Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker

Automatic speech recognition (ASR) is a commonly used machine learning (ML) technology in our daily lives and business scenarios. Applications such as voice-controlled assistants like Alexa and Siri, and voice-to-text applications like automatic subtitling for videos and transcribing meetings, are all powered by this technology. These applications take audio clips as input and convert speech signals to text, also referred as speech-to-text applications.

This technology has matured in recent years, and many of the latest models can achieve a very good performance, such as transformer-based models Wav2Vec2 and Speech2Text. Transformer is a sequence-to-sequence deep learning architecture originally proposed for machine translation. Now it’s extended to solve all kinds of natural language processing (NLP) tasks, such as text classification, text summarization, and ASR. The transformer architecture yields very good model performance and results in various NLP tasks; however, the models’ sizes (the number of parameters) as well as the amount of data they’re pre-trained on increase exponentially when pursuing better performance. It becomes very time-consuming and costly to train a transformer from scratch, for example training a BERT model from scratch could take 4 days and cost $6,912 (for more information, see The Staggering Cost of Training SOTA AI Models). Hugging Face, an AI company, provides an open-source platform where developers can share and reuse thousands of pre-trained transformer models. With the transfer learning technique, you can fine-tune your model with a small set of labeled data for a target use case. This reduces the overall compute cost, speeds up the development lifecycle, and lessens the carbon footprint of the community.

AWS announced collaboration with Hugging Face in 2021. Developers can easily work with Hugging Face models on Amazon SageMaker and benefit from both worlds. You can fine-tune and optimize all models from Hugging Face, and SageMaker provides managed training and inference services that offer high performance resources and high scalability via Amazon SageMaker distributed training libraries. This collaboration can help you accelerate your NLP tasks’ productization journey and realize business benefits.

This post shows how to use SageMaker to easily fine-tune the latest Wav2Vec2 model from Hugging Face, and then deploy the model with a custom-defined inference process to a SageMaker managed inference endpoint. Finally, you can test the model performance with sample audio clips, and review the corresponding transcription as output.

Wav2Vec2 background

Wav2Vec2 is a transformer-based architecture for ASR tasks and was released in September 2020. The following diagram shows its simplified architecture. For more details, see the original paper. As the diagram shows, the model is composed of a multi-layer convolutional network (CNN) as a feature extractor, which takes an input audio signal and outputs audio representations, also considered as features. They are fed into a transformer network to generate contextualized representations. This part of training can be self-supervised; the transformer can be trained with unlabeled speech and learn from it. Then the model is fine-tuned on labeled data with the Connectionist Temporal Classification (CTC) algorithm for specific ASR tasks. The base model we use in this post is Wav2Vec2-Base-960h, fine-tuned on 960 hours of Librispeech on 16 kHz sampled speech audio.

CTC is a character-based algorithm. During training, it’s able to demarcate each character of the transcription in the speech automatically, so the timeframe alignment isn’t required between audio signal and transcription. For example, if the audio clip says “Hello World,” we don’t need to know in which second the word “hello” is located. It saves a lot of labeling effort for ASR use cases. For more information about how the algorithm works, refer to Sequence Modeling With CTC.

Solution overview

In this post, we use the SUPERB (Speech processing Universal PERformance Benchmark) dataset available from the Hugging Face Datasets library, and fine-tune the Wav2Vec2 model and deploy it as a SageMaker endpoint for real-time inference for an ASR task. SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks.

The following diagram provides a high-level view of the solution workflow.

First, we show how to load and preprocess the SUPERB dataset in a SageMaker environment in order to obtain a tokenizer and feature extractor, which are required for fine-tuning the Wav2Vec2 model. Then we use SageMaker Script Mode for training and inference steps, which allows you to define and use custom training and inference scripts, and SageMaker provides supported Hugging Face framework Docker containers. For more information about training and serving Hugging Face models on SageMaker, see Use Hugging Face with Amazon SageMaker. This functionality is available through the development of Hugging Face AWS Deep Learning Containers (DLCs).

The notebook and code from this post are available on GitHub. The notebook is tested in both Amazon SageMaker Studio and SageMaker notebook environments.

Data preprocessing

In this section, we walk through the steps to preprocess the data.

Process the dataset

In this post we use SUPERB dataset, which you can load from the Hugging Face Datasets library directly using the load_dataset function. The SUPERB dataset also includes speaker_id and chapter_id; we remove these columns and only keep audio files and transcriptions to fine-tune the Wav2Vec2 model for an ASR task, which transcribes speech to text. To speed up the fine-tuning process for this example, we only take the test dataset from the original dataset, then split it into train and test datasets. See the following code:

data = load_dataset("superb", 'asr', ignore_verifications=True) 
data = data.remove_columns(['speaker_id', 'chapter_id', 'id'])
# reduce the data volume for this example. only take the test data from the original dataset for fine-tune
data = data['test'] 

train_test = data.train_test_split(test_size=0.2)
dataset = DatasetDict({
    'train': train_test['train'],
    'test': train_test['test']})

After we process the data, the dataset structure is as follows:

DatasetDict({
    train: Dataset({
        features: ['file', 'audio', 'text'],
        num_rows: 2096
    })
    test: Dataset({
        features: ['file', 'audio', 'text'],
        num_rows: 524
    })
})

Let’s print one data point from the train dataset and examine the information in each feature. ‘file’ is the audio file path where it’s saved and cached in the local repository. ‘audio’ contains three components: ‘path’ is the same as ‘file’, ‘array’ is the numerical representation of the raw waveform of the audio file in NumPy array format, and ‘sampling_rate’ shows the number of samples of audio recorded every second. ‘text’ is the transcript of the audio file.

print(dataset['train'][0])
result: 
{ {'file': '/root/.cache/huggingface/datasets/downloads/extracted/e0f3d50e856945385982ba36b58615b72eef9b2ba5a2565bdcc225b70f495eed/LibriSpeech/test-clean/7021/85628/7021-85628-0000.flac',
 'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/e0f3d50e856945385982ba36b58615b72eef9b2ba5a2565bdcc225b70f495eed/LibriSpeech/test-clean/7021/85628/7021-85628-0000.flac',
  'array': array([-0.00018311, -0.00024414, -0.00018311, ...,  0.00061035,
          0.00064087,  0.00061035], dtype=float32),
  'sampling_rate': 16000},
 'text': 'but anders cared nothing about that'}

Build a vocabulary file

The Wav2Vec2 model uses the CTC algorithm to train deep neural networks in sequence problems, and its output is a single letter or blank. It uses a character-based tokenizer. Therefore, we extract distinct letters from the dataset and build the vocabulary file using the following code:

def extract_characters(batch):
  texts = " ".join(batch["text"])
  vocab = list(set(texts))
  return {"vocab": [vocab], "texts": [texts]}

vocabs = dataset.map(extract_characters, batched=True, batch_size=-1, 
                   keep_in_memory=True, remove_columns= dataset.column_names["train"])

vocab_list = list(set(vocabs["train"]["vocab"][0]) | set(vocabs["test"]["vocab"][0]))
vocab_dict = {v: k for k, v in enumerate(vocab_list)}
vocab_dict["|"] = vocab_dict[" "]
del vocab_dict[" "]

vocab_dict["[UNK]"] = len(vocab_dict) # add "unknown" token 
vocab_dict["[PAD]"] = len(vocab_dict) # add a padding token that corresponds to CTC's "blank token"

with open('vocab.json', 'w') as vocab_file:
    json.dump(vocab_dict, vocab_file)

Create a tokenizer and feature extractor

The Wav2Vec2 model contains a tokenizer and feature extractor. In this step, we use the vocab.json file that we created from the previous step to create the Wav2Vec2CTCTokenizer. We use Wav2Vec2FeatureExtractor to make sure that the dataset used in fine-tuning has the same audio sampling rate as the dataset used for pre-training. Finally, we create a Wav2Vec2 processor that can wrap the feature extractor and the tokenizer into one single processor. See the following code:

# create Wav2Vec2 tokenizer
tokenizer = Wav2Vec2CTCTokenizer("vocab.json", unk_token="[UNK]",
                                  pad_token="[PAD]", word_delimiter_token="|")

# create Wav2Vec2 feature extractor
feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=16000, 
                                             padding_value=0.0, do_normalize=True, return_attention_mask=False)
# create a processor pipeline 
processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)

Prepare the train and test datasets

Next, we extract the array representation of the audio files and its sampling_rate from the dataset and process them using the processor, in order to have train and test data that can be consumed by the model:

# extract the numerical representation from the dataset
def extract_array_samplingrate(batch):
    batch["speech"] = batch['audio']['array'].tolist()
    batch["sampling_rate"] = batch['audio']['sampling_rate']
    batch["target_text"] = batch["text"]
    return batch

dataset = dataset.map(extract_array_samplingrate, 
                      remove_columns=dataset.column_names["train"])

# process the dataset with processor pipeline that created above
def process_dataset(batch):  
    batch["input_values"] = processor(batch["speech"], 
                            sampling_rate=batch["sampling_rate"][0]).input_values

    with processor.as_target_processor():
        batch["labels"] = processor(batch["target_text"]).input_ids
    return batch

data_processed = dataset.map(process_dataset, 
                    remove_columns=dataset.column_names["train"], batch_size=8, 
                    batched=True)

train_dataset = data_processed['train']
test_dataset = data_processed['test']

Then we upload the train and test data to Amazon Simple Storage Service (Amazon S3) using the following code:

from datasets.filesystems import S3FileSystem
s3 = S3FileSystem()

# save train_dataset to s3
training_input_path = f's3://{BUCKET}/{PREFIX}/train'
train_dataset.save_to_disk(training_input_path,fs=s3)

# save test_dataset to s3
test_input_path = f's3://{BUCKET}/{PREFIX}/test'
test_dataset.save_to_disk(test_input_path,fs=s3)

Fine-tune the Hugging Face model (Wav2Vec2)

We use SageMaker Hugging Face DLC script mode to construct the training and inference job, which allows you to write custom training and serving code and using Hugging Face framework containers that are maintained and supported by AWS.

When we create a training job using the script mode, the entry_point script, hyperparameters, its dependencies (inside requirements.txt), and input data (train and test datasets) are copied into the container. Then it invokes the entry_point training script, where the train and test datasets are loaded, training steps are performed, and model artifacts are saved in /opt/ml/model in the container. After training, artifacts in this directory are uploaded to Amazon S3 for later model hosting.

You can inspect the training script in the GitHub repo, in the scripts/ directory.

Create an estimator and start a training job

We use the Hugging Face estimator class to train our model. When creating the estimator, you need to specify the following parameters:

  • entry_point – The name of the training script. It loads data from the input channels, configures training with hyperparameters, trains a model, and saves the model.
  • source_dir – The location of the training scripts.
  • transformers_version – The Hugging Face Transformers library version we want to use.
  • pytorch_version – The PyTorch version that’s compatible with the Transformers library.

For this use case and dataset, we use one ml.p3.2xlarge instance, and the training job is able to finish in around 2 hours. You can select a more powerful instance with more memory and GPU to reduce the training time; however, it incurs more cost.

When you create a Hugging Face estimator, you can configure hyperparameters and provide a custom parameter into the training script, such as vocab_url in this example. Also, you can specify the metrics in the estimator, parse the logs of these metrics, and send them to Amazon CloudWatch to monitor and track the training performance. For more details, see Monitor and Analyze Training Jobs Using Amazon CloudWatch Metrics.

from sagemaker.huggingface import HuggingFace

#create an unique id to tag training job, model name and endpoint name. 
id = int(time.time())

TRAINING_JOB_NAME = f"huggingface-wav2vec2-training-{id}"
vocab_url = f"s3://{BUCKET}/{PREFIX}/vocab.json"

hyperparameters = {'epochs':10, # you can increase the epoch number to improve model accuracy
                   'train_batch_size': 8,
                   'model_name': "facebook/wav2vec2-base",
                   'vocab_url': vocab_url
                  }
                  
# define metrics definitions
metric_definitions=[
        {'Name': 'eval_loss', 'Regex': "'eval_loss': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'eval_wer', 'Regex': "'eval_wer': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'eval_runtime', 'Regex': "'eval_runtime': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'eval_samples_per_second', 'Regex': "'eval_samples_per_second': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'epoch', 'Regex': "'epoch': ([0-9]+(.|e-)[0-9]+),?"}]

OUTPUT_PATH= f's3://{BUCKET}/{PREFIX}/{TRAINING_JOB_NAME}/output/'

huggingface_estimator = HuggingFace(entry_point='train.py',
                                    source_dir='./scripts',
                                    output_path= OUTPUT_PATH, 
                                    instance_type='ml.p3.2xlarge',
                                    instance_count=1,
                                    transformers_version='4.6.1',
                                    pytorch_version='1.7.1',
                                    py_version='py36',
                                    role=ROLE,
                                    hyperparameters = hyperparameters,
                                    metric_definitions = metric_definitions,
                                   )

#Starts the training job using the fit function, training takes approximately 2 hours to complete.
huggingface_estimator.fit({'train': training_input_path, 'test': test_input_path},
                          job_name=TRAINING_JOB_NAME)

In the following figure of CloudWatch training job logs, you can see that, after 10 epochs of training, the model evaluation metrics WER (word error rate) can achieve around 0.17 for the subset of the SUPERB dataset. WER is a commonly used metric to evaluate speech recognition model performance, and the objective is to minimize it. You can increase the number of epochs or use the full SUPERB dataset to improve the model further.

Deploy the model as an endpoint on SageMaker and run inference

In this section, we walk through the steps to deploy the model and perform inference.

Inference script

We use the SageMaker Hugging Face Inference Toolkit to host our fine-tuned model. It provides default functions for preprocessing, predicting, and postprocessing for certain tasks. However, the default capabilities can’t inference our model properly. Therefore, we defined the custom functions model_fn(), input_fn(), predict_fn(), and output_fn() in the inference.py script to override the default settings with custom requirements. For more details, refer to the GitHub repo.

As of January 2022, the Inference Toolkit can inference tasks from architectures that end with 'TapasForQuestionAnswering', 'ForQuestionAnswering', 'ForTokenClassification', 'ForSequenceClassification', 'ForMultipleChoice', 'ForMaskedLM', 'ForCausalLM', 'ForConditionalGeneration', 'MTModel', 'EncoderDecoderModel','GPT2LMHeadModel', and 'T5WithLMHeadModel'. The Wav2Vec2 model is not currently supported.

You can inspect the full inference script in the GitHub repo, in the scripts/ directory.

Create a Hugging Face model from the estimator

We use the Hugging Face Model class to create a model object, which you can deploy to a SageMaker endpoint. When creating the model, specify the following parameters:

  • entry_point – The name of the inference script. The methods defined in the inference script are implemented to the endpoint.
  • source_dir – The location of the inference scripts.
  • transformers_version – The Hugging Face Transformers library version we want to use. It should be consistent with the training step.
  • pytorch_version – The PyTorch version that is compatible with the Transformers library. It should be consistent with the training step.
  • model_data – The Amazon S3 location of a SageMaker model data .tar.gz file.
from sagemaker.huggingface import HuggingFaceModel

huggingface_model = HuggingFaceModel(
        entry_point = 'inference.py',
        source_dir='./scripts',
        name = f'huggingface-wav2vec2-model-{id}',
        transformers_version='4.6.1', 
        pytorch_version='1.7.1', 
        py_version='py36',
        model_data=huggingface_estimator.model_data,
        role=ROLE,
    )

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge", 
    endpoint_name = f'huggingface-wav2vec2-endpoint-{id}'
)

When you create a predictor by using the model.deploy function, you can change the instance count and instance type based on your performance requirements.

Inference audio files

After you deploy the endpoint, you can run prediction tests to check the model performance. You can download an audio file from the S3 bucket by using the following code:

import boto3
s3 = boto3.client('s3')
s3.download_file(BUCKET, 'huggingface-blog/sample_audio/xxx.wav', 'downloaded.wav')
file_name ='downloaded.wav'

Alternatively, you can download a sample audio file to run the inference request:

import soundfile
!wget https://datashare.ed.ac.uk/bitstream/handle/10283/343/MKH800_19_0001.wav
file_name ='MKH800_19_0001.wav'
speech_array, sampling_rate = soundfile.read(file_name)
json_request_data = {"speech_array": speech_array.tolist(),
                     "sampling_rate": sampling_rate}

prediction = predictor.predict(json_request_data)
print(prediction)

The predicted result is as follows:

['"she had your dark suit in grecy wash water all year"', 'application/json']

Clean up

When you’re finished using the solution, delete the SageMaker endpoint to avoid ongoing charges:

predictor.delete_endpoint()

Conclusion

In this post, we showed how to fine-tune the pre-trained Wav2Vec2 model on SageMaker using a Hugging Face estimator, and also how to host the model on SageMaker as a real-time inference endpoint using the SageMaker Hugging Face Inference Toolkit. For both training and inference steps, we provided custom defined scripts for greater flexibility, which are enabled and supported by SageMaker Hugging Face DLCs. You can use the method from this post to fine-tune a We2Vec2 model with your own datasets, or to fine-tune and deploy a different transformer model from Hugging Face.

Check out the notebook and code of this project from GitHub, and let us know your comments. For more comprehensive information, see Hugging Face on SageMaker and Use Hugging Face with Amazon SageMaker.

In addition, Hugging Face and AWS announced a partnership in 2022 that makes it even easier to train Hugging Face models on SageMaker. This functionality is available through the development of Hugging Face AWS DLCs. These containers include the Hugging Face Transformers, Tokenizers, and Datasets libraries, which allow us to use these resources for training and inference jobs. For a list of the available DLC images, see Available Deep Learning Containers Images. They are maintained and regularly updated with security patches. You can find many examples of how to train Hugging Face models with these DLCs and the Hugging Face Python SDK in the following GitHub repo.


About the Author

Ying Hou, PhD, is a Machine Learning Prototyping Architect at AWS. Her main areas of interests are deep learning, computer vision, NLP, and time series data prediction. In her spare time, she enjoys reading novels and hiking in national parks in the UK.

Read More

Build a virtual credit approval agent with Amazon Lex, Amazon Textract, and Amazon Connect

Banking and financial institutions review thousands of credit applications per week. The credit approval process requires financial organizations to invest time and resources in reviewing documents like W2s, bank statements, and utility bills. The overall experience can be costly for the organization. At the same time, organizations have to consider borrowers, who are waiting for decisions on their credit applications. To retain customers, organizations need to process borrower applications quickly with low turnaround times.

With an automated credit approval assistant using machine learning, financial organizations can expedite the process, reduce cost, and provide better customer experience with faster decisions. Banks and Fintechs can build a virtual agent that can review a customer’s financial documents and provide a decision instantly. Building an effective credit approval process not only improves the customer experience, but also lowers the cost.

In this post, we show how to build a virtual credit approval assistant that reviews the financial documents required for loan approval and makes decisions instantly for a seamless customer experience. The solution uses Amazon Lex, Amazon Textract, and Amazon Connect, among other AWS services.

Overview of the solution

You can deploy the solution using an AWS CloudFormation template. The solution creates a virtual agent using Amazon Lex and associates it with Amazon Connect, which acts as the conversational interface with customers and asks the loan applicant to upload the necessary documents. The documents are stored in an Amazon Simple Storage Service (Amazon S3) bucket used only for that customer.

This solution is completely serverless and uses Amazon S3 to store a static website that hosts the front end and custom JavaScript to enable the rest of the requests. Amazon CloudFront serves as a content delivery network (CDN) to allow a public front end for the website. CloudFront is a fast CDN service that securely delivers data, videos, applications, and APIs to customers globally with low latency and high transfer speeds, all within a developer-friendly environment.

This is a sample project designed to be easily deployable for experimentation. The AWS Identity and Access Management (IAM) policy permissions in this solution use least privilege, however the CloudFront and Amazon API Gateway resources deployed are publicly accessible. To take the appropriate measures to secure your CloudFront distribution and API Gateway resources, refer to Configuring secure access and restricting access to content and Security in Amazon API Gateway, respectively.

Additionally, the backend features API Gateway with HTTP routes for two AWS Lambda functions. The first function creates the session with Amazon Connect for chat; the second passes the pre-signed URL link fetched by the front end from Amazon Connect to Amazon Lex. Amazon Lex triggers the Lambda function associated with it and lets Amazon Textract read the documents and capture all the fields and information in them. This function also makes the credit decisions based on business processes previously defined by the organization. The solution is integrated with Amazon Connect to let customers connect to contact center agents if the customer is having difficulty or needs help through the process.

The following example depicts the interaction between bot and borrower.

The following diagram illustrates the solution architecture.

The solution workflow is as follows:

  1. Customers navigate to a URL served by CloudFront, which fetches webpages from an S3 bucket and sends JavaScript to the web browser.
  2. The web browser renders the webpages and makes an API call to API Gateway.
  3. API Gateway triggers the associated Lambda function.
  4. The function initiates a startChatContact API call with Amazon Connect and triggers the contact flow associated with it.
  5. Amazon Connect triggers Amazon Lex with the utterance to classify the intent. After the intent is classified, Amazon Lex elicits the required slots and asks the customer to upload the document to fulfill the intent.
  6. The applicant uploads the W2 document to the S3 bucket using the upload attachment icon in the chat window.

As a best practice, consider implementing encryption at rest for the S3 bucket using AWS Key Management Service (AWS KMS). Additionally, you can attach a bucket policy to the S3 bucket to ensure data is always encrypted in transit. Consider enabling server access logging for the S3 bucket to capture detailed records of requests to assist with security and access audits. For more information, see Security Best Practices for Amazon S3.

  1. The web browser makes a call to Amazon Connect to retrieve a pre-signed URL of the uploaded image. Make sure the pre-signed URLs expire a few minutes after the Lambda function runs the logic.
  2. After the document has been uploaded successfully, the web application makes an API call to API Gateway to updates the file location for use in Amazon Lex session attributes.
  3. API Gateway triggers a Lambda function to pass the W2 pre-signed URL location. The function updates the session attributes in Amazon Lex with the pre-signed URL of the W2 document.
  4. The web browser also updates the slot to uploaded, which fulfills the intent.
  5. Amazon Lex triggers a Lambda function, which downloads the W2 image data and sends it to Amazon Textract for processing.
  6. Amazon Textract reads all the fields from the W2 image document, converts them into key-value pairs, and passes the data back to the Lambda function.

Amazon Textract conforms to the AWS shared responsibility model, which outlines the responsibilities for data protection between AWS and the customer. For more information, refer to Data Protection in Amazon Textract.

  1. Lambda uses the W2 data for evaluation of the loan application and returns the result to the web browser.

Follow the best practices for enabling logging in Lambda. Refer to part 1 and part 2 of the blog series “Operating Lambda: Building a solid security foundation.

Data in-transit is secured using TLS, and it’s highly recommended to encrypt data at rest. For more information about protecting data inside your S3 bucket, refer to Strengthen the security of sensitive data stored in Amazon S3 by using additional AWS services.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account.
  2. An Amazon Connect contact center instance in the us-east-1 Region. You can use an existing one or create a new one. For instructions, refer to Get started with Amazon Connect. If you have an existing Amazon Connect instance and chat isn’t enabled, refer to Enabling Chat in an Existing Amazon Connect Contact Center.
  3. Chat attachments enabled in Amazon Connect. For instructions, refer to Enable attachments to share files using chat. For CORS setup, use option 2, which uses the * wildcard to AllowedOrigin.
  4. The example project located in the GitHub repository. You need to clone this repository on your local machine and use AWS Serverless Application Model (AWS SAM) to deploy the project. To install the AWS SAM CLI and configure AWS credentials, refer to Getting started with AWS SAM.
  5. Python 3.9 runtime to support the AWS SAM deployment.

Import the Amazon Connect flow

To import the Amazon Connect flow, complete the following steps:

  1. Log in to your Amazon Connect instance.
  2. Under Routing, choose Contact Flows.
  3. Choose Create contact flow.
  4. On the Save menu, choose Import flow.
  5. Choose Select and choose the import flow file located in the /flow subdirectory, called Loan_App_Connect_Flow.
  6. Save the flow. Do not publish yet.
  7. Expand Show additional flow information and choose the copy icon to capture the ARN.
  8. Save these IDs for use as parameters in the CloudFormation template to be deployed in the next step:
    arn:aws:connect:us-east-1:123456789012:instance/11111111-1111-1111-1111-111111111111/contact-flow/22222222-2222-2222-2222-222222222222

The Amazon Connect instance ID is the long alphanumeric value between the slashes immediately following instance in the ARN. For this post, the instance ID is 11111111-1111-1111-1111-111111111111.

The contact flow ID is the long value after the slash following contact-flow in the ARN. For this post, the flow ID is 22222222-2222-2222-2222-222222222222.

Deploy with AWS SAM

With the instance and flow IDs captured, we’re ready to deploy the project.

  1. Open a terminal window and clone the GitHub repository in a directory of your choice.
  2. Navigate to the amazon-connect-virtual-credit-agent directory and follow the deployment instructions in GitHub repo.
  3. Record the Amazon Lex bot name from the Outputs section of the deployment for the next steps (called Loan_App_Bot if you accepted the default name).
  4. Return to these instructions once the AWS SAM deploy completes successfully.

Update the contact flow blocks

To update the contact flow blocks, complete the following steps:

  1. Log in to your Amazon Connect instance
  2. Under Routing, choose Contact Flows.
  3. Choose the flow named Loan_App_Flow.
  4. Choose the Get customer input block.
  5. Under the Amazon Lex section, choose the bot named Loan_App_Bot and the dev alias created earlier.
  6. Choose Save.
  7. Choose the Set working queue block.
  8. Choose the X icon and on the drop-down menu, choose BasicQueue.
  9. Choose Save.
  10. Save the flow.
  11. Publish the flow.

Test the solution

You’re now ready to test the solution.

  1. Log in to you Amazon Connect instance for setting up an Amazon Connect agent for a chat.
  2. On the dashboard, choose the phone icon to open the Contact Control Panel (CCP) in a separate window.
  3. In the CCP, change the agent state to Available.
  4. On the Outputs tab for your CloudFormation stack, choose the value for cloudFrontDistribution.

This is a link to your CloudFront URL. You’re redirected to a webpage with your loan services bot. A floating action button (FAB) is on the bottom right of the screen.

  1. Choose the FAB to open the chat bot.
  2. After you get the welcome message, enter I need a loan.
  3. When prompted, choose a loan type and enter a loan amount.
  4. Upload an image of a W2 document.

A sample W2 image file is located in the project repository in the /img subdirectory. The file is called w2.png.

After the image is uploaded, the bot asks you if you want to submit the application.

  1. Choose Yes to submit.

After submission, the bot evaluates the W2 image and provides a response. After a few seconds, you’re connected to an agent.

You should see a request to connect with chat in the CCP.

  1. Choose the request to accept.

The agent is now connected to the chat user. You can simulate each side of the conversation to test the chat session.

  1. Choose End Chat when you’re done.

Troubleshooting

After you deploy the stack, if you see an Amazon S3 permission error when viewing the CloudFront URL, it means the domain isn’t ready yet. The CDN can take up to 1 hour to be ready.

If you can’t add your attachments, check your CORS setting. For instructions, refer to Enable attachments to share files using chat. For CORS setup, use option 2, which uses the * wildcard to AllowedOrigin.

Clean up

To avoid incurring future charges, remove all resources created by deleting the CloudFormation stack.

Conclusion

In this post, we demonstrated how to quickly and securely set up a loan application processing solution. Data at rest and in transit are both encrypted and secured. This solution can act as a blueprint to build other self-service processing flows where Amazon Connect and Amazon Lex provide a conversational interface for customer engagement. We look forward to seeing what other solutions you build using this architecture.

Should you need assistance building these capabilities and Amazon Connect contact flows, please reach out to one of the dozens of Amazon Connect partners available worldwide.


About the Authors

Dipkumar Mehta is a Senior Conversational AI Consultant with the Amazon ProServe Natural Language AI team. He focuses on helping customers design, deploy and scale end-to-end Conversational AI solutions in production on AWS. He is also passionate about improving customer experience and drive business outcomes by leveraging data.

Cecil Patterson is a Natural Language AI consultant with AWS Professional services based in North Texas. He has many years of experience working with large enterprises to enable and support global infrastructure solutions. Cecil uses his experience and diverse skill set to build exceptional conversational solutions for customers of all types.

Sanju Sunny is a Digital Innovation Specialist with Amazon ProServe. He engages with customers in a variety of industries around Amazon’s distinctive customer-obsessed innovation mechanisms in order to rapidly conceive, validate and prototype new products, services and experiences.

Matt Kurio is a Security Transformation Consultant with the Amazon ProServe Shared Delivery Team.  He excels helping enterprise customers build secure platforms and manage security effectively and efficiently.  He also enjoys relaxing at the beach and outdoor activities with his family.

Read More

Control access to Amazon SageMaker Feature Store offline using AWS Lake Formation

You can establish feature stores to provide a central repository for machine learning (ML) features that can be shared with data science teams across your organization for training, batch scoring, and real-time inference. Data science teams can reuse features stored in the central repository, avoiding the need to reengineer feature pipelines for different projects and as a result eliminating rework and duplication.

To satisfy security and compliance needs, you may need granular control over how these shared ML features are accessed. These needs often go beyond table- and column-level access control to individual row-level access control. For example, you may want to let account representatives see rows from a sales table for only their accounts and mask the prefix of sensitive data like credit card numbers. Fine-grained access controls are needed to protect feature store data and grant access based on an individual’s role. This is specifically important for customers and stakeholders in industries that are required to audit access to feature data and ensure the right level of security is in place.

In this post, we provide an overview of how to implement granular access control to feature groups and features stored in an offline feature store using Amazon SageMaker Feature Store and AWS Lake Formation. If you’re new to Feature Store, you may want to refer to Understanding the key capabilities of Amazon SageMaker Feature Store for additional background before diving into the rest of this post. Note that for the online feature store, you can use AWS Identity and Access Management (IAM) policies with conditions to restrict user access against feature groups.

Solution overview

The following architecture uses Lake Formation to implement row-, column-, or cell-level access to limit which feature groups or features within a feature group can be accessed by a data scientist working in Amazon SageMaker Studio. Although we focus on restricting access to users working in Studio, the same approach is applicable for users accessing the offline feature store using services like Amazon Athena.

Feature Store is a purpose-built solution for ML feature management that helps data science teams reuse ML features across teams and models, serve features for model predictions at scale with low latency, and train and deploy new models more quickly and effectively.

Lake Formation is a fully managed service that helps you build, secure, and manage data lakes, and provide access control for data in the data lake. Lake Formation supports the following security levels:

  • Row-level permissions – Restricts access to specific rows based on data compliance and governance policies
  • Column-level permissions – Restricts access to specific columns based on data filters
  • Cell-level permissions – Combines both row- and column-level controls by allowing you access to specific rows and columns on the database tables

Lake Formation also provides centralized auditing and compliance reporting by identifying which principals accessed what data, when, and through which services.

By combining Feature Store and Lake Formation, you can implement granular access to ML features on your existing offline feature store.

In this post, we provide an approach for use cases in which you have created feature groups in Feature Store and need to provide access to your data science teams for feature exploration and creating models for their projects. At a high level, a Lake Formation admin defines and creates a permission model in Lake Formation and assigns it to individual Studio users or groups of users.

We walk you through the following steps:

  1. Register the offline feature store in Lake Formation.
  2. Create the Lake Formation data filters for fine-grained access control.
  3. Grant feature groups (tables) and features (columns) permissions.

Prerequisites

To implement this solution, you need to create a Lake Formation admin user in IAM and sign in as that admin user. For instructions, refer to Create a Data Lake Administrator.

We begin with setting up test data using synthetic grocery orders from synthetically generated customer lists using the Faker Python library. You can try it yourself by following the module on GitHub. For each customer, the notebook generates between 1–10 orders, with products purchased in each order. Then you can use the following notebook to create the three feature groups for the customers, products, and orders datasets in the feature store. Before creating the feature groups, make sure that your Studio environment is set up in your AWS account. For instructions, refer to Onboard to Amazon SageMaker Domain.

The goal is to illustrate how to use Feature Store to store the features and use Lake Formation to control access to these features. The following screenshot shows the definition of the orders feature group using the Studio console.

Feature Store uses an Amazon Simple Storage Service (Amazon S3) bucket in your account to store offline data. You can use query engines like Athena against the offline data store in Amazon S3 to extract training datasets or to analyze feature data, and you can join more than one feature group in a single query. Feature Store automatically builds the AWS Glue Data Catalog for feature groups during feature group creation, which allows you to use this catalog to access and query the data from the offline store using Athena or open-source tools like Presto.

Register the offline feature store in Lake Formation

To start using Lake Formation permissions with your existing Feature Store databases and tables, you must revoke the Super permission from the IAMAllowedPrincipals group on the database and the associated feature group tables in Lake Formation.

  1. Sign in to the AWS Management Console as a Lake Formation administrator.
  2. In the navigation pane, under Data Catalog, choose Databases.
  3. Select the database sagemaker_featurestore, which is the database associated to the offline feature store.

Because Feature Store automatically builds an AWS Glue Data Catalog when you create the feature groups, the offline feature store is visible as a database in Lake Formation.

  1. On the Actions menu, choose Edit.
  2. On the Edit database page, if you want Lake Formation permissions to work for newly created feature groups too and not have to revoke the IAMAllowedPrincipals for each table, deselect Use only IAM access control for new tables in this database, then choose Save.
  3. On the Databases page, select the sagemaker_featurestore database.
  4. On the Actions menu, choose View permissions.
  5. Select the IAMAllowedPrincipals group and choose Revoke.

Similarly, you need to perform these steps for all feature group tables that are associated to your offline feature store.

  1. In the navigation pane, under Data Catalog, choose Tables.
  2. Select table with your feature group name.
  3. On the Actions menu, choose View permissions.
  4. Select the IAMAllowedPrincipals group and choose Revoke.

To switch the offline feature store to the Lake Formation permission model, you need to turn on Lake Formation permissions for the Amazon S3 location of the offline feature store. For this, you have to register the Amazon S3 location.

  1. In the navigation pane, under Register and Ingest, choose Data lake locations.
  2. Choose Register location.
  3. Select the location of the offline feature store in Amazon S3 for the Amazon S3 path.

The location is the S3Uri that was provided in the feature group’s offline store configuration and can be found in the DescribeFeatureGroup API’s ResolvedOutputS3Uri field.

  1. Select the default AWSServiceRoleForLakeFormationDataAccess IAM role and choose Register location.

Lake Formation integrates with AWS Key Management Service (AWS KMS); this approach also works with Amazon S3 locations that have been encrypted with an AWS managed key or with the recommended approach of a customer managed key. For further reading, refer to Registering an encrypted Amazon S3 location.

Create Lake Formation data filters for fine-grained access control

You can implement row-level and cell-level security by creating data filters. You select a data filter when you grant the SELECT Lake Formation permission on tables. In this case, we use this capability to implement a set of filters that limit access to feature groups and specific features within a feature group.

Let’s use the following figure to explain how data filters work. The figure shows two feature groups: customers and orders. A row-level data filter is applied to the customers feature group, resulting in only records where feature1 = ‘12’ is being returned. Similarly, access to the orders feature group is restricted using a cell-level data filter to only feature records where feature2 = ‘22’, as well as excluding feature 1 from the resulting dataset.

To create a new data filter, in the navigation pane on the Lake Formation console, under Data Catalog, choose Data filters and then choose Create new filter.

When you select Access to all columns and provide a row filter expression, you’re establishing row-level security (row filtering) only. In this example, we create a filter that limits access to a data scientist to only records in the orders feature group based on the value of the feature customer_id ='C7782'.

When you include or exclude specific columns and also provide a row filter expression, you’re establishing cell-level security (cell filtering). In this example, we create a filter that limits access to a data scientist to certain features of a feature group (we exclude sex and is_married) and a subset of the records in the customers feature group based on the value of the feature (customer_id ='C3126').

The following screenshot shows the data filters created.

Grant feature groups (tables) and features (columns) permission

In this section, you grant granular access control and permissions defined in Lake Formation to a SageMaker user by assigning the data filter to the SageMaker execution role associated to the user who originally created the feature groups. The SageMaker execution role is created as part of the SageMaker Studio domain setup and by default starts with AmazonSageMaker-ExecutionRole-*. You need to give this role permissions on Lake Formation APIs (GetDataAccess, StartQueryPlanning, GetQueryState, GetWorkUnits, and GetWorkUnitResults) and AWS Glue APIs (GetTables and GetDatabases) in IAM in order for it to be able to access the data.

Create the following policy in IAM, name the policy LakeFormationDataAccess, and attach it to the SageMaker execution role. You also need to attach the AmazonAthenaFullAccess policy to access Athena.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "LakeFormationDataAccess",
"Effect": "Allow",
"Action": [
"lakeformation:GetDataAccess",
"lakeformation:StartQueryPlanning",
"lakeformation:GetQueryState",
"lakeformation:GetWorkUnits",
"lakeformation:GetWorkUnitResults",
"glue:GetTables",
"glue:GetDatabases"
],
"Resource": "*"
}
]
}

Next, you need to grant access to the Feature Store database and specific feature group table to the SageMaker execution role and assign it one of the data filters created previously. To grant data permissions inside Lake Formation, in the navigation pane, under Permissions, choose Data Lake Permissions, then choose Grant. The following screenshot demonstrates how to grant permissions with a data filter for row-level access to a SageMaker execution role.

Similarly, you can grant permissions with the data filter created for cell-level access to the SageMaker execution role.

Test Feature Store access

In this section, you validate the access controls set up in Lake Formation using a Studio notebook. This implementation uses the Feature Store Python SDK and Athena to query data from the offline feature store that has been registered in Lake Formation.

First, you test row-level access by creating an Athena query for your feature group orders with the following code. The table_name is the AWS Glue table that is automatically generated by Feature Store.

orders_query = orders_fg.athena_query()
orders_table = orders_query.table_name

You query all records from the orders using the following query string:

orders_query_string = f'SELECT * FROM "sagemaker_featurestore"."{orders_table}" '

orders_query.run(query_string=orders_query_string, output_location=output_location)
orders_query.wait()
orders_df = orders_query.as_dataframe()
orders_df

Only records with customer_id = ‘C7782’ are returned as per the data filters created in Lake Formation.

Secondly, you test cell-level access by creating an Athena query for your feature group customers with the following code. The table_name is the AWS Glue table that is automatically generated by Feature Store.

customers_query = customers_fg.athena_query()
customers_table = customers_query.table_name

You query all records from the orders using the following query string:

customers_query_string = f'SELECT * FROM "sagemaker_featurestore"."{customers_table}" '

customers_query.run(query_string=customers_query_string, output_location=output_location)
customers_query.wait()
customers_df = customers_query.as_dataframe()
customers_df

Only records with customer_id ='C3126' are returned as per the data filters created in Lake Formation. In addition, the features sex and is_married aren’t visible.

With this approach, you can implement granular permission access control to an offline feature store. With the Lake Formation permission model, you can limit access to certain feature groups or specific features within a feature group for individuals based on their role in the organization.

To explore the complete code example, and to try it out in your own account, see the GitHub repo.

Conclusion

SageMaker Feature Store provides a purpose-built feature management solution to help organizations scale ML development across business units and data science teams. In this post, we explained how you can use Lake Formation to implement fine-grained access control for your offline feature store. Give it a try, and let us know what you think in comments.


About the Authors

Arnaud Lauer is a Senior Partner Solutions Architect in the Public Sector team at AWS. He enables partners and customers to understand how best to use AWS technologies to translate business needs into solutions. He brings more than 16 years of experience in delivering and architecting digital transformation projects across a range of industries, including the public sector, energy, and consumer goods. Artificial intelligence and machine learning are some of his passions. Arnaud holds 12 AWS certifications, including the ML Specialty Certification.

Ioan Catana is an Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He helps customers develop and scale their ML solutions in the AWS Cloud. Ioan has over 20 years of experience, mostly in software architecture design and cloud engineering.

Swagat Kulkarni is a Senior Solutions Architect at AWS and an AI/ML enthusiast. He is passionate about solving real-world problems for customers with cloud-native services and machine learning. Swagat has over 15 years of experience delivering several digital transformation initiatives for customers across multiple domains including retail, travel and hospitality and healthcare. Outside of work, Swagat enjoys travel, reading, and meditating.

Charu Sareen is a Sr. Product Manager for Amazon SageMaker Feature Store. Prior to AWS, she was leading growth and monetization strategy for SaaS services at VMware. She is a data and machine learning enthusiast and has over a decade of experience spanning product management, data engineering, and advanced analytics. She has a bachelor’s degree in Information Technology from National Institute of Technology, India and an MBA from University of Michigan, Ross School of Business.

Read More

Manage dialog to elicit Amazon Lex slots in Amazon Connect contact flows

Amazon Lex can add powerful automation to contact center solutions, so you can enable self-service via interactive voice response (IVR) interactions or route calls to the appropriate agent based on caller input. These capabilities can increase customer satisfaction by streamlining the user experience, and improve containment rates in the contact center.

In both the self-service and call routing scenarios, you may need to configure the bot so that it can obtain information commonly required in customer service calls. For example, to enable a self-service experience when the caller requests a transfer from their checking account to their savings account, you may have to first get their account ID.

Bots are more effective at processing a response if they know the related request or prompt (for example, “What is your account ID?”). Amazon Lex provides comprehensive dialog management capabilities, so that context can be maintained across a conversation. However, sometimes the initial prompt may occur before the Amazon Lex bot is engaged.

In the case of an IVR solution, for example, the welcome prompt (“Welcome to ACME bank. To get started, can you tell me your account ID?”) may be defined in the client (Amazon Connect) contact flow. In this case, the Amazon Lex bot isn’t aware that you already prompted the user for their account ID. This could be a source of ambiguity for the bot (imagine if someone called you and started a conversation by saying, “123456”).

To create the best customer experience in cases like this, we recommend that you provide your Amazon Lex bot with details about the prompt. In this post, we show a simple way to inform Amazon Lex about details such as a prior prompt already provided to the user.

Solution overview

For this example, we use an Amazon Lex bot that provides self-service capabilities as part of an Amazon Connect contact flow. When the user calls in on their phone, they’re prompted for their account ID (for example, a six-digit number). We demonstrate how the Amazon Connect contact flow passes context about the information requested (in this case, the AccountId slot) to the Amazon Lex bot. As a best practice, we recommend setting the Amazon Lex dialog state to “slot elicitation” any time a user is prompted for a slot value.

We use the following sample banking interaction to model our Amazon Lex bot:

IVR: Hi, welcome to ACME bank customer service. To get started, please tell me your account ID.

User: 123456.

IVR: Thanks. How can I help? You can check account balances, transfer funds, and order checks.

User: What’s my balance in checking?

IVR: The balance for your checking account is $875. Is there anything else I can help you with?

User: No thanks, that’s it.

IVR: Okay, thanks for contacting us today. We appreciate your business!

Let’s deploy an Amazon Lex bot to see how this works.

Solution architecture

In this sample solution, we use AWS CloudFormation to deploy an Amazon Lex bot with an AWS Lambda fulfillment function, along with an example Amazon Connect contact flow that is integrated with the bot. The welcome prompt (“Welcome to ACME bank. To get started, please tell me your account ID.”) is configured in a “Play prompt” block in the contact flow.

The contact flow uses a Lambda helper function to inform Amazon Lex that the user has been prompted for a slot value. This is done via an “Invoke AWS Lambda function” block in the contact flow. The helper function makes a call to the Amazon Lex put-session API, to tell Amazon Lex to elicit for the account ID slot value. See the following code:

bot_response = lexClient.put_session(
        botId=bot_id,
        botAliasId=bot_alias_id,
        localeId='en_US',
        sessionId=session_id,
        sessionState={
            'dialogAction': {
                'type': 'ElicitSlot',
                'slotElicitationStyle': elicitation_style,
                'slotToElicit': slot_to_elicit
            },            
            'intent': {
                'name': intent,
                'slots': {},
                'state': 'InProgress',
                'confirmationState': 'None'
            },
            'activeContexts': [
                {
                    'name': context_name,
                    'contextAttributes': {},
                    'timeToLive': {
                        'timeToLiveInSeconds': int(context_ttl),
                        'turnsToLive': int(context_turns)
                    }
                }
            ],            
            'sessionAttributes': {}
        },
        requestAttributes={},
        responseContentType='text/plain; charset=utf-8'
    )

Next, control passes to the “Get customer input” block in the contact flow to trigger the Amazon Lex bot. Because the bot is ready for the account ID slot, the conversation is more efficient. You can also handle scenarios where the caller doesn’t have the requested information, by creating an intent to respond to inputs such as “I don’t know.” Although the bot is expecting a number (account ID), if the user provides a different response, the appropriate intent is triggered.

Prerequisites

Before deploying this solution, you should have the following prerequisites:

Deploy the sample solution

To deploy the solution, complete the following steps:

  1. Sign in to the AWS Management Console in your AWS account, and choose the following link:

This launches a new CloudFormation stack to create the example banking bot.

  1. For Stack name, enter a name (for example, lex-elicit-slot-example).
  2. For ConnectInstanceARN, enter the ARN (Amazon Resource Name) for the Amazon Connect instance you’ll use for testing the solution.
  3. Leave the other parameters at their default or change them as needed.
  4. Choose Next.
  5. Add any tags you may want for your stack (this step is optional).
  6. Choose Next.
  7. Review the stack details and select the check box to acknowledge that IAM resources will be created.
  8. Choose Create stack.

After a few minutes, your stack is complete, and includes the following resources:

  • A Lex bot, including a published version with an alias (Development-Alias)
  • A Lambda fulfillment function for the bot (BotHandler)
  • A Lambda helper function, which calls the Amazon Lex put-session API to enable slot elicitation mode (SlotElicitor)
  • A CloudWatch Logs log group for Amazon Lex conversation logs (optional)
  • Required IAM roles
  • A custom resource that adds a sample contact flow to your Amazon Connect instance

Test the bot on the Amazon Lex console

At this point, you can try the example interaction on the Amazon Lex console. You should see the sample bot with the name that you specified in the CloudFormation template (banking-bot-sample).

  1. On the Amazon Lex console, choose this bot and choose Bot versions in the navigation pane.
  2. Choose Version 1, then choose Intents in the navigation pane.

You can see a list of intents.

  1. Choose Test.
  2. Select Development-Alias and choose Confirm.

The test window opens.

  1. Try “What’s my balance?” to get started. You can also say “order some checks,” “transfer 100 dollars,” and “goodbye.”

You will be prompted for an account ID.

Test the bot with Amazon Connect

Now let’s try this with voice using an Amazon Connect instance. We have already configured a sample contact flow in your Amazon Connect instance.

All you need to do is set up a phone number and associate it with this contact flow. To do this, follow these steps:

  1. On the Amazon Connect console, open your instance by choosing Access URL and logging in to the instance.
  2. On the Dashboard, choose View phone numbers.
  3. Choose Claim a number.
  4. Choose a country on the Country drop-down menu, and choose a number.
  5. For Description, enter a description, such as Example contact flow that elicits a slot with Amazon Lex.
  6. For Contact flow, choose the contact flow you just created.
  7. Choose Save.

You’re now ready to call in to your Amazon Connect instance to test your bot using voice. Just dial the number on your phone and give it a try!

Clean up

You may want to clean up the resources created as part of the CloudFormation template when you’re done using the bot, to avoid incurring ongoing charges. To do this, delete the CoudFormation Stack.

Conclusion

Amazon Lex offers powerful automated speech recognition (ASR) and natural language understanding (NLU) capabilities that you can use to capture information from your users to provide automated, self-service functionality, or to route callers to the right agents. Amazon Lex uses slot elicitation to collect information commonly needed in a customer service call. It’s important to provide the bot details on the type of information it should be expecting at the right times—in some cases, even on the first turn of a conversation. You can incorporate this technique in your own Amazon Lex conversation flows.


About the Authors

Brian Yost is a Senior Technical Program manager on the AWS Lex team. In his spare time, he enjoys mountain biking, home brewing, and tinkering with technology.

Read More

Luxembourg

Scientists in Luxembourg solve problems for our global customers and collaborate with teams worldwide. Much of the work in Luxembourg is focused on surfacing the right products to Amazon retail customers and delivering them as efficiently as possible.Read More