Enhance Amazon Lex with conversational FAQ features using LLMs

Enhance Amazon Lex with conversational FAQ features using LLMs

Amazon Lex is a service that allows you to quickly and easily build conversational bots (“chatbots”), virtual agents, and interactive voice response (IVR) systems for applications such as Amazon Connect.

Artificial intelligence (AI) and machine learning (ML) have been a focus for Amazon for over 20 years, and many of the capabilities that customers use with Amazon are driven by ML. Today, large language models (LLMs) are transforming the way developers and enterprises solve historically complex challenges related to natural language understanding (NLU). We announced Amazon Bedrock recently, which democratizes Foundational Model access for developers to easily build and scale generative AI-based applications, using familiar AWS tools and capabilities. One of the challenges enterprises face is to incorporate their business knowledge into LLMs to deliver accurate and relevant responses. When leveraged effectively, enterprise knowledge bases can be used to deliver tailored self-service and assisted-service experiences, by delivering information that helps customers solve problems independently and/or augmenting an agent’s knowledge. Today, a bot developer can improve self-service experiences without utilizing LLMs in a couple of ways. First, by creating intents, sample utterances, and responses, thereby covering all anticipated user questions within an Amazon Lex bot. Second, developers can also integrate bots with search solutions, which can index documents stored across a wide range of repositories and find the most relevant document to answer their customer’s question. These methods are effective, but require developer resources making getting started difficult.

One of the benefits offered by LLMs is the ability to create relevant and compelling conversational self-service experiences. They do so by leveraging enterprise knowledge base(s) and delivering more accurate and contextual responses. This blog post introduces a powerful solution for augmenting Amazon Lex with LLM-based FAQ features using the Retrieval Augmented Generation (RAG). We will review how the RAG approach augments Amazon Lex FAQ responses using your company data sources. In addition, we will also demonstrate Amazon Lex integration with LlamaIndex, which is an open-source data framework that provides knowledge source and format flexibility to the bot developer. As a bot developer gains confidence with using a LlamaIndex to explore LLM integration, they can scale the Amazon Lex capability further. They can also use enterprise search services such as Amazon Kendra, which is natively integrated with Amazon Lex.

In this solution, we showcase the practical application of an Amazon Lex chatbot with LLM-based RAG enhancement. We use the Zappos customer support use case as an example to demonstrate the effectiveness of this solution, which takes the user through an enhanced FAQ experience (with LLM), rather than directing them to fallback (default, without LLM).

Solution overview

RAG combines the strengths of traditional retrieval-based and generative AI based approaches to Q&A systems. This methodology harnesses the power of large language models, such as Amazon Titan or open-source models (for example, Falcon), to perform generative tasks in retrieval systems. It also takes into account the semantic context from stored documents more effectively and efficiently.

RAG starts with an initial retrieval step to retrieve relevant documents from a collection based on the user’s query. It then employs a language model to generate a response by considering both the retrieved documents and the original query. By integrating RAG into Amazon Lex, we can provide accurate and comprehensive answers to user queries, resulting in a more engaging and satisfying user experience.

The RAG approach requires document ingestion so that embeddings can be created to enable LLM-based search. The following diagram shows how the ingestion process creates the embeddings that are then used by the chatbot during fallback to answer the customer’s question.

With this solution architecture, you should choose the most suitable LLM for your use case. It also provides an inference endpoint choice between Amazon Bedrock (in limited preview) and models hosted on Amazon SageMaker JumpStart, offering additional LLM flexibility.

The document is uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket has an event listener attached that invokes an AWS Lambda function on changes to the bucket. The event listener ingests the new document and places the embeddings in another S3 bucket. The embeddings are then used by the RAG implementation in the Amazon Lex bot during the fallback intent to answer the customer’s question. The next diagram shows the architecture of how an FAQ bot within Lex can be enhanced with LLMs and RAG.

Let’s explore how we can integrate RAG based on LlamaIndex into an Amazon Lex bot. We provide code examples and an AWS Cloud Development Kit (AWS CDK) import to assist you in setting up the integration. You can find the code examples in our GitHub repository. The following sections provide a step-by-step guide to help you set up the environment and deploy the necessary resources.

How RAG works with Amazon Lex

The flow of RAG involves an iterative process where the retriever component retrieves relevant passages, the question and passages help construct the prompt, and the generation component produces a response. This combination of retrieval and generation techniques allows the RAG model to take advantage of the strengths of both approaches, providing accurate and contextually appropriate answers to user questions. The workflow provides the following capabilities:

  • Retriever engine – The RAG model begins with a retriever component responsible for retrieving relevant documents from a large corpus. This component typically uses an information retrieval technique like TF-IDF or BM25 to rank and select documents that are likely to contain the answer to a given question. The retriever scans the document corpus and retrieves a set of relevant passages.
  • Prompt helper – After the retriever has identified the relevant passages, the RAG model moves to prompt creation. The prompt is a combination of the question and the retrieved passages, serving as additional context for the prompt, which is used as input to the generator component. To create the prompt, the model typically augments the question with the selected passages in a specific format.
  • Response generation – The prompt, consisting of the question and relevant passages, is fed into the generation component of the RAG model. The generation component is usually a language model capable of reasoning through the prompt to generate a coherent and relevant response.
  • Final response – Finally, the RAG model selects the highest-ranked answer as the output and presents it as the response to the original question. The selected answer can be further postprocessed or formatted as necessary before being returned to the user. In addition, the solution enables the filtering of the generated response if the retrieval results yields a low confidence score, implying that it likely falls outside the distribution (OOD).

LlamaIndex: An open-source data framework for LLM-based applications

In this post, we demonstrate the RAG solution based on LlamaIndex. LlamaIndex is an open-source data framework specifically designed to facilitate LLM-based applications. It offers a robust and scalable solution for managing document collection in different formats. With LlamaIndex, bot developers are empowered to effortlessly integrate LLM-based QA (question answering) capabilities into their applications, eliminating the complexities associated with managing solutions catered to large-scale document collections. Furthermore, this approach proves to be cost-effective for smaller-sized document repositories.

Prerequisites

You should have the following prerequisites:

Set up your development environment

The main third-party package requirements are llama_index and sagemaker sdk. Follow the specified commands in our GitHub repository’s README to set up your environment properly.

Deploy the required resources

This step involves creating an Amazon Lex bot, S3 buckets, and a SageMaker endpoint. Additionally, you need to Dockerize the code in the Docker image directory and push the images to Amazon Elastic Container Registry (Amazon ECR) so that it can run in Lambda. Follow the specified commands in our GitHub repository’s README to deploy the services.

During this step, we demonstrate LLM hosting via SageMaker Deep Learning Containers. Adjust the settings according to your computation needs:

  • Model – To find a model that meets your requirements, you can explore resources like the Hugging Face model hub. It offers a variety of models such as Falcon 7B or Flan-T5-XXL. Additionally, you can find detailed information about various officially supported model architectures, helping you make an informed decision. For more information about different model types, refer to optimized architectures.
  • Model inference endpoint – Define the path of the model (for example, Falcon 7B), choose your instance type (for example, g5.4xlarge), and use quantization (for example, int-8 quantization).Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.

Set up your document index via LlamaIndex

To set up your document index, first upload your document data. We assume that you have the source of your FAQ content, such as a PDF or text file.

After the document data is uploaded, the LlamaIndex system will automatically initiate the process of creating the document index. This task is performed by a Lambda function, which generates the index and saves it to an S3 bucket.

To enable efficient retrieval of relevant information, configure the document retriever using the LlamaIndex Retriever Query Engine. This engine offers several customization options, such as the following:

  • Embedding models – You can choose your embedding model, such as Hugging Face embedding.
  • Confidence cutoff – Specify a confidence cutoff threshold to determine the quality of retrieval results. If the confidence score falls below this threshold, you can choose to provide out-of-scope responses, indicating that the query is beyond the scope of the indexed documents.

Test the integration

Define your bot definition with a fallback intent and use the Amazon Lex console to test your FAQ requests. For more details, please refer to GitHub repository. The following screenshot shows an example conversation with the bot.

Tips to boost your bot efficiency

The following tips could potentially further improve the efficiency of your bot:

  • Index storage – Store your index in an S3 bucket or a service with vector database capabilities such as Amazon OpenSearch. By utilizing cloud-based storage solutions, you can enhance the accessibility and scalability of your index, leading to faster retrieval times and improved overall performance. Also, Refer to this blog post for an Amazon Lex bot that utilizes an Amazon Kendra search solution.
  • Retrieval optimization – Experiment with different sizes of embedding models for the retriever. The choice of embedding model can significantly impact the input requirements of your LLM. Finding the optimal balance between model size and retrieval performance can result in improved efficiency and faster response times.
  • Prompt engineering – Experiment with different prompt formats, lengths, and styles to optimize the performance and quality of your bot’s answers.
  • LLM model selection – Select the most suitable LLM model for your specific use case. Consider factors such as model size, language capabilities, and compatibility with your application requirements. Choosing the right LLM model ensures optimal performance and efficient utilization of system resources.

Contact center conversations can span from self-service to a live human interaction. For use cases involving human-to-human interactions over Amazon Connect, you can use Wisdom to search and find content across multiple repositories, such as frequently asked questions (FAQs), wikis, articles, and step-by-step instructions for handling different customer issues.

Clean up

To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the SageMaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy in the same directory as the other cdk commands to deprovision all the resources in your stack.

Summary

This post discussed the following steps to enhance Amazon Lex with LLM-based QA features using the RAG strategy and LlamaIndex:

  • Install the necessary dependencies, including LlamaIndex libraries
  • Set up model hosting via Amazon SageMaker or Amazon Bedrock (in limited preview)
  • Configure LlamaIndex by creating an index and populating it with relevant documents
  • Integrate RAG into Amazon Lex by modifying the configuration and configuring RAG to use LlamaIndex for document retrieval
  • Test the integration by engaging in conversations with the chatbot and observing its retrieval and generation of accurate responses

By following these steps, you can seamlessly incorporate powerful LLM-based QA capabilities and efficient document indexing into your Amazon Lex chatbot, resulting in more accurate, comprehensive, and contextually aware interactions with users. As a follow up, we also invite you to review our next blog post, which explores enhancing the Amazon Lex FAQ experience using URL ingestion and LLMs.


About the authors

Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.

Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.

Saket Saurabh is an engineer with AWS Lex team. He works on improving Lex developer experience to help developers build more human-like chat bots. Outside of work, he enjoys traveling, discovering diverse cuisines, and learn about different cultures.

f

Read More

Enhance Amazon Lex with LLMs and improve the FAQ experience using URL ingestion

Enhance Amazon Lex with LLMs and improve the FAQ experience using URL ingestion

In today’s digital world, most consumers would rather find answers to their customer service questions on their own rather than taking the time to reach out to businesses and/or service providers. This blog post explores an innovative solution to build a question and answer chatbot in Amazon Lex that uses existing FAQs from your website. This AI-powered tool can provide quick, accurate responses to real-world inquiries, allowing the customer to quickly and easily solve common problems independently.

Single URL ingestion

Many enterprises have a published set of answers for FAQs for their customers available on their website. In this case, we want to offer customers a chatbot that can answer their questions from our published FAQs. In the blog post titled Enhance Amazon Lex with conversational FAQ features using LLMs, we demonstrated how you can use a combination of Amazon Lex and LlamaIndex to build a chatbot powered by your existing knowledge sources, such as PDF or Word documents. To support a simple FAQ, based on a website of FAQs, we need to create an ingestion process that can crawl the website and create embeddings that can be used by LlamaIndex to answer customer questions. In this case, we will build on the bot created in the previous blog post, which queries those embeddings with a user’s utterance and returns the answer from the website FAQs.

The following diagram shows how the ingestion process and the Amazon Lex bot work together for our solution.

In the solution workflow, the website with FAQs is ingested via AWS Lambda. This Lambda function crawls the website and stores the resulting text in an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket then triggers a Lambda function that uses LlamaIndex to create embeddings that are stored in Amazon S3. When a question from an end-user arrives, such as “What is your return policy?”, the Amazon Lex bot uses its Lambda function to query the embeddings using a RAG-based approach with LlamaIndex. For more information about this approach and the pre-requisites, refer to the blog post, Enhance Amazon Lex with conversational FAQ features using LLMs.

After the pre-requisites from the aforementioned blog are complete, the first step is to ingest the FAQs into a document repository that can be vectorized and indexed by LlamaIndex. The following code shows how to accomplish this:

import logging
import sys
import requests
import html2text
from llama_index.readers.schema.base import Document
from llama_index import GPTVectorStoreIndex
from typing import List

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


class EZWebLoader:

def __init__(self, default_header: str = None):
self._html_to_text_parser = html2text()
if default_header is None:
self._default_header = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
else:
self._default_header = default_header

def load_data(self, urls: List[str], headers: str = None) -> List[Document]:
if headers is None:
headers = self._default_header

documents = []
for url in urls:
response = requests.get(url, headers=headers).text
response = self._html2text.html2text(response)
documents.append(Document(response))
return documents

url = "http://www.zappos.com/general-questions"
loader = EZWebLoader()
documents = loader.load_data([url])
index = GPTVectorStoreIndex.from_documents(documents)

In the preceding example, we take a predefined FAQ website URL from Zappos and ingest it using the EZWebLoader class. With this class, we have navigated to the URL and loaded all the questions that are in the page into an index. We can now ask a question like “Does Zappos have gift cards?” and get the answers directly from our FAQs on the website. The following screenshot shows the Amazon Lex bot test console answering that question from the FAQs.

We were able to achieve this because we had crawled the URL in the first step and created embedddings that LlamaIndex could use to search for the answer to our question. Our bot’s Lambda function shows how this search is run whenever the fallback intent is returned:

import time
import json
import os
import logging
import boto3
from llama_index import StorageContext, load_index_from_storage


logger = logging.getLogger()
logger.setLevel(logging.DEBUG)


def download_docstore():
# Create an S3 client
s3 = boto3.client('s3')

# List all objects in the S3 bucket and download each one
try:
bucket_name = 'faq-bot-storage-001'
s3_response = s3.list_objects_v2(Bucket=bucket_name)

if 'Contents' in s3_response:
for item in s3_response['Contents']:
file_name = item['Key']
logger.debug("Downloading to /tmp/" + file_name)
s3.download_file(bucket_name, file_name, '/tmp/' + file_name)

logger.debug('All files downloaded from S3 and written to local filesystem.')

except Exception as e:
logger.error(e)
raise e

#download the doc store locally
download_docstore()

storage_context = StorageContext.from_defaults(persist_dir="/tmp/")
# load index
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()


def lambda_handler(event, context):
"""
Route the incoming request based on intent.
The JSON body of the request is provided in the event slot.
"""
# By default, treat the user request as coming from the America/New_York time zone.
os.environ['TZ'] = 'America/New_York'
time.tzset()
logger.debug("===== START LEX FULFILLMENT ====")
logger.debug(event)
slots = {}
if "currentIntent" in event and "slots" in event["currentIntent"]:
slots = event["currentIntent"]["slots"]
intent = event["sessionState"]["intent"]

dialogaction = {"type": "Delegate"}
message = []
if str.lower(intent["name"]) == "fallbackintent":
#execute query from the input given by the user
response = str.strip(query_engine.query(event["inputTranscript"]).response)
dialogaction["type"] = "Close"
message.append({'content': f'{response}', 'contentType': 'PlainText'})

final_response = {
"sessionState": {
"dialogAction": dialogaction,
"intent": intent
},
"messages": message
}

logger.debug(json.dumps(final_response, indent=1))
logger.debug("===== END LEX FULFILLMENT ====")

return final_response

This solution works well when a single webpage has all the answers. However, most FAQ sites are not built on a single page. For instance, in our Zappos example, if we ask the question “Do you have a price matching policy?”, then we get a less-than-satisfactory answer, as shown in the following screenshot.

In the preceding interaction, the price-matching policy answer isn’t helpful for our user. This answer is short because the FAQ referenced is a link to a specific page about the price matching policy and our web crawl was only for the single page. Achieving better answers will mean crawling these links as well. The next section shows how to get answers to questions that require two or more levels of page depth.

N-level crawling

When we crawl a web page for FAQ knowledge, the information we want can be contained in linked pages. For example, in our Zappos example, we ask the question “Do you have a price matching policy?” and the answer is “Yes please visit <link> to learn more.” If someone asks “What is your price matching policy?” then we want to give a complete answer with the policy. Achieving this means we have the need to traverse links to get the actual information for our end-user. During the ingestion process, we can use our web loader to find the anchor links to other HTML pages and then traverse them. The following code change to our web crawler allows us to find links in the pages we crawl. It also includes some additional logic to avoid circular crawling and allow a filter by a prefix.

import logging
import requests
import html2text
from llama_index.readers.schema.base import Document
from typing import List
import re


def find_http_urls_in_parentheses(s: str, prefix: str = None):
pattern = r'((https?://[^)]+))'
urls = re.findall(pattern, s)

matched = []
if prefix is not None:
for url in urls:
if str(url).startswith(prefix):
matched.append(url)
else:
matched = urls

return list(set(matched)) # remove duplicates by converting to set, then convert back to list



class EZWebLoader:

def __init__(self, default_header: str = None):
self._html_to_text_parser = html2text
if default_header is None:
self._default_header = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
else:
self._default_header = default_header

def load_data(self,
urls: List[str],
num_levels: int = 0,
level_prefix: str = None,
headers: str = None) -> List[Document]:

logging.info(f"Number of urls: {len(urls)}.")

if headers is None:
headers = self._default_header

documents = []
visited = {}
for url in urls:
q = [url]
depth = num_levels
for page in q:
if page not in visited: #prevent cycles by checking to see if we already crawled a link
logging.info(f"Crawling {page}")
visited[page] = True #add entry to visited to prevent re-crawling pages
response = requests.get(page, headers=headers).text
response = self._html_to_text_parser.html2text(response) #reduce html to text
documents.append(Document(response))
if depth > 0:
#crawl linked pages
ingest_urls = find_http_urls_in_parentheses(response, level_prefix)
logging.info(f"Found {len(ingest_urls)} pages to crawl.")
q.extend(ingest_urls)
depth -= 1 #reduce the depth counter so we go only num_levels deep in our crawl
else:
logging.info(f"Skipping {page} as it has already been crawled")
logging.info(f"Number of documents: {len(documents)}.")
return documents

url = "http://www.zappos.com/general-questions"
loader = EZWebLoader()
#crawl the site with 1 level depth and prefix of "/c/" for customer service root
documents = loader.load_data([url] 
num_levels=1, level_prefix="https://www.zappos.com/c/")
index = GPTVectorStoreIndex.from_documents(documents)

In the preceding code, we introduce the ability to crawl N levels deep, and we give a prefix that allows us to restrict crawling to only things that begin with a certain URL pattern. In our Zappos example, the customer service pages all are rooted from zappos.com/c, so we include that as a prefix to limit our crawls to a smaller and more relevant subset. The code shows how we can ingest up to two levels deep. Our bot’s Lambda logic remains the same because nothing has changed except the crawler ingests more documents.

We now have all the documents indexed and we can ask a more detailed question. In the following screenshot, our bot provides the correct answer to the question “Do you have a price matching policy?”

We now have a complete answer to our question about price matching. Instead of simply being told “Yes see our policy,” it gives us the details from the second-level crawl.

Clean up

To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the Sagemaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy in the same directory as the other cdk commands to deprovision all the resources in your stack.

Conclusion

The ability to ingest a set of FAQs into a chatbot enables your customers to find the answers to their questions with straightforward, natural language queries. By combining the built-in support in Amazon Lex for fallback handling with a RAG solution such as a LlamaIndex, we can provide a quick path for our customers to get satisfying, curated, and approved answers to FAQs. By applying N-level crawling into our solution, we can allow for answers that could possibly span multiple FAQ links and provide deeper answers to our customer’s queries. By following these steps, you can seamlessly incorporate powerful LLM-based Q and A capabilities and efficient URL ingestion into your Amazon Lex chatbot. This results in more accurate, comprehensive, and contextually aware interactions with users.


About the authors

Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.

Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.

John Baker is a Principal SDE at AWS where he works on Natural Language Processing, Large Language Models and other ML/AI related projects. He has been with Amazon for 9+ years and has worked across AWS, Alexa and Amazon.com. In his spare time, John enjoys skiing and other outdoor activities throughout the Pacific Northwest.

Read More

Build an email spam detector using Amazon SageMaker

Build an email spam detector using Amazon SageMaker

Spam emails, also known as junk mail, are sent to a large number of users at once and often contain scams, phishing content, or cryptic messages. Spam emails are sometimes sent manually by a human, but most often they are sent using a bot. Examples of spam emails include fake ads, chain emails, and impersonation attempts. There is a risk that a particularly well-disguised spam email may land in your inbox, which can be dangerous if clicked on. It’s important to take extra precautions to protect your device and sensitive information.

As technology is improving, the detection of spam emails becomes a challenging task due to its changing nature. Spam is quite different from other types of security threats. It may at first appear like an annoying message and not a threat, but it has an immediate effect. Also spammers often adapt new techniques. Organizations who provide email services want to minimize spam as much as possible to avoid any damage to their end customers.

In this post, we show how straightforward it is to build an email spam detector using Amazon SageMaker. The built-in BlazingText algorithm offers optimized implementations of Word2vec and text classification algorithms. Word2vec is useful for various natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, and machine translation. Text classification is essential for applications like web searches, information retrieval, ranking, and document classification.

Solution overview

This post demonstrates how you can set up email spam detector and filter spam emails using SageMaker. Let’s see how a spam detector typically works, as shown in the following diagram.

Emails are sent through a spam detector. An email is sent to the spam folder if the spam detector detects it as spam. Otherwise, it’s sent to the customer’s inbox.

We walk you through the following steps to set up our spam detector model:

  1. Download the sample dataset from the GitHub repo.
  2. Load the data in an Amazon SageMaker Studio notebook.
  3. Prepare the data for the model.
  4. Train, deploy, and test the model.

Prerequisites

Before diving into this use case, complete the following prerequisites:

  1. Set up an AWS account.
  2. Set up a SageMaker domain.
  3. Create an Amazon Simple Storage Service (Amazon S3) bucket. For instructions, see Create your first S3 bucket.

Download the dataset

Download the email_dataset.csv from GitHub and upload the file to the S3 bucket.

The BlazingText algorithm expects a single preprocessed text file with space-separated tokens. Each line in the file should contain a single sentence. If you need to train on multiple text files, concatenate them into one file and upload the file in the respective channel.

Load the data in SageMaker Studio

To perform the data load, complete the following steps:

  1. Download the spam_detector.ipynb file from GitHub and upload the file in SageMaker Studio.
  2. In your Studio notebook, open the spam_detector.ipynb notebook.
  3. If you are prompted to choose a Kernel, choose the Python 3 (Data Science 3.0) kernel and choose Select. If not, verify that the right kernel has been automatically selected.

  1. Import the required Python library and set the roles and the S3 buckets. Specify the S3 bucket and prefix where you uploaded email_dataset.csv.

  1. Run the data load step in the notebook.

  1. Check if the dataset is balanced or not based on the Category labels.

We can see our dataset is balanced.

Prepare the data

The BlazingText algorithm expects the data in the following format:

__label__<label> "<features>"

Here’s an example:

__label__0 “This is HAM"
__label__1 "This is SPAM"

Check Training and Validation Data Format for the BlazingText Algorithm.

You now run the data preparation step in the notebook.

  1. First, you need to convert the Category column to an integer. The following cell replaces the SPAM value with 1 and the HAM value with 0.

  1. The next cell adds the prefix __label__ to each Category value and tokenizes the Message column.

  1. The next step is to split the dataset into train and validation datasets and upload the files to the S3 bucket.

Train the model

To train the model, complete the following steps in the notebook:

  1. Set up the BlazingText estimator and create an estimator instance passing the container image.

  1. Set the learning mode hyperparameter to supervised.

BlazingText has both unsupervised and supervised learning modes. Our use case is text classification, which is supervised learning.

  1. Create the train and validation data channels.

  1. Start training the model.

  1. Get the accuracy of the train and validation dataset.

Deploy the model

In this step, we deploy the trained model as an endpoint. Choose your preferred instance

Test the model

Let’s provide an example of three email messages that we want to get predictions for:

  • Click on below link, provide your details and win this award
  • Best summer deal here
  • See you in the office on Friday.

Tokenize the email message and specify the payload to use when calling the REST API.

Now we can predict the email classification for each email. Call the predict method of the text classifier, passing the tokenized sentence instances (payload) into the data argument.

Clean up

Finally , you can delete the endpoint to avoid any unexpected cost.

Also, delete the data file from S3 bucket.

Conclusion

In this post, we walked you through the steps to create an email spam detector using the SageMaker BlazingText algorithm. With the BlazingText algorithm, you can scale to large datasets. BlazingText is used for textual analysis and text classification problems, and has both unsupervised and supervised learning modes. You can use the algorithm for use cases like customer sentiment analysis and text classification.

To learn more about the BlazingText algorithm, check out BlazingText algorithm.


About the Author

Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.

Read More

Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart

Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart

Today, we are excited to announce that Llama 2 foundation models developed by Meta are available for customers through Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases. You can easily try out these models and use them with SageMaker JumpStart, which is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML.

In this post, we walk through how to use Llama 2 models via SageMaker JumpStart.

What is Llama 2

Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. Llama 2 is intended for commercial and research use in English. It comes in a range of parameter sizes—7 billion, 13 billion, and 70 billion—as well as pre-trained and fine-tuned variations. According to Meta, the tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. Llama 2 was pre-trained on 2 trillion tokens of data from publicly available sources. The tuned models are intended for assistant-like chat, whereas pre-trained models can be adapted for a variety of natural language generation tasks. Regardless of which version of the model a developer uses, the responsible use guide from Meta can assist in guiding additional fine-tuning that may be necessary to customize and optimize the models with appropriate safety mitigations.

What is SageMaker JumpStart

With SageMaker JumpStart, ML practitioners can choose from a broad selection of open source foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment.

You can now discover and deploy Llama 2 with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security. Llama 2 models are available today in Amazon SageMaker Studio, initially in us-east 1 and us-west 2 regions.

Discover models

You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

Once you’re on the SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find two flagship Llama 2 models in the Foundation Models: Text Generation carousel. If you don’t see Llama 2 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Apps.

You can also find other four model variants by choosing Explore all Text Generation Models or searching for llama in the search box.

You can choose the model card to view details about the model such as license, data used to train, and how to use. You can also find two buttons, Deploy and Open Notebook, which help you use the model.

When you choose either button, a pop-up will show the end-user license agreement and acceptable use policy for you to acknowledge.

Upon acknowledging, you will proceed to the next step to use the model.

Deploy a model

When you choose Deploy and acknowledge the terms, model deployment will start. Alternatively, you can deploy through the example notebook that shows up by choosing Open Notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using a notebook, we start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel
my_model = JumpStartModel(model_id = "meta-textgeneration-llama-2-70b-f")
predictor = my_model.deploy()

This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {
    “inputs”:  
      [
        [
         {"role": "system", "content": "Always answer with Haiku"},
         {"role": "user", "content": "I am going to Paris, what should I see?"},
        ]   
      ],
   "parameters":{"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}

Fine-tuned chat models (Llama-2-7b-chat, Llama-2-13b-chat, Llama-2-70b-chat) accept a history of chat between the user and the chat assistant, and generate the subsequent chat. The pre-trained models (Llama-2-7b, Llama-2-13b, Llama-2-70b) requires a string prompt and perform text completion on the provided prompt. See the following code:

predictor.predict(payload, custom_attributes="accept_eula=true")

Note that by default, accept_eula is set to false. You need to set accept_eula=true to invoke the endpoint successfully. By doing so, you accept the user license agreement and acceptable use policy as mentioned earlier. You can also download the license agreement.

Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by = and pairs are separated by ;. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if accept_eula=false; accept_eula=true is passed to the server, then  accept_eula=true is kept and passed to the script handler.

Inference parameters control the text generation process at the endpoint. The maximum new tokens control refers to the size of the output generated by the model. Note that this is not the same as the number of words because the vocabulary of the model is not the same as the English language vocabulary, and each token may not be an English language word. Temperature controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.

The following table lists all the Llama models available in SageMaker JumpStart along with the model_ids, default instance types, and the maximum number of total tokens (sum of number of input tokens and number of generated tokens) supported for each of these models.

Model Name Model ID Max Total Tokens Default Instance Type
Llama-2-7b meta-textgeneration-llama-2-7b 4096 ml.g5.2xlarge
Llama-2-7b-chat meta-textgeneration-llama-2-7b-f 4096 ml.g5.2xlarge
Llama-2-13b meta-textgeneration-llama-2-13b 4096 ml.g5.12xlarge
Llama-2-13b-chat meta-textgeneration-llama-2-13b-f 4096 ml.g5.12xlarge
Llama-2-70b meta-textgeneration-llama-2-70b 4096 ml.g5.48xlarge
Llama-2-70b-chat meta-textgeneration-llama-2-70b-f 4096 ml.g5.48xlarge

Note that SageMaker endpoints have a timeout limit of 60s. Thus, even though the model may be able to generate 4096 tokens, if text generation takes more than 60s, request will fail. For 7B, 13B, and 70B models, we recommend to set max_new_tokens no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.

Inference and example prompts for Llama-2-70b

You can use Llama models for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as answering questions, language translation, sentiment analysis, and many more. Input payload to the endpoint looks like the following code:

payload = {
    “inputs”: str,
   (optional)"parameters":{"max_new_tokens":int, "top_p":float, "temperature":float}
}

The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}.

Input: Simply put, the theory of relativity states that  

 Output: 
1) the laws of physics are the same for all observers in uniform motion relative to one another, and 2) the speed of light is the same for all observers, regardless of their relative motion or of the motion of the source of the light.
In the late 1800s, scientists believed that light travelled through a medium they called the luminiferous ether. But in 1887, Albert Michelson and Edward Morley designed an experiment to measure the speed of light in the ether, and found that the speed of light was the same no matter how they oriented their equipment. This result seemed to contradict the idea of the ether, and Michelson and Morley’s experiment became one of the most famous failed experiments in history.
In 1905, Albert Einstein published a paper that used the results of the Michelson-Morley experiment to develop the theory of special relativity. In 1915, Einstein published a paper on general relativity, which is a theory of gravitation.
In physics, the speed of light is considered to be a constant, and its value is 299,792,458 metres

Input: I believe the meaning of life is 

 Output: 
to be happy, to find joy in life, to love and be loved. To have the courage to be yourself and to be kind to others. To be able to express yourself, to create and to be inspired. To be able to give and receive. To be able to learn and grow. To be able to find beauty in the world and to be able to appreciate it. To be able to find peace and contentment. To be able to find meaning and purpose. To be able to make a difference in the world. To be able to find joy in the simple things. To be able to find joy in the difficult things. To be able to find joy in the journey. To be able to find joy in the destination. To be able to find joy in the present moment. To be able to find joy in the future. To be able to find joy in the past. To be able to find joy in the present. To be able to find joy in the future. To be able to find joy in the past. To be able to find joy in the present. To be able to find joy in the future. To be able to find joy in the past. To be able to find joy in the present. To be able to find joy

Input: A brief message congratulating the team on the launch:

        Hi everyone,
        
        I just  

 Output: 
wanted to send a quick note to congratulate everyone on the launch of our new website. I know it’s been a long time coming and I’m sure we’re all relieved that it’s finally out there.
        
        Thanks to everyone for their hard work and dedication over the last few months. I know it wasn’t always easy, but I think we can all be proud of the final product.
        
        I’m looking forward to seeing how our users respond to the new site and I’m confident that it will be a success.
        
        Thanks again, and enjoy the rest of your day!
        
        Best,
        
        [Your Name]
        
        [Your Title]
        
        [Your Company]
        
        [Your Email]
        
        [Your Phone Number]
        
        [Your Website]
        
        [Your Social Media Handles]
        
        [Your Address]
        
        [Your City, State, Zip Code]
        
        [Your Country]
        
        [Your Postal Code]
        
        [Your Website]
        
        [

In the next example, we show how to use Llama models with few-shot in-context learning, where we provide training samples available to the model. Note that we only make inference on the deployed model and during this process, model weights don’t change.

Input: Translate English to French:
        
        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        cheese => 

 Output: 
fromage
        chocolate => chocolat
        car => voiture
        hamster => hamster
        beer => bière
        bicycle => bicyclette
        banana => banane
        strawberry => fraise
        pizza => pizza
        cheeseburger => cheeseburger
        pineapple => ananas
        carrot => carotte
        watermelon => pastèque
        whale => baleine
        eggplant => aubergine
        cat => chat
        penguin => manchot
        lion => lion
        elephant => éléphant
        panda => panda
        tiger => tigre
        salmon => saumon
        squirrel => écureuil
        kangaroo => kangourou
        turtle => tortue
        octopus => poulpe
        frog => grenouille
        pig => cochon
        goldfish => poisson rouge
        tiger shark => requin tigre
        gorilla => gorille
        dol

Inference and example prompts for Llama-2-70b-chat

With Llama-2-Chat models, which are optimized for dialogue use cases, the input to the chat model endpoints is the previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas that define the chat assistant’s behavior. The input payload to the endpoint looks like the following code:

payload = {
    “inputs”:  [
     [
      (optional){"role": "system", "content": str},
      (repeated k times) {"role": "user", "content": str}, {"role": "assistant", "content": str},
      {"role": "user", "content": str},
     ]
    ],
   (optional)"parameters":{"max_new_tokens":int, "top_p":float, "temperature":float}
}

The following are some sample example prompts and the text generated by the model. All outputs are generated with the inference parameters {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}.

Input:
User: what is the recipe of mayonnaise?

Output:
Assistant:  Sure, I can help you with that! Mayonnaise is a thick, creamy condiment made from a mixture of egg yolks, oil, vinegar or lemon juice, and seasonings. Here's a basic recipe for homemade mayonnaise:

Ingredients:

* 2 egg yolks
* 1/2 cup (120 ml) neutral-tasting oil, such as canola or grapeseed
* 1 tablespoon (15 ml) vinegar or lemon juice
* 1/2 teaspoon salt
* 1/4 teaspoon black pepper

Instructions:

1. In a medium-sized bowl, whisk together the egg yolks, salt, and black pepper until well combined.
2. Slowly pour the oil into the bowl while continuously whisking the mixture. It's important to add the oil very slowly, otherwise the mixture can break and you'll end up with a separated, curdled mess.
3. Continue whisking until the mixture thickens and emulsifies, which should take about 5-7 minutes. You'll know it's ready when the mixture has doubled in volume and has a thick, creamy consistency.
4. Add the vinegar or lemon juice and continue whisking for another minute or two, until the flavors are fully incorporated.
5. Taste the mayonnaise and adjust the seasoning as needed. You can add more salt, pepper, or acid if desired.
6. Cover the bowl with plastic wrap and refrigerate the mayonnaise for at least 30 minutes before serving. This will allow the flavors to meld together and the mayonnaise to thicken further.

That's it! This recipe makes about 1 1/2 cups (360 ml) of mayonnaise, which should be enough for a small to medium-sized batch of sandwiches or salads. You can also customize the recipe by adding different flavorings, such as garlic, mustard, or herbs, to create a unique taste. Enjoy!

In the following example, the user has had a conversation with the assistant about tourist sites in Paris. Next, the user is inquiring about the first option recommended by the chat assistant.

Input:
User: I am going to Paris, what should I see?
Assistant: Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:

1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.
2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.
3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.

These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.
User: What is so great about #1?

Output:
Assistant:  The Eiffel Tower is a world-renowned iconic landmark that symbolizes Paris and France. It was built for the World's Fair in 1889 and was initially intended to be a temporary structure, but it has become a permanent part of the Parisian skyline. Here are some reasons why the Eiffel Tower is so great:

1. Engineering marvel: The Eiffel Tower was a groundbreaking feat of engineering at the time of its construction. It stands at 324 meters (1,063 feet) tall and was the tallest structure in the world for over 40 years.
2. Iconic design: The Eiffel Tower's distinctive shape and design make it instantly recognizable. It has been the subject of countless works of art, photographs, and films, and has become a symbol of Paris and France.
3. Panoramic views: The Eiffel Tower offers breathtaking views of the city from its observation decks. Visitors can enjoy a 360-degree view of Paris, including famous landmarks such as the Seine River, the Arc de Triomphe, and the Montmartre neighborhood.
4. Historical significance: The Eiffel Tower has played a significant role in Paris's history, having been the site of numerous important events, including the famous "Battle of the Eiffel Tower" during World War II.
5. Cultural impact: The Eiffel Tower has had a significant impact on popular culture, inspiring countless works of art, literature, and music. It has also been the subject of numerous documentaries and films, including the classic movie "An American in Paris."
6. Romantic setting: The Eiffel Tower is known for its romantic atmosphere, making it a popular destination for couples and newlyweds. It has been the site of countless marriage proposals and has been featured in numerous romantic films and literature.
7. Accessibility: The Eiffel Tower is easily accessible by public transportation, making it a convenient destination for visitors. It is also wheelchair-friendly, with elevators and ramps providing access to all levels.

Overall, the Eiffel Tower is a must-see destination for anyone visiting Paris, offering a unique and unforgettable experience that combines history, culture, engineering, and rom

In the following examples, we set the system’s configuration:

Input:
System: Always answer with Haiku
User: I am going to Paris, what should I see?

Output:
Assistant:  Eiffel Tower high
Art in the Louvre Museum
River Seine flows

Input:
System: Always answer with emojis
User: How to go from Beijing to NY?

Output:
Assistant:  Here are the emojis that represent the steps to go from Beijing to New York:

🇨🇳🛫🛬🇺🇸🗽🏙🚕💨🛩🛬🛫😴💤🛩🛬🛫😍

Clean up

After you’re done running the notebook, make sure to delete all resources so that all the resources that you created in the process are deleted and your billing is stopped:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Llama 2 models in SageMaker Studio. With this, you have access to six Llama 2 foundation models that contain billions of parameters. Because foundation models are pre-trained, they can also help lower training and infrastructure costs and enable customization for your use case. To get started with SageMaker JumpStart, visit the following resources:


About the authors

June Won is a product manager with SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping application and last mile delivery.

Vivek MadanDr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Sundar Ranganathan is the Global Head of GenAI/Frameworks GTM Specialists at AWS. He focuses on developing GTM strategy for large language models, GenAI, and large-scale ML workloads across AWS services like Amazon EC2, EKS, EFA, AWS Batch, and Amazon SageMaker. His experience includes leadership roles in product management and product development at NetApp, Micron Technology, Qualcomm, and Mentor Graphics.

Read More

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering

With cloud computing, as compute power and data became more available, machine learning (ML) is now making an impact across every industry and is a core part of every business and industry.

Amazon SageMaker Studio is the first fully integrated ML development environment (IDE) with a web-based visual interface. You can perform all ML development steps and have complete access, control, and visibility into each step required to build, train, and deploy models.

Amazon Redshift is a fully managed, fast, secure, and scalable cloud data warehouse. Organizations often want to use SageMaker Studio to get predictions from data stored in a data warehouse such as Amazon Redshift.

As described in the AWS Well-Architected Framework, separating workloads across accounts enables your organization to set common guardrails while isolating environments. This can be particularly useful for certain security requirements, as well as to simplify cost controls and monitoring between projects and teams. Organizations with a multi-account architecture typically have Amazon Redshift and SageMaker Studio in two separate AWS accounts. Also, Amazon Redshift and SageMaker Studio are typically configured in VPCs with private subnets to improve security and reduce the risk of unauthorized access as a best practice.

Amazon Redshift natively supports cross-account data sharing when RA3 node types are used. If you’re using any other Amazon Redshift node types, such as DS2 or DC2, you can use VPC peering to establish a cross-account connection between Amazon Redshift and SageMaker Studio.

In this post, we walk through step-by-step instructions to establish a cross-account connection to any Amazon Redshift node type (RA3, DC2, DS2) by connecting the Amazon Redshift cluster located in one AWS account to SageMaker Studio in another AWS account in the same Region using VPC peering.

Solution overview

We start with two AWS accounts: a producer account with the Amazon Redshift data warehouse, and a consumer account for Amazon SageMaker ML use cases that has SageMaker Studio set up. The following is a high-level overview of the workflow:

  1. Set up SageMaker Studio with VPCOnly mode in the consumer account. This prevents SageMaker from providing internet access to your studio notebooks. All SageMaker Studio traffic is through the specified VPC and subnets.
  2. Update your SageMaker Studio domain to turn on SourceIdentity to propagate the user profile name.
  3. Create an AWS Identity and Access Management (IAM) role in the Amazon Redshift producer account that the SageMaker Studio IAM role will assume to access Amazon Redshift.
  4. Update the SageMaker IAM execution role in the SageMaker Studio consumer account that SageMaker Studio will use to assume the role in the producer Amazon Redshift account.
  5. Set up a peering connection between VPCs in the Amazon Redshift producer account and SageMaker Studio consumer account.
  6. Query Amazon Redshift in SageMaker Studio in the consumer account.

The following diagram illustrates our solution architecture.

Prerequisites

The steps in this post assume that Amazon Redshift is launched in a private subnet in the Amazon Redshift producer account. Launching Amazon Redshift in a private subnet provides an additional layer of security and isolation compared to launching it in a public subnet because the private subnet is not directly accessible from the internet and more secure from external attacks.

To download public libraries, you must create a VPC and a private and public subnet in the SageMaker consumer account. Then launch a NAT gateway in the public subnet and add an internet gateway for SageMaker Studio in the private subnet to access the internet. For instructions on how to establish a connection to a private subnet, refer to How do I set up a NAT gateway for a private subnet in Amazon VPC?

Set up SageMaker Studio with VPCOnly mode in the consumer account

To create SageMaker Studio with VPCOnly mode, complete the following steps:

  1. On the SageMaker console, choose Studio in the navigation pane.
  2. Launch SageMaker Studio, choose Standard setup, and choose Configure.

If you’re already using AWS IAM Identity Center (successor to AWS Single Sign-On) for accessing your AWS accounts, you can use it for authentication. Otherwise, you can use IAM for authentication and use your existing federated roles.

  1. In the General settings section, select Create a new role.
  2. In the Create an IAM role section, optionally specify your Amazon Simple Storage Service (Amazon S3) buckets by selecting Any, Specific, or None, then choose Create role.

This creates a SageMaker execution role, such as AmazonSageMaker-ExecutionRole-00000000.

  1. Under Network and Storage Section, choose your VPC, subnet (private subnet), and security group that you created as a prerequisite.
  2. Select VPC Only, then choose Next.

Update your SageMaker Studio domain to turn on SourceIdentity to propagate the user profile name

SageMaker Studio is integrated with AWS CloudTrail to enable administrators to monitor and audit user activity and API calls from SageMaker Studio notebooks. You can configure SageMaker Studio to record the user identity (specifically, the user profile name) to monitor and audit user activity and API calls from SageMaker Studio notebooks in CloudTrail events.

To log specific user activity among several user profiles, we recommended that you turn on SourceIdentity to propagate the SageMaker Studio domain with the user profile name. This allows you to persist the user information into the session so you can attribute actions to a specific user. This attribute is also persisted over when you chain roles, so you can get fine-grained visibility into their actions in the producer account. As of the time this post was written, you can only configure this using the AWS Command Line Interface (AWS CLI) or any command line tool.

To update this configuration, all apps in the domain must be in the Stopped or Deleted state.

Use the following code to enable the propagation of the user profile name as the SourceIdentity:

update-domain
--domain-id <value>
[--default-user-settings <value>]
[--domain-settings-for-update "ExecutionRoleIdentityConfig=USER_PROFILE_NAME"]

This requires that you add sts:SetSourceIdentity in the trust relationship for your execution role.

Create an IAM role in the Amazon Redshift producer account that SageMaker Studio must assume to access Amazon Redshift

To create a role that SageMaker will assume to access Amazon Redshift, complete the following steps:

  1. Open the IAM console in the Amazon Redshift producer account.

  1. Choose Roles in the navigation pane, then choose Create role.

  1. On the Select trusted entity page, select Custom trust policy.
  2. Enter the following custom trust policy into the editor and provide your SageMaker consumer account ID and the SageMaker execution role that you created:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<<SageMaker-Consumer-Account-ID>>:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXX"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetSourceIdentity"
           ]
            
        }
    ]
}

  1. Choose Next.
  2. On the Add required permissions page, choose Create policy.
  3. Add the following sample policy and make necessary edits based on your configuration.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GetRedshiftCredentials",
            "Effect": "Allow",
            "Action": "redshift:GetClusterCredentials",
            "Resource": [
                "arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbname:<<redshift-cluster-name>>/<<redshift-db-name>>",
                "arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbuser:<<redshift-cluster-name>>/${redshift:DbUser}",
                "arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:cluster:<<redshift-cluster-name>>"
            ],
            "Condition": {
                "StringEquals": {
                    "redshift:DbUser": "${aws:SourceIdentity}"
                }
            }
        },
        {
            "Sid": "DynamicUserCreation",
            "Effect": "Allow",
            "Action": "redshift:CreateClusterUser",
            "Resource": "arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbuser:<<redshift-cluster-name>>/${redshift:DbUser}",
            "Condition": {
                "StringEquals": {
                    "redshift:DbUser": "${aws:SourceIdentity}"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "redshift:JoinGroup",
            "Resource": "arn:aws:redshift:<<redshift-region-name>>:<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:dbgroup:<<redshift-cluster-name>>/*"
        },
        {
            "Sid": "DataAPIPermissions",
            "Effect": "Allow",
            "Action": [
                "redshift-data:ExecuteStatement",
                "redshift-data:CancelStatement",
                "redshift-data:ListStatements",
                "redshift-data:GetStatementResult",
                "redshift-data:DescribeStatement",
                "redshift-data:ListDatabases",
                "redshift-data:ListSchemas",
                "redshift-data:ListTables",
                "redshift-data:DescribeTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ReadPermissions",
            "Effect": "Allow",
            "Action": [
                "redshift:Describe*",
                "redshift:ViewQueriesInConsole"
            ],
            "Resource": "*"
        }
    ]
}

  1. Save the policy by adding a name, such as RedshiftROAPIUserAccess.

The SourceIdentity attribute is used to tie the identity of the original SageMaker Studio user to the Amazon Redshift database user. The actions by the user in the producer account can then be monitored using CloudTrail and Amazon Redshift database audit logs.

  1. On the Name, review, and create page, enter a role name, review the settings, and choose Create role.

Update the IAM role in the SageMaker consumer account that SageMaker Studio assumes in the Amazon Redshift producer account

To update the SageMaker execution role for it to assume the role that we just created, complete the following steps:

  1. Open the IAM console in the SageMaker consumer account.
  2. Choose Roles in the navigation pane, then choose the SageMaker execution role that we created (AmazonSageMaker-ExecutionRole-*).
  3. In the Permissions policy section, on the Add permissions menu, choose Create inline policy.

  1. In the editor, on the JSON tab, enter the following policy, where <StudioRedshiftRoleARN> is the ARN of the role you created in the Amazon Redshift producer account:
{
    "Version": "2012-10-17",
    "Statement": {
        "Effect": "Allow",
        "Action": "sts:AssumeRole",
        "Resource": "<StudioRedshiftRoleARN>"
    }
}

You can get the ARN of the role created in the Amazon Redshift producer account on the IAM console, as shown in the following screenshot.

  1. Choose Review policy.
  2. For Name, enter a name for your policy.
  3. Choose Create policy.

Your permission policies should look similar to the following screenshot.

Set up a peering connection between the VPCs in the Amazon Redshift producer account and SageMaker Studio consumer account

To establish communication between the SageMaker Studio VPC and Amazon Redshift VPC, the two VPCs need to be peered using VPC peering. Complete the following steps to establish a connection:

  1. In either the Amazon Redshift or SageMaker account, open the Amazon VPC console.
  2. In the navigation pane, choose Peering connections, then choose Create peering connection.
  3. For Name, enter a name for your connection.
  4. Under Select a local VPC to peer with, choose a local VPC.
  5. Under Select another VPC to peer with, specify another VPC in the same Region and another account.
  6. Choose Create peering connection.

  1. Review the VPC peering connection and choose Accept request to activate.

After the VPC peering connection is successfully established, you create routes on both the SageMaker and Amazon Redshift VPCs to complete connectivity between them.

  1. In the SageMaker account, open the Amazon VPC console.
  2. Choose Route tables in the navigation pane, then choose the VPC that is associated with SageMaker and edit the routes.
  3. Add CIDR for the destination Amazon Redshift VPC and the target as the peering connection.
  4. Additionally, add a NAT gateway.
  5. Choose Save changes.

  1. In the Amazon Redshift account, open the Amazon VPC console.
  2. Choose Route tables in the navigation pane, then choose the VPC that is associated with Amazon Redshift and edit the routes.
  3. Add CIDR for the destination SageMaker VPC and the target as the peering connection.
  4. Additionally, add an internet gateway.
  5. Choose Save changes.

You can connect to SageMaker Studio from your VPC through an interface endpoint in your VPC instead of connecting over the internet. When you use a VPC interface endpoint, communication between your VPC and the SageMaker API or runtime is conducted entirely and securely within the AWS network.

  1. To create a VPC endpoint, in the SageMaker account, open the VPC console.
  2. Choose Endpoints in the navigation pane, then choose Create endpoint.
  3. Specify the SageMaker VPC, the respective subnets and appropriate security groups to allow inbound and outbound NFS traffic for your SageMaker notebooks domain, and choose Create VPC endpoint.

Query Amazon Redshift in SageMaker Studio in the consumer account

After all the networking has been successfully established, follow the steps in this section to connect to the Amazon Redshift cluster in the SageMaker Studio consumer account using the AWS SDK for pandas library:

  1. In SageMaker Studio, create a new notebook.
  2. If the AWS SDK for pandas package is not installed you can install it using the following:
!pip install awswrangler #AWS SDK for pandas

This installation is not persistent and will be lost if the KernelGateway App is deleted. Custom packages can be added as part of a Lifecycle Configuration.

  1. Enter the following code in the first cell and run the code. Replace RoleArn and region_name values based on your account settings:
import boto3
import awswrangler as wr
import pandas as pd
from datetime import datetime
import json
sts_client = boto3.client('sts')

# Call the assume_role method of the STSConnection object and pass the role
# ARN and a role session name.
assumed_role_object=sts_client.assume_role(
    RoleArn="arn:aws:iam::<<REDSHIFT-PRODUCER-ACCOUNT-ID>>:role/<<redshift-account-role>>",
    RoleSessionName="RedshiftSession"
)
credentials=assumed_role_object['Credentials']

# Use the temporary credentials that AssumeRole returns to make a 
# connection to Amazon S3  
redshift_session=boto3.Session(
    aws_access_key_id=credentials['AccessKeyId'],
    aws_secret_access_key=credentials['SecretAccessKey'],
    aws_session_token=credentials['SessionToken'],
    region_name="<<redshift-region-name>>",
)
  1. Enter the following code in a new cell and run the code to get the current SageMaker user profile name:
def get_userprofile_name():
    metadata_file_path = '/opt/ml/metadata/resource-metadata.json'
    with open(metadata_file_path, 'r') as logs:
        metadata = json.load(logs)
    return metadata.get("UserProfileName")
  1. Enter the following code in a new cell and run the code:
con_redshift = wr.redshift.connect_temp(
    cluster_identifier="<<redshift-cluster-name>>",
    database="<<redshift-db-name>>",
    user=get_userprofile_name(),
    auto_create=True,
    db_groups=[<<list-redshift-user-group>>],
    boto3_session = redshift_session
)

To successfully query Amazon Redshift, your database administrator needs to assign the newly created user with the required read permissions within the Amazon Redshift cluster in the producer account.

  1. Enter the following code in a new cell, update the query to match your Amazon Redshift table, and run the cell. This should return the records successfully for further data processing and analysis.
df = wr.redshift.read_sql_query(
    sql="SELECT * FROM users",
    con=con_redshift
)

You can now start building your data transformations and analysis based on your business requirements.

Clean up

To clean up any resources to avoid incurring recurring costs, delete the SageMaker VPC endpoints, Amazon Redshift cluster, and SageMaker Studio apps, users, and domain. Also delete any S3 buckets and objects you created.

Conclusion

In this post, we showed how to establish a cross-account connection between private Amazon Redshift and SageMaker Studio VPCs in different accounts using VPC peering and access Amazon Redshift data in SageMaker Studio using IAM role chaining, while also logging the user identity when the user accessed Amazon Redshift from SageMaker Studio. With this solution, you eliminate the need to manually move data between accounts to access data. We also walked through how to access the Amazon Redshift cluster using the AWS SDK for pandas library in SageMaker Studio and prepare the data for your ML use cases.

To learn more about Amazon Redshift and SageMaker, refer to the Amazon Redshift Database Developer Guide and Amazon SageMaker Documentation.


About the Authors

Supriya Puragundla is a Senior Solutions Architect at AWS. She helps key customer accounts on their AI and ML journey. She is passionate about data-driven AI and the area of depth in machine learning.

Marc Karp is a Machine Learning Architect with the Amazon SageMaker team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Read More

Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning

Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning

Recent years have shown amazing growth in deep learning neural networks (DNNs). This growth can be seen in more accurate models and even opening new possibilities with generative AI: large language models (LLMs) that synthesize natural language, text-to-image generators, and more. These increased capabilities of DNNs come with the cost of having massive models that require significant computational resources in order to be trained. Distributed training addresses this problem with two techniques: data parallelism and model parallelism. Data parallelism is used to scale the training process over multiple nodes and workers, and model parallelism splits a model and fits them over the designated infrastructure. Amazon SageMaker distributed training jobs enable you with one click (or one API call) to set up a distributed compute cluster, train a model, save the result to Amazon Simple Storage Service (Amazon S3), and shut down the cluster when complete. Furthermore, SageMaker has continuously innovated in the distributed training space by launching features like heterogeneous clusters and distributed training libraries for data parallelism and model parallelism.

Efficient training on a distributed environment requires adjusting hyperparameters. A common example of good practice when training on multiple GPUs is to multiply batch (or mini-batch) size by the GPU number in order to keep the same batch size per GPU. However, adjusting hyperparameters often impacts model convergence. Therefore, distributed training needs to balance three factors: distribution, hyperparameters, and model accuracy.

In this post, we explore the effect of distributed training on convergence and how to use Amazon SageMaker Automatic Model Tuning to fine-tune model hyperparameters for distributed training using data parallelism.

The source code mentioned in this post can be found on the GitHub repository (an m5.xlarge instance is recommended).

Scale out training from a single to distributed environment

Data parallelism is a way to scale the training process to multiple compute resources and achieve faster training time. With data parallelism, data is partitioned among the compute nodes, and each node computes the gradients based on their partition and updates the model. These updates can be done using one or multiple parameter servers in an asynchronous, one-to-many, or all-to-all fashion. Another way can be to use an AllReduce algorithm. For example, in the ring-allreduce algorithm, each node communicates with only two of its neighboring nodes, thereby reducing the overall data transfers. To learn more about parameter servers and ring-allreduce, see Launching TensorFlow distributed training easily with Horovod or Parameter Servers in Amazon SageMaker. With regards to data partitioning, if there are n compute nodes, then each node should get a subset of the data, approximately 1/n in size.

To demonstrate the effect of scaling out training on model convergence, we run two simple experiments:

Each model training ran twice: on a single instance and distributed over multiple instances. For the DNN distributed training, in order to fully utilize the distributed processors, we multiplied the mini-batch size by the number of instances (four). The following table summarizes the setup and results.

Problem type Image classification Binary classification
Model DNN XGBoost
Instance ml.c4.xlarge ml.m5.2xlarge
Data set

MNIST

(Labeled images)

Direct Marketing
(tabular, numeric and vectorized categories)
Validation metric Accuracy AUC
Epocs/Rounds 20 150
Number of Instances 1 4 1 3
Distribution type N/A Parameter server N/A AllReduce
Training time (minutes) 8 3 3 1
Final Validation score 0.97 0.11 0.78 0.63

For both models, the training time was reduced almost linearly by the distribution factor. However, model convergence suffered a significant drop. This behavior is consistent for the two different models, the different compute instances, the different distribution methods, and different data types. So, why did distributing the training process affect model accuracy?

There are a number of theories that try to explain this effect:

  • When tensor updates are big in size, traffic between workers and the parameter server can get congested. Therefore, asynchronous parameter servers will suffer significantly worse convergence due to delays in weights updates [1].
  • Increasing batch size can lead to over-fitting and poor generalization, thereby reducing the validation accuracy [2].
  • When asynchronously updating model parameters, some DNNs might not be using the most recent updated model weights; therefore, they will be calculating gradients based on weights that are a few iterations behind. This leads to weight staleness [3] and can be caused by a number of reasons.
  • Some hyperparameters are model or optimizer specific. For example, the XGBoost official documentation says that the exact value for the tree_mode hyperparameter doesn’t support distributed training because XGBoost employs row splitting data distribution whereas the exact tree method works on a sorted column format.
  • Some researchers proposed that configuring a larger mini-batch may lead to gradients with less stochasticity. This can happen when the loss function contains local minima and saddle points and no change is made to step size, to optimization getting stuck in such local minima or saddle point [4].

Optimize for distributed training

Hyperparameter optimization (HPO) is the process of searching and selecting a set of hyperparameters that are optimal for a learning algorithm. SageMaker Automatic Model Tuning (AMT) provides HPO as a managed service by running multiple training jobs on the provided dataset. SageMaker AMT searches the ranges of hyperparameters that you specify and returns the best values, as measured by a metric that you choose. You can use SageMaker AMT with the built-in algorithms or use your custom algorithms and containers.

However, optimizing for distributed training differs from common HPO because instead of launching a single instance per training job, each job actually launches a cluster of instances. This means a greater impact on cost (especially if you consider costly GPU-accelerated instances, which are typical for DNN). In addition to AMT limits, you could possibly hit SageMaker account limits for concurrent number of training instances. Finally, launching clusters can introduce operational overhead due to longer starting time. SageMaker AMT has specific features to address these issues. Hyperband with early stopping ensures that well-performing hyperparameters configurations are fine-tuned and those that underperform are automatically stopped. This enables efficient use of training time and reduces unnecessary costs. Also, SageMaker AMT fully supports the use of Amazon EC2 Spot Instances, which can optimize the cost of training up to 90% over on-demand instances. With regards to long start times, SageMaker AMT automatically reuses training instances within each tuning job, thereby reducing the average startup time of each training job by 20 times. Additionally, you should follow AMT best practices, such as choosing the relevant hyperparameters, their appropriate ranges and scales, and the best number of concurrent training jobs, and setting a random seed to reproduce results.

In the next section, we see these features in action as we configure, run, and analyze an AMT job using the XGBoost example we discussed earlier.

Configure, run, and analyze a tuning job

As mentioned earlier, the source code can be found on the GitHub repo. In Steps 1–5, we download and prepare the data, create the xgb3 estimator (the distributed XGBoost estimator is set to use three instances), run the training jobs, and observe the results. In this section, we describe how to set up the tuning job for that estimator, assuming you already went through Steps 1–5.

A tuning job computes optimal hyperparameters for the training jobs it launches by using a metric to evaluate performance. You can configure your own metric, which SageMaker will parse based on regex you configure and emit to stdout, or use the metrics of SageMaker built-in algorithms. In this example, we use the built-in XGBoost objective metric, so we don’t need to configure a regex. To optimize for model convergence, we optimize based on the validation AUC metric:

objective_metric_name="validation:auc"

We tune seven hyperparameters:

  • num_round – Number of rounds for boosting during the training.
  • eta – Step size shrinkage used in updates to prevent overfitting.
  • alpha – L1 regularization term on weights.
  • min_child_weight – Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, the building process gives up further partitioning.
  • max_depth – Maximum depth of a tree.
  • colsample_bylevel – Subsample ratio of columns for each split, in each level. This subsampling takes place once for every new depth level reached in a tree.
  • colsample_bytree – Subsample ratio of columns when constructing each tree. For every tree constructed, the subsampling occurs once.

To learn more about XGBoost hyperparameters, see XGBoost Hyperparameters. The following code shows the seven hyperparameters and their ranges:

hyperparameter_ranges = {
    "num_round": IntegerParameter(100, 200),
    "eta": ContinuousParameter(0, 1),
    "min_child_weight": ContinuousParameter(1, 10),
    "alpha": ContinuousParameter(0, 2),
    "max_depth": IntegerParameter(1, 10),
    "colsample_bylevel": ContinuousParameter(0, 1),
    "colsample_bytree": ContinuousParameter(0, 1),
}

Next, we provide the configuration for the Hyperband strategy and the tuner object configuration using the SageMaker SDK. HyperbandStrategyConfig can use two parameters: max_resource (optional) for the maximum number of iterations to be used for a training job to achieve the objective, and min_resource – the minimum number of iterations to be used by a training job before stopping the training. We use HyperbandStrategyConfig to configure StrategyConfig, which is later used by the tuning job definition. See the following code:

hsc = HyperbandStrategyConfig(max_resource=30, min_resource=1)
sc = StrategyConfig(hyperband_strategy_config=hsc)

Now we create a HyperparameterTuner object, to which we pass the following information:

  • The XGBoost estimator, set to run with three instances
  • The objective metric name and definition
  • Our hyperparameter ranges
  • Tuning resource configurations such as number of training jobs to run in total and how many training jobs can be run in parallel
  • Hyperband settings (the strategy and configuration we configured in the last step)
  • Early stopping (early_stopping_type) set to Off

Why do we set early stopping to Off? Training jobs can be stopped early when they are unlikely to improve the objective metric of the hyperparameter tuning job. This can help reduce compute time and avoid overfitting your model. However, Hyperband uses an advanced built-in mechanism to apply early stopping. Therefore, the parameter early_stopping_type must be set to Off when using the Hyperband internal early stopping feature. See the following code:

tuner = HyperparameterTuner(
    xgb3,
    objective_metric_name,
    hyperparameter_ranges,
    max_jobs=30,
    max_parallel_jobs=4,
    strategy="Hyperband",
    early_stopping_type="Off",
    strategy_config=sc
)

Finally, we start the automatic model tuning job by calling the fit method. If you want to launch the job in an asynchronous fashion, set wait to False. See the following code:

tuner.fit(
{"train": s3_input_train, "validation": s3_input_validation},
include_cls_metadata=False,
wait=True,
)

You can follow the job progress and summary on the SageMaker console. In the navigation pane, under Training, choose Hyperparameter tuning jobs, then choose the relevant tuning job. The following screenshot shows the tuning job with details on the training jobs’ status and performance.

When the tuning job is complete, we can review the results. In the notebook example, we show how to extract results using the SageMaker SDK. First, we examine how the tuning job increased model convergence. You can attach the HyperparameterTuner object using the job name and call the describe method. The method returns a dictionary containing tuning job metadata and results.

In the following code, we retrieve the value of the best-performing training job, as measured by our objective metric (validation AUC):

tuner = HyperparameterTuner.attach(tuning_job_name=tuning_job_name)
tuner.describe()["BestTrainingJob"]["FinalHyperParameterTuningJobObjectiveMetric"]["Value"]

The result is 0.78 in AUC on the validation set. That’s a significant improvement over the initial 0.63!

Next, let’s see how fast our training job ran. For that, we use the HyperparameterTuningJobAnalytics method in the SDK to fetch results about the tuning job, and read into a Pandas data frame for analysis and visualization:

tuner_analytics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)
full_df = tuner_analytics.dataframe()
full_df.sort_values(by=["FinalObjectiveValue"], ascending=False).head()

Let’s see the average time a training job took with Hyperband strategy:

full_df["TrainingElapsedTimeSeconds"].mean()

The average time took approximately 1 minute. This is consistent with the Hyperband strategy mechanism that stops underperforming training jobs early. In terms of cost, the tuning job charged us for a total of 30 minutes of training time. Without Hyperband early stopping, the total billable training duration was expected to be 90 minutes (30 jobs * 1 minutes per job * 3 instances per job). That is three times better in cost savings! Finally, we see that the tuning job ran 30 training jobs and took a total of 12 minutes. That is almost 50% less of the expected time (30 jobs/4 jobs in parallel * 3 minutes per job).

Conclusion

In this post, we described some observed convergence issues when training models with distributed environments. We saw that SageMaker AMT using Hyperband addressed the main concerns that optimizing data parallel distributed training introduced: convergence (which improved by more than 10%), operational efficiency (the tuning job took 50% less time than a sequential, non-optimized job would have taken) and cost-efficiency (30 vs. the 90 billable minutes of training job time). The following table summarizes our results:

Improvement Metric No Tuning/Naive Model Tuning Implementation SageMaker Hyperband Automatic Model Tuning Measured Improvement
Model Quality
(Measured by validation AUC)
0.63 0.78 15%
Cost
(Measured by billable training minutes)
90 30 66%
Operational efficiency
(Measured by total running time)
24 12 50%

In order to fine-tune with regards to scaling (cluster size), you can repeat the tuning job with multiple cluster configurations and compare the results to find the optimal hyperparameters that satisfy speed and model accuracy.

We included the steps to achieve this in the last section of the notebook.

References

[1] Lian, Xiangru, et al. “Asynchronous decentralized parallel stochastic gradient descent.” International Conference on Machine Learning. PMLR, 2018.

[2] Keskar, Nitish Shirish, et al. “On large-batch training for deep learning: Generalization gap and sharp minima.” arXiv preprint arXiv:1609.04836 (2016).

[3] Dai, Wei, et al. “Toward understanding the impact of staleness in distributed machine learning.” arXiv preprint arXiv:1810.03264 (2018).

[4] Dauphin, Yann N., et al. “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization.” Advances in neural information processing systems 27 (2014).


About the Author

Uri Rosenberg is the AI & ML Specialist Technical Manager for Europe, Middle East, and Africa. Based out of Israel, Uri works to empower enterprise customers to design, build, and operate ML workloads at scale. In his spare time, he enjoys cycling, hiking, and complaining about data preparation.

Read More

Access private repos using the @remote decorator for Amazon SageMaker training workloads

Access private repos using the @remote decorator for Amazon SageMaker training workloads

As more and more customers are looking to put machine learning (ML) workloads in production, there is a large push in organizations to shorten the development lifecycle of ML code. Many organizations prefer writing their ML code in a production-ready style in the form of Python methods and classes as opposed to an exploratory style (writing code without using methods or classes) because this helps them ship production-ready code faster.

With Amazon SageMaker, you can use the @remote decorator to run a SageMaker training job simply by annotating your Python code with an @remote decorator. The SageMaker Python SDK will automatically translate your existing workspace environment and any associated data processing code and datasets into a SageMaker training job that runs on the SageMaker training platform.

Running a Python function locally often requires several dependencies, which may not come with the local Python runtime environment. You can install them via package and dependency management tools like pip or conda.

However, organizations operating in regulated industries like banking, insurance, and healthcare operate in environments that have strict data privacy and networking controls in place. These controls often mandate having no internet access available to any of their environments. The reason for such restriction is to have full control over egress and ingress traffic so they can reduce the chances of unscrupulous actors sending or receiving non-verified information through their network. It’s often also mandated to have such network isolation as part of the auditory and industrial compliance rules. When it comes to ML, this restricts data scientists from downloading any package from public repositories like PyPI, Anaconda, or Conda-Forge.

To provide data scientists access to the tools of their choice while also respecting the restrictions of the environment, organizations often set up their own private package repository hosted in their own environment. You can set up private package repositories on AWS in multiple ways:

In this post, we focus on the first option: using CodeArtifact.

Solution overview

The following architecture diagram shows the solution architecture.

Solution-Architecture-vpc-no-internet

The high-level steps to implement the solution are as follows

  • Set up a virtual private cloud (VPC) with no internet access using an AWS CloudFormation template.
  • Use a second CloudFormation template to set up CodeArtifact as a private PyPI repository and provide connectivity to the VPC, and set up an Amazon SageMaker Studio environment to use the private PyPI repository.
  • Train a classification model based on the MNIST dataset using an @remote decorator from the open-source SageMaker Python SDK. All the dependencies will be downloaded from the private PyPI repository.

Note that using SageMaker Studio in this post is optional. You can choose to work in any integrated development environment (IDE) of your choice. You just need to set up your AWS Command Line Interface (AWS CLI) credentials correctly. For more information, refer to Configure the AWS CLI.

Prerequisites

You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account.

Set up a VPC with no internet connection

Create a new CloudFormation stack using the vpc.yaml template. This template creates the following resources:

  • A VPC with two private subnets across two Availability Zones with no internet connectivity
  • A Gateway VPC endpoint for accessing Amazon S3
  • Interface VPC endpoints for SageMaker, CodeArtifact, and a few other services to allow the resources in the VPC to connect to AWS services via AWS PrivateLink

Provide a stack name, such as No-Internet, and complete the stack creation process.

vpc-no-internet-stack

Wait for the stack creation process to complete.

Set up a private repository and SageMaker Studio using the VPC

The next step is to deploy another CloudFormation stack using the sagemaker_studio_codeartifact.yaml template. This template creates the following resources:

Provide a stack name and keep the default values or adjust the parameters for the CodeArtifact domain name, private repository name, user profile name for SageMaker Studio, and name for the upstream public PyPI repository. You also we need to provide the VPC stack name created in the previous step.

Studio-CodeArtifact-stack

When the stack creation is complete, the SageMaker domain should be visible on the SageMaker console.

studio-domain

To verify there is no internet connection available in SageMaker Studio, launch SageMaker Studio. Choose File, New, and Terminal to launch a terminal and try to curl any internet resource. It should fail to connect, as shown in the following screenshot.

terminal-showing-no-internet

Train an image classifier using an @remote decorator with the private PyPI repository

In this section, we use the @remote decorator to run a PyTorch training job that produces a MNIST image classification model. To achieve this, we set up a configuration file, develop the training script, and run the training code.

Set up a configuration file

We set up a config.yaml file and provide the configurations needed to do the following:

  • Run a SageMaker training job in the no-internet VPC created earlier
  • Download the required packages by connecting to the private PyPI repository created earlier

The file looks like the following code:

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: '../config/requirements.txt'
        InstanceType: 'ml.m5.xlarge'
        PreExecutionCommands:
            - 'aws codeartifact login --tool pip --domain <domain-name> --domain-owner <AWS account number> --repository <private repository name> --endpoint-url <VPC-endpoint-url-prefixed with https://>
        RoleArn: '<execution role ARN for running training job>'
        S3RootUri: '<s3 bucket to store the job output>'
        VpcConfig:
            SecurityGroupIds: 
            - '<security group id used by SageMaker Studio>'
            Subnets: 
            - '<VPC subnet id 1>'
            - '<VPC subnet id 2>'

The Dependencies field contains the path to requirements.txt, which contains all the dependencies needed. Note that all the dependencies will be downloaded from the private repository. The requirements.txt file contains the following code:

torch
torchvision
sagemaker>=2.156.0,<3

The PreExecutionCommands section contains the command to connect to the private PyPI repository. To get the CodeArtifact VPC endpoint URL, use the following code:

response = ec2.describe_vpc_endpoints(
    Filters=[
        {
            'Name': 'service-name',
            'Values': [
                f'com.amazonaws.{boto3_session.region_name}.codeartifact.api'
            ]
        },
    ]
)

code_artifact_api_vpc_endpoint = response['VpcEndpoints'][0]['DnsEntries'][0]['DnsName']

endpoint_url = f'https://{code_artifact_api_vpc_endpoint}'
endpoint_url

Generally, we get two VPC endpoints for CodeArtifact, and we can use any of them in the connection commands. For more details, refer to Use CodeArtifact from a VPC.

Additionally, configurations like execution role, output location, and VPC configurations are provided in the config file. These configurations are needed to run the SageMaker training job. To know more about all the configurations supported, refer to Configuration file.

It’s not mandatory to use the config.yaml file in order to work with the @remote decorator. This is just a cleaner way to supply all configurations to the @remote decorator. All the configs could also be supplied directly in the decorator arguments, but that reduces readability and maintainability of changes in the long run. Also, the config file can be created by an admin and shared with all the users in an environment.

Develop the training script

Next, we prepare the training code in simple Python files. We have divided the code into three files:

  • load_data.py – Contains the code to download the MNIST dataset
  • model.py – Contains the code for the neural network architecture for the model
  • train.py – Contains the code for training the model by using load_data.py and model.py

In train.py, we need to decorate the main training function as follows:

@remote(include_local_workdir=True)
def perform_train(train_data,
                  test_data,
                  *,
                  batch_size: int = 64,
                  test_batch_size: int = 1000,
                  epochs: int = 3,
                  lr: float = 1.0,
                  gamma: float = 0.7,
                  no_cuda: bool = True,
                  no_mps: bool = True,
                  dry_run: bool = False,
                  seed: int = 1,
                  log_interval: int = 10,
                  ):
    # pytorch native training code........

Now we’re ready to run the training code.

Run the training code with an @remote decorator

We can run the code from a terminal or from any executable prompt. In this post, we use a SageMaker Studio notebook cell to demonstrate this:

!python ./train.py

Running the preceding command triggers the training job. In the logs, we can see that it’s downloading the packages from the private PyPI repository.

training-job-logs

This concludes the implementation of an @remote decorator working with a private repository in an environment with no internet access.

Clean up

To clean up the resources, follow the instructions in CLEANUP.md.

Conclusion

In this post, we learned how to effectively use the @remote decorator’s capabilities while still working in restrictive environments without any internet access. We also learned how can we integrate CodeArtifact private repository capabilities with the help of configuration file support in SageMaker. This solution makes iterative development much simpler and faster. Another added advantage is that you can still continue to write the training code in a more natural, object-oriented way and still use SageMaker capabilities to run training jobs on a remote cluster with minimal changes in your code. All the code shown as part of this post is available in the GitHub repository.

As a next step, we encourage you to check out the @remote decorator functionality and Python SDK API and use it in your choice of environment and IDE. Additional examples are available in the amazon-sagemaker-examples repository to get you started quickly. You can also check out the post Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes for more details.


About the author

Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Read More