The event is over, but Amazon Science interviewed each of the six speakers within the Science of Machine Learning track. See what they had to say.Read More
3 questions with George Karypis: Making learning from data embedded in graphs easy and scalable
Karypis is a featured speaker at the first virtual Amazon Web Services Machine Learning Summit on June 2.Read More
Automate weed detection in farm crops using Amazon Rekognition Custom Labels
Amazon Rekognition Custom Labels makes automated weed detection in crops easier. Instead of manually locating weeds, you can automate the process with Amazon Rekognition Custom Labels, which allows you to build machine learning (ML) models that can be trained with only a handful of images and yet are capable of accurately predicting which areas of a crop have weeds and need treatment. This saves farmers time, effort, and weed treatment costs.
Every farm has weeds. Weeds compete with crops and if not controlled can take up precious space, sunlight, water, and nutrients from crops and reduce their yield. Weeds grow much faster than crops and need immediate and effective control. Detecting weeds in crops is a lengthy and time-consuming process and is currently done manually. Although weed spray machines exist that can be coded to go to an exact location in a field and spray weed treatment in just those spots, the process of locating where those weeds exist is not yet automated.
Weed location automation isn’t an easy process. This is where computer vision and AI come in. Amazon Rekognition is a fully managed computer vision service that allows developers to analyze images and videos for a variety of use cases, including face identification and verification, media intelligence, custom industrial automation, and workplace safety. Detecting custom objects and scenes can be hard. Training and improving the accuracy of a computer vision model requires a large amount of data and is a complex problem. Amazon Rekognition Custom Labels allows you to detect custom labeled objects and scenes with just a handful of training images.
In this post, we use Amazon Rekognition Custom Labels to build an ML model that detects weeds in crops. We’re presently helping researchers at a US university automate this process for local farmers.
Create and train a weed detection model
We solve this problem by feeding images of crops with and without weeds to Amazon Rekognition Custom Labels and building an ML model. After the model is built and deployed, we can perform inference by feeding the model images from field cameras. This way farmers can automate weed detection in their fields. Our experiments showed that highly accurate models can be built with as few as 32 images.
- On the Amazon Rekognition console, choose Use Custom Labels.
- Choose Projects.
- Choose Create project.
- For Project name, enter a name (for example,
Weed-detection-in-crops
). - Choose Create project.
Next, we create a dataset.
- On the Amazon Rekognition Custom Labels console, choose Datasets.
- Choose Create dataset.
- Enter a name for your dataset, such as
crop-weed-ds
. - Select your training data location (for this post, we select Upload images from your computer).
- Choose Add images to upload your images.
For this post, we use 32 field images, of which half are images of crops without weeds and half are weed-infected crops.
- After you upload your training images, choose Add labels to add labels to your training data.
For this post, we define two labels: good-crop
and weed
.
- Assign your uploaded images one of these two labels depending on that image type.
- Save these changes.
We now have labeled images for both the classes we defined.
- Create another dataset for testing called
test-ds
, which contains four labeled images for testing purposes.
We’re now ready to train a new model.
- Select the project you created and choose Train new model.
- Choose the training dataset and test dataset that you created earlier.
- Choose Train.
After the model is trained, we can see how it performed. Our model was near perfect, with an F1 score of 1.0. Precision and recall were 1.0 as well.
We can choose View test results to see how this model performed on our test data. The following screenshot shows that good crops were predicted accurately as good crops and weed-infected crops were detected as containing weeds.
Test the model via your browser
We offer an AWS CloudFormation template in the GitHub repo that allows you to test the model through a browser. Choose the appropriate template depending on your Region. The template launches the required resources for you to test the model
The template asks for your email when you launch it. When the template is ready, it emails you the required credentials. The Outputs tab for the CloudFormation stack has a website URL for testing the model.
- On the browser front end, choose Start the model.
- Enter 1 for inference units.
- Choose Start the model.
- When the model is running, you can upload any image to it and get classification results.
- Stop the model once your testing is completed.
Perform inference using the SDK
Inference from the model is also possible using the SDK. The following code runs on the same image as in the previous section:
import boto3
def show_custom_labels(model, bucket, image, min_confidence):
client=boto3.client('rekognition')
#Call DetectCustomLabels
response = client.detect_custom_labels(Image={'S3Object': {'Bucket': bucket, 'Name': image}},
MinConfidence=min_confidence,
ProjectVersionArn=model)
# Print results
for customLabel in response['CustomLabels']:
print('Label ' + str(customLabel['Name']))
print('Confidence ' + str(customLabel['Confidence']) + "n")
return len(response['CustomLabels'])
def main():
bucket = 'crop-weed-bucket'
image = "Weed-1.jpg"
model = 'arn:aws:rekognition:us-east-2:xxxxxxxxxxxx:project/Weed-detection-in-crops/version/Weed-detection-in-crops.2021-03-30T10.02.49/yyyyyyyyyy'
min_confidence=1
label_count=show_custom_labels(model, bucket, image, min_confidence)
print("Custom labels detected: " + str(label_count))
if __name__ == "__main__":
main()
The results from using the SDK are the same as earlier from the browser:
Label weed
Confidence 92.1469955444336
Label good-crop
Confidence 7.852999687194824
Custom labels detected: 2
Best practices
Consider the following best practices when using Amazon Rekognition Custom Labels:
- Use images that have high resolution
- Crop out any background noise in the image
- Have a good contrast between the object you’re trying to detect and other objects in the image
- Delete any resources that you have created once your project is completed
Conclusion
In this post, we showed how you can automate weed detection in crops by building custom ML models with Amazon Rekognition Custom Labels. Amazon Rekognition Custom Labels takes care of deep learning complexities behind the scenes, allowing you to build powerful image classification models with just a handful of training images. You can improve model accuracy by increasing the number of images in your training data and resolution of those images. Farmers can deploy models such as these into their weed spray machines in order to reduce cost and manual effort. To learn more, including other use cases and video tutorials, visit the Amazon Rekognition Custom Labels webpage.
About the Author
Raju Penmatcha is a Senior AI/ML Specialist Solutions Architect at AWS. He works with education, government, and nonprofit customers on machine learning and artificial intelligence related projects, helping them build solutions using AWS. When not helping customers, he likes traveling to new places.
What kind of data scientist should you be?
Amazon’s Daliana Liu helps others in the field chart their own paths.Read More
Fine-tune and deploy the ProtBERT model for protein classification using Amazon SageMaker
Proteins, the key fundamental macromolecules governing in biological bodies, are composed of amino acids. These 20 essential amino acids, each represented by a capital letter, combine to form a protein sequence, which can be used to predict the subcellular localization (the location of protein in a cell) and structure of proteins.
data:image/s3,"s3://crabby-images/a610a/a610afbe1327304f9cc815223a39290b4eb2ac1a" alt=""
Figure 1: Protein Sequence
The study of protein localization is important to comprehend the function of protein, which is essentially to structure, function, and regulate the body’s tissues and organs. Protein localization has great importance for drug design and other applications. For example, we can investigate methods to disrupt the binding of the spiky S1 protein of the SARS-Cov-2 virus. The binding of the S1 protein to the human receptor ACE2 is the mechanism which led to the COVID-19 pandemic [1]. It also plays an important role in characterizing the cellular function of hypothetical and newly discovered proteins [2].
data:image/s3,"s3://crabby-images/db4e4/db4e4c07fb4ee02af2cae3a10817fc5288b4fc34" alt=""
Figure 2: SARS-Cov-2 virus binding to ACE2 human receptor
Protein sequences are constrained to adopting particular 3D shapes (referred to as protein 3D structure) optimized for accomplishing particular functions. These constraints mirror the rules of grammar and meaning in natural language, thereby allowing us to map algorithms from natural language processing (NLP) directly onto protein sequences. During training, the language model learns to extract those constraints from millions of examples and store the derived knowledge in its weights. [1] Although existing solutions in protein bioinformatics [11, 12, 13, 14, 15,16] usually have to search for evolutionary-related proteins in exponentially growing databases, language models offer a potential alternative to this increasingly time-consuming database search because they extract features directly from single protein sequences. Additionally, the performance of existing solutions deteriorates if a sufficient number of related sequences can’t be found; for example, the quality of predicted protein structures correlates strongly with the number of effective sequences found in today’s databases [17].
Several research endeavors currently aim to localize whole proteomes by using high-throughput approaches [2, 3, 4]. These large datasets provide important information about protein function, and more generally global cellular processes. However, they currently don’t achieve 100% coverage of proteomes, and the methodology used can in some cases cause mislocalization of subsets of proteins [5, 6]. Therefore, complementary methods are necessary to address these problems.
In this post, we use NLP techniques for protein sequence classification. The idea is to interpret protein sequences as sentences and their constituent—amino acids—as single words [7]. More specifically, we fine-tune the PyTorch ProtBERT model from the Hugging Face library using Amazon SageMaker.
What is ProtBERT?
ProtBERT is a pretrained model on protein sequences using a masked language modeling objective. It’s based on the BERT model, which is pretrained on a large corpus of protein sequences in a self-supervised fashion. This means it was pretrained on the raw protein sequences only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those protein sequences [8]. For more information about ProtBERT, see ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing.
Solution overview
The post focuses on fine-tuning the PyTorch ProtBERT model (see the following diagram). We first extend the pretrained ProtBERT model to classify the protein sequences.
We then deploy the model using SageMaker, which is the most comprehensive and fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. During the training, we use the distributed data parallel (SDP) feature in SageMaker, which extends its training capabilities on deep learning models with near-linear scaling efficiency, achieving fast time-to-train with minimal code changes.
The notebook and code from this post are available on GitHub. To run it yourself, clone the GitHub repository and open the Jupyter notebook file.
Dataset
In this post, we use an open-source DeepLoc [10] public dataset of protein sequences to train the model. The dataset is a FASTA file composed of header and protein sequence. The header is composed of the accession number from Uniprot, the annotated subcellular localization, and possibly a description field indicating if the protein was part of the test set. The subcellular localization includes an additional label, where S indicates soluble, M membrane, and U unknown [9]. The following code is a sample of the data:
>Q9SMX3 Mitochondrion-M test
MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTTGTNKGSLFLGDVATQVKNNNFTADVKVST
DSSLLTTLTFDEPAPGLKVIVQAKLPDHKSGKAEVQYFHDYAGISTSVGFTATPIVNFSGVVGTNGLSLGTDV
AYNTESGNFKHFNAGFNFTKDDLTASLILNDKGEKLNASYYQIVSPSTVVGAEISHNFTTKENAITVGTQHAL>
DPLTTVKARVNNAGVANALIQHEWRPKSFFTVSGEVDSKAIDKSAKVGIALALKP"
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The definition line (defline) is distinguished from the sequence data by a greater-than (>
) symbol at the beginning. The word following the >
symbol is the identifier of the sequence, and the rest of the line is the description.
We download the FASTA formatted dataset and read it by directly filtering out the columns that are of interest. The dataset consists of 14,000 sequences and 6 columns in total. The columns are as follows:
- id – Unique identifier given each sequence in the dataset.
- sequence – Protein sequence. Each character is separated by a space. This is useful for the BERT tokenizer.
- sequence_length – Character length of each protein sequence.
- location – Classification given each sequence. The dataset has 10 unique classes (subcellular localization).
- is_train – Indicates whether the record should be used for training or test. Is also used to separate the dataset for training and validation.
When we plot the sequence lengths of each record as an histogram, we observe the following distribution.
This is an important observation because the ProtBERT model receives a fixed sentence length as input. Usually, the maximum length of a sentence depends on the data we’re working on. For sentences that are shorter than this maximum length, we have to add paddings (empty tokens) to the sentences to make up the length.
In the preceding plot, most of the sequences are under 1,500 characters in length, therefore, it’s a good idea to choose max_length = 1536
, but that increases the training time for this sample notebook, therefore, we use max_length = 512
.
When we’re retrieving each sequence record using the Pytorch DataLoaders during training, we must ensure that each sequence is tokenized, truncated, and the necessary padding is added to make them all the same max_length
value. To encapsulate this process, we define the ProteinSequenceDataset
class, which uses the encode_plus()
API provided by the Hugging Face transformer library:
#data_prep.py
import torch
from torch import nn
import torch.utils.data
import torch.utils.data.distributed
from torch.utils.data import Dataset, DataLoader, RandomSampler, TensorDataset
class ProteinSequenceDataset(Dataset):
def __init__(self, sequence, targets, tokenizer, max_len):
self.sequence = sequence
self.targets = targets
self.tokenizer = tokenizer
self.max_len = max_len
def __len__(self):
return len(self.sequence)
def __getitem__(self, item):
sequence = str(self.sequence[item])
target = self.targets[item]
encoding = self.tokenizer.encode_plus(
sequence,
truncation=True,
add_special_tokens=True,
max_length=self.max_len,
return_token_type_ids=False,
padding='max_length',
return_attention_mask=True,
return_tensors='pt',
)
return {
'protein_sequence': sequence,
'input_ids': encoding['input_ids'].flatten(),
'attention_mask': encoding['attention_mask'].flatten(),
'targets': torch.tensor(target, dtype=torch.long)
}
Next, we divide the dataset into training and test. We can use the is_train column to do the split, which results 11,231 records for the training set and 2,773 records for the test set (about a 75:25 data split). Finally, we upload this test and train data to our Amazon Simple Storage Service (Amazon S3) location in order to accommodate model training on SageMaker.
ProtBERT fine-tuning
In computational biology and bioinformatics, we have gold mines of data from protein sequences, but we need high computing resources to train the models, which can be limiting and costly. One way to overcome this challenge is to use transfer learning.
Transfer learning is an ML method in which a pretrained model, such as a pretrained BERT model for text classification, is reused as the starting point for a different but related problem. By reusing parameters from pretrained models, you can save significant amounts of training time and cost.
In our notebook, we use the pretrained prot_bert_bfd_localization
model on the DeepLoc dataset for predicting protein subcellular localization by adding a classification layer, as shown in the following code:
#model_def.py
from transformers import BertModel, BertTokenizer, AdamW, get_linear_schedule_with_warmup
import torch
import torch.nn.functional as F
import torch.nn as nn
PRE_TRAINED_MODEL_NAME = 'Rostlab/prot_bert_bfd_localization'
class ProteinClassifier(nn.Module):
def __init__(self, n_classes):
super(ProteinClassifier, self).__init__()
self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
self.classifier = nn.Sequential(nn.Dropout(p=0.2),
nn.Linear(self.bert.config.hidden_size, n_classes),
nn.Tanh())
def forward(self, input_ids, attention_mask):
output = self.bert(
input_ids=input_ids,
attention_mask=attention_mask
)
return self.classifier(output.pooler_output)
We use ProteinClassifier
defined in the model_def.py
script for training.
Training script
We use the PyTorch-Transformers library, which contains PyTorch implementations and pretrained model weights for many NLP models, including BERT. As mentioned earlier, we use the ProtBERT model, which is pretrained on protein sequences.
We also use the distributed data parallel feature launched in December 2020 to speed up the training by distributing the data on multiple GPUs. It’s very similar to a PyTorch training script you might run outside of SageMaker, but modified to run with SDP. SDP’s PyTorch client provides an alternative to PyTorch’s native DDP. For details about how to use SDP in your native PyTorch script, see the Get Started with Distributed Training.
The following script saves the model artifacts learned during training to a file path, model_dir
, as mandated by the SageMaker PyTorch image:
# SageMaker Distributed code.
from smdistributed.dataparallel.torch.parallel.distributed import DistributedDataParallel as DDP
import smdistributed.dataparallel.torch.distributed as dist
# intializes the process group for distributed training
dist.init_process_group()
When training is complete, SageMaker uploads model artifacts saved in model_dir
to Amazon S3 so they’re available for deployment. The following code in the script saves the trained model artifacts:
def save_model(model, model_dir):
path = os.path.join(model_dir, 'model.pth')
# recommended way from http://pytorch.org/docs/master/notes/serialization.html
torch.save(model.state_dict(), path)
logger.info(f"Saving model: {path} n")
Because PyTorch-Transformer isn’t included natively in SageMaker PyTorch images, we have to provide a requirements.txt
file so that SageMaker installs this library for training and inference. A requirements.txt
file is a text file that contains a list of items that are installed by using pip install
. You can also specify the version of an item to install. To install PyTorch-Transformer and other libraries, we add the following line to the requirements.txt
file:
transformers
torch-optimizer
sagemaker==2.19.0
boto3
You can view the entire file in the GitHub repo, and it also goes into the code/
directory. For more information about the format of a requirements.txt
file, see Requirements Files.
Train on SageMaker
We use SageMaker to train and deploy a model using our custom PyTorch code. The SageMaker Python SDK makes it easy to run a PyTorch script in SageMaker using its PyTorch estimator. After that, we can use the SageMaker Python SDK to deploy the trained model and run predictions. For more information on how to use this SDK with PyTorch, see Use PyTorch with the SageMaker Python SDK.
To start, we use the PyTorch estimator class to train our model. When creating our estimator, we make sure to specify a few things:
- entry_point – The name of our PyTorch script. It contains our training script, which loads data from the input channels, configures training with hyperparameters, trains a model, and saves the model. It also contains code to load and run the model during inference.
- source_dir – The location of our training scripts and
requirements.txt
file. The requirements file lists packages you want to use with your script. - framework_version – The PyTorch version we want to use.
The PyTorch estimator supports both single-machine and multi-machine, distributed PyTorch training using SDP. Our training script supports distributed training for only GPU instances.
Instance types
SDP supports model training on SageMaker with the following instance types only:
- p3.16xlarge
- p3dn.24xlarge (Recommended)
- p4d.24xlarge (Recommended)
Instance count
To get the best performance out of SDP, you should use at least two instances, but you can also use one for testing this example, which implements the script in a single instance, multiple GPU mode, taking advantage of the eight GPUs on the instance to train faster and cheaper.
Distribution strategy
To use DDP mode, you update the the distribution strategy and set it to use smdistributed dataparallel
.
After we create the estimator, we call fit()
, which launches a training job. We use the Amazon S3 URIs that we uploaded the training data to earlier. See the following code:
from sagemaker.pytorch import PyTorch
TRAINING_JOB_NAME="protbert-training-pytorch-{}".format(time.strftime("%m-%d-%Y-%H-%M-%S"))
print('Training job name: ', TRAINING_JOB_NAME)
estimator = PyTorch(
entry_point="train.py",
source_dir="code",
role=role,
framework_version="1.6.0",
py_version="py36",
instance_count=1, # this script support distributed training for only GPU instances.
instance_type="ml.p3.16xlarge",
distribution={'smdistributed':{
'dataparallel':{
'enabled': True
}
}
},
debugger_hook_config=False,
hyperparameters={
"epochs": 3,
"num_labels": num_classes,
"batch-size": 4,
"test-batch-size": 4,
"log-interval": 100,
"frozen_layers": 15,
},
metric_definitions=[
{'Name': 'train:loss', 'Regex': 'Training Loss: ([0-9\.]+)'},
{'Name': 'test:accuracy', 'Regex': 'Validation Accuracy: ([0-9\.]+)'},
{'Name': 'test:loss', 'Regex': 'Validation loss: ([0-9\.]+)'},
]
)
estimator.fit({"training": inputs_train, "testing": inputs_test}, job_name=TRAINING_JOB_NAME)
With max_length=512
and running the model for only three epochs, we get a validation accuracy of around 65%, which is pretty decent. You can optimize it further by trying a bigger sequence length, increasing the number of epochs, and tuning other hyperparameters. Make sure to increase the GPU memory or reduce the batch size when you increase the sequence length, otherwise you might get cuda out of memory error
.
For more details on optimizing the model, see ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing.
Deploy the model on SageMaker
After we train our model, we host it on an SageMaker endpoint. To make the endpoint load the model and serve predictions, we implement a few methods in inference.py:
- model_fn() – Loads the saved model and returns a model object that can be used for model serving. The SageMaker PyTorch model server loads our model by invoking
model_fn
:
def model_fn(model_dir):
logger.info('model_fn')
print('Loading the trained model...')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ProteinClassifier(10) # pass number of classes, in our case its 10
with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
model.load_state_dict(torch.load(f, map_location=device))
return model.to(device)
- input_fn() – Deserializes and prepares the prediction input. In this example, our request body is first serialized to JSON and then sent to the model serving endpoint. Therefore, in
input_fn()
, we first deserialize the JSON-formatted request body and return the input as atorch.tensor
, as required for the ProtBERT model:
def input_fn(request_body, request_content_type):
"""An input_fn that loads a pickled tensor"""
if request_content_type == "application/json":
sequence = json.loads(request_body)
print("Input protein sequence: ", sequence)
encoded_sequence = tokenizer.encode_plus(
sequence,
max_length = MAX_LEN,
add_special_tokens = True,
return_token_type_ids = False,
padding = 'max_length',
return_attention_mask = True,
return_tensors='pt'
)
input_ids = encoded_sequence['input_ids']
attention_mask = encoded_sequence['attention_mask']
return input_ids, attention_mask
raise ValueError("Unsupported content type: {}".format(request_content_type))
- predict_fn() – Performs the prediction and returns the result. To deploy our endpoint, we call
deploy()
on our PyTorch estimator object, passing in our desired number of instances and instance type:
def predict_fn(input_data, model):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
input_id, input_mask = input_data
input_id = input_id.to(device)
input_mask = input_mask.to(device)
with torch.no_grad():
output = model(input_id, input_mask)
_, prediction = torch.max(output, dim=1)
return prediction
Create a model object
You define the model object by using the SageMaker SDK’s PyTorchModel
and pass in the model from the estimator and the entry_point
. The function loads the model and sets it to use a GPU, if available. See the following code:
import sagemaker
from sagemaker.pytorch import PyTorchModel
ENDPOINT_NAME = "protbert-inference-pytorch-1-{}".format(time.strftime("%m-%d-%Y-%H-%M-%S"))
print("Endpoint name: ", ENDPOINT_NAME)
model = PyTorchModel(model_data=model_data, source_dir='code',
entry_point='inference.py', role=role, framework_version='1.6.0', py_version='py3')
Deploy the model on an endpoint
You create a predictor by using the model.deploy
function. You can optionally change both the instance count and instance type:
%%time
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.2xlarge', endpoint_name=ENDPOINT_NAME)
Predict protein subcellular localization
Now that we have deployed the model endpoint, we can provide some protein sequences and let the model endpoint identify their subcellular localization, using the predictor we created:
prediction = predictor.predict(protein_sequence)
print(prediction)
The following table summarizes some of our results.
Sequence | Ground Truth | Prediction |
M G K K D A S T T R T P V D Q Y R K Q I G R Q D Y K K N K P V L K A T R L K A E A K K A A I G I K E V I L V T I A I L V L L F A F Y A F F F L N L T K T D I Y E D S N N | Endoplasmic.reticulum | Endoplasmic.reticulum |
M S M T I L P L E L I D K C I G S N L W V I M K S E R E F A G T L V G F D D Y V N I V L K D V T E Y D T V T G V T E K H S E M L L N G N G M C M L I P G G K P E | Nucleus | Nucleus |
M G G P T R R H Q E E G S A E C L G G P S T R A A P G P G L R D F H F T T A G P S K A D R L G D A A Q I H R E R M R P V Q C G D G S G E R V F L Q S P G S I G T L Y I R L D L N S Q R S T C C C L L N A G T K G M C | Cytoplasm | Cytoplasm |
Clean up resources
Remember to delete the SageMaker endpoint and SageMaker notebook instance created to avoid charges. See the following code:
predictor.delete_endpoint()
Conclusion
In this post, we used a pretrained ProtBERT model (prot_bert_bfd_localization) as a starting point and fine-tuned it for the downstream task of identifying the subcelluar localization of protein sequences. We used the SageMaker capabilities to train, deploy, and do the inference. Furthermore, we explored the SageMaker data parallel feature to make our training process efficient. You can use the same concept to perform other downstream tasks, such as for amino-acid level classification like predicting the secondary structure of the protein. For more about using PyTorch with SageMaker, see Using PyTorch with the SageMaker Python SDK.
References
- [1] ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v2.full.pdf)
- [2]Protein sequence Diagram : https://www.technologynetworks.com/applied-sciences/articles/essential-amino-acids-chart-abbreviations-and-structure-324357
- [3] Refining Protein Subcellular Localization (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1289393/)
- [4] Kumar A, Agarwal S, Heyman JA, Matson S, Heidtman M, et al. Subcellular localization of the yeast proteome. Genes Dev. 2002;16:707–719. [PMC free article] [PubMed] [Google Scholar]
- [5] Huh WK, Falvo JV, Gerke LC, Carroll AS, Howson RW, et al. Global analysis of protein localization in budding yeast. Nature. 2003;425:686–691. [PubMed] [Google Scholar]
- [6] Wiemann S, Arlt D, Huber W, Wellenreuther R, Schleeger S, et al. From ORFeome to biology: A functional genomics pipeline. Genome Res. 2004;14:2136–2144. [PMC free article] [PubMed] [Google Scholar]
- [7] Davis TN. Protein localization in proteomics. Curr Opin Chem Biol. 2004;8:49–53. [PubMed] [Google Scholar]
- [8] Scott MS, Thomas DY, Hallett MT. Predicting subcellular localization via protein motif co-occurrence. Genome Res. 2004;14:1957–1966. [PMC free article] [PubMed] [Google Scholar]
- [9] ProtBERT Hugging Face (https://huggingface.co/Rostlab/prot_bert)
- [10] DeepLoc-1.0: Eukaryotic protein subcellular localization predictor (http://www.cbs.dtu.dk/services/DeepLoc-1.0/data.php)
- [11] M. S. Klausen, M. C. Jespersen et al., “NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning,” Proteins: Structure, Function, and Bioinformatics, vol. 87, no. 6, pp. 520–527, 2019, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.25674.
- [12] J. J. Almagro Armenteros, C. K. Sønderby et al., “DeepLoc: Prediction of protein subcellular localization using deep learning,” Bioinformatics, vol. 33, no. 21, pp. 3387–3395, Nov. 2017.
- [13] J. Yang, I. Anishchenko et al., “Improved protein structure prediction using predicted interresidue orientations,” Proceedings of the National Academy of Sciences, vol. 117, no. 3, pp. 1496–1503, Jan. 2020.
- [14] A. Kulandaisamy, J. Zaucha et al., “Pred-MutHTP: Prediction of disease-causing and neutral mutations in human transmembrane proteins,” Human Mutation, vol. 41, no. 3, pp. 581–590, 2020, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/humu.23961.
- [15] M. Schelling, T. A. Hopf, and B. Rost, “Evolutionary couplings and sequence variation effect predict protein binding sites,” Proteins: Structure, Function, and Bioinformatics, vol. 86, no. 10, pp. 1064–1074, 2018, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.25585.
- [16] M. Bernhofer, E. Kloppmann et al., “TMSEG: Novel prediction of transmembrane helices,” Proteins: Structure, Function, and Bioinformatics, vol. 84, no. 11, pp. 1706–1716, 2016, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.25155.
- [17] D. S. Marks, L. J. Colwell et al., “Protein 3D Structure Computed from Evolutionary Sequence Variation,” PLOS ONE, vol. 6, no. 12, p. e28766, Dec. 2011.
About the Authors
Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers using machine learning to solve their business challenges using the AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge, therefore, she has created her own lab with self-driving kit and prototype manufacturing production line, where she spend lot of her free time.
Shamika Ariyawansa is a Solutions Architect at AWS helping customers run a variety of applications on AWS and machine learning workloads in particular. He is based out of Denver, Colorado. In his spare time, he enjoys off-roading adventures in the Colorado mountains and competing in machine learning competitions.
Vaijayanti Joshi is a Boston-based Solutions Architect for AWS. She is passionate about technology and enjoys helping customers find innovative solutions to complex business challenges. Her core areas of focus are machine learning and analytics. When she’s not working with customers on their journey to the cloud, she enjoys biking, swimming, and exploring new places.
Predicting answers to product questions using similar products
A new method predicts with high accuracy the answer to a product question, based on knowledge from similar products.Read More
Gain valuable ML skills with the AWS Machine Learning Engineer Nanodegree Scholarship from Udacity
Amazon Web Services is partnering with Udacity to help educate developers of all skill levels on machine learning (ML) concepts with the AWS Machine Learning Scholarship Program by Udacity by offering 425 scholarships, with a focus on women and underrepresented groups.
Machine learning is an exciting and rapidly developing technology that has the power to create millions of jobs and transform our daily lives. According to the Future of Jobs Report 2020 by the World Economic Forum, by 2025, 97 million new roles may be created as a result of ML innovation. However, only today’s developers have the skills to act on these opportunities now. Proximity to high-quality education, cost of traditional education, and allocating time to start and complete new learning projects make learning ML more complicated.
To address this challenge, AWS invests in educating developers, data scientists, and ML developers with a variety of education solutions, such as exploring reinforcement learning concepts with AWS DeepRacer, training and validation with the AWS Certified Machine Learning – Specialty certification, and hands-on tutorials from the AWS Machine Learning Community.
Up-leveling ML skills and opening new career opportunities
With the AWS Machine Learning Engineer Nanodegree by Udacity, developers can learn valuable skills for ML career path with an interactive, cost-effective, and accessible ML education. All students that enroll in the scholarship program have access to AWS Machine Learning Foundations, a free course covering an introduction to ML concepts, including reinforcement learning, computer vision, and generative artificial intelligence, with expert-led interactive tutorials with AWS AI Devices such as AWS DeepRacer, AWS DeepLens, and AWS DeepComposer.
The AWS Machine Learning Scholarship Program is open to all for registration starting May 26, 2021, through June 23, 2021. Your learning journey begins with the free AWS Machine Learning Foundations course on June 28, 2021, in which you learn the fundamental aspects of ML, ML techniques and algorithms, programming best practices, Python coding, and interactive tutorials with AWS AI Devices. You have 3months to study and complete your assessment by October 11, 2021, with the top 425 students eligible for a scholarship to the AWS Machine Learning Engineer Nanodegree.
The AWS Machine Learning Foundations Course (free) includes the following objectives:
- Learn the fundamentals of ML
- Learn object-oriented programming best practices
- Learn computer vision with AWS DeepLens, reinforcement learning with AWS DeepRacer, and generative AI with AWS DeepComposer.
- Dedicate 3–5 hours a week on the course and work towards earning one of the follow-up Nanodegree program scholarships
In the Machine Learning Engineer Nanodegree program (a $1,000 value course), you learn advanced ML techniques and algorithms, including how to package and deploy models to a production environment.
The AWS Machine Learning Nanodegree Program enabled me to learn and achieve valuable machine learning skills at my own pace with interactive modules that made learning fun and effective,” said Juv Chan, AWS ML Hero and AWS Machine Learning Nanodegree Program Alumni. “Carving out time to learn machine learning can be very hard, especially under the demanding schedules that software engineers work from. The flexibility offered by Udacity Nanodegrees lets me learn new skills on a timetable that works for me.”
This year, we added 100 additional scholarships on top of the 325 scholarships allocated in 2020, and updated the content for students with advanced ML techniques and algorithms and expert-led tutorials on deploying ML models at scale with Amazon SageMaker.
AWS is also collaborating with several nonprofit organizations through the We Power Tech Program to increase the diversity and talent in technical roles, including organizations like Girls In Tech and the National Society of Black Engineers. As part of these ongoing relationships, the nonprofit organizations will help encourage women and underrepresented groups to participate in the AWS Machine Learning Engineer Nanodegree Scholarship Program. Organizations like these develop programs to inspire, support, train, and empower people from underrepresented groups to pursue careers in tech.
“AWS strives to help level the playing field for women and people of color, who have been underrepresented in the tech industry for far too long. We are thrilled to collaborate with Udacity to make this sort of technical training more widely available and accessible,” said LaDavia Drane, global head of Inclusion, Diversity & Equity at AWS. “We look forward to seeing the incredible innovations in machine learning that are sure to come from this initiative.”
“Tech needs representation from women, BIPOC, and other marginalized communities in every aspect of our industry. Companies must make meaningful and measurable change in the areas of diversity, equity, and inclusion to reach their greatest potential, and skills training programs uniquely tailored to increase representation from these groups are necessary for technology to achieve all that it’s capable of. Girls in Tech applauds our collaborator AWS, as well as Udacity, for breaking down the barriers that so often leave women behind in tech. Together, we aim to give everyone a seat at the table.” Adriana Gascoigne, Founder and CEO, Girls in Tech.
How the AWS Machine Learning Engineer Nanodegree Scholarship works
Scholarship enrollment is open from May 26, 2021, through June 23, 2021. Students begin their learning journey with the AWS Machine Learning Foundations Course from June 28, 2021, through October 11, 2021. At the end of the course, learners take an assessment, from which top students are selected for 425 scholarships, who then start the AWS Machine Learning Engineer Nanodegree from October 25, 2021, through January 25, 2022.
Get started and leverage the community
Get started by enrolling now and accelerate your ML journey! Connect with experts and like-minded aspiring ML developers on the AWS Machine Learning Slack channel.
About the Author
Cameron Peron is Senior Marketing Manager for AWS AI/ML Education and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.
Ten university teams selected to participate in Alexa Prize TaskBot Challenge
Teams from three continents will compete to develop agents that assist customers in completing multi-step tasks.Read More
How Contentsquare reduced TensorFlow inference latency with TensorFlow Serving on Amazon SageMaker
In this post, we present the results of a model serving experiment made by Contentsquare scientists with an innovative DL model trained to analyze HTML documents. We show how the Amazon SageMaker TensorFlow Serving solution helped Contentsquare address several serving challenges.
Contentsquare’s challenge
Contentsquare is a fast-growing French technology company empowering brands to build better digital experiences. In their own words, “Our experience analytics platform tracks and visualizes billions of digital behaviors, delivering intelligent recommendations that everyone can use to grow revenue, increase loyalty, and fuel innovation.”
Contentsquare scientists developed several ML and deep learning models, and wanted to find solutions for cost-effective and performant real-time model serving. For this experiment, they chose a custom multi-input, multi-task deep neural network developed with TensorFlow-backed Keras, which can answer several questions in one single inference on large payloads consisting of HTML pages.
Baseline deployment: Flask on Amazon EC2
As a baseline deployment, the Contentsquare team served the TensorFlow-backed Keras model from a Flask server hosted on an Amazon Elastic Compute Cloud (Amazon EC2) p2.xlarge GPU machine. Flask is a popular Python web framework (52,000 stars on GitHub) appreciated for its simplicity and large community. The EC2 p2.xlarge instance was fitted with a NVIDIA Tesla K80 GPU card. On a reference input payload used as a benchmark, this design provided a single-request inference latency of approximately 5 seconds.
Optimized deployment: TensorFlow Serving on SageMaker
To reduce management overhead and get a simpler deployment experience, the Contentsquare team experimented with Amazon SageMaker. SageMaker is a managed service supporting the development lifecycle of custom models, from annotation up to production deployment and monitoring. Beyond enabling a faster time to market, SageMaker provides state-of-the-art open-source pre-written serving containers for XGBoost (container, SDK), Scikit-Learn (container, SDK), PyTorch (container, SDK), TensorFlow (container, SDK) and Apache MXNet (container, SDK). In particular, the SageMaker TensorFlow serving container is built on top of TensorFlow Serving (TensorFlow-Serving: Flexible, High-Performance ML Serving, Olston et al.), the official, high-performance serving stack for TensorFlow. The SageMaker team further improved the TensorFlow Serving experience by adding the option to run custom inference code in front of TensorFlow Serving (for example, for pre or postprocessing).
Slim Frikha, a Contentsquare scientist, says, “That is one of the reasons why we use TensorFlow Serving on SageMaker: TensorFlow Serving runs performant inference, SageMaker provides easy deployment, and the combination of both brings the extra possibility to do preprocessing and postprocessing with TensorFlow Serving.”
Preprocessing and postprocessing are important capabilities that ML practitioners look for when choosing an ML serving solution. To use the custom processing capacity of SageMaker TensorFlow Serving, developers can provide a custom inference.py script containing handling functions. For more information, see Create Python Scripts for Custom Input and Output Formats.
The following figures show a high-level view of the internal architecture of the current SageMaker TensorFlow Serving container. Two web servers are collocated in each instance of the endpoint instance fleet. An NGINX server handles the communication with the requesting client and can optionally run ad hoc data processing via an infererence.py script running in Gunicorn. A TensorFlow Serving server internally exposes TensorFlow models for consumption by the Gunicorn server. In-server communication between Gunicorn and TensorFlow Serving can be done in REST or gRPC when using an inference.py custom inference script, and with REST when using the default setup without the custom inference script. In both cases, external requests are done with REST.
The Contentsquare team tested both gRPC and HTTP for internal communication with TensorFlow Serving, and found gRPC to be much faster than HTTP, because HTTP required a JSON dump of the very large preprocessed input. On the specific benchmark inference payload, deploying in SageMaker TensorFlow Serving on an ml.p2.xlarge hosting instance reduced the global serving latency from 5 seconds to 3 seconds, compared to Keras deployed in Flask on Amazon EC2 p2.xlarge instance—a 40% improvement! This gain is driven by serving optimizations internal to TensorFlow Serving and decoding inputs to TensorFlow tensors, which can be faster if using gRPC.
Conclusion
Contentsquare scientists successfully completed their benchmark and found a cost-effective, high-performance serving solution for their custom TensorFlow model that reduced latency by 40% vs. a reasonable baseline. Another axis of improvement, not evaluated in this benchmark but worth consideration for extra gains, would be to evaluate different instance types. For example, the EC2 G4 instances, more recent than the P2, demonstrated great performance and economics in several inference cases. If you are interested in learning more about TensorFlow Serving on SageMaker, you can find guidance in the documentation, view the container source code on GitHub and navigate our examples gallery.
About the Author
Olivier Cruchant is a Machine Learning Specialist Solutions Architect at AWS, based in Lyon, France. Olivier helps French customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.
Host multiple TensorFlow computer vision models using Amazon SageMaker multi-model endpoints
Amazon SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML. SageMaker accelerates innovation within your organization by providing purpose-built tools for every step of ML development, including labeling, data preparation, feature engineering, statistical bias detection, AutoML, training, tuning, hosting, explainability, monitoring, and workflow automation.
Companies are increasingly training ML models based on individual user data. For example, an image sharing service designed to enable discovery of information on the internet trains custom models based on each user’s uploaded images and browsing history to personalize recommendations for that user. The company can also train custom models based on search topics for recommending images per topic. Building custom ML models for each use case leads to higher inference accuracy, but increases the cost of deploying and managing models. These challenges become more pronounced when not all models are accessed at the same rate but still need to be available at all times.
SageMaker multi-model endpoints provide a scalable and cost-effective way to deploy large numbers of ML models in the cloud. SageMaker multi-model endpoints enable you to deploy multiple ML models behind a single endpoint and serve them using a single serving container. Your application simply needs to include an API call with the target model to this endpoint to achieve low-latency, high-throughput inference. Instead of paying for a separate endpoint for every single model, you can host many models for the price of a single endpoint. For more information about SageMaker multi-model endpoints, see Save on inference costs by using Amazon SageMaker multi-model endpoints.
In this post, we demonstrate how to use SageMaker multi-model endpoints to host two computer vision models with different model architectures and datasets for image classification. In practice, you can deploy tens of thousands of models on multi-model endpoints.
Overview of solution
SageMaker multi-model endpoints work with several frameworks, such as TensorFlow, PyTorch, MXNet, and sklearn, and you can build your own container with a multi-model server. Multi-model endpoints are also supported natively in the following popular SageMaker built-in algorithms: XGBoost, Linear Learner, Random Cut Forest (RCF), and K-Nearest Neighbors (KNN). You can directly use the SageMaker-provided containers while using these algorithms without having to build your own custom container.
The following diagram is a simplified illustration of how you can host multiple (for this post, six) models using SageMaker multi-model endpoints. In practice, multi-model endpoints can accommodate hundreds to tens of thousands of ML models behind an endpoint. In our architecture, if we host more models using model artifacts stored in Amazon Simple Storage Service (Amazon S3), multi-model endpoints dynamically unload some of the least-used models to accommodate newer models.
In this post, we show how to host two computer vision models trained using the TensorFlow framework behind a single SageMaker multi-model endpoint. We use the TensorFlow Serving container enabled for multi-model endpoints to host these models. For our first model, we train a smaller version of AlexNet CNN to classify images from the CIFAR-10 dataset. For the second model, we use a VGG16 CNN model pretrained on the ImageNet dataset and fine-tuned on the Sign Language Digits Dataset to classify hand symbol images. We also provide a fully functional notebook to demonstrate all the steps.
Model 1: CIFAR-10 image classification
CIFAR-10 is a benchmark dataset for image classification in computer vision and ML. CIFAR images are colored (three channels) with dramatic variation in how the objects appear. It consists of 32 × 32 color images in 10 classes, with 6,000 images per class. It contains 50,000 training images and 10,000 test images. The following image shows a sample of the images grouped by the labels.
To build the image classifier, we use a simplified version of the classical AlexNet CNN. The network is composed of five convolutional and pooling layers, and three fully connected layers. Our simplified architecture stacks three convolutional layers and two fully connected (dense) layers.
The first step is to load the dataset into train and test objects. The TensorFlow framework provides the CIFAR dataset for us to load using the load_data() method. Next, we rescale the input images by dividing the pixel values by 255: [0,255] ⇒ [0,1]. We also need to prepare the labels using one-hot encoding. One hot encoding is a process by which the categorical variables are converted into a numerical form. The following code snippet shows these steps in action:
from tensorflow.keras.datasets import cifar10
# load dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
# rescale input images
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255
# one hot encode target labels
num_classes = len(np.unique(y_train))
y_train = utils.to_categorical(y_train, num_classes)
y_test = utils.to_categorical(y_test, num_classes)
After the dataset is prepared and ready for training, it’s saved to Amazon S3 to be used by SageMaker. The model architecture and the training code for the image classifier are assembled into a training script (cifar_train.py). To generate batches of tensor image data for the training process, we use ImageDataGenerator. This enables us to apply data augmentation transformations like rotation, width, and height shifts to our training data.
In the next step, we use the training script to create a TensorFlow estimator using the SageMaker SDK (see the following code). We use the estimator to fit the CNN model on CIFAR-10 inputs. When the training is complete, the model is saved to Amazon S3.
from sagemaker.tensorflow import TensorFlow
model_name = 'cifar-10'
hyperparameters = {'epochs': 50}
estimator_parameters = {'entry_point':'cifar_train.py',
'instance_type': 'ml.m5.2xlarge',
'instance_count': 2,
'model_dir': f'/opt/ml/model',
'role': role,
'hyperparameters': hyperparameters,
'output_path': f's3://{BUCKET}/{PREFIX}/cifar_10/out',
'base_job_name': f'mme-cv-{model_name}',
'framework_version': TF_FRAMEWORK_VERSION,
'py_version': 'py37',
'script_mode': True}
estimator_1 = TensorFlow(**estimator_parameters)
estimator_1.fit(inputs)
Later, we demonstrate how to host this model using SageMaker multi-model endpoint alongside our second model (the sign language digits classifier).
Model 2: Sign language digits classification
For our second model, we use the sign language digits dataset. This dataset distinguishes the sign language digits from 0–9. The following image shows a sample of the dataset.
The dataset contains 100 x 100 images in RGB color and has 10 classes (digits 0–9). The training set contains 1,712 images, the validation set 300, and the test set 50.
This dataset is very small. Training a network from scratch on this small a dataset doesn’t achieve good results. To achieve higher accuracy, we use transfer learning. Transfer learning is usually the go-to approach when starting a classification project, especially when you don’t have much training data. It migrates the knowledge learned from the source dataset to the target dataset, to save training time and computational cost.
To train this model, we use a pretrained VGG16 CNN model trained on the ImageNet dataset and fine-tune it to work on our sign language digits dataset. A pretrained model is a network that has been previously trained on a large dataset, typically on a large-scale image classification task. The VGG16 model architecture we use has 13 convolutional layers in total. For the sign language dataset, because its domain is different from the source domain of the ImageNet dataset, we only fine-tune the last few layers. Fine-tuning here refers to freezing a few of the network layers that are used for feature extraction, and jointly training both the non-frozen layers and the newly added classifier layers of the pretrained model.
The training script (sign_language_train.py) encapsulates the model architecture and the training logic for the sign language digits classifier. First, we load the pretrained weights from the VGG16 network trained on the ImageNet dataset. Next, we freeze part of the feature extractor part, followed by adding the new classifier layers. Finally, we compile the network and run the training process to optimize the model for the smaller dataset.
Next, we use this training script to create a TensorFlow estimator using the SageMaker SDK. This estimator is used to fit the sign language digits classifier on the supplied inputs. When the training is complete, the model is saved to Amazon S3 to be hosted by SageMaker multi-model endpoints. See the following code:
model_name = 'sign-language'
hyperparameters = {'epochs': 50}
estimator_parameters = {'entry_point':'sign_language_train.py',
'instance_type': 'ml.m5.2xlarge',
'instance_count': 2,
'hyperparameters': hyperparameters,
'model_dir': f'/opt/ml/model',
'role': role,
'output_path': f's3://{BUCKET}/{PREFIX}/sign_language/out',
'base_job_name': f'cv-{model_name}',
'framework_version': TF_FRAMEWORK_VERSION,
'py_version': 'py37',
'script_mode': True}
estimator_2 = TensorFlow(**estimator_parameters)
estimator_2.fit({'train': train_input, 'val': val_input})
Deploy a multi-model endpoint
SageMaker multi-model endpoints provide a scalable and cost-effective solution to deploy large numbers of models. It uses a shared serving container that is enabled to host multiple models. This reduces hosting costs by improving endpoint utilization compared to using single-model endpoints. It also reduces deployment overhead because SageMaker manages loading models in memory and scaling them based on the traffic patterns to them.
To create the multi-model endpoint, first we need to copy the trained models for the individual estimators (1 and 2) from their saved S3 locations to a common S3 prefix that can be used by the multi-model endpoint:
tf_model_1 = estimator_1.model_data
output_1 = f's3://{BUCKET}/{PREFIX}/mme/cifar.tar.gz'
tf_model_2 = estimator_2.model_data
output_2 = f's3://{BUCKET}/{PREFIX}/mme/sign-language.tar.gz'
!aws s3 cp {tf_model_1} {output_1}
!aws s3 cp {tf_model_2} {output_2}
After the models are copied to the common location designated by the S3 prefix, we create a serving model using the TensorFlowModel class from the SageMaker SDK. The serving model is created for one of the models to be hosted under the multi-model endpoint. In this case, we use the first model (the CIFAR-10 image classifier). Next, we use the MultiDataModel class from the SageMaker SDK to create a multi-model data model using the serving model for model-1, which we created in the previous step:
from sagemaker.tensorflow.serving import TensorFlowModel
from sagemaker.multidatamodel import MultiDataModel
model_1 = TensorFlowModel(model_data=output_1,
role=role,
image_uri=IMAGE_URI)
mme = MultiDataModel(name=f'mme-tensorflow-{current_time}',
model_data_prefix=model_data_prefix,
model=model_1,
sagemaker_session=sagemaker_session)
Finally, we deploy the MultiDataModel by calling the deploy() method, providing the attributes needed to create the hosting infrastructure required to back the multi-model endpoint:
predictor = mme.deploy(initial_instance_count=2,
instance_type='ml.m5.2xlarge',
endpoint_name=f'mme-tensorflow-{current_time}')
The deploy call returns a predictor instance, which we can use to make inference calls. We see this in the next section.
Test the multi-model endpoint for real-time inference
Multi-model endpoints enable sharing memory resources across your models. If the model to be referenced is already cached, multi-model endpoints run inference immediately. On the other hand, if the particular requested model isn’t cached, SageMaker has to download the model, which increases latency for that initial request. However, this takes only a fraction of the time it would take to launch an entirely new infrastructure (instances) to host the model individually on SageMaker. After a model is cached in the multi-model endpoint, subsequent requests are initiated in real time (unless the model is removed). As a result, you can run many models from a single instance, effectively decoupling our quantity of models from our cost of deployment. This makes it easy to manage ML deployments at scale and lowers your model deployment costs through increased usage of the endpoint and its underlying compute instances. For more information and a demonstration of cost savings of over 90% for a 1,000-model example, see Save on inference costs using Amazon SageMaker multi-model endpoints.
Multi-model endpoints also unload unused models from the container when the instances backing the endpoint reach memory capacity and more models need to be loaded into its container. SageMaker deletes unused model artifacts from the instance storage volume when the volume is reaching capacity and new models need to be downloaded. The first invocation to a newly added model takes longer because the endpoint takes time to download the model from Amazon S3 to the container’s memory of the instances backing the multi-model endpoint. Models that are unloaded remain on the instance’s storage volume and can be loaded into the container’s memory later without being downloaded again from the S3 bucket.
Let’s see how to make an inference from the CIFAR-10 image classifier (model-1) hosted under the multi-model endpoint. First, we load a sample image from one of the classes—airplane— and prepare it to be sent to the multi-model endpoint using the predictor we created in the previous step.
With this predictor, we can call the predict() method along with the initial_args parameter, which specifics the name of the target model to invoke. In this case, the target model is cifar.tar.gz. The following snippet demonstrates this process in detail:
img = load_img('./data/cifar_10/raw_images/airplane.png', target_size=(32, 32))
data = img_to_array(img)
data = data.astype('float32')
data = data / 255.0
data = data.reshape(1, 32, 32, 3)
payload = {'instances': data}
y_pred = predictor.predict(data=payload, initial_args={'TargetModel': 'cifar.tar.gz'})
predicted_label = CIFAR10_LABELS[np.argmax(y_pred)]
print(f'Predicted Label: [{predicted_label}]')
Running the preceding code returns the prediction output as the label airplane, which is correctly interpreted by our served model:
Predicted Label: [airplane]
Next, let’s see how to dynamically load the sign language digit classifier (model-2) into a multi-model endpoint by invoking the endpoint with sign-language.tar.gz as the target model.
We use the following sample image of the hand sign digit 0.
The following snippet shows how to invoke the multi-model endpoint with the sample image to get back the correct response:
test_path = './data/sign_language/test'
img = mpimg.imread(f'{test_path}/0/IMG_4159.JPG')
def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
data = path_to_tensor(f'{test_path}/0/IMG_4159.JPG')
payload = {'instances': data}
y_pred = predictor.predict(data=payload, initial_args={'TargetModel': 'sign-language.tar.gz'})predicted_label = np.argmax(y_pred)
print(f'Predicted Label: [{predicted_label}]')
The following code is our response, with the label 0:
Predicted Label: [0]
Conclusion
In this post, we demonstrated the SageMaker feature multi-model endpoints to optimize inference costs. Multi-model endpoints are useful when you’re dealing with hundreds to tens of thousands of models and where you don’t need to deploy each model as an individual endpoint. Models are loaded and unloaded dynamically, according to usage and the amount of memory available on the endpoint.
This post discussed how to host multiple computer vision models trained using the TensorFlow framework under one SageMaker multi-model endpoint. The image classification models were of different model architectures and trained on different datasets. The notebook included with the post provides detailed instructions on training and hosting the models.
Give SageMaker multi-model endpoints a try for your use case and leave your feedback in the comments.
About the Authors
Arunprasath Shankar is an Artificial Intelligence and Machine Learning (AI/ML) Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.
Mark Roy is a Principal Machine Learning Architect for AWS, helping AWS customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including Insurance, Financial Services, Media and Entertainment, Healthcare, Utilities, and Manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for 25+ years, including 19 years in financial services.