Building better pangenomes to improve the equity of genomics

Building better pangenomes to improve the equity of genomics

For decades, researchers worked together to assemble a complete copy of the molecular instructions for a human — a map of the human genome. The first draft was finished in 2000, but with several missing pieces. Even when a complete reference genome was achieved in 2022, their work was not finished. A single reference genome can’t incorporate known genetic variations, such as the variants for the gene determining whether a person has a blood type A, B, AB or O. Furthermore, the reference genome didn’t represent the vast diversity of human ancestries, making it less useful for detecting disease or finding cures for people from some backgrounds than others. For the past three years, we have been part of an international collaboration with 119 scientists across 60 institutions, called the Human Pangenome Research Consortium, to address these challenges by creating a new and more representative map of the human genome, a pangenome.

We are excited to share that today, in “A draft human pangenome reference”, published in Nature, this group is announcing the completion of the first human pangenome reference. The pangenome combines 47 individual genome reference sequences and better represents the genomic diversity of global populations. Building on Google’s deep learning technologies and past advances in genomics, we used tools based on convolutional neural networks (CNNs) and transformers to tackle the challenges of building accurate pangenome sequences and using them for genome analysis. These contributions helped the consortium build an information-rich resource for geneticists, researchers and clinicians around the world.

Using graphs to build pangenomes

In the typical analysis workflow for high-throughput DNA sequencing, a sequencing instrument reads millions of short pieces of an individual’s genome, and a program called a mapper or aligner then estimates where those pieces best fit relative to the single, linear human reference sequence. Next, variant caller software identifies the unique parts of the individual’s sequence relative to the reference.

But because humans carry a diverse set of sequences, sections that are present in an individual’s DNA but are not in the reference genome can’t be analyzed. One study of 910 African individuals found that a total of 300 million DNA base pairs — 10% of the roughly three billion base pair reference genome — are not present in the previous linear reference but occur in at least one of the 910 individuals.

To address this issue, the consortium used graph data structures, which are powerful for genomics because they can represent the sequences of many people simultaneously, which is needed to create a pangenome. Nodes in a graph genome contain the known set of sequences in a population, and paths through those nodes compactly describe the unique sequences of an individual’s DNA.

Schematic of a graph genome. Each color represents the sequence path of a different individual. Multiple paths passing through the same node indicate multiple individuals share that sequence, but some paths also show a single nucleotide variant (SNV), insertions, or deletions. Illustration credit Darryl Leja, National Human Genome Research Institute (NHGRI).

Actual graph genome for the major histocompatibility complex (MHC) region of the genome. Genes in MHC regions are essential to immune function and are associated with a person’s resistance and susceptibility to infectious disease and autoimmune disorders (e.g., ankylosing spondylitis and lupus). The graph shows the linear human genome reference (green) and different individual person’s sequence (gray).

Using graphs creates numerous challenges. They require reference sequences to be highly accurate and the development of new methods that can use their data structure as an input. However, new sequencing technologies (such as consensus sequencing and phased assembly methods) have driven exciting progress towards solving these problems.

Long-read sequencing technology, which reads larger pieces of the genome (10,000 to millions of DNA characters long) at a time, are essential to the creation of high quality reference sequences because larger pieces can be stitched together into assembled genomes more easily than the short pieces read out by earlier technologies. Short read sequencing reads pieces of the genome that are only 100 to 300 DNA characters long, but has been the highly scalable basis for high-throughput sequencing methods developed in the 2000s. Though long-read sequencing is newer and has advantages for reference genome creation, many informatics methods for short reads hadn’t been developed for long read technologies.

Evolving DeepVariant for error correction

Google initially developed DeepVariant, an open-source CNN variant caller framework that analyzes the short-read sequencing evidence of local regions of the genome. However, we were able to re-train DeepVariant to yield accurate analysis of Pacific Bioscience’s long-read data.

Training and evaluation schematic for DeepVariant.

We next teamed up with researchers at the University of California, Santa Cruz (UCSC) Genomics Institute to participate in a United States Food and Drug Administration competition for another long-read sequencing technology from Oxford Nanopore. Together, we won the award for highest accuracy in the nanopore category, with a single nucleotide variants (SNVs) accuracy that matched short-read sequencing. This work has been used to detect and treat genetic diseases in critically ill newborns. The use of DeepVariant on long-read technologies provided the foundation for the consortium’s use of DeepVariant for error correction of pangenomes.

DeepVariant’s ability to use multiple long-read sequencing modalities proved useful for error correction in the Telomere-to-Telomere (T2T) Consortium’s effort that generated the first complete assembly of a human genome. Completing this first genome set the stage to build the multiple reference genomes required for pangenomes, and T2T was already working closely with the Human Pangenome Project (with many shared members) to scale those practices.

With a set of high-quality human reference genomes on the horizon, developing methods that could use those assemblies grew in importance. We worked to adapt DeepVariant to use the pangenome developed by the consortium. In partnership with UCSC, we built an end-to-end analysis workflow for graph-based variant detection, and demonstrated improved accuracy across several thousand samples. The use of the pangenome allows many previously missed variants to be correctly identified.

Visualization of variant calls in the KCNE1 gene (a gene with variants associated with cardiac arrhythmias and sudden death) using a pangenome reference versus the prior linear reference. Each dot represents a variant call that is either correct (blue dot), incorrect (green dot) — when a variant is identified but is not really there —or a missed variant call (red dot). The top box shows variant calls made by DeepVariant using the pangenome reference while the bottom shows variant calls made by using the linear reference. Figure adapted from A Draft Human Pangenome Reference.

Improving pangenome sequences using transformers

Just as new sequencing technologies enabled new pangenome approaches, new informatics technologies enabled improvements for sequencing methods. Google adapted transformer architectures from analysis of human language to genome sequences to develop DeepConsensus. A key enabler for this was the development of a differentiable loss function that could handle the insertions and deletions common in sequencing data. This enabled us to have high accuracy without needing a decoder, allowing the speed required to keep up with terabytes of sequencer output.

Transformer architecture for DeepConsensus. DeepConsensus takes as input the repeated sequence of the DNA molecule, measured from fluorescent light detected by the addition of each base. DeepConsensus also uses as input the more detailed information about the sequencing process, including the duration of the light pulse (referred to here as pulse width or PW), the time between pulses (IP) the signal-to-noise ratio (SN) and which side of the double helix is being measured (strand).
Effect of alignment loss function in training evaluation of model output. Better accounting of insertions and deletions by a differentiable alignment function enables the model training process to better estimate errors.

DeepConsensus improves the yield and accuracy of instrument data. Because PacBio sequencing provides the primary sequence information for the 47 genome assemblies, we could apply DeepConsensus to improve those assemblies. With application of DeepConsensus, consortium members built a genome assembler that was able to reach 99.9997% assembly base-level accuracies.

Conclusion

We developed multiple new approaches to improve genetic sequencing methods, which we then used to construct pangenome references that enable more robust genome analysis.

But this is just the beginning of the story. In the next stage, a larger, worldwide group of scientists and clinicians will use this pangenome reference to study genetic diseases and make new drugs. And future pangenomes will represent even more individuals, realizing a vision summarized this way in a recent Nature story: “Every base, everywhere, all at once.” Read our post on the Keyword Blog to learn more about the human pangenome reference announcement.

Acknowledgements

Many people were involved in creating the pangenome reference, including 119 authors across 60 organizations, with the Human Pangenome Reference Consortium. This blog post highlights Google’s contributions to the broader work. We thank the research groups at UCSC Genomics Institute (GI) under Professors Benedict Paten and Karen Miga, genome polishing efforts of Arang Rhie at National Institute of Health (NIH), Genome Assembly and Polishing of Adam Phillipy’s group, and the standards group at National Institute of Standards and Technology (NIST) of Justin Zook. We thank Google contributors: Pi-Chuan Chang, Maria Nattestad, Daniel Cook, Alexey Kolesnikov, Anastaysia Belyaeva, and Gunjan Baid. We thank Lizzie Dorfman, Elise Kleeman, Erika Hayden, Cory McLean, Shravya Shetty, Greg Corrado, Katherine Chou, and Yossi Matias for their support, coordination, and leadership. Last but not least, thanks to the research participants that provided their DNA to help build the pangenome resource.

Read More

Reduce Amazon SageMaker inference cost with AWS Graviton

Reduce Amazon SageMaker inference cost with AWS Graviton

Amazon SageMaker provides a broad selection of machine learning (ML) infrastructure and model deployment options to help meet your ML inference needs. It’s a fully-managed service and integrates with MLOps tools so you can work to scale your model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden. SageMaker provides multiple inference options so you can pick the option that best suits your workload.

New generations of CPUs offer a significant performance improvement in ML inference due to specialized built-in instructions. In this post, we focus on how you can take advantage of the AWS Graviton3-based Amazon Elastic Compute Cloud (EC2) C7g instances to help reduce inference costs by up to 50% relative to comparable EC2 instances for real-time inference on Amazon SageMaker. We show how you can evaluate the inference performance and switch your ML workloads to AWS Graviton instances in just a few steps.

To cover the popular and broad range of customer applications, in this post we discuss the inference performance of PyTorch, TensorFlow, XGBoost, and scikit-learn frameworks. We cover computer vision (CV), natural language processing (NLP), classification, and ranking scenarios for models and ml.c6g, ml.c7g, ml.c5, and ml.c6i SageMaker instances for benchmarking.

Benchmarking results

AWS measured up to 50% cost savings for PyTorch, TensorFlow, XGBoost, and scikit-learn model inference with AWS Graviton3-based EC2 C7g instances relative to comparable EC2 instances on Amazon SageMaker. At the same time, the latency of inference is also reduced.

For comparison, we used four different instance types:

All four instances have 16 vCPUs and 32 GiB of memory.

In the following graph, we measured the cost per million inference for the four instance types. We further normalized the cost per million inference results to a c5.4xlarge instance, which is measured as 1 on the Y-axis of the chart. You can see that for the XGBoost models, cost per million inference for c7g.4xlarge (AWS Graviton3) is about 50% of the c5.4xlarge and 40% of c6i.4xlarge; for the PyTorch NLP models, the cost savings is about 30–50% compared to c5 and c6i.4xlarge instances. For other models and frameworks, we measured at least 30% cost savings compared to c5 and c6i.4xlarge instances.

Similar to the preceding inference cost comparison graph, the following graph shows the model p90 latency for the same four instance types. We further normalized the latency results to the c5.4xlarge instance, which is measured as 1 in the Y-axis of the chart. The c7g.4xlarge (AWS Graviton3) model inference latency is up to 50% better than the latencies measured on c5.4xlarge and c6i.4xlarge.

Migrate to AWS Graviton instances

To deploy your models to AWS Graviton instances, you can either use AWS Deep Learning Containers (DLCs) or bring your own containers that are compatible with the ARMv8.2 architecture.

The migration (or new deployment) of your models to AWS Graviton instances is straightforward because not only does AWS provide containers to host models with PyTorch, TensorFlow, scikit-learn, and XGBoost, but the models are architecturally agnostic as well. You can also bring your own libraries, but be sure that your container is built with an environment that supports the ARMv8.2 architecture. For more information, see Building your own algorithm container.

You will need to complete three steps in order to deploy your model:

  1. Create a SageMaker model. This will contain, among other parameters, the information about the model file location, the container that will be used for the deployment, and the location of the inference script. (If you have an existing model already deployed in a compute optimized inference instance, you can skip this step.)
  2. Create an endpoint configuration. This will contain information about the type of instance you want for the endpoint (for example, ml.c7g.xlarge for AWS Graviton3), the name of the model you created in the previous step, and the number of instances per endpoint.
  3. Launch the endpoint with the endpoint configuration created in the previous step.

For detailed instructions, refer to Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker

Benchmarking methodology

We used Amazon SageMaker Inference Recommender to automate performance benchmarking across different instances. This service compares the performance of your ML model in terms of latency and cost on different instances and recommends the instance and configuration that gives the best performance for the lowest cost. We have collected the aforementioned performance data using Inference Recommender. For more details, refer to the GitHub repo.

You can use the sample notebook to run the benchmarks and reproduce the results. We used the following models for benchmarking:

Conclusion

AWS measured up to 50% cost savings for PyTorch, TensorFlow, XGBoost, and scikit-learn model inference with AWS Graviton3-based EC2 C7g instances relative to comparable EC2 instances on Amazon SageMaker. You can migrate your existing inference use cases or deploy new ML models on AWS Graviton by following the steps provided in this post. You can also refer to the AWS Graviton Technical Guide, which provides the list of optimized libraries and best practices that will help you achieve cost benefits with AWS Graviton instances across different workloads.

If you find use cases where similar performance gains are not observed on AWS Graviton, please reach out us. We will continue to add more performance improvements to make AWS Graviton the most cost-effective and efficient general-purpose processor for ML inference.


About the authors

Sunita Nadampalli is a Software Development Manager at AWS. She leads Graviton software performance optimizations for machine learning, HPC, and multimedia workloads. She is passionate about open-source development and delivering cost-effective software solutions with Arm SoCs.

Jaymin Desai is a Software Development Engineer with the Amazon SageMaker Inference team. He is passionate about taking AI to the masses and improving the usability of state-of-the-art AI assets by productizing them into features and services. In his free time, he enjoys exploring music and traveling.

Mike Schneider is a Systems Developer, based in Phoenix AZ. He is a member of Deep Learning containers, supporting various Framework container images, to include Graviton Inference. He is dedicated to infrastructure efficiency and stability.

Mohan Gandhi is a Senior Software Engineer at AWS. He has been with AWS for the last 10 years and has worked on various AWS services like EMR, EFA and RDS. Currently, he is focused on improving the SageMaker Inference Experience. In his spare time, he enjoys hiking and marathons.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Wayne Toh is a Specialist Solutions Architect for Graviton at AWS. He focuses on helping customers adopt ARM architecture for large scale container workloads. Prior to joining AWS, Wayne worked for several large software vendors, including IBM and Red Hat.

Lauren Mullennex is a Solutions Architect based in Denver, CO. She works with customers to help them architect solutions on AWS. In her spare time, she enjoys hiking and cooking Hawaiian cuisine.

Read More

­­­­How Sleepme uses Amazon SageMaker for automated temperature control to maximize sleep quality in real time

­­­­How Sleepme uses Amazon SageMaker for automated temperature control to maximize sleep quality in real time

This is a guest post co-written with Trey Robinson, CTO at Sleepme Inc.

Sleepme is an industry leader in sleep temperature management and monitoring products, including an Internet of Things (IoT) enabled sleep tracking sensor suite equipped with heart rate, respiration rate, bed and ambient temperature, humidity, and pressure sensors.

Sleepme offers a smart mattress topper system that can be scheduled to cool or heat your bed using the companion application. The system can be paired with a sleep tracker that gathers insights such as heart rate, respiration rate, humidity in the room, wake up times, and when the user was in and out of bed. At the end of a given sleep session, it will aggregate sleep tracker insights, along with sleep stage data, to produce a sleep quality score.

This smart mattress topper works like a thermostat for your bed and gives customers control of their sleep climate. Sleepme products help you cool your body temperature, which is linked with falling into a deep sleep, while being hot can reduce the likelihood of falling and staying asleep.

In this post, we share how Sleepme used Amazon SageMaker to developed a machine learning (ML) model proof of concept that recommends temperatures to maximize your sleep score.

“The adoption of AI opens new avenues to improve customers’ sleeping experience. These changes will be implemented in the Sleepme product line, allowing the client to leverage the technical and marketing value of the new features during deployment.”

– Trey Robinson, Chief Technology Officer of Sleepme.

Using ML to improve sleep in real time

Sleepme is a science-driven organization that uses scientific studies, international journals, and cutting-edge research to bring customers the latest in sleep health and wellness. Sleepme provides sleep science information on their website.

Sleepme discusses how only 44% of Americans report a restful night’s sleep almost every night, and that 35% of adults sleep less than 7 hours per night. Getting a full night’s sleep helps you feel more energized and has proven benefits to your mind, weight, and heart. This represents a huge population of people with opportunities to improve their sleep and health.

Sleepme saw an opportunity to improve the sleep of their users by changing the user’s sleep environment during the night. By capturing environment data like temperature and humidity and connecting it with personalized user data like restlessness, heart rate, and sleep cycle, Sleepme determined they were able to change the user’s environment to optimize their rest. This use case demanded an ML model that served real-time inference.

Sleepme needed a highly available inference model that provides low-latency recommendations. With a focus on delivering new features and products for their customers, Sleepme needed an out-of-the-box solution that doesn’t require infrastructure management.

To address these challenges, Sleepme turned to Amazon SageMaker.

Using Amazon SageMaker to build an ML model for sleep temperature recommendations

SageMaker accelerates the deployment of ML workloads by simplifying the ML build process. It provides a set of ML capabilities that run on a managed infrastructure on AWS. This reduces the operational overhead and complexity associated with ML development.

Sleepme chose SageMaker because of the capabilities it provides in model training, endpoint deployment process, and infrastructure management. The following diagram illustrates their AWS architecture.

Solution Diagram

Sleepme is focused on delivering new products and features for their customers. They didn’t want to dedicate their resources to a lengthy ML model training process.

SageMaker’s Model Training allowed Sleepme to use their historical data to quickly develop a proprietary machine learning model. SageMaker Model Training provides dozens of built-in training algorithms and hundreds of pre-trained models, increasing Sleepme’s agility in model creation. By managing the underlying compute instances, SageMaker Model Training enabled Sleepme to focus on enhancing model performance.

This ML model needed to make sleep environment adjustments in real time. To achieve this, Sleepme used a SageMaker Real-time inference to manage the hosting of their model. This endpoint receives data from Sleepme’s smart mattress topper and sleep tracker to make a temperature recommendation for the user’s sleep in real time. Additionally, with the option for automatic scaling of models, SageMaker inference offered Sleepme the option to add or remove instances to meet demand.

SageMaker also provides Sleepme with useful features as their workload evolves. They could use shadow tests to evaluate model performance of new versions before they are deployed to customers, SakeMaker Model Registry to manage model versions and automate model deployment, and SageMaker Model Monitoring to monitor the quality of their model in production. These features provide Sleepme with the opportunity to take their ML use cases to next level, without developing new capabilities on their own.

Conclusion

With Amazon SageMaker, Sleepme was able to build and deploy a custom ML model in a matter of weeks that identifies the recommended temperature adjustment, which the Sleepme devices mirror to the user’s environment.

Sleepme IoT devices capture sleep data and can now make adjustments to a customer’s bed in minutes. This capability proved to be a business differentiator. Now, users sleep can be optimized to provide a higher-quality sleep in real time.

To learn more about how you can quickly build ML models, refer to the Train Models or get started on the SageMaker Console.


About the Authors

Trey Robinson has been a mobile and IoT-focused software engineer leading teams as the CTO of Sleepme Inc and Director of Engineering at Passport Inc. He has worked on dozens of mobile apps, backends, and IoT projects over the years. Before moving to Charlotte, NC, Trey grew up in Ninety Six, South Carolina, and studied Computer Science at Clemson University.

Benon Boyadjian is a Solutions Architect in the Private Equity group at Amazon Web Services. Benon works directly with Private Equity Firms and their portfolio companies, helping them leverage AWS to achieve business objectives and increase enterprise value.

Read More

Publish predictive dashboards in Amazon QuickSight using ML predictions from Amazon SageMaker Canvas

Publish predictive dashboards in Amazon QuickSight using ML predictions from Amazon SageMaker Canvas

Understanding business trends, customer behavior, sales revenue, increase in demand, and buyer propensity all start with data. Exploring, analyzing, interpreting, and finding trends in data is essential for businesses to achieve successful outcomes.

Business analysts play a pivotal role in facilitating data-driven business decisions through activities such as the visualization of business metrics and the prediction of future events. Quick iteration and faster time-to-value can be achieved by providing these analysts with a visual business intelligence (BI) tool for simple analysis, supported by technologies like machine learning (ML).

Amazon QuickSight is a fully managed, cloud-native BI service that makes it easy to connect to your data, create interactive dashboards and reports, and share these with tens of thousands of users, either within QuickSight or embedded in your application or website. Amazon SageMaker Canvas is a visual interface that enables business analysts to generate accurate ML predictions on their own, without requiring any ML experience or having to write a single line of code.

In this post, we show how you can publish predictive dashboards in QuickSight using ML-based predictions from Canvas, without explicitly downloading predictions and importing into QuickSight. This solution will help you send predictions from Canvas to QuickSight, enabling you with accelerated decision-making using ML to achieve effective business outcomes.

Solution overview

In the following sections, we discuss steps that will help administrators configure the right permissions to seamlessly redirect users from Canvas to QuickSight. Then we detail how to build a model and run predictions, and demonstrate the business analyst experience.

Prerequisites

The following prerequisites are needed to implement this solution:

Make sure to use the same QuickSight Region as Canvas. You can change the Region by navigating from the profile icon on the QuickSight console.

Administrator setup

In this section, we detail the steps to set up IAM resources, prepare the data, train the data with the training dataset, and infer the validation dataset. Thereafter, we send the data to QuickSight for further analysis.

Create a new IAM policy for QuickSight access

To create an IAM policy, complete the following steps:

  1. On the IAM console, choose Policies in the navigation pane.
  2. Choose Create policy.
  3. On the JSON tab, enter the following permissions policy into the editor:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "quicksight:CreateDataSet",
                "quicksight:ListNamespaces",
                "quicksight:CreateDataSource",
                "quicksight:PassDataSet",
                "quicksight:PassDataSource"
            ],
            "Resource":[
                "arn:aws:quicksight:*:<AWS-account-id>:datasource/*", #replace account id
                "arn:aws:quicksight:*:<AWS-account-id>:user/*", #replace account id
                "arn:aws:quicksight:*:<AWS-account-id>:namespace/*", #replace account id
                "arn:aws:quicksight:*:<AWS-account-id>:dataset/*" #replace account id
            ]
        }
    ]
}

For details about the IAM policy language, see IAM JSON policy reference.

  1. Choose Next: Tags.
  2. You can add metadata to the policy by attaching tags as key-value pairs, then choose Next: Review.

For more information about using tags in IAM, see Tagging IAM resources.

  1. On the Review policy page, enter a name (for example, canvas-quicksight-access-policy) and an optional description of the policy.
  2. Review the Summary section to see the permissions that are granted by your policy.
  3. Choose Create policy to save your work.

After you create a policy, you can attach it to your execution role that grants your users the necessary permissions to send batch predictions to users in QuickSight.

Attach the policy to your Studio execution role

To attach the policy to your Studio execution role, complete the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose your domain.
  3. Choose Domain settings.
  4. Copy the role name under Execution role.

  1. On the IAM console, choose Roles in the navigation pane.
  2. In the search bar, enter the execution role you copied, then choose the role.

  1. On the page for the user’s role, navigate to the Permissions policies section.
  2. On the Add permissions menu, choose Attach policies.
  3. Search for the previously created policy (canvas-quicksight-access-policy), select it, and choose Add permissions.

Now you have an IAM policy attached to your execution role that grants your users the necessary permissions to send batch predictions to users in QuickSight.

Download the datasets

Let’s download the datasets that we use to train the model and make the predictions:

Build a model and run predictions

In this section, we cover how we can build a model and run predictions on the loan dataset. Then we send the data to the QuickSight dashboard to get business insights.

Launch Canvas

To launch Canvas, complete the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose your domain.
  3. On the Launch menu, choose Canvas.

Upload training and validation datasets

Complete the following steps to upload your datasets to Canvas:

  1. On the Canvas home page, choose Datasets.
  2. Choose Import data, then upload lending_club_loan_data_train.csv and lending_club_loan_data_test.csv.
  3. Choose Save & Close, then choose Import data.

Now let’s create new model.

  1. Choose My models in the navigation pane.
  2. Choose New model.
  3. Enter a name to your model (Loan_Prediction) and choose Create.

If this is the first time creating a Canvas model, you will be welcomed by an informative pop-up about how to build your first model in four simple steps. You can read this through, then come back to this guide.

  1. In the model view, on the Select tab, select the lending_club_loan_data_train dataset.

This dataset has 18 columns and 32,000 rows.

  1. Choose Select dataset.

  1. On the Build tab, choose the target column, in our case loan_status.

Canvas will automatically detect that this is a 3+ category prediction problem (also known as multi-class classification).

  1. If another model type is detected, change it manually by choosing Change type.

  1. Choose Quick build, and select Start quick build from the pop-up.

You can also choose Standard build, which goes through the complete AutoML cycle, generating multiple models before recommending the best model.

Now your model is being built. Quick build usually takes 2–15 minutes.

After the model is built, you can find the model status on the Analyze tab.

Make predictions with the model

After we build and train the model, we can generate predictions on this model.

  1. Choose Predict on the Analyze tab, or choose the Predict tab.
  2. Run a single prediction by choosing Single prediction and providing entries.

You will see the loan_status prediction on the right side of the page. You can copy the prediction by choosing Copy, or download it by choosing Download prediction. This is ideal for generating what-if scenarios and testing how different columns impact the predictions of our model.

  1. To run batch predictions, choose Batch prediction.

This is best when you’d like to make predictions for an entire dataset. You should make predictions with a dataset that matches your input dataset.

For each prediction or set of predictions, Canvas returns the predicted values and the probability of the predicted value being correct.

Let’s make predictions from the trained model using the validation dataset.

  1. Choose Select the dataset.
  2. Select lending_club_loan_data_test and choose Generate predictions.

When your predictions are ready, you can find them in the Dataset section. You can preview the prediction, download it to a local machine, delete it, or send it to QuickSight.

Send predictions to QuickSight

You can now share predictions from these ML models as QuickSight datasets that will serve as a new source for enterprise-wide dashboards. You can analyze trends, risks, and business opportunities. Through this capability, ML becomes more accessible to business teams so they can accelerate data-driven decision-making. Sharing data with QuickSight users grants them owner permissions on the dataset. Multiple inferred datasets can be sent at once to QuickSight.

Note that you can only send predictions to users in the default namespace of the QuickSight account, and the user must have the Author or Admin role in QuickSight. Predictions sent to QuickSight are available in the same Region as Canvas.

  1. Select the inferred batch dataset and choose Send to Amazon QuickSight.

  1. Enter one or multiple QuickSight user names to share the dataset with and press Enter.
  2. Choose Send to share data.

After you send your batch predictions, the QuickSight field for the datasets you sent shows as Sent.

  1. In the confirmation box, you can choose Open Amazon QuickSight to open your QuickSight application.
  2. If you’re done using Canvas, log out of the Canvas application.

You can send batch predictions to QuickSight for numeric, categorical prediction, and time series forecasting models. You can also send predictions generated with the bring your own model (BYOM) method. Single-label image prediction and multi-category text prediction models are excluded.

The QuickSight users that you’ve sent datasets to can open their QuickSight console and view the Canvas datasets that have been shared with them. Then they can create predictive dashboards with the data. For more information, see Getting started with Amazon QuickSight data analysis.

By default, all the users to whom you send predictions have owner permissions for the dataset in QuickSight. Owners are able to create analyses, refresh, edit, delete, and reshare datasets. The changes that owners make to a dataset change the dataset for all users with access. To change the permissions, go to the dataset in QuickSight and manage its permissions. For more information, see Viewing and editing the permissions users that a dataset is shared with.

Business analysts experience

With QuickSight, you can visualize your data to better understand it. We start by getting some high-level information.

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Create an analysis on the batch prediction dataset shared from Canvas by choosing Create analysis on the drop-down options menu (three vertical dots).

  1. On the analysis page, choose the sheet name and rename to it Loan Data Analysis.

Let’s create a visual to show the count by loan status.

  1. For Visual types, choose Donut chart.
  2. Use the loan_status field for Group/Color.

We can see that 99% are fully paid, 1% are current, and 0% are charged off.

Now we add a second visual to show the amount of loans by status.

  1. On the top-left corner, choose the plus sign and choose Add visual.
  2. For Visual types, choose Waterfall chart.
  3. Use the loan_status field for Category.
  4. Use the loan_amount field for Value.

We can see that the total loan amount is around $88 million, with around $221,000 charged off.

Let’s try to detect some risk drivers for defaulting on loans.

  1. Choose the plus sign and choose Add visual.
  2. For Visual types, choose Horizontal bar chart.
  3. Use the loan_status field for Y axis.
  4. Use the loan_amount field for Value.
  5. Modify the Value field aggregation from Sum to Average.

We can see that on average, the loan amount was around $3,500 lower for the fully paid loans compared to the current loans, and around $3,500 lower for the fully paid loans compared to the charged off loans. There seems to be a correlation between the loan amount and the credit risk.

  1. To duplicate the visual, choose the options menu (three dots), choose Duplicate visual to, and choose This sheet.
  2. Choose the duplicated visual to modify its configuration.
  3. For Visual types, choose Horizontal bar chart.
  4. Use the loan_status field for Y axis.
  5. Use the loan_amount field for Value.
  6. Modify the Value field aggregation from Sum to Average.

You can create additional visuals to check for additional risk drivers. For example:

  • Loan term
  • Open credit lines
  • Revolving line utilization rate
  • Total credit lines
  1. After you add the visuals, publish the dashboard using the Share option on the analyses page and share the dashboard with the business stakeholders.

Clean up

To avoid incurring future charges, delete or shut down the resources you created while following this post. Refer to Logging out of Amazon SageMaker Canvas for more details.

Conclusion

In this post, we trained an ML model using Canvas without writing a single line of code thanks to its user-friendly interfaces and clear visualizations. We then generated single and batch predictions for this model in Canvas. To assess the trends, risks, and business opportunities across the enterprise, we sent the predictions of this ML model to QuickSight. As business analysts, we created various visualizations to assess the trends in QuickSight.

This capability is available in all Regions where Canvas is now supported. You can learn more on the Canvas product page and documentation.


About the Authors

Ajjay Govindaram is a Senior Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.

Varun Mehta is a Solutions Architect at AWS. He is passionate about helping customers build enterprise-scale well-architected solutions on the AWS Cloud. He works with strategic customers who are using AI/ML to solve complex business problems.

Shyam Srinivasan is a Principal Product Manager on the AWS AI/ML team, leading product management for Amazon SageMaker Canvas. Shyam cares about making the world a better place through technology and is passionate about how AI and ML can be a catalyst in this journey.

Read More

How AI and Crowdsourcing Can Advance mRNA Vaccine Distribution

How AI and Crowdsourcing Can Advance mRNA Vaccine Distribution

Artificial intelligence is teaming up with crowdsourcing to improve the thermo-stability – the ability to avoid breaking down under heat stress –  of mRNA vaccines, making distribution more accessible worldwide.

In this episode of NVIDIA’s AI Podcast, host Noah Kravitz interviewed Bojan Tunguz, a physicist and senior system software engineer at NVIDIA, and Johnny Israeli, senior manager of AI and cloud software at NVIDIA.

The guests delved into AI’s potential in drug discovery and the Stanford Open Vaccine competition, a machine-learning contest using crowdsourcing to tackle the thermo-stability challenges of mRNA vaccines.

Kaggle, the online machine learning competition platform, hosted the Stanford Open Vaccine competition. Tunguz, a quadruple Kaggle grandmaster, shared how Kaggle has grown to encompass not just competitions, but also datasets, code and discussions. Competitors can earn points, rankings and status achievements across these four areas.

The fusion of AI, crowdsourcing and machine learning competitions is opening new possibilities in drug discovery and vaccine distribution. By tapping into the collective wisdom and skills of participants worldwide, it becomes possible to solve pressing global problems, such as enhancing the thermo-stability of mRNA vaccines, allowing for a more efficient and widely accessible distribution process.

You Might Also Like

Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive

Teaching the AI brains of autonomous vehicles to understand the world as humans do requires billions of miles of driving experience. The road to achieving this astronomical level of driving leads to the virtual world. Learn how Waabi uses powerful high-fidelity simulations to train and develop production-level autonomous vehicles.

Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans

Driving enjoyment and autonomous driving capabilities can complement one another in intelligent, sustainable vehicles. Learn about the automaker’s plans to unveil its third vehicle, the Polestar 3, the tech inside it, and what the company’s racing heritage brings to the intersection of smarts and sustainability.

GANTheftAuto: Harrison Kinsley on AI-Generated Gaming Environments

Humans playing games against machines is nothing new, but now computers can develop their own games for people to play. Programming enthusiast and social media influencer Harrison Kinsley created GANTheftAuto, an AI-based neural network that generates a playable chunk of the classic video game Grand Theft Auto V.

Subscribe to the AI Podcast: Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Announcing new Jupyter contributions by AWS to democratize generative AI and scale ML workloads

Announcing new Jupyter contributions by AWS to democratize generative AI and scale ML workloads

Project Jupyter is a multi-stakeholder, open-source project that builds applications, open standards, and tools for data science, machine learning (ML), and computational science. The Jupyter Notebook, first released in 2011, has become a de facto standard tool used by millions of users worldwide across every possible academic, research, and industry sector. Jupyter enables users to work with code and data interactively, and to build and share computational narratives that provide a full and reproducible record of their work.

Given the importance of Jupyter to data scientists and ML developers, AWS is an active sponsor and contributor to Project Jupyter. Our goal is to work in the open-source community to help Jupyter to be the best possible notebook platform for data science and ML. AWS is a platinum sponsor of Project Jupyter through the NumFOCUS Foundation, and I am proud and honored to lead a dedicated team of AWS engineers who contribute to Jupyter’s software and participate in Jupyter’s community and governance. Our open-source contributions to Jupyter include JupyterLab, Jupyter Server, and the Jupyter Notebook subprojects. We are also members of the Jupyter working groups for Security, and Diversity, Equity, and Inclusion (DEI). In parallel to these open-source contributions, we have AWS product teams who are working to integrate Jupyter with products such as Amazon SageMaker.

Today at JupyterCon, we are excited to announce several new tools for Jupyter users to improve their experience and boost development productivity. All of these tools are open-source and can be used anywhere you are running Jupyter.

Introducing two generative AI extensions for Jupyter

Generative AI can significantly boost the productivity of data scientists and developers as they write code. Today, we are announcing two Jupyter extensions that bring generative AI to Jupyter users through a chat UI, IPython magic commands, and autocompletion. These extensions enable you to perform a wide range of development tasks using generative AI models in JupyterLab and Jupyter notebooks.

Jupyter AI, an open-source project to bring generative AI to Jupyter notebooks

Using the power of large language models like ChatGPT, AI21’s Jurassic-2, and (coming soon) Amazon Titan, Jupyter AI is an open-source project that brings generative AI features to Jupyter notebooks. For example, using a large language model, Jupyter AI can help a programmer generate, debug, and explain their source code. Jupyter AI can also answer questions about local files and generate entire notebooks from a simple natural language prompt. Jupyter AI offers both magic commands that work in any notebook or IPython shell, and a friendly chat UI in JupyterLab. Both of these experiences work with dozens of models from a wide range of model providers. JupyterLab users can select any text or notebook cells, enter a natural language prompt to perform a task with the selection, and then insert the AI-generated response wherever they choose. Jupyter AI is integrated with Jupyter’s MIME type system, which lets you work with inputs and outputs of any type that Jupyter supports (text, images, etc.). Jupyter AI also provides integration points that allows third parties to configure their own models. Jupyter AI is an official open-source project of Project Jupyter.

Amazon CodeWhisperer Jupyter extension

Autocompletion is foundational for developers and generative AI can significantly enhance the code suggestion experience. That is why we announced the general availability of Amazon CodeWhisperer earlier in 2023. is an AI coding companion that uses foundational models under the hood to radically improve developer productivity. This works by generating code suggestions in real time based on developers’ comments in natural language and prior code in their integrated development environment (IDE).

Today, we are excited to announce that JupyterLab users can install and use the CodeWhisperer extension for free to generate real-time, single-line, or full-function code suggestions for Python notebooks in JupyterLab and Amazon SageMaker Studio. With CodeWhisperer, you can write a comment in natural language that outlines a specific task in English, such as “Create a pandas dataframe using a CSV file.” Based on this information, CodeWhisperer recommends one or more code snippets directly in the notebook that can accomplish the task. You can quickly and easily accept the top suggestion, view more suggestions, or continue writing your own code.

During its preview, CodeWhisperer proved it is excellent at generating code to accelerate coding tasks, helping developers complete tasks an average of 57% faster. Additionally, developers who used CodeWhisperer were 27% more likely to complete a coding task successfully than those who did not. This is a giant leap forward in developer productivity. CodeWhisperer also includes a built-in reference tracker that detects whether a code suggestion might resemble open-source training data and can flag such suggestions.

Introducing new Jupyter extensions to build, train, and deploy ML at scale

Our mission at AWS is to democratize access to ML across industries. To achieve this goal, starting from 2017, we launched the Amazon SageMaker notebook instance—a fully managed compute instance running Jupyter that includes all the popular data science and ML packages. In 2019, we made a significant leap forward with the launch of SageMaker Studio, an IDE for ML built on top of JupyterLab that enables you to build, train, tune, debug, deploy, and monitor models from a single application. Tens of thousands of customers are using Studio to empower data science teams of all sizes. In 2021, we further extended the benefits of SageMaker to the community of millions of Jupyter users by launching Amazon SageMaker Studio Lab—a free notebook service, again based on JupyterLab, that includes free compute and persistent storage.

Today, we are excited to announce three new capabilities to help you scale ML development faster.

Notebooks scheduling

In 2022, we released a new capability to enable our customers to run notebooks as scheduled jobs in SageMaker Studio and Studio Lab. Thanks to this capability, many of our customers have saved time by not having to manually set up complex cloud infrastructure to scale their ML workflows.

We are excited to announce that the notebooks scheduling tool is now an open-source Jupyter extension that allows JupyterLab users to run and schedule notebooks on SageMaker anywhere JupyterLab runs. Users can select a notebook and automate it as a job that runs in a production environment via a simple yet powerful user interface. After a notebook is selected, the tool takes a snapshot of the entire notebook, packages its dependencies in a container, builds the infrastructure, runs the notebook as an automated job on a schedule set by the user, and deprovisions the infrastructure upon job completion. This reduces the time it takes to move a notebook to production from weeks to hours.

SageMaker open-source distribution

Data scientists and developers want to begin developing ML applications quickly, and it can be complex to install the mutually compatible versions of all the necessary packages. To remove the manual work and improve productivity, we are excited to announce a new open-source distribution that includes the most popular packages for ML, data science, and data visualization. This distribution includes deep learning frameworks like PyTorch, TensorFlow, and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab and the Jupyter Notebook. The distribution is versioned using SemVer and will be released on a regular basis moving forward. The container is available via Amazon ECR Public Gallery, and its source code is available on GitHub. This provides enterprises transparency into the packages and build process, thereby making it easier for them to reproduce, customize, or re-certify the distribution. The base image comes with pip and Conda/Mamba, so that data scientists can quickly install additional packages to meet their specific needs.

Amazon CodeGuru Jupyter extension

Amazon CodeGuru Security now supports security and code quality scans in JupyterLab and SageMaker Studio. This new capability assists notebook users in detecting security vulnerabilities such as injection flaws, data leaks, weak cryptography, or missing encryption within the notebook cells. You can also detect many common issues that affect the readability, reproducibility, and correctness of computational notebooks, such as misuse of ML library APIs, invalid run order, and nondeterminism. When vulnerabilities or quality issues are identified in the notebook, CodeGuru generates recommendations that enable you to remediate those issues based on AWS security best practices.

Conclusion

We are excited to see how the Jupyter community will use these tools to scale development, increase productivity, and take advantage of generative AI to transform their industries. Check out the following resources to learn more about Jupyter on AWS and how to install and get started with these new tools:


About the Author

Brian Granger is a leader of the Python project, co-founder of Project Jupyter, and an active contributor to a number of other open-source projects focused on data science in Python. In 2016, he co-created the Altair package for statistical visualization in Python. He is an advisory board member of the NumFOCUS Foundation, a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship, and the Sr. Principal Technologist at AWS.

Read More

Schedule your notebooks from any JupyterLab environment using the Amazon SageMaker JupyterLab extension

Schedule your notebooks from any JupyterLab environment using the Amazon SageMaker JupyterLab extension

Jupyter notebooks are highly favored by data scientists for their ability to interactively process data, build ML models, and test these models by making inferences on data. However, there are scenarios in which data scientists may prefer to transition from interactive development on notebooks to batch jobs. Examples of such use cases include scaling up a feature engineering job that was previously tested on a small sample dataset on a small notebook instance, running nightly reports to gain insights into business metrics, and retraining ML models on a schedule as new data becomes available.

Migrating from interactive development on notebooks to batch jobs required you to copy code snippets from the notebook into a script, package the script with all its dependencies into a container, and schedule the container to run. To run this job repeatedly on a schedule, you had to set up, configure, and oversee cloud infrastructure to automate deployments, resulting in a diversion of valuable time away from core data science development activities.

To help simplify the process of moving from interactive notebooks to batch jobs, in December 2022, Amazon SageMaker Studio and Studio Lab introduced the capability to run notebooks as scheduled jobs, using notebook-based workflows. You can now use the same capability to run your Jupyter notebooks from any JupyterLab environment such as Amazon SageMaker notebook instances and JupyterLab running on your local machine. SageMaker provides an open-source extension that can be installed on any JupyterLab environment and be used to run notebooks as ephemeral jobs and on a schedule.

In this post, we show you how to run your notebooks from your local JupyterLab environment as scheduled notebook jobs on SageMaker.

Solution overview

The solution architecture for scheduling notebook jobs from any JupyterLab environment is shown in the following diagram. The SageMaker extension expects the JupyterLab environment to have valid AWS credentials and permissions to schedule notebook jobs. We discuss the steps for setting up credentials and AWS Identity and Access Management (IAM) permissions later in this post. In addition to the IAM user and assumed role session scheduling the job, you also need to provide a role for the notebook job instance to assume for access to your data in Amazon Simple Storage Service (Amazon S3) or to connect to Amazon EMR clusters as needed.

In the following sections, we show how to set up the architecture and install the open-source extension, run a notebook with the default configurations, and also use the advanced parameters to run a notebook with custom settings.

Prerequisites

For this post, we assume a locally hosted JupyterLab environment. You can follow the same installation steps for an environment hosted in the cloud as well.

The following steps assume that you already have a valid Python 3 and JupyterLab environment (this extension works with JupyterLab v3.0 or higher).

Install the AWS Command Line Interface (AWS CLI) if you don’t already have it installed. See Installing or updating the latest version of the AWS CLI for instructions.

Set up IAM credentials

You need an IAM user or an active IAM role session to submit SageMaker notebook jobs. To set up your IAM credentials, you can configure the AWS CLI with your AWS credentials for your IAM user, or assume an IAM role. For instructions on setting up your credentials, see Configuring the AWS CLI. The IAM principal (user or assumed role) needs the following permissions to schedule notebook jobs. To add the policy to your principal, refer to Adding IAM identity permissions.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EventBridgeSchedule",
            "Effect": "Allow",
            "Action": [
                "events:TagResource",
                "events:DeleteRule",
                "events:PutTargets",
                "events:DescribeRule",
                "events:EnableRule",
                "events:PutRule",
                "events:RemoveTargets",
                "events:DisableRule"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                }
            }
        },
        {
            "Sid": "IAMPassRoleToNotebookJob",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/SagemakerJupyterScheduler*",
            "Condition": {
                "StringLike": {
                    "iam:PassedToService": [
                        "sagemaker.amazonaws.com",
                        "events.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Sid": "IAMListRoles",
            "Effect": "Allow",
            "Action": "iam:ListRoles",
            "Resource": "*" 
        },
        {
            "Sid": "S3ArtifactsAccess",
            "Effect": "Allow",
            "Action": [
                "s3:PutEncryptionConfiguration",
                "s3:CreateBucket",
                "s3:PutBucketVersioning",
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetEncryptionConfiguration",
                "s3:DeleteObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::sagemaker-automated-execution-*"
            ]
        },
        {
            "Sid": "S3DriverAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::sagemakerheadlessexecution-*"
            ]
        },
        {
            "Sid": "SagemakerJobs",
            "Effect": "Allow",
            "Action": [
                "sagemaker:DescribeTrainingJob",
                "sagemaker:StopTrainingJob",
                "sagemaker:DescribePipeline",
                "sagemaker:CreateTrainingJob",
                "sagemaker:DeletePipeline",
                "sagemaker:CreatePipeline"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                }
            }
        },
         {
            "Sid": "AllowSearch",
            "Effect": "Allow",
            "Action": "sagemaker:Search",
            "Resource": "*"
        },
         {
            "Sid": "SagemakerTags",
            "Effect": "Allow",
            "Action": [
                "sagemaker:ListTags",
                "sagemaker:AddTags"
            ],
            "Resource": [
                "arn:aws:sagemaker:*:*:pipeline/*",
                "arn:aws:sagemaker:*:*:space/*",
                "arn:aws:sagemaker:*:*:training-job/*",
                "arn:aws:sagemaker:*:*:user-profile/*"
            ]
        },
        {
            "Sid": "ECRImage",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchGetImage"
            ],
            "Resource": "*"
        }
   ]
}

If your notebook jobs need to be encrypted with customer managed AWS Key Management Service (AWS KMS) keys, add the policy statement allowing AWS KMS access as well. For a sample policy, see Install policies and permissions for local Jupyter environments.

Set up an IAM role for the notebook job instance

SageMaker requires an IAM role to run jobs on the user’s behalf, such as running the notebook job. This role should have access to the resources required for the notebook to complete the job, such as access to data in Amazon S3.

The scheduler extension automatically looks for IAM roles in the AWS account, with the prefix SagemakerJupyterScheduler to run the notebook jobs.

To create an IAM role, create an execution role for Amazon SageMaker with the AmazonSageMakerFullAccess policy. Name the role SagemakerJupyterSchedulerDemo, or provide a name with the expected prefix.

After the role is created, on the Trust relationships tab, choose Edit trust policy. Replace the existing trust policy with the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "sagemaker.amazonaws.com",
                    "events.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

The AmazonSageMakerFullAccess policy is fairly permissive and is generally preferred for experimentation and getting started with SageMaker. We strongly encourage you to create a minimum scoped policy for any future workloads in accordance with security best practices in IAM. For the minimum set of permissions required for the notebook job, see Install policies and permissions for local Jupyter environments.

Install the extension

Open a terminal on your local machine and install the extension by running the following command:

pip install amazon-sagemaker-jupyter-scheduler

After this command runs, you can start JupyterLab by running jupyter lab.

If you’re installing the extension from within the JupyterLab terminal, restart the Jupyter server to load the extension. You can restart the Jupyter server by choosing Shut Down on the File menu from your JupyterLab, and starting JupyterLab from your command line by running jupyter lab.

Submit a notebook job

After the extension is installed on your environment, you can run any self-contained notebook as an ephemeral job. Let’s submit a simple “Hello world” notebook to run as a scheduled job.

  1. On the File menu, choose New and Notebook.
  2. Enter the following contents:
    # install packages
    !pip install pandas
    !pip install boto3
    
    # import block
    import boto3
    import pandas as pd
    
    # download a sample dataset
    s3 = boto3.client("s3")
    # Load the dataset
    file_name = "abalone.csv"
    s3.download_file(
        "sagemaker-sample-files", f"datasets/tabular/uci_abalone/abalone.csv", file_name
    )
    
    # display the dataset
    df = pd.read_csv(file_name)
    df.head()

After the extension is successfully installed, you’ll see the notebook scheduling icon on the notebook.

  1. Choose the icon to create a notebook job.

Alternatively, you can right-click on the notebook in your file explorer and choose Create notebook job.

  1. Provide the job name, input file, compute type, and additional parameters.
  2. Leave the remaining settings at the default and choose Create.

After the job is scheduled, you’re redirected to the Notebook Jobs tab, where you can view the list of notebook jobs and their status, and view the notebook output and logs after the job is complete. You can also access this notebook jobs window from the Launcher, as shown in the following screenshot.

Advanced configurations

From your local compute, notebooks automatically run on the SageMaker Base Python image, which is the official Python 3.8 image from Docker Hub with Boto3 and the AWS CLI included. In real-world cases, data scientists need to install specific packages or frameworks for their notebooks. There are three ways to achieve a reproducible environment:

  • At the simplest option, you can install the packages and frameworks directly on the first cell of your notebook.
  • You can also provide an initialization script in the Additional options section, pointing to a bash script on your local storage that is run by the notebook job when the notebook starts up. In the following section, we show an example of using initialization scripts to install packages.
  • Finally, if you want maximum flexibility in configuring your run environment, you can build your own custom image with a Python3 kernel, push the image to Amazon Elastic Container Registry (Amazon ECR), and provide the ECR image URI to your notebook job under Additional options. The ECR image should follow the requirements for SageMaker images, as listed in Custom SageMaker image specifications.

In addition, your enterprise might set up guardrails like running jobs in internet-free mode within an Amazon VPC, using a custom least-privilege role for the job, and enforcing encryption. You can specify such configurations for your notebook jobs in the Additional options section as well. For a detailed list of advanced configurations, see Additional options.

Add an initialization script

To showcase the initialization script, we now run the sample notebook for Studio notebook jobs available on GitHub. To run this notebook, you need to install the required packages through an initialization script. Complete the following steps:

  1. From your JupyterLab terminal, run the following command to download the file:
    curl https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-notebook-jobs/studio-scheduling/scheduled-example.ipynb > scheduled-example.ipynb

  2. On the File menu, choose New and Text file.
  3. Enter the following contents to your file, and save the file under the name init-script.sh:
    echo "Installing required packages"
    
    pip install --upgrade sagemaker
    pip install pandas numpy matplotlib scikit-learn

  4. Choose scheduled-example.ipynb from your file explorer to open the notebook.
  5. Choose the notebook job icon to schedule the notebook, and expand the Additional options section.
  6. For Initialization script location, enter the full path of your script.

You can also optionally customize the input and output S3 folders for your notebook job. SageMaker creates an input folder in a specified S3 location to store the input files, and creates an output S3 folder where the notebook outputs are stored. You can specify encryption, IAM role, and VPC configurations here. See Constraints and considerations for custom image and VPC specifications.

  1. For now, simply update the initialization script, choose Run now for the schedule, and choose Create.

When the job is complete, you can view the notebook with outputs and the output log under Output files, as shown in the following screenshot. In the output log, you should be able to see the initialization script being run before running the notebook.

To further customize your notebook job environment, you can use your own image by specifying the ECR URI of your custom image. If you’re bringing your own image, ensure you install a Python3 kernel when building your image. For a sample Dockerfile that can run a notebook using TensorFlow, see the following code:

FROM tensorflow/tensorflow:latest
RUN pip install ipykernel && 
        python -m ipykernel install --sys-prefix

Conclusion

In this post, we showed you how to run your notebooks from any JupyterLab environment hosted locally as SageMaker training jobs, using the SageMaker Jupyter scheduler extension. Being able to run notebooks in a headless manner, on a schedule, greatly reduces undifferentiated heavy lifting for the data scientists, such as refactoring notebooks to Python scripts, setting up Amazon EventBridge event triggers, and creating AWS Lambda functions or SageMaker pipelines to start the training jobs. SageMaker notebook jobs are run on demand, so you only pay for the time that the notebook is run, and you can use the notebook jobs extension to view the notebook outputs anytime from your JupyterLab environment. We encourage you to try scheduled notebook jobs, and connect with the Machine Learning & AI community on re:Post for feedback!


About the authors

Bhadrinath Pani is a Software Development Engineer at Amazon Web Services, working on Amazon SageMaker interactive ML products, with over 12 years of experience in software development across domains like automotive, IoT, AR/VR, and computer vision. Currently, his main focus is on developing machine learning tools aimed at simplifying the experience for data scientists. In his free time, he enjoys spending time with his family and exploring the beauty of the Pacific Northwest.

Durga Sury is an ML Solutions Architect on the Amazon SageMaker Service SA team. She is passionate about making machine learning accessible to everyone. In her 4 years at AWS, she has helped set up AI/ML platforms for enterprise customers. When she isn’t working, she loves motorcycle rides, mystery novels, and long walks with her 5-year-old husky.

Read More

Announcing provisioned concurrency for Amazon SageMaker Serverless Inference

Announcing provisioned concurrency for Amazon SageMaker Serverless Inference

Amazon SageMaker Serverless Inference allows you to serve model inference requests in real time without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. You can let AWS handle the undifferentiated heavy lifting of managing the underlying infrastructure and save costs in the process. A Serverless Inference endpoint spins up the relevant infrastructure, including the compute, storage, and network, to stage your container and model for on-demand inference. You can simply select the amount of memory to allocate and the number of max concurrent invocations to have a production-ready endpoint to service inference requests.

With on-demand serverless endpoints, if your endpoint doesn’t receive traffic for a while and then suddenly receives new requests, it can take some time for your endpoint to spin up the compute resources to process the requests. This is called a cold start. A cold start can also occur if your concurrent requests exceed the current concurrent request usage. With provisioned concurrency on Serverless Inference, you can mitigate cold starts and get predictable performance characteristics for their workloads. You can add provisioned concurrency to your serverless endpoints, and for the predefined amount of provisioned concurrency, Amazon SageMaker will keep the endpoints warm and ready to respond to requests instantaneously. In addition, you can now use Application Auto Scaling with provisioned concurrency to address inference traffic dynamically based on target metrics or a schedule.

In this post, we discuss what provisioned concurrency and Application Auto Scaling are, how to use them, and some best practices and guidance for your inference workloads.

Provisioned concurrency with Application Auto Scaling

With provisioned concurrency on Serverless Inference endpoints, SageMaker manages the infrastructure that can serve multiple concurrent requests without incurring cold starts. SageMaker uses the value specified in your endpoint configuration file called ProvisionedConcurrency, which is used when you create or update an endpoint. The serverless endpoint enables provisioned concurrency, and you can expect that SageMaker will serve the number of requests you have set without a cold start. See the following code:

endpoint_config_response_pc = client.create_endpoint_config(
    EndpointConfigName=xgboost_epc_name_pc,
    ProductionVariants=[
        {
            "VariantName": "byoVariant",
            "ModelName": model_name,
            "ServerlessConfig": {
                "MemorySizeInMB": 4096,
                "MaxConcurrency": 1,
                #Provisioned Concurrency value setting example 
                "ProvisionedConcurrency": 1
            },
        },
    ],
)

By understanding your workloads and knowing how many cold starts you want to mitigate, you can set this to a preferred value.

Serverless Inference with provisioned concurrency also supports Application Auto Scaling, which allows you to optimize costs based on your traffic profile or schedule to dynamically set the amount of provisioned concurrency. This can be set in a scaling policy, which can be applied to an endpoint.

To specify the metrics and target values for a scaling policy, you can configure a target-tracking scaling policy. Define the scaling policy as a JSON block in a text file. You can then use that text file when invoking the AWS Command Line Interface (AWS CLI) or the Application Auto Scaling API. To define a target-tracking scaling policy for a serverless endpoint, use the SageMakerVariantProvisionedConcurrencyUtilization predefined metric:

{
    "TargetValue": 0.5,
    "PredefinedMetricSpecification": 
    {
        "PredefinedMetricType": "SageMakerVariantProvisionedConcurrencyUtilization"
    },
    "ScaleOutCooldown": 1,
    "ScaleInCooldown": 1
}

To specify a scaling policy based on a schedule (for example, every day at 12:15 PM UTC), you can modify the scaling policy as well. If the current capacity is below the value specified for MinCapacity, Application Auto Scaling scales out to the value specified by MinCapacity. The following code is an example of how to set this via the AWS CLI:

aws application-autoscaling put-scheduled-action 
  --service-namespace sagemaker --schedule 'cron(15 12 * * ? *)' 
  --scheduled-action-name 'ScheduledScalingTest' 
  --resource-id endpoint/MyEndpoint/variant/MyVariant 
  --scalable-dimension sagemaker:variant:DesiredProvisionedConcurrency 
  --scalable-target-action 'MinCapacity=10'

With Application Auto Scaling, you can ensure that your workloads can mitigate cold starts, meet business objectives, and optimize cost in the process.

You can monitor your endpoints and provisioned concurrency specific metrics using Amazon CloudWatch. There are four metrics to focus on that are specific to provisioned concurrency:

  • ServerlessProvisionedConcurrencyExecutions – The number of concurrent runs handled by the endpoint
  • ServerlessProvisionedConcurrencyUtilization – The number of concurrent runs divided by the allocated provisioned concurrency
  • ServerlessProvisionedConcurrencyInvocations – The number of InvokeEndpoint requests handled by the provisioned concurrency
  • ServerlessProvisionedConcurrencySpilloverInvocations – The number of InvokeEndpoint requests not handled provisioned concurrency, which is handled by on-demand Serverless Inference

By monitoring and making decisions based on these metrics, you can tune their configuration with cost and performance in mind and optimize your SageMaker Serverless Inference endpoint.

For SageMaker Serverless Inference, you can choose either a SageMaker-provided container or bring your own. SageMaker provides containers for its built-in algorithms and prebuilt Docker images for some of the most common machine learning (ML) frameworks, such as Apache MXNet, TensorFlow, PyTorch, and Chainer. For a list of available SageMaker images, see Available Deep Learning Containers Images. If you’re bringing your own container, you must modify it to work with SageMaker. For more information about bringing your own container, see Adapting Your Own Inference Container.

Notebook example

Creating a serverless endpoint with provisioned concurrency is a very similar process to creating an on-demand serverless endpoint. For this example, we use a model using the SageMaker built-in XGBoost algorithm. We work with the Boto3 Python SDK to create three SageMaker inference entities:

  • SageMaker model – Create a SageMaker model that packages your model artifacts for deployment on SageMaker using the CreateModel You can also complete this step via AWS CloudFormation using the AWS::SageMaker::Model resource.
  • SageMaker endpoint configuration – Create an endpoint configuration using the CreateEndpointConfig API and the new configuration ServerlessConfig options or by selecting the serverless option on the SageMaker console. You can also complete this step via AWS CloudFormation using the AWS::SageMaker::EndpointConfig You must specify the memory size, which, at a minimum, should be as big as your runtime model object, and the maximum concurrency, which represents the max concurrent invocations for a single endpoint. For our endpoint with provisioned concurrency enabled, we specify that parameter in the endpoint configuration step, taking into account that the value must be greater than 0 and less than or equal to max concurrency.
  • SageMaker endpoint – Finally, using the endpoint configuration that you created in the previous step, create your endpoint using either the SageMaker console or programmatically using the CreateEndpoint You can also complete this step via AWS CloudFormation using the AWS::SageMaker::Endpoint resource.

In this post, we don’t cover the training and SageMaker model creation; you can find all these steps in the complete notebook. We focus primarily on how you can specify provisioned concurrency in the endpoint configuration and compare performance metrics for an on-demand serverless endpoint with a provisioned concurrency enabled serverless endpoint.

Configure a SageMaker endpoint

In the endpoint configuration, you can specify the serverless configuration options. For Serverless Inference, there are two inputs required, and they can be configured to meet your use case:

  • MaxConcurrency – This can be set from 1–200
  • Memory SizeThis can be the following values: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB

For this example, we create two endpoint configurations: one on-demand serverless endpoint and one provisioned concurrency enabled serverless endpoint. You can see an example of both configurations in the following code:

xgboost_epc_name_pc = "xgboost-serverless-epc-pc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
xgboost_epc_name_on_demand = "xgboost-serverless-epc-on-demand" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

endpoint_config_response_pc = client.create_endpoint_config(
    EndpointConfigName=xgboost_epc_name_pc,
    ProductionVariants=[
        {
            "VariantName": "byoVariant",
            "ModelName": model_name,
            "ServerlessConfig": {
                "MemorySizeInMB": 4096,
                "MaxConcurrency": 1,
                # Providing Provisioned Concurrency in EPC
                "ProvisionedConcurrency": 1
            },
        },
    ],
)

endpoint_config_response_on_demand = client.create_endpoint_config(
    EndpointConfigName=xgboost_epc_name_on_demand,
    ProductionVariants=[
        {
            "VariantName": "byoVariant",
            "ModelName": model_name,
            "ServerlessConfig": {
                "MemorySizeInMB": 4096,
                "MaxConcurrency": 1,
            },
        },
    ],
)

print("Endpoint Configuration Arn Provisioned Concurrency: " + endpoint_config_response_pc["EndpointConfigArn"])
print("Endpoint Configuration Arn On Demand Serverless: " + endpoint_config_response_on_demand["EndpointConfigArn"])

With SageMaker Serverless Inference with a provisioned concurrency endpoint, you also need to set the following, which is reflected in the preceding code:

  • ProvisionedConcurrency – This value can be set from 1 to the value of your MaxConcurrency

Create SageMaker on-demand and provisioned concurrency endpoints

We use our two different endpoint configurations to create two endpoints: an on-demand serverless endpoint with no provisioned concurrency enabled and a serverless endpoint with provisioned concurrency enabled. See the following code:

endpoint_name_pc = "xgboost-serverless-ep-pc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

create_endpoint_response = client.create_endpoint(
    EndpointName=endpoint_name_pc,
    EndpointConfigName=xgboost_epc_name_pc,
)

print("Endpoint Arn Provisioned Concurrency: " + create_endpoint_response["EndpointArn"])

endpoint_name_on_demand = "xgboost-serverless-ep-on-demand" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

create_endpoint_response = client.create_endpoint(
    EndpointName=endpoint_name_on_demand,
    EndpointConfigName=xgboost_epc_name_on_demand,
)

print("Endpoint Arn Provisioned Concurrency: " + create_endpoint_response["EndpointArn"])

Compare invocation and performance

Next, we can invoke both endpoints with the same payload:

%%time

#On Demand Serverless Endpoint Test
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name_on_demand,
    Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
    ContentType="text/csv",
)

print(response["Body"].read())

%%time

#Provisioned Endpoint Test
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name_pc,
    Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
    ContentType="text/csv",
)

print(response["Body"].read())

When timing both cells for the first request, we immediately notice a drastic improvement in end-to-end latency in the provisioned concurrency enabled serverless endpoint. To validate this, we can send five requests to each endpoint with 10-minute intervals between each request. With the 10-minute gap, we can ensure that the on-demand endpoint is cold. Therefore, we can successfully evaluate cold start performance comparison between the on-demand and provisioned concurrency serverless endpoints. See the following code:

import time
import numpy as np
print("Testing cold start for serverless inference with PC vs no PC")

pc_times = []
non_pc_times = []

# ~50 minutes
for i in range(5):
    time.sleep(600)
    start_pc = time.time()
    pc_response = runtime.invoke_endpoint(
        EndpointName=endpoint_name_pc,
        Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
        ContentType="text/csv",
    )
    end_pc = time.time() - start_pc
    pc_times.append(end_pc)

    start_no_pc = time.time()
    response = runtime.invoke_endpoint(
        EndpointName=endpoint_name_on_demand,
        Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
        ContentType="text/csv",
    )
    end_no_pc = time.time() - start_no_pc
    non_pc_times.append(end_no_pc)

pc_cold_start = np.mean(pc_times)
non_pc_cold_start = np.mean(non_pc_times)

print("Provisioned Concurrency Serverless Inference Average Cold Start: {}".format(pc_cold_start))
print("On Demand Serverless Inference Average Cold Start: {}".format(non_pc_cold_start))

We can then plot these average end-to-end latency values across five requests and see that the average cold start for provisioned concurrency was approximately 200 milliseconds end to end as opposed to nearly 6 seconds with the on-demand serverless endpoint.

When to use Serverless Inference with provisioned concurrency

Provisioned concurrency is a cost-effective solution for low throughput and spiky workloads requiring low latency guarantees. Provisioned concurrency will be suitable for use cases when the throughput is low, and you want to reduce costs compared with instance-based while still having predictable performance or for workloads with predictable traffic bursts with low latency requirements. For example, a chatbot application run by a tax filing software company typically sees high demand during the last week of March from 10:00 AM to 5:00 PM because it’s close to the tax filing deadline. You can choose on-demand Serverless Inference for the remaining part of the year to serve requests from end-users, but for the last week of March, you can add provisioned concurrency to handle the spike in demand. As a result, you can reduce costs during idle time while still meeting your performance goals.

On the other hand, if your inference workload is steady, has high throughput (enough traffic to keep the instances saturated and busy), has a predictable traffic pattern, and requires ultra-low latency, or it includes large or complex models that require GPUs, Serverless Inference isn’t the right option for you, and you should deploy on real-time inference. Synchronous use cases with burst behavior that don’t require performance guarantees are more suitable for using on-demand Serverless Inference. The traffic patterns and the right hosting option (serverless or real-time inference) are depicted in the following figures:

  • Real-time inference endpoint – Traffic is mostly steady with predictable peaks. The high throughput is enough to keep the instances behind the auto scaling group busy and saturated. This will allow you to efficiently use the existing compute and be cost-effective along with providing ultra-low latency guarantees. For the predictable peaks, you can choose to use the scheduled auto scaling policy in SageMaker for real-time inference endpoints. Read more about the best practices for selecting the right auto scaling policy at Optimize your machine learning deployments with auto scaling on Amazon SageMaker.

  • On-demand Serverless Inference – This option is suitable for traffic with unpredictable peaks, but the ML application is tolerant to cold start latencies. To help determine whether a serverless endpoint is the right deployment option from a cost and performance perspective, use the SageMaker Serverless Inference benchmarking toolkit, which tests different endpoint configurations and compares the most optimal one against a comparable real-time hosting instance.

  • Serverless Inference with provisioned concurrency – This option is suitable for the traffic pattern with predictable peaks but is otherwise low or intermittent. This option provides you additional low latency guarantees for ML applications that can’t tolerate cold start latencies.

Use the following factors to determine which hosting option (real time over on-demand Serverless Inference over Serverless Inference with provisioned concurrency) is right for your ML workloads:

  • Throughput – This represents requests per second or any other metrics that represent the rate of incoming requests to the inference endpoint. We define the high throughput in the following diagram as any throughput that is enough to keep the instances behind the auto scaling group busy and saturated to get the most out of your compute.
  • Traffic pattern – This represents the type of traffic, including traffic with predictable or unpredictable spikes. If the spikes are unpredictable but the ML application needs low-latency guarantees, Serverless Inference with provisioned concurrency might be cost-effective if it’s a low throughput application.
  • Response time – If the ML application needs low-latency guarantees, use Serverless Inference with provisioned concurrency for low throughput applications with unpredictable traffic spikes. If the application can tolerate cold start latencies and has low throughput with unpredictable traffic spikes, use on-demand Serverless Inference.
  • Cost – Consider the total cost of ownership, including infrastructure costs (compute, storage, networking), operational costs (operating, managing, and maintaining the infrastructure), and security and compliance costs.

The following figure illustrates this decision tree.

Best practices

With Serverless Inference with provisioned concurrency, you should still adhere to best practices for workloads that don’t use provisioned concurrency:

  • Avoid installing packages and other operations during container startup and ensure containers are already in their desired state to minimize cold start time when being provisioned and invoked while staying under the 10 GB maximum supported container size. To monitor how long your cold start time is, you can use the CloudWatch metric OverheadLatency to monitor your serverless endpoint. This metric tracks the time it takes to launch new compute resources for your endpoint.
  • Set the MemorySizeInMB value to be large enough to meet your needs as well as increase performance. Larger values will also devote more compute resources. At some point, a larger value will have diminishing returns.
  • Set the MaxConcurrency to accommodate the peaks of traffic while considering the resulting cost.
  • We recommend creating only one worker in the container and only loading one copy of the model. This is unlike real-time endpoints, where some SageMaker containers may create a worker for each vCPU to process inference requests and load the model in each worker.
  • Use Application Auto Scaling to automate your provisioned concurrency setting based on target metrics or schedule. By doing so, you can have finer-grained, automated control of the amount of the provisioned concurrency used with your SageMaker serverless endpoint.

In addition, with the ability to configure ProvisionedConcurrency, you should set this value to the integer representing how many cold starts you would like to avoid when requests come in a short time frame after a period of inactivity. Using the metrics in CloudWatch can help you tune this value to be optimal based on preferences.

Pricing

As with on-demand Serverless Inference, when provisioned concurrency is enabled, you pay for the compute capacity used to process inference requests, billed by the millisecond, and the amount of data processed. You also pay for provisioned concurrency usage based on the memory configured, duration provisioned, and amount of concurrency enabled.

Pricing can be broken down into two components: provisioned concurrency charges and inference duration charges. For more details, refer to Amazon SageMaker Pricing.

Conclusion

SageMaker Serverless Inference with provisioned concurrency provides a very powerful capability for workloads when cold starts need to be mitigated and managed. With this capability, you can better balance cost and performance characteristics while providing a better experience to your end-users. We encourage you to consider whether provisioned concurrency with Application Auto Scaling is a good fit for your workloads, and we look forward to your feedback in the comments!

Stay tuned for follow-up posts where we will provide more insight into the benefits, best practices, and cost comparisons using Serverless Inference with provisioned concurrency.


About the Authors

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences,  and staying up to date with the latest technology trends.You can find him on LinkedIn.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing and artificial intelligence. He focuses on deep learning, including NLP and computer vision domains. He helps customers achieve high-performance model inference on Amazon SageMaker.

Ram Vegiraju is a ML Architect with the SageMaker Service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.

Rupinder Grewal is a Sr Ai/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on SageMaker. Prior to this role he has worked as Machine Learning Engineer building and hosting models. Outside of work he enjoys playing tennis and biking on mountain trails.

Rishabh Ray Chaudhury is a Senior Product Manager with Amazon SageMaker, focusing on Machine Learning inference. He is passionate about innovating and building new experiences for Machine Learning customers on AWS to help scale their workloads. In his spare time, he enjoys traveling and cooking. You can find him on LinkedIn.

Shruti Sharma is a Sr. Software Development Engineer in AWS SageMaker team. Her current work focuses on helping developers efficiently host machine learning models on Amazon SageMaker. In her spare time she enjoys traveling, skiing and playing chess. You can find her on LinkedIn.

Hao Zhu is a Software Development with Amazon Web Services. In his spare time he loves to hit the slopes and ski. He also enjoys exploring new places, trying different foods, experiencing different cultures and is always up for a new adventure.

Read More

Accelerate protein structure prediction with the ESMFold language model on Amazon SageMaker

Accelerate protein structure prediction with the ESMFold language model on Amazon SageMaker

Proteins drive many biological processes, such as enzyme activity, molecular transport, and cellular support. The three-dimensional structure of a protein provides insight into its function and how it interacts with other biomolecules. Experimental methods to determine protein structure, such as X-ray crystallography and NMR spectroscopy, are expensive and time-consuming.

In contrast, recently-developed computational methods can rapidly and accurately predict the structure of a protein from its amino acid sequence. These methods are critical for proteins that are difficult to study experimentally, such as membrane proteins, the targets of many drugs. One well-known example of this is AlphaFold, a deep learning-based algorithm celebrated for its accurate predictions.

ESMFold is another highly-accurate, deep learning-based method developed to predict protein structure from its amino acid sequence. ESMFold uses a large protein language model (pLM) as a backbone and operates end to end. Unlike AlphaFold2, it doesn’t need a lookup or Multiple Sequence Alignment (MSA) step, nor does it rely on external databases to generate predictions. Instead, the development team trained the model on millions of protein sequences from UniRef. During training, the model developed attention patterns that elegantly represent the evolutionary interactions between amino acids in the sequence. This use of a pLM instead of an MSA enables up to 60 times faster prediction times than other state-of-the-art models.

In this post, we use the pre-trained ESMFold model from Hugging Face with Amazon SageMaker to predict the heavy chain structure of trastuzumab, a monoclonal antibody first developed by Genentech for the treatment of HER2-positive breast cancer. Quickly predicting the structure of this protein could be useful if researchers wanted to test the effect of sequence modifications. This could potentially lead to improved patient survival or fewer side effects.

This post provides an example Jupyter notebook and related scripts in the following GitHub repository.

Prerequisites

We recommend running this example in an Amazon SageMaker Studio notebook running the PyTorch 1.13 Python 3.9 CPU-optimized image on an ml.r5.xlarge instance type.

Visualize the experimental structure of trastuzumab

To begin, we use the biopython library and a helper script to download the trastuzumab structure from the RCSB Protein Data Bank:

from Bio.PDB import PDBList, MMCIFParser
from prothelpers.structure import atoms_to_pdb

target_id = "1N8Z"
pdbl = PDBList()
filename = pdbl.retrieve_pdb_file(target_id, pdir="data")
parser = MMCIFParser()
structure = parser.get_structure(target_id, filename)
pdb_string = atoms_to_pdb(structure)

Next, we use the py3Dmol library to visualize the structure as an interactive 3D visualization:

view = py3Dmol.view()
view.addModel(pdb_string)
view.setStyle({'chain':'A'},{"cartoon": {'color': 'orange'}})
view.setStyle({'chain':'B'},{"cartoon": {'color': 'blue'}})
view.setStyle({'chain':'C'},{"cartoon": {'color': 'green'}})
view.show()

The following figure represents the 3D protein structure 1N8Z from the Protein Data Bank (PDB). In this image, the trastuzumab light chain is displayed in orange, the heavy chain is blue (with the variable region in light blue), and the HER2 antigen is green.

We’ll first use ESMFold to predict the structure of the heavy chain (Chain B) from its amino acid sequence. Then, we will compare the prediction to the experimentally determined structure shown above.

Predict the trastuzumab heavy chain structure from its sequence using ESMFold

Let’s use the ESMFold model to predict the structure of the heavy chain and compare it to the experimental result. To start, we’ll use a pre-built notebook environment in Studio that comes with several important libraries, like PyTorch, pre-installed. Although we could use an accelerated instance type to improve the performance of our notebook analysis, we’ll instead use a non-accelerated instance and run the ESMFold prediction on a CPU.

First, we load the pre-trained ESMFold model and tokenizer from Hugging Face Hub:

from transformers import AutoTokenizer, EsmForProteinFolding

tokenizer = AutoTokenizer.from_pretrained("facebook/esmfold_v1")
model = EsmForProteinFolding.from_pretrained("facebook/esmfold_v1", low_cpu_mem_usage=True)

Next, we copy the model to our device (CPU in this case) and set some model parameters:

device = torch.device("cpu")
model.esm = model.esm.float()
model = model.to(device)
model.trunk.set_chunk_size(64)

To prepare the protein sequence for analysis, we need to tokenize it. This translates the amino acid symbols (EVQLV…) into a numerical format that the ESMFold model can understand (6,19,5,10,19,…):

tokenized_input = tokenizer([experimental_sequence], return_tensors="pt", add_special_tokens=False)["input_ids"]
tokenized_input = tokenized_input.to(device)

Next, we copy the tokenized input to the mode, make a prediction, and save the result to a file:

with torch.no_grad():
notebook_prediction = model.infer_pdb(experimental_sequence)
with open("data/prediction.pdb", "w") as f:
f.write(notebook_prediction)

This takes about 3 minutes on a non-accelerated instance type, like a r5.

We can check the accuracy of the ESMFold prediction by comparing it to the experimental structure. We do this using the US-Align tool developed by the Zhang Lab at the University of Michigan:

from prothelpers.usalign import tmscore

tmscore("data/prediction.pdb", "data/experimental.pdb", pymol="data/superimposed")
PDBchain1 PDBchain2 TM-Score
data/prediction.pdb:A data/experimental.pdb:B 0.802

The template modeling score (TM-score) is a metric for assessing the similarity of protein structures. A score of 1.0 indicates a perfect match. Scores above 0.7 indicate that proteins share the same backbone structure. Scores above 0.9 indicate that the proteins are functionally interchangeable for downstream use. In our case of achieving TM-Score 0.802, the ESMFold prediction would likely be appropriate for applications like structure scoring or ligand binding experiments, but may not be suitable for use cases like molecular replacement that require extremely high accuracy.

We can validate this result by visualizing the aligned structures. The two structures show a high, but not perfect, degree of overlap. Protein structure predictions is a rapidly-evolving field and many research teams are developing ever-more accurate algorithms!

Deploy ESMFold as a SageMaker inference endpoint

Running model inference in a notebook is fine for experimentation, but what if you need to integrate your model with an application? Or an MLOps pipeline? In this case, a better option is to deploy your model as an inference endpoint. In the following example, we’ll deploy ESMFold as a SageMaker real-time inference endpoint on an accelerated instance. SageMaker real-time endpoints provide a scalable, cost-effective, and secure way to deploy and host machine learning (ML) models. With automatic scaling, you can adjust the number of instances running the endpoint to meet the demands of your application, optimizing costs and ensuring high availability.

The pre-built SageMaker container for Hugging Face makes it easy to deploy deep learning models for common tasks. However, for novel use cases like protein structure prediction, we need to define a custom inference.py script to load the model, run the prediction, and format the output. This script includes much of the same code we used in our notebook. We also create a requirements.txt file to define some Python dependencies for our endpoint to use. You can see the files we created in the GitHub repository.

In the following figure, the experimental (blue) and predicted (red) structures of the trastuzumab heavy chain are very similar, but not identical.

After we’ve created the necessary files in the code directory, we deploy our model using the SageMaker HuggingFaceModel class. This uses a pre-built container to simplify the process of deploying Hugging Face models to SageMaker. Note that it may take 10 minutes or more to create the endpoint, depending on the availability of ml.g4dn instance types in our Region.

from sagemaker.huggingface import HuggingFaceModel
from datetime import datetime

huggingface_model = HuggingFaceModel(
model_data = model_artifact_s3_uri, # Previously staged in S3
name = f"emsfold-v1-model-" + datetime.now().strftime("%Y%m%d%s"),
transformers_version='4.17',
pytorch_version='1.10',
py_version='py38',
role=role,
source_dir = "code",
entry_point = "inference.py"
)

rt_predictor = huggingface_model.deploy(
initial_instance_count = 1,
instance_type="ml.g4dn.2xlarge",
endpoint_name=f"my-esmfold-endpoint",
serializer = sagemaker.serializers.JSONSerializer(),
deserializer = sagemaker.deserializers.JSONDeserializer()
)

When the endpoint deployment is complete, we can resubmit the protein sequence and display the first few rows of the prediction:

endpoint_prediction = rt_predictor.predict(experimental_sequence)[0]
print(endpoint_prediction[:900])

Because we deployed our endpoint to an accelerated instance, the prediction should only take a few seconds. Each row in the result corresponds to a single atom and includes the amino acid identity, three spatial coordinates, and a pLDDT score representing the prediction confidence at that location.

PDB_GROUP ID ATOM_LABEL RES_ID CHAIN_ID SEQ_ID CARTN_X CARTN_Y CARTN_Z OCCUPANCY PLDDT ATOM_ID
ATOM 1 N GLU A 1 14.578 -19.953 1.47 1 0.83 N
ATOM 2 CA GLU A 1 13.166 -19.595 1.577 1 0.84 C
ATOM 3 CA GLU A 1 12.737 -18.693 0.423 1 0.86 C
ATOM 4 CB GLU A 1 12.886 -18.906 2.915 1 0.8 C
ATOM 5 O GLU A 1 13.417 -17.715 0.106 1 0.83 O
ATOM 6 cg GLU A 1 11.407 -18.694 3.2 1 0.71 C
ATOM 7 cd GLU A 1 11.141 -18.042 4.548 1 0.68 C
ATOM 8 OE1 GLU A 1 12.108 -17.805 5.307 1 0.68 O
ATOM 9 OE2 GLU A 1 9.958 -17.767 4.847 1 0.61 O
ATOM 10 N VAL A 2 11.678 -19.063 -0.258 1 0.87 N
ATOM 11 CA VAL A 2 11.207 -18.309 -1.415 1 0.87 C

Using the same method as before, we see that the notebook and endpoint predictions are identical.

PDBchain1 PDBchain2 TM-Score
data/endpoint_prediction.pdb:A data/prediction.pdb:A 1.0

As observed in the following figure, the ESMFold predictions generated in-notebook (red) and by the endpoint (blue) show perfect alignment.

Clean up

To avoid further charges, we delete our inference endpoint and test data:

rt_predictor.delete_endpoint()
bucket = boto_session.resource("s3").Bucket(bucket)
bucket.objects.filter(Prefix=prefix).delete()
os.system("rm -rf data obsolete code")

Summary

Computational protein structure prediction is a critical tool for understanding the function of proteins. In addition to basic research, algorithms like AlphaFold and ESMFold have many applications in medicine and biotechnology. The structural insights generated by these models help us better understand how biomolecules interact. This can then lead to better diagnostic tools and therapies for patients.

In this post, we show how to deploy the ESMFold protein language model from Hugging Face Hub as a scalable inference endpoint using SageMaker. For more information about deploying Hugging Face models on SageMaker, refer to Use Hugging Face with Amazon SageMaker. You can also find more protein science examples in the Awesome Protein Analysis on AWS GitHub repo. Please leave us a comment if there are any other examples you’d like to see!


About the Authors

Brian Loyal is a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He has more than 17 years’ experience in biotechnology and machine learning, and is passionate about helping customers solve genomic and proteomic challenges. In his spare time, he enjoys cooking and eating with his friends and family.

Shamika Ariyawansa is an AI/ML Specialist Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He passionately works with customers to accelerate their AI and ML adoption by providing technical guidance and helping them innovate and build secure cloud solutions on AWS. Outside of work, he loves skiing and off-roading.

Yanjun QiYanjun Qi is a Senior Applied Science Manager at the AWS Machine Learning Solution Lab. She innovates and applies machine learning to help AWS customers speed up their AI and cloud adoption.

Read More