Prepare data from Amazon EMR for machine learning using Amazon SageMaker Data Wrangler

Prepare data from Amazon EMR for machine learning using Amazon SageMaker Data Wrangler

Data preparation is a principal component of machine learning (ML) pipelines. In fact, it is estimated that data professionals spend about 80 percent of their time on data preparation. In this intensive competitive market, teams want to analyze data and extract more meaningful insights quickly. Customers are adopting more efficient and visual ways to build data processing systems.

Amazon SageMaker Data Wrangler simplifies the data preparation and feature engineering process, reducing the time it takes from weeks to minutes by providing a single visual interface for data scientists to select, clean data, create features, and automate data preparation in ML workflows without writing any code. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, and Snowflake. You can now also use Amazon EMR as a data source in Data Wrangler to easily prepare data for ML.

Analyzing, transforming, and preparing large amounts of data is a foundational step of any data science and ML workflow. Data professionals such as data scientists want to leverage the power of Apache Spark, Hive, and Presto running on Amazon EMR for fast data preparation, but the learning curve is steep. Our customers wanted the ability to connect to Amazon EMR to run ad hoc SQL queries on Hive or Presto to query data in the internal metastore or external metastore (e.g., AWS Glue Data Catalog), and prepare data within a few clicks.

This blog article will discuss how customers can now find and connect to existing Amazon EMR clusters using a visual experience in SageMaker Data Wrangler. They can visually inspect the database, tables, schema, and Presto queries to prepare for modeling or reporting. They can then quickly profile data using a visual interface to assess data quality, identify abnormalities or missing or erroneous data, and receive information and recommendations on how to address these issues. Additionally, they can analyze, clean, and engineer features with the aid of more than a dozen additional built-in analyses and 300+ extra built-in transformations backed by Spark without writing a single line of code.

Solution overview 

Data professionals can quickly find and connect to existing EMR clusters using SageMaker Studio configurations. Additionally, data professionals can terminate EMR clusters with only a few clicks from SageMaker Studio using predefined templates and on-demand creation of EMR clusters. With the help of these tools, customers may jump right into the SageMaker Studio universal notebook and write code in Apache Spark, Hive, Presto, or PySpark to perform data preparation at scale. Due to a steep learning curve for creating Spark code to prepare data, not all data professionals are comfortable with this procedure. With Amazon EMR as a data source for Amazon SageMaker Data Wrangler, you can now quickly and easily connect to Amazon EMR without writing a single line of code.

The following diagram represents the different components used in this solution.

We demonstrate two authentication options that can be used to establish a connection to the EMR cluster. For each option, we deploy a unique stack of AWS CloudFormation templates.

The CloudFormation template performs the following actions when each option is selected:

  • Creates a Studio Domain in VPC-only mode, along with a user profile named studio-user.
  • Creates building blocks, including the VPC, endpoints, subnets, security groups, EMR cluster, and other required resources to successfully run the examples.
  • For the EMR cluster, connects the AWS Glue Data Catalog as metastore for EMR Hive and Presto, creates a Hive table in EMR, and fills it with data from a US airport dataset.
  • For the LDAP CloudFormation template, creates an Amazon Elastic Compute Cloud (Amazon EC2) instance to host the LDAP server to authenticate the Hive and Presto LDAP user.

Option 1: Lightweight Access Directory Protocol

For the LDAP authentication CloudFormation template, we provision an Amazon EC2 instance with an LDAP server and configure the EMR cluster to use this server for authentication. This is TLS Enabled.

Option 2: No-Auth

In the No-Auth authentication CloudFormation template, we use a standard EMR cluster with no authentication enabled.

Deploy the resources with AWS CloudFormation

Complete the following steps to deploy the environment:

  1. Sign in to the AWS Management Console as an AWS Identity and Access Management (IAM) user, preferably an admin user.
  2. Choose Launch Stack to launch the CloudFormation template for the appropriate authentication scenario. Make sure the Region used to deploy the CloudFormation stack has no existing Studio Domain. If you already have a Studio Domain in a Region, you may choose a different Region.
    • LDAP Launch Stack
    • No Auth Launch Stack
  3. Choose Next.
  4. For Stack name, enter a name for the stack (for example, dw-emr-blog).
  5. Leave the other values as default.
  6. To continue, choose Next from the stack details page and stack options. The LDAP stack uses the following credentials:
    • username: david
    • password:  welcome123
  7. On the review page, select the check box to confirm that AWS CloudFormation might create resources.
  8. Choose Create stack. Wait until the status of the stack changes from CREATE_IN_PROGRESS to CREATE_COMPLETE. The process usually takes 10–15 minutes.

Note: If you would like to try multiple stacks, please follow the steps in the Clean up section. Remember that you must delete the SageMaker Studio Domain before the next stack can be successfully launched.

Set up the Amazon EMR as a data source in Data Wrangler

In this section, we cover connecting to the existing Amazon EMR cluster created through the CloudFormation template as a data source in Data Wrangler.

Create a new data flow

To create your data flow, complete the following steps:

  1. On the SageMaker console, choose Amazon SageMaker Studio in the navigation pane.
  2. Choose Open studio.
  3. In the Launcher, choose New data flow. Alternatively, on the File drop-down, choose New, then choose Data Wrangler flow.
  4. Creating a new flow can take a few minutes. After the flow has been created, you see the Import data page.

Add Amazon EMR as a data source in Data Wrangler

On the Add data source menu, choose Amazon EMR.

You can browse all the EMR clusters that your Studio execution role has permissions to see. You have two options to connect to a cluster; one is through interactive UI, and the other is to first create a secret using AWS Secrets Manager with JDBC URL, including EMR cluster information, and then provide the stored AWS secret ARN in the UI to connect to Presto. In this blog, we follow the first option. Select one of the following clusters that you want to use. Click on Next, and select endpoints.

Select Presto, connect to Amazon EMR, create a name to identify your connection, and click Next.

Select Authentication type, either LDAP or No Authentication, and click Connect.

  • For Lightweight Directory Access Protocol (LDAP), provide username and password to be authenticated.

  • For No Authentication, you will be connected to EMR Presto without providing user credentials within VPC. Enter Data Wrangler’s SQL explorer page for EMR.

Once connected, you can interactively view a database tree and table preview or schema. You can also query, explore, and visualize data from EMR. For preview, you would see a limit of 100 records by default. For customized query, you can provide SQL statements in the query editor box and once you click the Run button, the query will be executed on EMR’s Presto engine.

The Cancel query button allows ongoing queries to be canceled if they are taking an unusually long time.

The last step is to import. Once you are ready with the queried data, you have options to update the sampling settings for the data selection according to the sampling type (FirstK, Random, or Stratified) and sampling size for importing data into Data Wrangler.

Click Import. The prepare page will be loaded, allowing you to add various transformations and essential analysis to the dataset.

Navigate to DataFlow from the top screen and add more steps to the flow as needed for transformations and analysis. You can run a data insight report to identify data quality issues and get recommendations to fix those issues. Let’s look at some example transforms.

Go to your dataflow, and this is the screen that you should see. It shows us that we are using EMR as a data source using the Presto connector.

Let’s click on the + button to the right of Data types and select Add transform. When you do that, the following screen should pop up:

Let’s explore the data. We see that it has multiple features such as iata_code, airport, city, state, country, latitude, and longitude. We can see that the entire dataset is based in one country, which is the US, and there are missing values in Latitude and Longitude. Missing data can cause bias in the estimation of parameters, and it can reduce the representativeness of the samples, so we need to perform some imputation and handle missing values in our dataset.

Let’s click on the Add Step button on the navigation bar to the right. Select Handle missing. The configurations can be seen in the following screenshots. Under Transform, select Impute. Select the column type as Numeric and column names Latitude and Longitude. We will be imputing the missing values using an approximate median value. Preview and add the transform.

Let us now look at another example transform. When building a machine learning model, columns are removed if they are redundant or don’t help your model. The most common way to remove a column is to drop it. In our dataset, the feature country can be dropped since the dataset is specifically for US airport data. Let’s see how we can manage columns. Let’s click on the Add step button on the navigation bar to the right. Select Manage columns. The configurations can be seen in the following screenshots. Under Transform, select Drop column, and under Columns to drop, select Country.

You can continue adding steps based on the different transformations required for your dataset. Let us go back to our data flow. You will now see two more blocks showing the transforms that we performed. In our scenario, you can see Impute and Drop column.

ML practitioners spend a lot of time crafting feature engineering code, applying it to their initial datasets, training models on the engineered datasets, and evaluating model accuracy. Given the experimental nature of this work, even the smallest project will lead to multiple iterations. The same feature engineering code is often run again and again, wasting time and compute resources on repeating the same operations. In large organizations, this can cause an even greater loss of productivity because different teams often run identical jobs or even write duplicate feature engineering code because they have no knowledge of prior work. To avoid the reprocessing of features, we will now export our transformed features to Amazon Feature Store. Let’s click on the + button to the right of Drop column. Select Export to and choose Sagemaker Feature Store (via Jupyter notebook).

You can easily export your generated features to SageMaker Feature Store by selecting it as the destination. You can save the features into an existing feature group or create a new one.

We have now created features with Data Wrangler and easily stored those features in Feature Store. We showed an example workflow for feature engineering in the Data Wrangler UI. Then we saved those features into Feature Store directly from Data Wrangler by creating a new feature group. Finally, we ran a processing job to ingest those features into Feature Store. Data Wrangler and Feature Store together helped us build automatic and repeatable processes to streamline our data preparation tasks with minimum coding required. Data Wrangler also provides us flexibility to automate the same data preparation flow using scheduled jobs. We can also automate training or feature engineering with SageMaker Pipelines (via Jupyter Notebook) and deploy to the Inference endpoint with SageMaker inference pipeline (via Jupyter Notebook).

Clean up

If your work with Data Wrangler is complete, select the stack created from the CloudFormation page and delete it to avoid incurring additional fees.

Conclusion

In this post, we went over how to set up Amazon EMR as a data source in Data Wrangler, how to transform and analyze a dataset, and how to export the results to a data flow for use in a Jupyter notebook. After visualizing our dataset using Data Wrangler’s built-in analytical features, we further enhanced our data flow. The fact that we created a data preparation pipeline without writing a single line of code is significant.

To get started with Data Wrangler, see Prepare ML Data with Amazon SageMaker Data Wrangler, and see the latest information on the Data Wrangler product page.


About the authors

Ajjay Govindaram is a Senior Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.

Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area. She helps AWS enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while making sure they are resilient and scalable. She’s passionate about machine learning technologies and environmental sustainability.

Rui Jiang is a Software Development Engineer at AWS based in the New York City area. She is a member of the SageMaker Data Wrangler team helping develop engineering solutions for AWS enterprise customers to achieve their business needs. Outside of work, she enjoys exploring new foods, life fitness, outdoor activities, and traveling.

Read More

Exafunction supports AWS Inferentia to unlock best price performance for machine learning inference

Exafunction supports AWS Inferentia to unlock best price performance for machine learning inference

Across all industries, machine learning (ML) models are getting deeper, workflows are getting more complex, and workloads are operating at larger scales. Significant effort and resources are put into making these models more accurate since this investment directly results in better products and experiences. On the other hand, making these models run efficiently in production is a non-trivial undertaking that’s often overlooked, despite being key to achieving performance and budget goals. In this post we cover how Exafunction and AWS Inferentia work together to unlock easy and cost-efficient deployment for ML models in production.

Exafunction is a start-up focused on enabling companies to perform ML at scale as efficiently as possible. One of their products is ExaDeploy, an easy-to-use SaaS solution to serve ML workloads at scale. ExaDeploy efficiently orchestrates your ML workloads across mixed resources (CPU and hardware accelerators) to maximize resource utilization. It also takes care of auto scaling, compute colocation, network issues, fault tolerance, and more, to ensure efficient and reliable deployment. AWS Inferentia-based Amazon EC2 Inf1 instances are purpose built to deliver the lowest cost-per-inference in the cloud. ExaDeploy now supports Inf1 instances, which allows users to get both the hardware-based savings of accelerators and the software-based savings of optimized resource virtualization and orchestration at scale.

Solution overview

How ExaDeploy solves for deployment efficiency

To ensure efficient utilization of compute resources, you need to consider proper resource allocation, auto scaling, compute co-location, network cost and latency management, fault tolerance, versioning and reproducibility, and more. At scale, any inefficiencies materially affect costs and latency, and many large companies have addressed these inefficiencies by building internal teams and expertise. However, it’s not practical for most companies to assume this financial and organizational overhead of building generalizable software that isn’t the company’s desired core competency.

ExaDeploy is designed to solve these deployment efficiency pain points, including those seen in some of the most complex workloads such as those in Autonomous Vehicle and natural language processing (NLP) applications. On some large batch ML workloads, ExaDeploy has reduced costs by over 85% without sacrificing on latency or accuracy, with integration time as low as one engineer-day. ExaDeploy has been proven to auto scale and manage thousands of simultaneous hardware accelerator resource instances without any system degradation.

Key features of ExaDeploy include:

  • Runs in your cloud: None of your models, inputs, or outputs ever leave your private network. Continue to use your cloud provider discounts.
  • Shared accelerator resources: ExaDeploy optimizes the accelerators used by enabling multiple models or workloads to share accelerator resources. It can also identify if multiple workloads are deploying the same model, and then share the model across those workloads, thereby optimizing the accelerator used. Its automatic rebalancing and node draining capabilities maximize utilization and minimize costs.

  • Scalable serverless deployment model: ExaDeploy auto scales based on accelerator resource saturation. Dynamically scale down to 0 or up to thousands of resources.
  • Support for a variety of computation types: You can offload deep learning models from all major ML frameworks as well as arbitrary C++ code, CUDA kernels, custom ops, and Python functions.
  • Dynamic model registration and versioning: New models or model versions can be registered and run without having to rebuild or redeploy the system.
  • Point-to-point execution: Clients connect directly to remote accelerator resources, which enables low latency and high throughput. They can even store the state remotely.
  • Asynchronous execution: ExaDeploy supports asynchronous execution of models, which allows clients to parallelize local computation with remote accelerator resource work.
  • Fault-tolerant remote pipelines: ExaDeploy allows clients to dynamically compose remote computations (models, preprocessing, etc.) into pipelines with fault tolerance guarantee. The ExaDeploy system handles pod or node failures with automatic recovery and replay, so that the developers never have to think about ensuring fault tolerance.
  • Out-of-the-box monitoring: ExaDeploy provides Prometheus metrics and Grafana dashboards to visualize accelerator resource usage and other system metrics.

ExaDeploy supports AWS Inferentia

AWS Inferentia-based Amazon EC2 Inf1 instances are designed for deep learning specific inference workloads. These instances provide up to 2.3x throughput and up to 70% cost saving compared to the current generation of GPU inference instances.

ExaDeploy now supports AWS Inferentia, and together they unlock the increased performance and cost-savings achieved through purpose-built hardware-acceleration and optimized resource orchestration at scale. Let’s look at the combined benefits of ExaDeploy and AWS Inferentia by considering a very common modern ML workload: batched, mixed-compute workloads.

Hypothetical workload characteristics:

  • 15 ms of CPU-only pre-process/post-process
  • Model inference (15 ms on GPU, 5 ms on AWS Inferentia)
  • 10 clients, each make request every 20 ms
  • Approximate relative cost of CPU:Inferentia:GPU is 1:2:4 (Based on Amazon EC2 On-Demand pricing for c5.xlarge, inf1.xlarge, and g4dn.xlarge)

The table below shows how each of the options shape up:

Setup Resources needed Cost Latency
GPU without ExaDeploy 2 CPU, 2 GPU per client (total 20 CPU, 20 GPU) 100 30 ms
GPU with ExaDeploy 8 GPUs shared across 10 clients, 1 CPU per client 42 30 ms
AWS Inferentia without ExaDeploy 1 CPU, 1 AWS Inferentia per client (total 10 CPU, 10 Inferentia) 30 20 ms
AWS Inferentia with ExaDeploy 3 AWS Inferentia shared across 10 clients, 1 CPU per client 16 20 ms

ExaDeploy on AWS Inferentia example

In this section, we go over the steps to configure ExaDeploy through an example with inf1 nodes on a BERT PyTorch model. We saw an average throughput of 1140 samples/sec for the bert-base model, which demonstrates that little to no overhead was introduced by ExaDeploy for this single model, single workload scenario.

Step 1: Set up an Amazon Elastic Kubernetes Service (Amazon EKS) cluster

An Amazon EKS cluster can be brought up with our Terraform AWS module. For our example, we used an inf1.xlarge for AWS Inferentia.

Step 2: Set up ExaDepoy

The second step is to set up ExaDeploy. In general, the deployment of ExaDeploy on inf1 instances is straightforward. Setup mostly follows the same procedure as it does on graphics processing unit (GPU) instances. The primary difference is to change the model tag from GPU to AWS Inferentia and recompile the model. For example, moving from g4dn to inf1 instances using ExaDeploy’s application programming interfaces (APIs) required only approximately 10 lines of code to be changed.

  • One simple method is to use Exafunction’s Terraform AWS Kubernetes module or Helm chart. These deploy the core ExaDeploy components to run in the Amazon EKS cluster.
  • Compile model into a serialized format (e.g., TorchScript, TF saved models, ONNX, etc).. For AWS Inferentia, we followed this tutorial.
  • Register the compiled model in ExaDeploy’s module repository.
    with exa.ModuleRepository(MODULE_REPOSITORY_ADDRESS) as repo:
       repo.register_py_module(
           "BertInferentia",
           module_class="TorchModule",
           context_data=BERT_NEURON_TORCHSCRIPT_AS_BYTES,
           config={
               "_torchscript_input_names": ",".join(BERT_INPUT_NAMES).encode(),
               "_torchscript_output_names": BERT_OUTPUT_NAME.encode(),
               "execution_type": "inferentia".encode(),
           },
       )

  • Prepare the data for the model (i.e., not ExaDeploy-specific).
    tokenizer = transformers.AutoTokenizer.from_pretrained(
       "bert-base-cased-finetuned-mrpc"
    )
    
    batch_encoding = tokenizer.encode_plus(
       "The company Exafunction is based in the Bay Area",
       "Exafunction’s headquarters are situated in Mountain View",
       max_length=MAX_LENGTH,
       padding="max_length",
       truncation=True,
       return_tensors="pt",
    )

  • Run the model remotely from the client.
    with exa.Session(
       scheduler_address=SCHEDULER_ADDRESS,
       module_tag="BertInferentia",
       constraint_config={
           "KUBERNETES_NODE_SELECTORS": "role=runner-inferentia",
           "KUBERNETES_ENV_VARS": "AWS_NEURON_VISIBLE_DEVICES=ALL",
       },
    ) as sess:
       bert = sess.new_module("BertInferentia")
       classification_logits = bert.run(
           **{
               key: value.numpy()
               for key, value in batch_encoding.items()
           }
       )[BERT_OUTPUT_NAME].numpy()
    
       # Assert that the model classifies the two statements as paraphrase.
       assert classification_logits[0].argmax() == 1

ExaDeploy and AWS Inferentia: Better together

AWS Inferentia is pushing the boundaries of throughput for model inference and delivering lowest cost-per-inference in the cloud. That being said, companies need the proper orchestration to enjoy the price-performance benefits of Inf1 at scale. ML serving is a complex problem that, if addressed in-house, requires expertise that’s removed from company goals and often delays product timelines. ExaDeploy, which is Exafunction’s ML deployment software solution, has emerged as the industry leader. It serves even the most complex ML workloads, while providing smooth integration experiences and support from a world-class team. Together, ExaDeploy and AWS Inferentia unlock increased performance and cost-savings for inference workloads at scale.

Conclusion

In this post, we showed you how Exafunction supports AWS Inferentia for performance ML. For more information on building applications with Exafunction, visit Exafunction. For best practices on building deep learning workloads on Inf1, visit Amazon EC2 Inf1 instances.


About the Authors

Nicholas Jiang, Software Engineer, Exafunction

Jonathan Ma, Software Engineer, Exafunction

Prem Nair, Software Engineer, Exafunction

Anshul Ramachandran, Software Engineer, Exafunction

Shruti Koparkar, Sr. Product Marketing Manager, AWS

Read More

Damage assessment using Amazon SageMaker geospatial capabilities and custom SageMaker models

Damage assessment using Amazon SageMaker geospatial capabilities and custom SageMaker models

In this post, we show how to train, deploy, and predict natural disaster damage with Amazon SageMaker with geospatial capabilities. We use the new SageMaker geospatial capabilities to generate new inference data to test the model. Many government and humanitarian organizations need quick and accurate situational awareness when a disaster strikes. Knowing the severity, cause, and location of damage can assist in the first responder’s response strategy and decision-making. The lack of accurate and timely information can contribute to an incomplete or misdirected relief effort.

As the frequency and severity of natural disasters increases, it’s important that we equip decision-makers and first responders with fast and accurate damage assessment. In this example, we use geospatial imagery to predict natural disaster damage. Geospatial data can be used in the immediate aftermath of a natural disaster for rapidly identifying damage to buildings, roads, or other critical infrastructure. In this post, we show you how to train and deploy a geospatial segmentation model to be used for disaster damage classification. We break down the application into three topics: model training, model deployment, and inference.

Model training

In this use case, we built a custom PyTorch model using Amazon SageMaker for image segmentation of building damage. The geospatial capabilities in SageMaker include trained models for you to utilize. These built-in models include cloud segmentation and removal, and land cover segmentation. For this post, we train a custom model for damage segmentation. We first trained the SegFormer model on data from the xView2 competition. The SegFormer is a transformer-based architecture that was introduced in the 2021 paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. It’s based on the transformer architectures that are quite popular with natural language processing workloads; however, the SegFormer architecture is built for semantic segmentation. It combines both the transformer-based encoder and a lightweight decoder. This allows for better performance than previous methods, while providing significantly smaller model sizes than previous methods. Both pre-trained and untrained SegFormer models are available from the popular Hugging Face transformer library. For this use case, we download a pre-trained SegFormer architecture and train it on a new dataset.

The dataset used in this example comes from the xView2 data science competition. This competition released the xBD dataset, one of the largest and highest-quality publicly available datasets of high-resolution satellite imagery annotated with building location and damage scores (classes) before and after natural disasters. The dataset contains data from 15 countries including 6 types of disasters (earthquake/tsunami, flood, volcanic eruption, wildfire, wind) with geospatial data containing 850,736 building annotations across 45,362 km^2 of imagery. The following image shows an example of the dataset. This image shows the post-disaster image with the building damage segmentation mask overlayed. Each image includes the following: pre-disaster satellite image, pre-disaster building segmentation mask, post-disaster satellite image, and post-disaster building segmentation mask with damage classes.

post-disaster image with the building damage segmentation mask overlayed

In this example, we only use the pre- and post-disaster imagery to predict the post-disaster damage classification (segmentation mask). We don’t use the pre-disaster building segmentation masks. This approach was selected for simplicity. There are other options for approaching this dataset. A number of the winning approaches for the xView2 competition used a two-step solution: first, predict the pre-disaster building outline segmentation mask. The building outlines and the post-damage images are then used as input for predicting the damage classification. We leave this to the reader to explore other modeling approaches to improve classification and detection performance.

The pre-trained SegFormer architecture is built to accept a single three-color channel image as input and outputs a segmentation mask. There are a number of ways we could have modified the model to accept both the pre- and post-satellite images as input, however, we used a simple stacking technique to stack both images together into a six-color channel image. We trained the model using standard augmentation techniques on the xView2 training dataset to predict the post-disaster segmentation mask. Note that we did resize all the input images from 1024 to 512 pixels. This was to further reduce spatial resolution of the training data. The model was trained with SageMaker using a single p3.2xlarge GPU based instance. An example of the trained model output is shown in the following figures. The first set of images are the pre- and post-damage images from the validation set.
pre- and post-damage images from the validation set

The following figures show the predicted damage mask and ground truth damage mask.
The following figures show the predicted damage mask and ground truth damage mask.

At first glance, it seems like the model doesn’t perform well as compared to the ground truth data. Many of the buildings are incorrectly classified, confusing minor damage for no damage and showing multiple classifications for a single building outline. However, one interesting finding when reviewing the model performance is that it appears to have learned to localize the building damage classification. Each building can be classified into No Damage, Minor Damage, Major Damage, or Destroyed. The predicted damage mask shows that the model has classified the large building in the middle into mostly No Damage, but the top right corner is classified as Destroyed. This sub-building damage localization can further assist responders by showing the localized damage per building.

Model deployment

The trained model was then deployed to an asynchronous SageMaker inference endpoint. Note that we chose an asynchronous endpoint to allow for longer inference times, larger payload input sizes, and the ability to scale the endpoint down to zero instances (no charges) when not in use. The following figure shows the high-level code for asynchronous endpoint deployment. We first compress the saved PyTorch state dictionary and upload the compressed model artifacts to Amazon Simple Storage Service (Amazon S3). We create a SageMaker PyTorch model pointing to our inference code and model artifacts. The inference code is required to load and serve our model. For more details on the required custom inference code for a SageMaker PyTorch model, refer to Use PyTorch with the SageMaker Python SDK.
high-level code for asynchronous endpoint deployment

The following figure shows the code for the auto scaling policy for the asynchronous inference endpoint.
The following figure shows the code for the auto scaling policy for the asynchronous inference endpoint.

Note that there are other endpoint options, such as real time, batch, and serverless, that could be used for your application. You’ll want to pick the option that is best suited for the use case and recall that Amazon SageMaker Inference Recommender is available to help recommend machine learning (ML) endpoint configurations.

Model inference

With the trained model deployed, we can now use SageMaker geospatial capabilities to gather data for inference. With SageMaker geospatial capabilities, several built-in models are available out of the box. In this example, we use the band stacking operation for stacking the red, green, and blue color channels for our earth observation job. The job gathers the data from the Sentinel-2 dataset. To configure an earth observation job, we first need the coordinates of the location of interest. Second, we need the time range of the observation. With this we can now submit an earth observation job using the stacking feature. Here we stack the red, green, and blue bands to produce a color image. The following figure shows the job configuration used to generate data from the floods in Rochester, Australia, in mid-October 2022. We utilize images from before and after the disaster as input to our trained ML model.

After the job configuration is defined, we can submit the job. When the job is complete, we export the results to Amazon S3. Note that we can only export the results after the job has completed. The results of the job can be exported to an Amazon S3 location specified by the user in the export job configuration. Now with our new data in Amazon S3, we can get damage predictions using the deployed model. We first read the data into memory and stack the pre- and post-disaster imagery together.
We first read the data into memory and stack the pre- and post-disaster imagery together.

The results of the segmentation mask for the Rochester floods are shown in the following images. Here we can see that the model has identified locations within the flooded region as likely damaged. Note also that the spatial resolution of the inference image is different than the training data. Increasing the spatial resolution could help model performance; however, this is less of an issue for the SegFormer model as it is for other models due to the multiscale model architecture.

pre-post flood

results of the segmentation mask for the Rochester floods

Damage Assessment

Conclusion

In this post, we showed how to train, deploy, and predict natural disaster damage with SageMaker with geospatial capabilities. We used the new SageMaker geospatial capabilities to generate new inference data to test the model. The code for this post is in the process of being released, and this post will be updated with links to the full training, deployment, and inference code. This application allows for first responders, governments, and humanitarian organizations to optimize their response, providing critical situational awareness immediately following a natural disaster. This application is only one example of what is possible with modern ML tools such as SageMaker.

Try SageMaker geospatial capabilities today using your own models; we look forward to seeing what you build next.


About the author

Aaron Sengstacken is a machine learning specialist solutions architect at Amazon Web Services. Aaron works closely with public sector customers of all sizes to develop and deploy production machine learning applications. He is interested in all things machine learning, technology, and space exploration.

Read More

Deploy Amazon SageMaker Autopilot models to serverless inference endpoints

Deploy Amazon SageMaker Autopilot models to serverless inference endpoints

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning (ML) models based on your data, while allowing you to maintain full control and visibility. Autopilot can also deploy trained models to real-time inference endpoints automatically.

If you have workloads with spiky or unpredictable traffic patterns that can tolerate cold starts, then deploying the model to a serverless inference endpoint would be more cost efficient.

Amazon SageMaker Serverless Inference is a purpose-built inference option ideal for workloads with unpredictable traffic patterns and that can tolerate cold starts. Unlike a real-time inference endpoint, which is backed by a long-running compute instance, serverless endpoints provision resources on demand with built-in auto scaling. Serverless endpoints scale automatically based on the number of incoming requests and scale down resources to zero when there are no incoming requests, helping you minimize your costs.

In this post, we show how to deploy Autopilot trained models to serverless inference endpoints using the Boto3 libraries for Amazon SageMaker.

Autopilot training modes

Before creating an Autopilot experiment, you can either let Autopilot select the training mode automatically, or you can select the training mode manually.

Autopilot currently supports three training modes:

  • Auto – Based on dataset size, Autopilot automatically chooses either ensembling or HPO mode. For datasets larger than 100 MB, Autopilot chooses HPO; otherwise, it chooses ensembling.
  • Ensembling – Autopilot uses the AutoGluon ensembling technique using model stacking and produces an optimal predictive model.
  • Hyperparameter optimization (HPO) – Autopilot finds the best version of a model by tuning hyperparameters using Bayesian optimization or multi-fidelity optimization while running training jobs on your dataset. HPO mode selects the algorithms most relevant to your dataset and selects the best range of hyperparameters to tune your models.

To learn more about Autopilot training modes, refer to Training modes.

Solution overview

In this post, we use the UCI Bank Marketing dataset to predict if a client will subscribe to a term deposit offered by the bank. This is a binary classification problem type.

We launch two Autopilot jobs using the Boto3 libraries for SageMaker. The first job uses ensembling as the chosen training mode. We then deploy the single ensemble model generated to a serverless endpoint and send inference requests to this hosted endpoint.

The second job uses the HPO training mode. For classification problem types, Autopilot generates three inference containers. We extract these three inference containers and deploy them to separate serverless endpoints. Then we send inference requests to these hosted endpoints.

For more information about regression and classification problem types, refer to Inference container definitions for regression and classification problem types.

We can also launch Autopilot jobs from the Amazon SageMaker Studio UI. If you launch jobs from the UI, ensure to turn off the Auto deploy option in the Deployment and Advanced settings section. Otherwise, Autopilot will deploy the best candidate to a real-time endpoint.

Prerequisites

Ensure you have the latest version of Boto3 and the SageMaker Python packages installed:

pip install -U boto3 sagemaker

We need the SageMaker package version >= 2.110.0 and Boto3 version >= boto3-1.24.84.

Launch an Autopilot job with ensembling mode

To launch an Autopilot job using the SageMaker Boto3 libraries, we use the create_auto_ml_job API. We then pass in AutoMLJobConfig, InputDataConfig, and AutoMLJobObjective as inputs to the create_auto_ml_job. See the following code:

bucket = session.default_bucket()
role = sagemaker.get_execution_role()
prefix = "autopilot/bankadditional"
sm_client = boto3.Session().client(service_name='sagemaker',region_name=region)

timestamp_suffix = strftime('%d%b%Y-%H%M%S', gmtime())
automl_job_name = f"uci-bank-marketing-{timestamp_suffix}"
max_job_runtime_seconds = 3600
max_runtime_per_job_seconds = 1200
target_column = "y"
problem_type="BinaryClassification"
objective_metric = "F1"
training_mode = "ENSEMBLING"

automl_job_config = {
    'CompletionCriteria': {
      'MaxRuntimePerTrainingJobInSeconds': max_runtime_per_job_seconds,
      'MaxAutoMLJobRuntimeInSeconds': max_job_runtime_seconds
    },    
    "Mode" : training_mode
}

automl_job_objective= { "MetricName": objective_metric }

input_data_config = [
    {
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': f's3://{bucket}/{prefix}/raw/bank-additional-full.csv'
        }
      },
      'TargetAttributeName': target_column
    }
  ]

output_data_config = {
	    'S3OutputPath': f's3://{bucket}/{prefix}/output'
	}


sm_client.create_auto_ml_job(
				AutoMLJobName=auto_ml_job_name,
				InputDataConfig=input_data_config,
				OutputDataConfig=output_data_config,
				AutoMLJobConfig=automl_job_config,
				ProblemType=problem_type,
				AutoMLJobObjective=automl_job_objective,
				RoleArn=role)

Autopilot returns the BestCandidate model object that has the InferenceContainers required to deploy the models to inference endpoints. To get the BestCandidate for the preceding job, we use the describe_automl_job function:

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
best_candidate = job_response['BestCandidate']
inference_container = job_response['BestCandidate']['InferenceContainers'][0]
print(inference_container)

Deploy the trained model

We now deploy the preceding inference container to a serverless endpoint. The first step is to create a model from the inference container, then create an endpoint configuration in which we specify the MemorySizeInMB and MaxConcurrency values for the serverless endpoint along with the model name. Finally, we create an endpoint with the endpoint configuration created above.

We recommend choosing your endpoint’s memory size according to your model size. The memory size should be at least as large as your model size. Your serverless endpoint has a minimum RAM size of 1024 MB (1 GB), and the maximum RAM size you can choose is 6144 MB (6 GB).

The memory sizes you can choose are 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.

To help determine whether a serverless endpoint is the right deployment option from a cost and performance perspective, we encourage you to refer to the SageMaker Serverless Inference Benchmarking Toolkit, which tests different endpoint configurations and compares the most optimal one against a comparable real-time hosting instance.

Note that serverless endpoints only accept SingleModel for inference containers. Autopilot in ensembling mode generates a single model, so we can deploy this model container as is to the endpoint. See the following code:

# Create Model
	model_response = sm_client.create_model(
				ModelName=model_name,
				ExecutionRoleArn=role,
				Containers=[inference_container]
	)

# Create Endpoint Config
	epc_response = sm_client.create_endpoint_config(
		EndpointConfigName = endpoint_config_name,
		ProductionVariants=[
			{
				"ModelName": model_name,
				"VariantName": "AllTraffic",
				"ServerlessConfig": {
					"MemorySizeInMB": memory,
					"MaxConcurrency": max_concurrency
				}
			}
		]
	)

# Create Endpoint
	ep_response = sm_client.create_endpoint(
		EndpointName=endpoint_name,
		EndpointConfigName=endpoint_config_name
	)

When the serverless inference endpoint is InService, we can test the endpoint by sending an inference request and observe the predictions. The following diagram illustrates the architecture of this setup.

Note that we can send raw data as a payload to the endpoint. The ensemble model generated by Autopilot automatically incorporates all required feature-transform and inverse-label transform steps, along with the algorithm model and packages, into a single model.

Send inference request to the trained model

Use the following code to send inference on your model trained using ensembling mode:

from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer


payload = "34,blue-collar,married,basic.4y,no,no,no,telephone,may,tue,800,4,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"

predictor = Predictor(
        endpoint_name=endpoint,
        sagmaker_session=session,
        serializer=CSVSerializer(),
    )

prediction = predictor.predict(payload).decode(‘utf-8’)
print(prediction)

Launch an Autopilot Job with HPO mode

In HPO mode, for CompletionCriteria, besides MaxRuntimePerTrainingJobInSeconds and MaxAutoMLJobRuntimeInSeconds, we could also specify the MaxCandidates to limit the number of candidates an Autopilot job will generate. Note that these are optional parameters and are only set to limit the job runtime for demonstration. See the following code:

training_mode = "HYPERPARAMETER_TUNING"

automl_job_config["Mode"] = training_mode
automl_job_config["CompletionCriteria"]["MaxCandidates"] = 15
hpo_automl_job_name =  f"{model_prefix}-HPO-{timestamp_suffix}"

response = sm_client.create_auto_ml_job(
					  AutoMLJobName=hpo_automl_job_name,
					  InputDataConfig=input_data_config,
					  OutputDataConfig=output_data_config,
					  AutoMLJobConfig=automl_job_config,
					  ProblemType=problem_type,
					  AutoMLJobObjective=automl_job_objective,
					  RoleArn=role,
					  Tags=tags_config
				)

To get the BestCandidate for the preceding job, we can again use the describe_automl_job function:

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
best_candidate = job_response['BestCandidate']
inference_containers = job_response['BestCandidate']['InferenceContainers']
print(inference_containers)

Deploy the trained model

Autopilot in HPO mode for the classification problem type generates three inference containers.

The first container handles the feature-transform steps. Next, the algorithm container generates the predicted_label with the highest probability. Finally, the post-processing inference container performs an inverse transform on the predicted label and maps it to the original label. For more information, refer to Inference container definitions for regression and classification problem types.

We extract these three inference containers and deploy them to a separate serverless endpoints. For inference, we invoke the endpoints in sequence by sending the payload first to the feature-transform container, then passing the output from this container to the algorithm container, and finally passing the output from the previous inference container to the post-processing container, which outputs the predicted label.

The following diagram illustrates the architecture of this setup. diagram that illustrates Autopilot model in HPO mode deployed to three serverless endpoints

We extract the three inference containers from the BestCandidate with the following code:

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
inference_containers = job_response['BestCandidate']['InferenceContainers']

models = list()
endpoint_configs = list()
endpoints = list()

# For brevity, we've encapsulated create_model, create endpoint_config and create_endpoint as helper functions
for idx, container in enumerate(inference_containers):
    (status, model_arn) = create_autopilot_model(
								    sm_client,
								    automl_job_name,
            						role,
								    container,
								    idx)
    model_name = model_arn.split('/')[1]
    models.append(model_name)

    endpoint_config_name = f"epc-{model_name}"
    endpoint_name = f"ep-{model_name}"
    (status, epc_arn) = create_serverless_endpoint_config(
								    sm_client,
								    endpoint_config_name,
								    model_name,
            						memory=2048,
								    max_concurrency=10)
	endpoint_configs.append(endpoint_config_name)

	response = create_serverless_endpoint(
								    sm_client,
								    endpoint_name,
								    endpoint_config_name)
	endpoints.append(endpoint_name)

Send inference request to the trained model

For inference, we send the payload in sequence: first to the feature-transform container, then to the model container, and finally to the inverse-label transform container.

visual of inference request flow of three inference containers from HPO mode

See the following code:

from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer

payload = "51,technician,married,professional.course,no,yes,no,cellular,apr,thu,687,1,0,1,success,-1.8,93.075,-47.1,1.365,5099.1"


for _, endpoint in enumerate(endpoints):
    try:
        print(f"payload: {payload}")
        predictor = Predictor(
            endpoint_name=endpoint,
            sagemaker_session=session,
            serializer=CSVSerializer(),
        )
        prediction = predictor.predict(payload)
        payload=prediction
    except Exception as e:
        print(f"Error invoking Endpoint; {endpoint} n {e}")
        break

The full implementation of this example is available in the following jupyter notebook.

Clean up

To clean up resources, you can delete the created serverless endpoints, endpoint configs, and models:

sm_client = boto3.Session().client(service_name='sagemaker',region_name=region)

for _, endpoint in enumerate(endpoints):
    try:
        sm_client.delete_endpoint(EndpointName=endpoint)
    except Exception as e:
        print(f"Exception:n{e}")
        continue
        
for _, endpoint_config in enumerate(endpoint_configs):
    try:
        sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config)
    except Exception as e:
        print(f"Exception:n{e}")
        continue

for _, autopilot_model in enumerate(models):
    try:
        sm_client.delete_model(ModelName=autopilot_model)
    except Exception as e:
        print(f"Exception:n{e}")
        continue

Conclusion

In this post, we showed how we can deploy Autopilot generated models both in ensemble and HPO modes to serverless inference endpoints. This solution can speed up your ability to use and take advantage of cost-efficient and fully managed ML services like Autopilot to generate models quickly from raw data, and then deploy them to fully managed serverless inference endpoints with built-in auto scaling to reduce costs.

We encourage you to try this solution with a dataset relevant to your business KPIs. You can refer to the solution implemented in a Jupyter notebook in the GitHub repo.

Additional references


About the Author

Praveen Chamarthi is a Senior AI/ML Specialist with Amazon Web Services. He is passionate about AI/ML and all things AWS. He helps customers across the Americas to scale, innovate, and operate ML workloads efficiently on AWS. In his spare time, Praveen loves to read and enjoys sci-fi movies.

Read More