Creating high-quality machine learning models for financial services using Amazon SageMaker Autopilot

Machine learning (ML) is used throughout the financial services industry to perform a wide variety of tasks, such as fraud detection, market surveillance, portfolio optimization, loan solvency prediction, direct marketing, and many others. This breadth of use cases has created a need for lines of business to quickly generate high-quality and performant models that can be produced with little to no code. This reduces the long cycles for taking use cases from concept to production and generates business value. In this post, we explore how to use Amazon SageMaker Autopilot for some common use cases in the financial services industry.

Autopilot automatically generates pipelines, trains and tunes the best ML models for classification or regression tasks on tabular data, while allowing you to maintain full control and visibility. Autopilot enables automatic creation of ML models without requiring any ML experience. Autopilot automatically analyzes the dataset, processes the data into features, and trains multiple optimized ML models.

Data scientists in financial services often work on tasks where the datasets are highly imbalanced (heavily skewed towards examples of one class). Examples of such tasks include credit card fraud (where a very small fraction of the transactions are actually fraudulent) or bankruptcy (only few corporations file for bankruptcy). We demonstrate how Autopilot automatically handles class imbalance without requiring any additional inputs from the user.

Autopilot recently announced the ability to tune models using the Area Under a Curve (AUC) metric in addition to F1 as the objective metric (which is the default objective for binary classification tasks), more specifically the area under the Receiver Operating Characteristic (ROC) curve. In this post, we show how using the AUC as the model evaluation metric for highly imbalanced data allows Autopilot to generate high-quality models.

Our first use case is to detect credit card fraud based on various anonymized attributes. The dataset is highly imbalanced, with over 99% of the transactions being non-fraudulent. Our second use case is to predict bankruptcy of Polish companies [2]. Here, bankruptcy is similarly a binary response variable (will bankrupt = 1, will not bankrupt = 0), with 96% of the companies not becoming bankrupt.

Prerequisites

To reproduce these steps in your own environment, you must complete the following prerequisites:

Credit card fraud detection

In fraud detection tasks, companies are interested in maintaining a very low false positive rate while correctly identifying the fraudulent transactions to the greatest extent possible. A false positive can lead to a company canceling or placing a hold on a customers’ card over a legitimate transaction, which leads to a poor customer experience. As a result, accuracy is not the best metric to consider for this problem; better metrics are the AUC and the F1 score.

The following code shows data for a credit card fraud task:

import pandas as pd 
fraud_df = pd.read_csv('creditcard.csv') 
fraud_df.head(5)

You can click on the previous table to expand for better viewing.

Class 0 and class 1 correspond to No Fraud and Fraud accordingly. As we can see, other than Amount, other columns are anonymized. A key differentiator of Autopilot is its ability to process raw data directly, without the need for data processing on the part of data scientists. For example, Autopilot automatically converts categorical features into numerical values, handles missing values (as we show in the second example), and performs simple text preprocessing.

Using the AWS boto3 API or the AWS Command Line Interface (AWS CLI), we upload the data to Amazon S3 in CSV format:

import boto3
s3 = boto3.client('s3')
s3.upload_file(file_name, bucket, object_name=None)

fraud_df = pd.read_csv(<your S3 file location>)

Now, we select all columns except Class as features and Class as target:

X = fraud_df[set(fraud_df.columns) - set(['Class'])]
y = fraud_df['Class']
print (y.value_counts())
0    284315
1       492

The binary label column Class is highly imbalanced, which is a typical occurrence in financial use cases. We can verify how well Autopilot handles this highly imbalanced data.

In the following code, we demonstrate how to configure Autopilot in Jupyter notebooks. We have to provide train and test files, and to set TargetAttributeName as Class, this is the target column (the column we predict):

auto_ml_job_name = 'automl-creditcard-fraud'
import boto3
sm = boto3.client('sagemaker')
import sagemaker  
session = sagemaker.Session()

prefix = 'sagemaker/' + auto_ml_job_name
bucket = session.default_bucket()
training_data = pd.DataFrame(X_train)
training_data['Class'] = list(y_train)
test_data = pd.DataFrame(X_test)

train_file = 'train_data.csv';
training_data.to_csv(train_file, index=False, header=True)
train_data_s3_path = session.upload_data(path=train_file, key_prefix=prefix + "/train")
print('Train data uploaded to: ' + train_data_s3_path)

test_file = 'test_data.csv';
test_data.to_csv(test_file, index=False, header=False)
test_data_s3_path = session.upload_data(path=test_file, key_prefix=prefix + "/test")
print('Test data uploaded to: ' + test_data_s3_path)
input_data_config = [{
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': 's3://{}/{}/train'.format(bucket,prefix)
        }
      },
      'TargetAttributeName': 'Class'
    }
  ]

Next, we create the Autopilot job. For this post, we set ProblemType='BinaryClassification' and job_objective='AUC'. If you don’t set these fields, Autopilot automatically determines the type of supervised learning problem by analyzing the data and uses the default metric for that problem type. The default metric for binary classification is F1. We explicitly set these parameters because we want to optimize AUC.

from sagemaker.automl.automl import AutoML
from time import gmtime, strftime, sleep
from sagemaker import get_execution_role

timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())
base_job_name = 'automl-card-fraud' 

target_attribute_name = 'Class'
role = get_execution_role()
automl = AutoML(role=role,
                target_attribute_name=target_attribute_name,
                base_job_name=base_job_name,
                sagemaker_session=session,
                problem_type='BinaryClassification',
                job_objective={'MetricName': 'AUC'},
                max_candidates=100)                
 

For more information about the parameters for job configuration, see create-auto-ml-job.

After the Autopilot job is created, we call the fit() function to run it:

automl.fit(train_file, job_name=base_job_name, wait=False, logs=False)
describe_response = automl.describe_auto_ml_job()
print (describe_response)
job_run_status = describe_response['AutoMLJobStatus']
    
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = automl.describe_auto_ml_job()
    job_run_status = describe_response['AutoMLJobStatus']
    print (job_run_status)
    sleep(30)
print ('completed')

When the job is complete, we can select the best candidate based on the AUC objective metric:

best_candidate = automl.describe_auto_ml_job()['BestCandidate']
best_candidate_name = best_candidate['CandidateName']
print("CandidateName: " + best_candidate_name)
print("FinalAutoMLJobObjectiveMetricName: " + best_candidate['FinalAutoMLJobObjectiveMetric']['MetricName'])
print("FinalAutoMLJobObjectiveMetricValue: " + str(best_candidate['FinalAutoMLJobObjectiveMetric']['Value']))
CandidateName: tuning-job-1-7e8f6c9dffe840a0bf-009-636d28c2
FinalAutoMLJobObjectiveMetricName: validation:auc
FinalAutoMLJobObjectiveMetricValue: 0.9890000224113464

We now create the Autopilot model object using the model artifacts from the Autopilot job in Amazon S3, and the inference container from the best candidate after running the tuning job. In addition to the predicted label, we’re interested in the probability of the prediction—we use this probability later to plot the AUC and precision and recall graphs.

model_name = 'automl-cardfraud-model-' + timestamp_suffix
inference_response_keys = ['predicted_label', 'probability']
model = automl.create_model(name=best_candidate_name,
candidate=best_candidate,inference_response_keys=inference_response_keys)

After the model is created, we can generate inferences for the test set using the following code. During inference time, Autopilot orchestrates deployment of the inference pipeline, including feature engineering and the ML algorithm on the inference machine.

s3_transform_output_path = 's3://{}/{}/inference-results/'.format(bucket, prefix);
output_path = s3_transform_output_path + best_candidate['CandidateName'] +'/'
transformer=model.transformer(instance_count=1, 
                          instance_type='ml.m5.xlarge',
                          assemble_with='Line',
                          output_path=output_path)
transformer.transform(data=test_data_s3_path, split_type='Line', content_type='text/csv', wait=False)

describe_response = sm.describe_transform_job(TransformJobName = transform_job_name)
job_run_status = describe_response['TransformJobStatus']
print (job_run_status)

while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = sm.describe_transform_job(TransformJobName = transform_job_name)
    job_run_status = describe_response['TransformJobStatus']
    print (describe_response)
    sleep(30)
print ('transform job completed with status : ' + job_run_status)

Finally, we read the inference and predicted data into a dataframe:

import json
import io
from urllib.parse import urlparse

def get_csv_from_s3(s3uri, file_name):
    parsed_url = urlparse(s3uri)
    bucket_name = parsed_url.netloc
    prefix = parsed_url.path[1:].strip('/')
    s3 = boto3.resource('s3')
    obj = s3.Object(bucket_name, '{}/{}'.format(prefix, file_name))
    return obj.get()["Body"].read().decode('utf-8')    
pred_csv = get_csv_from_s3(transformer.output_path, '{}.out'.format(test_file))
data_auc=pd.read_csv(io.StringIO(pred_csv), header=None)
data_auc.columns= ['label', 'proba']

Model metrics

Common metrics to compare classifiers are the ROC curve and the precision-recall curve. The ROC curve is a plot of the true positive rate against the false positive rate for various thresholds. The higher the prediction quality of the classification model, the more the ROC curve is skewed toward the top left.

The precision-recall curve demonstrates the trade-off between precision and recall, with the best models having a precision-recall curve that is flat initially and drops steeply as the recall approaches 1. The higher the precision and recall, the more the curve is skewed towards the upper right.

To optimize for the F1 score, we simply repeat the steps from earlier, setting the job_objective={'MetricName': 'F1'} and rerunning the Autopilot job. Because the steps are identical, we don’t repeat them in this section. Please note, F1 objective is default for binary classification problems. The following code plots the ROC curve:

import matplotlib.pyplot as plt
colors = ['blue','green']
model_names = ['Objective : AUC','Objective : F1']
models = [data_auc,data_f1]
from sklearn import metrics
for i in range(0,len(models)):
    fpr, tpr, _ = metrics.roc_curve(y_test, models[i]['proba'])
    fpr, tpr, _  = metrics.roc_curve(y_test, models[i]['proba'])
    auc_score = metrics.auc(fpr, tpr)
    plt.plot(fpr, tpr, label=str('Auto Pilot {:.2f} '+ model_names[i]).format(auc_score),color=colors[i]) 
        
plt.xlim([-0.1,1.1])
plt.ylim([-0.1,1.1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.legend(loc='lower right')
plt.title('ROC Cuve')

The following plot shows the results.

In the preceding AUC ROC plot, Autopilot models provide high AUC when optimizing both objective metrics. We also didn’t select any specific model or tune any hyperparameters; Autopilot did all that heavy lifting for us.

Finally, we plot the precision-recall curves for the trained Autopilot model:

from sklearn.metrics import precision_recall_curve
from sklearn.metrics import f1_score, precision_score, recall_score
from sklearn.metrics import plot_precision_recall_curve
import matplotlib.pyplot as plt
from sklearn import metrics

colors = ['blue','green']
model_names = ['Objective : AUC','Objective : F1']
models = [data_auc,data_f1]

print ('model ', 'F1 ', 'precision ', 'recall ')
for i in range(0,len(models)):
precision, recall, _ = precision_recall_curve(y_test, models[i]['proba'])
print (model_names[i],f1_score(y_test, np.array(models[i]['label'])),precision_score(y_test, models[i]['label']),recall_score(y_test, models[i]['label']) )
plt.plot(recall,precision,color=colors[i],label=model_names[i])

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc='upper right')
plt.show()

                    F1          precision      recall 
Objective : AUC 0.8164          0.872          0.7676
Objective : F1  0.7968          0.8947         0.7183

The following plot shows the results.

As we can see from the plot, Autopilot models provide good precision and recall, because the graph is heavily skewed toward the top-right corner.

Autopilot outputs

In addition to handling the heavy lifting of building and training the models, Autopilot provides visibility into the steps taken to build the models by generating two notebooks: CandidateDefinitionNotebook and DataExplorationNotebook.

You can use the candidate definition notebook to interactively step through the steps taken by Autopilot to arrive at the best candidate. You can also use this notebook to override various runtime parameters like parallelism, hardware used, algorithms explored, feature engineering scripts, hyperparameter tuning ranges, and more.

You can download the notebook from the following Amazon S3 location:

automl.describe_auto_ml_job()['AutoMLJobArtifacts']['CandidateDefinitionNotebookLocation']

The notebook also outlines the various feature engineering steps taken to build the models. The models are indexed by their model type and the feature engineering pipeline. For example, as shown in the Tuning Job Result Overview, the winning model corresponds to the pipeline dpp1-xgboost:

best_candidate_name = best_candidate['CandidateName']
print(best_candidate). From there if we look at 
print (describe_response)

If we search for ModelDataUrl, we can find Autopilot used dpp1-xgboost 'ModelDataUrl': 's3://sagemaker-us-east-1-<ACCOUNT-NUM>/automl-card-fraud-7/tuning/automl-car-dpp1-xgb/tuning-job-1-7e8f6c9dffe840a0bf-009-636d28c2/output/model.tar.gz'.

dpp1-xgboost is a data transformation strategy that transforms numeric features using RobustImputer. It merges all the generated features and applies RobustPCA followed by RobustStandardScaler. The transformed data is used to tune an XGBoost model.

From the candidate definition notebook, we can also see that Autopilot automatically applied up-weighting to the minority class using scale_pos_weight. This improves prediction quality for imbalanced datasets where the model doesn’t see many examples of the minority class during training. You can change the scale_pos_weight to a different value:

STATIC_HYPERPARAMETERS = {
    'xgboost': {
        'objective': 'binary:logistic',
        'scale_pos_weight': 568.6114285714285,
    },
}

The data exploration notebook generates a report that provides insights about the input dataset, such as the missing values or the data types for the different features:

automl.describe_auto_ml_job()['AutoMLJobArtifacts']['DataExplorationNotebookLocation'] 

Having described in detail the use of Autopilot to detect credit card fraud, we now briefly discuss a second task: predicting the bankruptcy of companies.

Predicting bankruptcy of Polish companies

For this post, we explore the various economic attributes in the Polish companies bankruptcy data dataset. There are 64 features and a target attribute class. We rename the column class to bankrupt (not bankrupt = 0, bankrupt = 1) for clarity. As noted before, this dataset is also highly imbalanced, with 96% of the data in the non-bankrupt category.

You can click on the previous table to expand for better viewing.

We followed the same process for running and configuring Autopilot as in the credit card fraud use case. However, unlike the credit card fraud dataset, this dataset contains missing values. Because Autopilot automatically handles missing values, we simply pass the raw data to Autopilot.

We don’t repeat the code steps in this section; we merely show the ROC and precision-recall curves. Autopilot again yields high-quality models as evidenced from the AUC, ROC, and precision-recall curves. For bankruptcy prediction, incorrectly predicting false negatives can lead to poor investment decisions, and incorrectly predicting that solvent companies may go bankrupt might lead to missed opportunities.

To boost model performance, Autopilot also automatically up-weights the minority class label, penalizing the model for mis-classifying the minority class during training. The following plot shows the precision-recall curve.

The following plot shows the ROC curve.

As we can see from these plots, for bankruptcy, the AUC objective is slightly better than F1. Autopilot can generate accurate predictions for a complex event like bankruptcy without any specialized manual feature-engineering steps.

Cleaning up

The Autopilot job creates many underlying artifacts, such as dataset splits, preprocessing scripts, and preprocessed data. To avoid incurring costs, delete these resources using the following code:

#s3 = boto3.resource('s3')
#bucket = s3.Bucket(bucket)
 
#job_outputs_prefix = '{}/output/{}'.format(prefix,auto_ml_job_name)
#bucket.objects.filter(Prefix=job_outputs_prefix).delete()

Conclusion

In this post, we demonstrated how to create ML models without any prior knowledge of algorithms using Autopilot. For imbalanced data, which is common in financial services use cases, we showed that using objective metrics such as AUC and F1 along with the automatic minority class up-weighting can lead to high-quality models. Autopilot provides the flexibility of AutoML with the control and detail of a do-it-yourself approach by unveiling the underlying metadata and the code used to preprocess the data and train the models. Importantly, AutoPilot works on datasets of all sizes ranging from few MBs to hundreds of GBs without you having to set up the underlying infrastructure. Finally, note that Amazon SageMaker Studio provides a UI for you to build, train, and deploy models using Autopilot with little to no code. For more information about tuning, training, and deploying Autopilot models, see Create a machine learning model automatically with Amazon SageMaker Autopilot.

References

[1] Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

[2] Zieba, M., Tomczak, S. K., & Tomczak, J. M. (2016). Ensemble Boosted Trees with Synthetic Features Generation in Application to Bankruptcy Prediction. Expert Systems with Applications.


About the Authors

Sumon Samanta is a Senior Specialist Architect for Global Financial Services at AWS. Previously, he worked as a Quantitative Developer at several investment banks to develop pricing and risk systems.

 

 

 

Stefan Natu is a Sr. Machine Learning Specialist at Amazon Web Services. He is focused on helping financial services customers build end-to-end machine learning solutions on AWS. In his spare time, he enjoys reading machine learning blogs, playing the guitar, and exploring the food scene in New York City.

 

 

Ilya Epshteyn is a solutions architect with AWS. He helps customers to innovate on AWS by building highly available, scalable, and secure architectures. He enjoys spending time outdoors and building Lego creations with his kids.

 

 

 

Miroslav Miladinovic is a Software Development Manager at Amazon SageMaker.

 

 

 

 

Jean Baptiste Faddoul is an Applied Science Manager working on SageMaker Autopilot and Automatic Model Tuning

 

 

 

Yotam Elor is a Senior Applied Scientist at AWS Sagemaker. He works on Sagemaker Autopilot – AWS’s auto ML solution.

Read More

How to train procedurally generated game-like environments at scale with Amazon SageMaker RL

A gym is a toolkit for developing and comparing reinforcement learning algorithms. Procgen Benchmark is a suite of 16 procedurally-generated gym environments designed to benchmark both sample efficiency and generalization in reinforcement learning.  These environments are associated with the paper Leveraging Procedural Generation to Benchmark Reinforcement Learning (citation). Compared to Gym Retro, these environments have the following benefits:

  • Faster – Gym Retro environments are already fast, but Procgen environments can run over four times faster.
  • Non-deterministic – Gym Retro environments are always the same, so you can memorize a sequence of actions that gets the highest reward. Procgen environments are randomized so this isn’t possible.
  • Customizable – If you install from source, you can perform experiments where you change the environments, or build your own environments. The environment-specific code for each environment is often less than 300 lines. This is almost impossible with Gym Retro.

This post demonstrates how to use the Amazon SageMaker reinforcement learning starter kit for the NeurIPS 2020 – Procgen competition hosted on AIcrowd. The competition was held from June to November 2020, and results can be found here but you can still try out the solution on your own. Our solution allows participants using AIcrowd’s existing neurips2020-procgen-starter-kit to get started with SageMaker seamlessly without making any algorithmic changes. It also helps you reduce the time and effort required to build your sample-efficient reinforcement learning solutions using homogenous and heteregeneous scaling.

Finally, our solution utilizes Spot Instances to reduce cost. The cost savings with Spot GPU Instances is approximately 70% for GPU instances such as ml.p3.2x and ml.p3.8x when training with a popular state-of-the-art reinforcement learning algorithm, Proximate Policy Optimization, and a multi-layer convolutional neural network as the agent’s policy.

Architecture

As part of the solution, we use the following services:

SageMaker reinforcement learning uses Ray and RLLib the same as in the starter kit. SageMaker supports distributed reinforcement learning in a single SageMaker ML instance with just a few lines of configuration by using the Ray RLlib library.

A typical SageMaker reinforcement learning job for an actor-critic algorithm uses GPU instances to learn a policy network and CPU instances to collect experiences for faster training at optimized costs. SageMaker allows you to achieve this by spinning up two jobs within the same Amazon VPC, and the communications between the instances are taken care of automatically.

Cost

You can contact AICrowd to get credits to use any AWS service.

You’re responsible for the cost of the AWS services used while running this solution, and should set up a budget alert when you’ve reached 90% of your allotted credits. For more information, see Amazon SageMaker Pricing.

As of September 1, 2020, SageMaker training costs (excluding notebook instances) are as follows:

  • c5.4xlarge – $0.952 per hour (16 vCPU)
  • g4dn.4xlarge – $1.686 per hour (1 GPU, 16 vCPU)
  • p3.2xlarge – $4.284 per hour (1 GPU, 8 vCPU)

Launching the solution

To launch the solution, complete the following steps:

  1. While signed in to your AWS account, choose the following link to create the AWS CloudFormation stack for the Region you want to run your notebook:

You’re redirected to the AWS CloudFormation console to create your stack.

  1. Acknowledge the use of the instance type for your SageMaker notebook and training instance.

Make sure that your AWS account has the limits for required instances. If you need to increase the limits for the instances you want to use, contact AWS Support.

  1. As the final parameter, provide the name of the S3 bucket for the solution.

The default name is neurips-2020. You should provide a unique name to make sure there are no conflicts with your existing S3 buckets. An S3 bucket name is globally unique, and the namespace is shared by all AWS accounts. This means that after a bucket is created, the name of that bucket can’t be used by another AWS account in any AWS Region until the bucket is deleted.

  1. Choose Create stack.

You can monitor the progress of your stack by choosing the Event tab or refreshing your screen. If you encounter any error during stack creation (such as confirming again that your S3 bucket name is unique), you can delete the stack and launch it again. When stack creation is complete, go to the SageMaker console. Your notebook should already be created and its status should read InService.

You’re now ready to start training!

  1. On the SageMaker console, choose Notebook instances.
  2. Locate the rl-procgen-neurips instance and choose Open Jupyter or Open JupyterLab.
  3. Choose the notebook 1_train.ipynb.

You can use the cells following the training to run evaluations, do rollouts, and visualize your outcome.

Configuring algorithm parameters and the agent’s neural network model

To configure your RLLib algorithm parameters, go to your notebook folder and open source/train-sagemaker-distributed-{}.py. A subset of algorithm parameters are provided for PPO, but for the full set of algorithm-specific parameters, see Proximal Policy Optimization (PPO). For baselines provided in the starter kit, refer to experiments{}.yaml files and copy additional parameters to the RLLib configuration parameters in the source/train-sagemaker-distributed-.py.

To check whether your model is using the correct parameters, go to the S3 bucket and navigate to the JSON file with the parameters. For example, {Amazon SageMaker training job} >output>intermediate>training>{PPO_procgen_env_wrapper_}>param.json.

To add a custom model, create a file inside the models/ directory and name it models/my_vision_network.py.

For a working implementation of how to add a custom model, see the GitHub repo. You can set the custom_model field in the experiment .yaml file to my_vision_network to use that model.

Make sure that the model is registered. If you get an error that your model isn’t registered, go to train-sagmaker.py or train-sagmaker-distributed.py and edit def register_algorithms_and_preprocessors(self) by adding the following code:

ModelCatalog.register_custom_model("impala_cnn_tf", ImpalaCNN)

Distributed training with multiple instances

SageMaker supports distributed reinforcement learning in a single SageMaker ML instance with just a few lines of configuration by using the Ray RLlib library.

In homogeneous scaling, you use multiple instances with the same type (typically CPU instances) for a single SageMaker job. A single CPU core is reserved for the driver, and you can use the remaining as rollout workers, which generate experiences through environmental simulations. The number of available CPU cores increases with multiple instances. Homogeneous scaling is beneficial when experience collection is the bottleneck of the training workflow; for example, when your environment is computationally heavy.

With more rollout workers, neural network updates can often become the bottleneck. In this case, you could use heterogeneous scaling, in which you use different instance types together. A typical choice is to use GPU instances to perform network optimization and CPU instances to collect experiences for faster training at optimized costs. SageMaker allows you to achieve this by spinning up two jobs within the same Amazon VPC, and the communications between the instances are taken care of automatically.

To run distributed training with multiple instances, use 2_train-homo-distributed-cpu.ipynb / 3_train-homo-distributed-gpu.ipynb and train-hetero-distributed.ipynb for homogenous and heterogenous scaling, respectively. The configurable parameters for distributed training are stored in source/train-sagemaker-distributed.py. You don’t have to configure ray_num_cpus or ray_num_gpus.

Make sure you scale num_workers and train_batch_size to reflect the number of instances in the notebook. For example, if you set train_instance_count = 5 for a p3.2xlarge instance, the maximum number of workers is 39. See the following code:

"num_workers": 8*5 -1, # adjust based on total number of CPUs available in the cluster, e.g., p3.2xlarge has 8 CPUs and 1 CPU is reserved for resource allocation
  "num_gpus": 0.2, # adjust based on number of GPUs available in a single node, e.g., p3.2xlarge has 1 GPU
  "num_gpus_per_worker": 0.1, # adjust based on number of GPUs, e.g., p3.2x large (1 GPU - num_gpus) / num_workers = 0.1
  "rollout_fragment_length": 140,
  "train_batch_size": 64 * (8*5 -1),

To use a Spot Instance, you need to set the flag train_use_spot_instances = True in the final cell of train-homo-distributed.ipynb or train-hetero-distributed.ipynb. You can also use the MaxWaitTimeInSeconds parameter to control the total duration of your training job (actual training time plus waiting time).

Summary

We compared our starter kit with three different GPU instances (ml.g4n.4x,  ml.p3.2x, and ml.p3.8x) using single and multiple instances. On all GPU instances, our Spot Instance training provided a 70% cost reduction. This means that you spend less than $1 with the starter kit hyperparameters for the competition’s benchmarking solution, such as, 8 MM steps, using an ml.p3.2x instance.

Our starter kit allows you to run multiple instances of ml.p3.2x with 1 GPU versus a single instance of ml.p3.8x with 4 GPUs. We observed that running a single instance of ml.p3.8x with 4 GPUs is more cost-effective than running five instances of ml.p3.2x (=5 GPUs) due to communication overhead. The single instance training with ml.p3.8x converges in 20 minutes, helping you iterate faster to meet the competition deadline.

Finally, we observed that ml.g4n.4x instances provide an additional 40% cost reduction over 70% reduction from Spot Instances. However, it takes longer to train: 45 minutes with ml.p3.8x versus 70 minutes with ml.g4n.4x.

To get started with this solution, sign in to your AWS account and choose the quick create to launch the CloudFormation stack for the Region you want to run your notebook.


About the Authors

Jonathan Chung is an Applied scientist in AWS. He works on applying deep learning to various applications including games and document analysis. He enjoys cooking and visiting historical cities around the world.

 

 

 

Anna Luo is an Applied Scientist in AWS. She works on utilizing reinforcement learning techniques for different domains including supply chain and recommender system. Her current personal goal is to master snowboarding.

 

 

 

Sahika Genc is a Principal Applied Scientist in the AWS AI team. Her current research focus is deep reinforcement learning (RL) for smart automation and robotics. Previously, she was a senior research scientist in the Artificial Intelligence and Learning Laboratory at the General Electric (GE) Global Research Center, where she led science teams on healthcare analytics for patient monitoring.

 

Read More

AWS Announces the global expansion of AWS CCI Solutions

We’re excited to announce the global availability of AWS Contact Center Intelligence (AWS CCI) solutions powered by AWS AI Services and made available through the AWS Partner Network. AWS CCI solutions enable you to leverage AWS machine learning (ML) capabilities with your current contact center provider to gain greater efficiencies and deliver increasingly tailored customer experiences —with no ML expertise required.

AWS CCI solutions use a combination of AWS AI-powered services for text-to-speech, translation, intelligent search, conversational AI, transcription, and language comprehension capabilities. We’re delighted to announce the addition of AWS Technology Partners: Salesforce, Avaya, Talkdesk, 8×8, Clarabridge, Clevy, XappAI, and Voiceworx. We are also adding new AWS Consulting Partners: Inawisdom, Cation Consulting, HCL Technologies, Wipro, First Derivatives, Servion, and Lucy in the Cloud/Micropole for customers who require a custom solution or seek additional assistance with AWS CCI. These new partners provide customers across the globe more opportunities to benefit from AWS ML-powered contact center intelligence solutions to enhance self-service, analyze calls in real time to assist agents, and learn from all contact center interactions with post-call analytics.

Around the world, the volume of interactions in contact centers continues to increase. Companies see multiple opportunities to leverage AI technology to improve the customer experience. This can include 24/7 self-serve virtual agents that can provide timely and accurate answers to customer queries, call analytics and agent assist to improve agent productivity, or call analytics to generate further improvements in their operations. However, piecing together the various technologies to build an ML-driven intelligent contact center unique to the goals and needs of each business can be a significant undertaking. You want the benefits that intelligent contact center technologies bring, but the resources, time and cost to implement are often too high to overcome. AWS CCI provides a simple and fast route to deploy AWS ML solutions no matter which contact center provider you use.

AWS CCI customer success stories

Multiple customers already benefit from an improved customer experience and reduced operational costs as a result of using AWS CCI solutions through AWS Partners. Here are some example of AWS CCI customer stories.

Maximus is a leading pure-play provider in the administration of government health and human services programs, and is the largest provider of contact center services to the government. Tom Romeo, the General Manager at Maximus Federal, says, “At Maximus, we are constantly looking for new ways to innovate and improve the Citizen Journey and contact center experience. With AWS Partner SuccessKPI, we were able to add AWS CCI into our Genesys Cloud environment in a matter of hours and deliver a 360-degree view of the citizen experience. This program allowed us to deliver increased capacity, automated quality review, and agent compliance and performance improvements for government agencies.”

Magellan Health is a large managed health care company focused on special population, complete pharmacy benefits and other specialty areas. Brian Lichtle, the Senior Director of Software Engineering at Magellan Rx says,
“We chose Amazon Kendra, a service within AWS CCI to build a secure and scalable agent assist application. This helped call center agents, and the customer they serve quickly uncover the information they need. Since implementing CCI and Amazon Kendra, early results show an average reduction in call times of about 9-15 seconds, which saves more than 4.4k hours on over 2.2 million calls per calendar year.”

Cation Consulting is an AWS consulting partner focused on delivering robust, conversational AI experiences to customers. Alan Kiernan, the co-founder and CTO at Cation Consulting says, “At Cation Consulting, we provide customers with conversational AI and self-service experiences that allow them to significantly reduce customer support costs while improving the customer experience. AWS Contact Center Intelligence enables us to move quickly and scale seamlessly with customers such as Ryanair, the largest airline in Europe. The Ryanair chatbot has handled millions of customer enquiries per year as a trusted extension of the Ryanair’s customer care team. We are excited to leverage Amazon Lex’s recent expansion into European languages and design virtual agents who can resolve customer issues quickly and improve customer service ratings.”

New AWS CCI language support and partner additions

In addition to our new partners, AWS CCI continues to expand its global capabilities with new language expansions. AWS CCI has 3 pre-configured solutions available through participating APN partners, focused on the contact center workflow: Self-Service, Live Call Analytics and Agent Assist, and Post-Call Analytics. The Self-Service solution uses ML-driven chatbots and Interactive Voice Response (IVR) to address and deflect the most common tasks and queries so that the contact center workforce can focus on resolving interactions that need a human touch. The Self-Service solution utilizes the conversational interface of Amazon Lex and the text to speech voices of Amazon Polly to create a dynamic virtual agent in multiple languages such as French, German, Italian, and Spanish. Adding Amazon Kendra can boost the ability of these virtual agents to answer questions by finding the best answers from internal knowledge bases. The Live Call Analytics & Agent Assist and Post-Call Analytics solutions use Amazon Transcribe to perform real-time or post-call speech transcription with Amazon Comprehend to automatically analyze the interaction, detect call sentiment, and identify key words and phrases in the conversation using natural language processing (NLP) to increase agent productivity. These key words can then be utilized by the intelligent search capabilities of Amazon Kendra to help agents find timely and relevant information to resolve live call issues more quickly. Transcribing live calls is now available in German, Italian, Japanese, Korean, and Portuguese languages. Amazon Translate can also be used to translate calls into an agent’s preferred language and supports a total of 71 languages and variants.

“At Amazon, we want to meet the customer wherever they are in their contact center journey. With AWS CCI, we wanted to make it easy for customers who use different contact centers providers to add AI and achieve new levels of operational efficiency.” says Vasi Philomin, GM of AWS Language Services, AI. “Having a global partner network is critical to enabling our customers to realize the benefits of cloud-based machine learning services and removing the need to hire specialized developers to build and maintain these systems.”

Talkdesk is a cloud contact center for innovative enterprises, combining enterprise performance with consumer simplicity resulting in higher customer satisfaction, productivity and cost savings. Tiago Paiva, chief executive officer at Talkdesk, shares, “Combining Talkdesk cloud innovations with powerful AI and machine learning services from AWS extends the capabilities and choices available to Talkdesk customers. We are excited to add new, out-of-the-box options through AWS Contact Center Intelligence solutions to help the Talkdesk user base rise above their market peers through superior customer service.”

8×8 is a leading contact center provider. Manu Mukerji, the Vice President of Engineering at 8×8, Inc., says, “By partnering with AWS, we can deliver to businesses and organizations superior bi-directional integration with AWS CCI, providing a best-in-class experience for customers. The 8×8 integration with AWS CCI makes it easy for customers to leverage AI capabilities even if they have no AI experience. The 8×8 Virtual Agent is the only fully managed and customizable solution in the market that works seamlessly for both unified communications and contact center use cases, enhancing contact center efficiency for reduced wait times and faster time to resolution.”

Pat Higbie, Co-founder and CEO of XAPP AI, an AWS Technology Partner, says, “Amazon Lex, Amazon Kendra and Amazon Polly provide a powerful combination of AI services that enables contact centers to transcend the limitations of traditional chatbots and IVR to transform their operations with truly conversational self-service that improves the customer experience and delivers dramatic ROI. And, AWS CCI solutions can be integrated with all contact center brands to bring the value of AWS AI services to any enterprise quickly.”

We are excited to have all these new partners join

Getting started

There are multiple ways to get started with AWS CCI. To find a participating partner, see the AWS CCI partner page for more information and contact details.

To learn more, please join us for any or all of the following sessions hosted by AWS and our AWS CCI partners.

re:Invent sessions

Learn how you can leverage AWS CCI solutions to improve the customer experience and reduce cost with AI. Explore how AWS CCI solutions can be built easily through an expanding network of partners to provide self-service interactions, live and post-call analytics, and agent assist on existing contact center systems. AWS Partner SuccessKPI shares how it uses CCI solutions to improve the customer experience and tackle challenging business problems such as reducing call volume, improving agent effectiveness, and automating quality management in enterprise contact centers for customers like Maximus.

Numerous stakeholders including content designers, developers, and business owners collaborate to create a bot. In this session, hear how Dropbox used the Amazon Lex interface to build a chatbot as a support offering. The session covers how the Amazon Lex console allows you to easily create flows and manage them, and it details the decoupling that should exist between the bot author and developer for an optimal collaborative model. Finally, this session provides insights into conversational interface (CI) and conversational design (CD), language localization, and deployment practices.

 Answering customer questions is essential to the customer support experience. Powered by ML, Amazon Kendra is an enterprise search service that can add Q&A capabilities to your virtual agents or boost call center agent productivity with live call agent assistance. In this session, you hear how Magellan RX Management augmented the call center experience using Amazon Kendra to help agents find accurate information faster.

In this session, learn how to train custom language models in Amazon Transcribe that supercharge speech recognition accuracy. Octopus Energy, a UK-based utility company, shares how it leverages domain-specific data to train a custom language model that is fine-tuned for its business needs and specific use case.

Partner sessions

  • How to boost the return on your contact center investments with AI
    January 26 at 10:00 am PST – REGISTER HERE
    Presented by Acqueon and AWS

With AI technologies maturing, enterprises are embracing them to delight customers and improve the operational productivity of their contact centers. In this educational webinar, AI expert Chris Featherstone, Global Business Development Leader for AWS CCI and industry veteran Nicolas de Kouchkovsky, CMO at Acqueon, discuss how to integrate AI into your contact center software stack. They will provide an update on industry adoption and share the art of the possible without having to overhaul your technology investments.

  • Gain Control of your CX with a 360 CCI Power View: A step by step guide
    January 27, 2021 at 1PM EST/10AM PST – REGISTER HERE
    Presented by SuccessKPI and AWS

Managing customer experience requires tackling a complex set of metrics across agents, queues, geographies, customer types, and channels. Mix in the data from speech analytics, chatbots, and post call surveys, and the picture gets blurry very quickly. In this informative webinar, we explore the factors that make customer experience management such a quagmire and provide a series of recommendations and steps to help put you in control of your customer experience.

  • Add Intelligence to your existing contact center with AWS Contact Center Intelligence and Talkdesk
    February 24, 2021 at 9am BRT, 9am MXT, and 9am PST – REGISTER HERE
    Presented by Talkdesk and AWS at AWS Innovate – AI/ML Edition

Learn how your organization can leverage AWS Contact Center Intelligence (CCI) solutions and AWS Partner, Talkdesk, to improve customer experience and reduce cost with AI. We will explore how AWS CCI solutions can be built easily to provide self-service interactions, live and post-call analytics and agent assist on existing contact center systems. Talkdesk will also share how they improve customer experience and tackle challenging business problems such as improving agent effectiveness, and automating quality management in enterprise contact centers.


About the Author

Eron Kelly is the worldwide leader of Product Marketing for a broad portfolio of AWS services that cover Compute, Storage, Networking, Contact Centers, End User Computing and Business Applications. In this capacity, his team leads all aspects of product marketing including messaging, positioning, launches, web strategy and execution, service adoption, and field enablement. Prior to AWS, he has led sales and marketing teams at Microsoft, Proctor and Gamble and was a Captain in the Air Force. Outside of work, Mr. Kelly is very active raising a family of four kids. He is a member of the Board of Trustees at Eastside Catholic School in Sammamish, WA, and spent the last 10 years coaching youth lacrosse.

Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.

Read More

Hosting a private PyPI server for Amazon SageMaker Studio notebooks in a VPC

Amazon SageMaker Studio notebooks provide a full-featured integrated development environment (IDE) for flexible machine learning (ML) experimentation and development. Security measures secure and support a versatile and collaborative environment. In some cases, such as to protect sensitive data or meet regulatory requirements, security protocols require that public internet access be disabled in the development environment.

Typically, developers have access to the public internet and can install any new libraries you want to import. You can install Python packages from the public Python Package Index (PyPI), a Python software repository, using standard tools such as pip. You can find hundreds of thousands of packages, including common packages such as NumPy, Pandas, Matplotlib, Pytest, Requests, Django, and BeautifulSoup.

In a development environment with internet access disabled, you can instead mirror packages and host your own PyPI server hosted in your own Amazon Virtual Private Cloud (Amazon VPC). A VPC is a logically isolated virtual network into which you can launch resources, such as Amazon Elastic Compute Cloud (Amazon EC2) instances and SageMaker Studio domains. You have fine-grained access control over its network connectivity. You can specify an IP address range for the VPC and associate security groups to control its inbound and outbound traffic. You can also add subnets that use a subset of IP addresses within the VPC, and choose whether each subnet is open to the public internet or is private.

When you use a local PyPI server with this architecture and install Python libraries from your SageMaker Studio notebook, you connect to your private server instead of a public package index, and all traffic remains within a single secured VPC and private subnet.

SageMaker Studio recently launched VPC integration to meet these security needs. You can now launch Studio notebooks within a private VPC, disabling internet access. To install Python packages within this secure environment, you can configure an EC2 instance in your VPC that acts as a PyPI server for your notebooks. This enables you to maintain productivity and ease of package installation while working within a private environment that isn’t accessible from the public internet.

Solution overview

This solution creates a private PyPI server on an EC2 instance, and connects it to a SageMaker Studio notebook through network configuration including a VPC, private subnet, security group, and elastic network interface. The following diagram illustrates this architecture.

The following diagram illustrates this architecture.

You complete the following steps to implement this solution:

  1. Launch an EC2 instance within a VPC, subnet, and security group.
  2. Configure the instance to function as a private PyPI server.
  3. Create a VPC endpoint and add security group rules.
  4. Create a VPC-only SageMaker Studio domain, user, and notebook with the necessary permissions and networking.
  5. Install a Python package from the PyPI server onto the SageMaker Studio notebook.

Prerequisites

This is an intermediate-level solution with the following prerequisites:

  • An AWS account
  • Sufficient level of access to create Amazon SageMaker, Amazon EC2, and Amazon VPC resources
  • Familiarity with creating and modifying AWS resources on the AWS Management Console
  • Basic command-line experience, such as SSHing onto an EC2 instance, installing packages, and editing files using vim or another command-line text editor

Launching an EC2 instance

For this post, we launch a new EC2 instance in the us-east-2 Region. For the full list of available Regions supporting SageMaker Studio, see Supported Regions and Quotas.

  1. On the Amazon EC2 console, launch a new instance in a Region supporting SageMaker Studio.
  2. Choose an Amazon Linux 2 AMI.
  3. Choose a t2.medium instance (or larger t2, if preferred).
  4. On the Step 3: Configure Instance Details page, for Network, choose your VPC.
  5. For Subnet, choose your subnet.

You can use the default VPC and subnet, use other existing resources, or create new ones. Make sure to note the VPC and subnet you select for later reference.

  1. Leave all other settings as-is.
  2. Use default storage and tag settings.
  3. On the Step 6: Configure Security Group page, for Assign a security group, select Create a new security group.
  4. For Security group name, enter studio-SG.
  5. For Type, choose SSH on port range 22.
  6. For Source, choose My IP.

This allows you to SSH onto the instance from your current internet network.

  1. Create a new key pair, studio-host.
  2. Launch the instance.

For more information about launching an instance, see Tutorial: Getting started with Amazon EC2 Linux instances.

Configuring the instance as a PyPI server

To configure your instance, complete the following steps:

  1. Open a terminal window and navigate to the directory containing your .pem file.
  2. Change the key permissions and SSH onto your instance, substituting in the public IP address and Region:
    chmod 400 studio-host.pem
    ssh -i "studio-host.pem" ec2-user@ec2-x-x-x-x.{region}.compute.amazonaws.com

If needed, you can find the SSH command by selecting your instance on the console, choosing Connect, and navigating to the SSH Client tab.

  1. Install pip, which you use to install Python packages, and bandersnatch, which you use to mirror packages from the public PyPI server onto your instance. For this post, we use the package AWS Data Wrangler, an AWS Professional Services open-source library that integrates Pandas DataFrames with AWS services:
    sudo yum install python3-pip
    sudo pip3 install multidict==4.7.6
    sudo pip3 install yarl==1.6.0
    sudo pip3 install bandersnatch

You now configure bandersnatch to specify packages and their versions to mirror.

  1. Open a config file:
    sudo vim /etc/bandersnatch.conf

  1. Enter the following file contents:
    [mirror]
    directory = /pypi
    master = https://pypi.org
    timeout = 10
    workers = 3
    hash-index = false
    stop-on-error = false
    json = false
    
    [plugins]
    enabled =
        whitelist_project
        allowlist_release
    
    [whitelist]
    packages =
        awswrangler==1.10.0
        pyarrow==2.0.0
        SQLAlchemy==1.3.10
        s3fs==0.4.2
        numpy==1.18.4
        sqlalchemy-redshift==0.7.9
        boto3==1.15.10
        pandas==1.1.0
        psycopg2-binary==2.8.0
        pymysql==0.9.3
        botocore==1.18.10
        fsspec==0.7.4
        s3transfer==0.3.2
        jmespath==0.9.4
        pytz==2019.3
        python-dateutil==2.8.1
        urllib3==1.25.8
        six==1.14.0
    

  1. Mirror the libraries and list the directory contents to view that the libraries have been copied onto the instance:
    sudo /usr/local/bin/bandersnatch mirror
    ls /pypi/web/simple/

You must configure pip so that when pip is run to install packages, they are searched for within your private PyPI server instead of on the public server. The file already exists, and you add two more lines to the existing file.

  1. Open the file:
    sudo vim /etc/pip.conf

  1. Ensure your pip config file reads as follows, adding the last two lines:
    [global] 
    disable_pip_version_check = 1 
    format = columns 
    index-url = http://localhost/simple 
    trusted-host = localhost

  1. Install and configure nginx so that the instance can function as a private web server:
    sudo amazon-linux-extras install nginx1
    sudo vim /etc/nginx/nginx.conf

  1. Update the server section of the nginx config file to change the server_name to localhost, listen on the private IP address, and add the root and index locations. The server section of the nginx config file should be as follows:
    server {
            listen x.x.x.x:80;
            listen       80;
            listen       [::]:80;
            server_name localhost;
            root         /usr/share/nginx/html;
    
            # Load configuration files for the default server block.
            include /etc/nginx/default.d/*.conf;
    
            location / { root /pypi/web/; index index.html index.htm index.php; }
    
            error_page 404 /404.html;
                location = /40x.html {
            }
    
            error_page 500 502 503 504 /50x.html;
                location = /50x.html {
            }
        }
    

  2. Start the server and install the package locally to test it out:
    sudo service nginx start
    pip3 install --user awswrangler

Note that the packages are collected from the localhost, not the public package index.

You now have a private PyPI server ready for use.

Creating a VPC endpoint

VPC endpoints allow resources within a VPC to access AWS services. For this solution, you will create an endpoint for the SageMaker API. You can extend this solution by adding more endpoints for other services you need to access from your notebook.

There are two types of VPC endpoints:

  • Interface endpoints – Elastic network interfaces within a subnet that serve as entry points for traffic destined to a supported AWS service, such as SageMaker
  • Gateway endpoints – Only supported for Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB
  1. On the Amazon VPC console, choose Endpoints.
  2. Choose Create Endpoint.
  3. Create the SageMaker API endpoint com.amazonaws.{region}.sagemaker.api.
  4. Make sure you choose the same VPC, subnet, and security group used by your EC2 instance.

Make sure you choose the same VPC, subnet, and security group used by your EC2 instance.

When finished, your endpoint is listed as shown in the following screenshot.

For more information about VPC endpoints, including the distinction between interface endpoints and gateway endpoints, see VPC endpoints.

Editing your security group rules

Edit your security group to add an inbound rule allowing all traffic from within the security group. This allows the Studio notebook to communicate with the EC2 instance because they both reside within this security group.

You can search for the security group name on the Amazon EC2 console, and you receive a suggested ID.

After you add the rule, the security group has two inbound rules: one allowing SSH on port 22 from your IP to connect to the EC2 instance, and another allowing all traffic from within the security group.

For more information about security groups, see Security groups for your VPC.

Creating VPC-only SageMaker Studio resources

All SageMaker Studio resources reside within a domain, with a maximum of one domain per Region in an AWS account. A domain contains one or more users, and as a user you can open a Studio notebook. For more information about creating a domain, see CreateDomain.

With the recent release of VPC support for Studio, you can choose from two networking options: public internet only and VPC only. For more information, see Connect SageMaker Studio Notebooks to Resources in a VPC and Securing Amazon SageMaker Studio connectivity using a private VPC. For this post, we create a VPC-only domain.

  1. On the SageMaker Studio console, Select Standard setup.

This allows for detailed configuration.

  1. For Authentication method, select AWS Identity and Access Management (IAM).For Authentication method, select AWS Identity and Access Management (IAM).
  2. Under Permissions, choose Create a new role.
  3. Use the default settings.
  4. Choose Create role.

This creates a new SageMaker execution role.

  1. In the Network and Storage section, configure your VPC and subnet to match those of the EC2 instance.
  2. For Network Access for Studio, select VPC Only.
  3. For Security group(s), choose the same security group as used for the EC2 instance.
  4. Choose Submit.

Wait approximately a minute to see the banner notification that SageMaker Studio is ready.

You now create a Studio user within the domain.

  1. Choose Add user.
  2. Give the user a name (for example, studio-user).
  3. Choose the role you just created, AmazonSageMaker-ExecutionRole-<timestamp when the role was created>.
  4. Choose Submit.

This concludes the initial SageMaker Studio resource creation. You now have a Studio domain and user ready for use and can proceed with creating and using a notebook.

Installing a Python package onto the SageMaker Studio notebook

To start using the PyPI server from the SageMaker Studio notebook, complete the following steps:

  1. On the SageMaker Studio Control Panel, choose Open Studio next to the user name.
  2. Wait for your Studio environment to load.

You can now see the Studio UI. For more information, see the Amazon SageMaker Studio UI Overview.

  1. Use the default SageMaker JumpStart Data Science image and create a new Notebook Python 3.
  2. Wait a few minutes for the image to launch and your notebook to be available.

If you try to run a command before the notebook is available, you get the message: Note: The kernel is still starting. Please execute this cell again after the kernel is started. After your image has launched, you see it listed under Kernel Sessions, along with items for Running Instances and Running Apps. The kernel runs within the app, and the app runs on the instance.

Now you’re ready to configure your notebook. The first step is pip configuration, so that when you install a package using pip, your notebook searches for the package on the private PyPI server instead of through the public internet at pypi.org.

  1. Run the following command in a notebook cell, substituting your EC2 instance’s private IP address:
    !printf '[global]nindex-url = http://x.x.x.x/simplentrusted-host = x.x.x.x'| sudo tee /etc/pip.conf

  1. To check that the file was successfully written, run the following command:
    !head /etc/pip.conf

Now you’re ready to install Python packages from your server.

  1. To see that AWS Data Wrangler isn’t installed by default, try to import it with the command:
    import awswrangler

  1. Install the package and append to your Python path:
    !pip install awswrangler
    import sys
    sys.path.append('/home/sagemaker-user/.local/lib/python3.7/site-packages')

The library was installed from your private server’s index, as you specified in the pip config file, http://{EC2-IP}/simple.

The library was installed from our private server’s index, as you specified in the pip config file,

  1. Now that the package has been installed, you can import the package smoothly:
    import awswrangler

    Now that the package has been installed, you can import the package smoothly:

Now your notebook is ready for development, including installation of the Python libraries of your choice! Moreover, your PyPI server remains operational and available even when you delete your notebooks or use multiple notebooks. Your PyPI server is separated from your development environment, giving you freedom to manage your notebook resources in the way that best suits your needs.

Cleaning up

To clean up your resources, complete the following steps:

  1. Shut down the running instance in the SageMaker Studio notebook.
  2. Delete any remaining user’s apps on the SageMaker Studio console, including the default app.
  3. Delete the SageMaker Studio user.
  4. Delete Studio in the SageMaker Studio Control Panel.
  5. Stop the EC2 instance.
  6. Terminate the EC2 instance.
  7. Delete the IAM role, VPC endpoint, studio-SG security group, and Amazon Elastic File System (EFS) file system.
  8. Delete the rules in the inbound and outbound NFS security groups.
  9. Delete the security groups.

Conclusion

This post demonstrated how to get started with SageMaker Studio in VPC-only mode, while retaining the ability to install Python packages by hosting a private PyPI server. Now you can move forward with your ML development in notebooks residing within this secure environment.

We invite you to explore other exciting applications of SageMaker Studio, including Amazon SageMaker Experiments and scheduling notebooks on SageMaker ephemeral instances.


About the Author

Julia Kroll is a Data & Machine Learning Engineer for AWS Professional Services. She works with enterprise and public sector customers to build data lake, analytics, and machine learning solutions.

Read More

Artificial intelligence and machine learning continues at AWS re:Invent

A fresh new year is here, and we wish you all a wonderful 2021. We signed off last year at AWS re:Invent on the artificial intelligence (AI) and machine learning (ML) track with the first ever machine learning keynote and over 50 AI/ML focused technical sessions covering industries, use cases, applications, and more. You can access all the content for the AI/ML track on the AWS re:Invent website. But, the exciting news is we’re not done yet. We’re kicking off 2021 by bringing you even more content for AI and ML through a set of new sessions that you can stream live starting Jan 12, 2021. Each session will be offered multiple times, so you can find the time that works best for your location and schedule.

And of course, AWS re:Invent is free. Register now if you have not already and build your schedule from the complete agenda. Here are some sample sessions from the AI/ML track that will stream live starting next week

Here are a few sample sessions that will stream live starting next week.

Customers using AI/ML solutions from AWS

A day in the life of a machine learning data scientist at J P Morgan Chase (AIM319)

Thursday, January 14 – 8 AM to 8:30 AM PST

Thursday, January 14 – 4 PM to 4:30 PM PST

Friday, January 15 – 12 AM to 12:30 AM PST

Learn how data scientists at J P Morgan Chase use custom ML solutions built on top of Amazon SageMaker to gather intelligent insights, while adhering to secure control policies and regulatory requirements.

Streamlining media content with PBS (AIM318)

Wednesday, January 13 – 3 PM to 3:30 PM PST

Wednesday, January 13 – 11 PM to 11:30 PM PST

Thursday, January 14 – 7 AM to 7:30 AM PST

Enhancing the viewer experience by streamlining operational tasks to review, search, and analyze image and video content is a critical factor for the media and entertainment industry. Learn how PBS uses Amazon Rekognition to build relevant features such as deep content search, brand safety, and automated ad insertion to get more out of their content.

Fraud detection with AWS and Coinbase (AIM320)

Thursday, January 14 – 10:15 AM to 10:45 AM PST

Thursday, January 14 – 6:15 PM to 6:45 PM PST

Friday, January 15 – 2:15 AM to 2:45 AM PST

Among many use cases, ML helps mitigate a universally expensive problem: fraud. Join AWS and Coinbase to learn how to detect fraud faster using sample datasets and architectures, and help save millions of dollars for your organization.

Autonomous vehicle solutions with Lyft (AIM315)

Wednesday, January 13 – 2 PM to 2:30 PM PST

Wednesday, January 13 – 10 PM to 10:30 PM PST

Thursday, January 14 – 6 AM to 6:30 AM PST

In this session, we discuss how computer vision models are labeled and trained at Lyft using Amazon SageMaker Ground Truth for visual perception tasks that are critical for autonomous driving systems.

Modernize your contact center with AWS Contact Center Intelligence (CCI) (AIM214)

Tuesday, January 12 – 1:15 PM to 1:45 PM PST

Tuesday, January 12 – 9:15 PM to 9:45 PM PST

Wednesday, January 13 – 5:15 AM to 5:45 AM PST

Improve the customer experience with reduced costs using AWS Contact Center Intelligence (CCI) solutions. You will hear from SuccessKPI, an AWS partner, on how they use CCI solutions to solve business problems such as improving agent effectiveness and automating quality management in enterprise contact centers.

Machine learning concepts with AWS

Consistent and portable environments with containers (AIM317)

Wednesday, January 13 – 8:45 AM to 9:15 AM PST

Wednesday, January 13 – 4:45 PM to 5:15 PM PST
Thursday, January 14 – 12:45 AM to 1:15 AM PST

Learn how to build consistent and portable ML environments using containers with AWS services such as Amazon SageMaker and Amazon Elastic Kubernetes Service (Amazon EKS) across multiple deployment clusters. This session will help you build these environments with ease and at scale in the midst of the ever-growing list of open-source frameworks and tools.

Achieve real-time inference at scale with Deep Java Library (AIM410)

Thursday, January 14 – 3:30 PM to 4 PM PST

Thursday, January 14 – 11:30 PM to 12 AM PST

Friday, January 15 – 7:30 AM to 8 AM PST

Deep Java Library (DJL) from AWS helps you build ML applications without needing to learn a new language. Learn how to use DJL and deploy models including BERT in the DJL model zoo to achieve real-time inference at scale.

Don’t miss out on all the action. We look forward to seeing you on the artificial intelligence and machine learning track. Please see the re:Invent agenda for more details and to build your schedule.


About the Author

Shyam Srinivasan is on the AWS Machine Learning marketing team. He cares about making the world a better place through technology and loves being part of this journey. In his spare time, Shyam likes to run long distances, travel around the world, and experience new cultures with family and friends.

Read More