MUSIQ: Assessing Image Aesthetic and Technical Quality with Multi-scale Transformers

MUSIQ: Assessing Image Aesthetic and Technical Quality with Multi-scale Transformers

Understanding the aesthetic and technical quality of images is important for providing a better user visual experience. Image quality assessment (IQA) uses models to build a bridge between an image and a user’s subjective perception of its quality. In the deep learning era, many IQA approaches, such as NIMA, have achieved success by leveraging the power of convolutional neural networks (CNNs). However, CNN-based IQA models are often constrained by the fixed-size input requirement in batch training, i.e., the input images need to be resized or cropped to a fixed size shape. This preprocessing is problematic for IQA because images can have very different aspect ratios and resolutions. Resizing and cropping can impact image composition or introduce distortions, thus changing the quality of the image.

In CNN-based models, images need to be resized or cropped to a fixed shape for batch training. However, such preprocessing can alter the image aspect ratio and composition, thus impacting image quality. Original image used under CC BY 2.0 license.

In “MUSIQ: Multi-scale Image Quality Transformer”, published at ICCV 2021, we propose a patch-based multi-scale image quality transformer (MUSIQ) to bypass the CNN constraints on fixed input size and predict the image quality effectively on native-resolution images. The MUSIQ model supports the processing of full-size image inputs with varying aspect ratios and resolutions and allows multi-scale feature extraction to capture image quality at different granularities. To support positional encoding in the multi-scale representation, we propose a novel hash-based 2D spatial embedding combined with an embedding that captures the image scaling. We apply MUSIQ on four large-scale IQA datasets, demonstrating consistent state-of-the-art results across three technical quality datasets (PaQ-2-PiQ, KonIQ-10k, and SPAQ) and comparable performance to that of state-of-the-art models on the aesthetic quality dataset AVA.

The patch-based MUSIQ model can process the full-size image and extract multi-scale features, which better aligns with a person’s typical visual response.

In the following figure, we show a sample of images, their MUSIQ score, and their mean opinion score (MOS) from multiple human raters in the brackets. The range of the score is from 0 to 100, with 100 being the highest perceived quality. As we can see from the figure, MUSIQ predicts high scores for images with high aesthetic quality and high technical quality, and it predicts low scores for images that are not aesthetically pleasing (low aesthetic quality) or that contain visible distortions (low technical quality).

High quality
76.10 [74.36] 69.29 [70.92]
     
Low aesthetics quality
55.37 [53.18] 32.50 [35.47]
     
Low technical quality
14.93 [14.38] 15.24 [11.86]
Predicted MUSIQ score (and ground truth) on images from the KonIQ-10k dataset. Top: MUSIQ predicts high scores for high quality images. Middle: MUSIQ predicts low scores for images with low aesthetic quality, such as images with poor composition or lighting. Bottom: MUSIQ predicts low scores for images with low technical quality, such as images with visible distortion artifacts (e.g., blurry, noisy).

The Multi-scale Image Quality Transformer

MUSIQ tackles the challenge of learning IQA on full-size images. Unlike CNN-models that are often constrained to fixed resolution, MUSIQ can handle inputs with arbitrary aspect ratios and resolutions.

To accomplish this, we first make a multi-scale representation of the input image, containing the native resolution image and its resized variants. To preserve the image composition, we maintain its aspect ratio during resizing. After obtaining the pyramid of images, we then partition the images at different scales into fixed-size patches that are fed into the model.

Illustration of the multi-scale image representation in MUSIQ.

Since patches are from images of varying resolutions, we need to effectively encode the multi-aspect-ratio multi-scale input into a sequence of tokens, capturing both the pixel, spatial, and scale information. To achieve this, we design three encoding components in MUSIQ, including: 1) a patch encoding module to encode patches extracted from the multi-scale representation; 2) a novel hash-based spatial embedding module to encode the 2D spatial position for each patch; and 3) a learnable scale embedding to encode different scales. In this way, we can effectively encode the multi-scale input as a sequence of tokens, serving as the input to the Transformer encoder.

To predict the final image quality score, we use the standard approach of prepending an additional learnable “classification token” (CLS). The CLS token state at the output of the Transformer encoder serves as the final image representation. We then add a fully connected layer on top to predict the IQS. The figure below provides an overview of the MUSIQ model.

Overview of MUSIQ. The multi-scale multi-resolution input will be encoded by three components: the scale embedding (SCE), the hash-based 2D spatial embedding (HSE), and the multi-scale patch embedding (MPE).

Since MUSIQ only changes the input encoding, it is compatible with any Transformer variants. To demonstrate the effectiveness of the proposed method, in our experiments we use the classic Transformer with a relatively lightweight setting so that the model size is comparable to ResNet-50.

Benchmark and Evaluation

To evaluate MUSIQ, we run experiments on multiple large-scale IQA datasets. On each dataset, we report the Spearman’s rank correlation coefficient (SRCC) and Pearson linear correlation coefficient (PLCC) between our model prediction and the human evaluators’ mean opinion score. SRCC and PLCC are correlation metrics ranging from -1 to 1. Higher PLCC and SRCC means better alignment between model prediction and human evaluation. The graph below shows that MUSIQ outperforms other methods on PaQ-2-PiQ, KonIQ-10k, and SPAQ.

Performance comparison of MUSIQ and previous state-of-the-art (SOTA) methods on four large-scale IQA datasets. On each dataset we compare the Spearman’s rank correlation coefficient (SRCC) and Pearson linear correlation coefficient (PLCC) of model prediction and ground truth.

Notably, the PaQ-2-PiQ test set is entirely composed of large pictures having at least one dimension exceeding 640 pixels. This is very challenging for traditional deep learning approaches, which require resizing. MUSIQ can outperform previous methods by a large margin on the full-size test set, which verifies its robustness and effectiveness.

It is also worth mentioning that previous CNN-based methods often required sampling as many as 20 crops for each image during testing. This kind of multi-crop ensemble is a way to mitigate the fixed shape constraint in the CNN models. But since each crop is only a sub-view of the whole image, the ensemble is still an approximate approach. Moreover, CNN-based methods both add additional inference cost for every crop and, because they sample different crops, they can introduce randomness in the result. In contrast, because MUSIQ takes the full-size image as input, it can directly learn the best aggregation of information across the full image and it only needs to run the inference once.

To further verify that the MUSIQ model captures different information at different scales, we visualize the attention weights on each image at different scales.

Attention visualization from the output tokens to the multi-scale representation, including the original resolution image and two proportionally resized images. Brighter areas indicate higher attention, which means that those areas are more important for the model output. Images for illustration are taken from the AVA dataset.

We observe that MUSIQ tends to focus on more detailed areas in the full, high-resolution images and on more global areas on the resized ones. For example, for the flower photo above, the model’s attention on the original image is focusing on the pedal details, and the attention shifts to the buds at lower resolutions. This shows that the model learns to capture image quality at different granularities.

Conclusion

We propose a multi-scale image quality transformer (MUSIQ), which can handle full-size image input with varying resolutions and aspect ratios. By transforming the input image to a multi-scale representation with both global and local views, the model can capture the image quality at different granularities. Although MUSIQ is designed for IQA, it can be applied to other scenarios where task labels are sensitive to image resolution and aspect ratio. The MUSIQ model and checkpoints are available at our GitHub repository.

Acknowledgements

This work is made possible through a collaboration spanning several teams across Google. We’d like to acknowledge contributions from Qifei Wang, Yilin Wang and Peyman Milanfar.

Read More

Get in Touch With New Mobile Gaming Controls on GeForce NOW

GeForce NOW expands touch control support to 13 more games this GFN Thursday. That means it’s easier than ever to take PC gaming on the go using mobile devices and tablets. The new “Mobile Touch Controls” row in the GeForce NOW app is the easiest way for members to find which games put the action right at their fingertips.

For a new way to play, members can soon experience these enhanced mobile games and more streaming on the newly announced Razer Edge 5G handheld gaming device.

And since GFN Thursday means more games every week, get ready for eight new titles in the GeForce NOW library, including A Plague Tale: Requiem.

Plus, the latest GeForce NOW Android app update is rolling out now, adding Adaptive VSync support in select games to improve frame stuttering and screen tearing.

Victory at Your Fingertips

Gamers on the go, rejoice! Enhanced mobile touch controls are now available for more than a dozen additional GeForce NOW games when playing on mobile devices and tablets.

Mobile Touch Row GeForce NOW
Make your gaming mobile with the new row of touch-control titles on the cloud.

These games join Fortnite and Genshin Impact as touch-enabled titles in the GeForce NOW library, removing the need to bring a controller when away from your battlestation.

Here’s the full list of games with touch-control support streaming on GeForce NOW on mobile devices and tablets:

Mobile and Tablet

Tablet Only

To get right into gaming, use the new “Mobile Touch Controls” row in the GeForce NOW app to find your next adventure.

The Razer Edge of Glory

Announced last week at RazerCon, the new Razer Edge 5G handheld device launches in January 2023 with the GeForce NOW app installed right out of the box.

Razer Edge GeForce NOW
Stunning visuals, console-quality control and 1,400+ games through GeForce NOW.

The Razer Edge 5G is a dedicated 5G console, featuring a 6.8-inch AMOLED touchscreen display that pushes up to a 144Hz refresh rate at 1080p — perfect for GeForce NOW RTX 3080 members who can stream at ultra-low latency and 120 frames per second.

The Razer Edge 5G is powered by the latest Snapdragon G3x Gen 1 Gaming Platform and runs on Verizon 5G Ultra Wideband. With a beautiful screen and full connectivity, gamers will have another great way to stream their PC gaming libraries from Steam, Epic, Ubisoft, Origin and more using GeForce NOW. Members can reserve the upcoming Razer Edge 5G ahead of its January 2023 release.

Razer’s new handheld joins a giant list of devices that support GeForce NOW, including PCs, Macs, Chromebooks, iOS Safari, Android mobile and TV devices, and NVIDIA SHIELD TV.

Members can also stream their PC libraries on the Logitech Cloud G handheld and Cloud Gaming Chromebooks from Asus, Acer and Lenovo, all available beginning this week.

Oh, Look – More Games!

That’s not all — every GFN Thursday brings a new pack of games.

A Plague Tale Requiem on GeForce NOW
A heart-wrenching tale continues.

Start a new adventure with the newly released A Plague Tale: Requiem, part of eight new titles streaming this week. 

  • A Plague Tale: Requiem (New release on Steam and Epic Games)
  • Batora – Lost Haven (New release on Steam, Oct. 20)
  • Warhammer 40,000: Shootas, Blood & Teef (New release on Steam and Epic Games, Oct. 20)
  • The Tenants (New release on Steam, Oct. 20)
  • FAITH: The Unholy Trinity (New release on Steam, Oct. 21)
  • Evoland Legendary Edition (Free on Epic Games, Oct. 20-27)
  • Commandos 3 – HD Remaster (Steam and Epic Games)
  • Monster Outbreak (Steam and Epic Games)

How are you making your gaming mobile? Let us know what device you’d take on a trip with you on Twitter or in the comments below.

The post Get in Touch With New Mobile Gaming Controls on GeForce NOW appeared first on NVIDIA Blog.

Read More

Digital transformation with Google Cloud

We’ve partnered with Google Cloud over the last few years to apply our AI research for making a positive impact on core solutions used by their customers. Here, we introduce a few of these projects, including optimising document understanding, enhancing the value of wind energy, and offering easier use of AlphaFold.Read More

How Tarteel Uses AI to Help Arabic Learners Perfect Their Pronunciation

There are some 1.8 billion Muslims, but only 16% or so of them speak Arabic, the language of the Quran.

This is in part due to the fact that many Muslims struggle to find qualified instructors to give them feedback on their Quran recitation.

Enter today’s guest and his company Tarteel, a member of the NVIDIA Inception program for startups.

Tarteel was founded with the mission of strengthening the relationship Muslims have with the Quran.

The company is accomplishing this with a fusion of Islamic principles and cutting-edge technology.

AI Podcast host Noah Kravitz spoke with Tarteel CEO Anas Abou Allaban, to learn more.

You Might Also Like

Artem Cherkasov and Olexandr Isayev on Democratizing Drug Discovery With NVIDIA GPUs

It may seem intuitive that AI and deep learning can speed up workflows — including novel drug discovery, a typically yearslong and several-billion-dollar endeavor. However, there is a dearth of recent research reviewing how accelerated computing can impact the process. Professors Artem Cherkasov and Olexandr Isayev discuss how GPUs can help democratize drug discovery.

Lending a Helping Hand: Jules Anh Tuan Nguyen on Building a Neuroprosthetic

Is it possible to manipulate things with your mind? Possibly. University of Minnesota postdoctoral researcher Jules Anh Tuan Nguyen discusses allowing amputees to control their prosthetic limbs with their thoughts, using neural decoders and deep learning.

Wild Things: 3D Reconstructions of Endangered Species With NVIDIA’s Sifei Liu

Studying endangered species can be difficult, as they’re elusive, and the act of observing them can disrupt their lives. Sifei Liu, a senior research scientist at NVIDIA, discusses how scientists can avoid these pitfalls by studying AI-generated 3D representations of these endangered species.

Subscribe to the AI Podcast: Now Available on Amazon Music

You can now listen to the AI Podcast through Amazon Music.

Also get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out our listener survey.

The post How Tarteel Uses AI to Help Arabic Learners Perfect Their Pronunciation appeared first on NVIDIA Blog.

Read More

Detect fraudulent transactions using machine learning with Amazon SageMaker

Businesses can lose billions of dollars each year due to malicious users and fraudulent transactions. As more and more business operations move online, fraud and abuses in online systems are also on the rise. To combat online fraud, many businesses have been using rule-based fraud detection systems.

However, traditional fraud detection systems rely on a set of rules and filters hand-crafted by human specialists. The filters can often be brittle and the rules may not capture the full spectrum of fraudulent signals. Furthermore, while fraudulent behaviors are ever-evolving, the static nature of predefined rules and filters makes it difficult to maintain and improve traditional fraud detection systems effectively.

In this post, we show you how to build a dynamic, self-improving, and maintainable credit card fraud detection system with machine learning (ML) using Amazon SageMaker.

Alternatively, if you’re looking for a fully managed service to build customized fraud detection models without writing code, we recommend checking out Amazon Fraud Detector. Amazon Fraud Detector enables customers with no ML experience to automate building fraud detection models customized for their data, leveraging more than 20 years of fraud detection expertise from AWS and Amazon.com.

Solution overview

This solution builds the core of a credit card fraud detection system using SageMaker. We start by training an unsupervised anomaly detection model using the algorithm Random Cut Forest (RCF). Then we train two supervised classification models using the algorithm XGBoost, one as a baseline model and the other for making predictions, using different strategies to address the extreme class imbalance in data. Lastly, we train an optimal XGBoost model with hyperparameter optimization (HPO) to further improve the model performance.

For the sample dataset, we use the public, anonymized credit card transactions dataset that was originally released as part of a research collaboration of Worldline and the Machine Learning Group of ULB (Université Libre de Bruxelles). In the walkthrough, we also discuss how you can customize the solution to use your own data.

The outputs of the solution are as follows:

  • An unsupervised SageMaker RCF model. The model outputs an anomaly score for each transaction. A low score value indicates that the transaction is considered normal (non-fraudulent). A high value indicates that the transaction is fraudulent. The definitions of low and high depend on the application, but common practice suggests that scores beyond three standard deviations from the mean score are considered anomalous.
  • A supervised SageMaker XGBoost model trained using its built-in weighting schema to address the highly unbalanced data issue.
  • A supervised SageMaker XGBoost model trained using the Sythetic Minority Over-sampling Technique (SMOTE).
  • A trained SageMaker XGBoost model with HPO.
  • Predictions of the probability for each transaction being fraudulent. If the estimated probability of a transaction is over a threshold, it’s classified as fraudulent.

To demonstrate how you can use this solution in your existing business infrastructures, we also include an example of making REST API calls to the deployed model endpoint, using AWS Lambda to trigger both the RCF and XGBoost models.

The following diagram illustrates the solution architecture.

Architecture diagram

Prerequisites

To try out the solution in your own account, make sure that you have the following in place:

When the Studio instance is ready, you can launch Studio and access JumpStart. JumpStart solutions are not available in SageMaker notebook instances, and you can’t access them through SageMaker APIs or the AWS Command Line Interface (AWS CLI).

Launch the solution

To launch the solution, complete the following steps:

  1. Open JumpStart by using the JumpStart launcher in the Get Started section or by choosing the JumpStart icon in the left sidebar.
  2. Under Solutions, choose Detect Malicious Users and Transactions to open the solution in another Studio tab.
    Find the solution
  3. On the solution tab, choose Launch to launch the solution.
    Launch the solution
    The solution resources are provisioned and another tab opens showing the deployment progress. When the deployment is finished, an Open Notebook button appears.
  4. Choose Open Notebook to open the solution notebook in Studio.
    Open notebook

Investigate and process the data

The default dataset contains only numerical features, because the original features have been transformed using Principal Component Analysis (PCA) to protect user privacy. As a result, the dataset contains 28 PCA components, V1–V28, and two features that haven’t been transformed, Amount and Time. Amount refers to the transaction amount, and Time is the seconds elapsed between any transaction in the data and the first transaction.

The Class column corresponds to whether or not a transaction is fraudulent.

Sample data

We can see that the majority is non-fraudulent, because out of the total 284,807 examples, only 492 (0.173%) are fraudulent. This is a case of extreme class imbalance, which is common in fraud detection scenarios.

Data class imbalance

We then prepare our data for loading and training. We split the data into a train set and a test set, using the former to train and the latter to evaluate the performance of our model. It’s important to split the data before applying any techniques to alleviate the class imbalance. Otherwise, we might leak information from the test set into the train set and hurt the model’s performance.

If you want to bring in your own training data, make sure that it’s tabular data in CSV format, upload the data to an Amazon Simple Storage Service (Amazon S3) bucket, and edit the S3 object path in the notebook code.

Data path in S3

If your data includes categorical columns with non-numerical values, you need to one-hot encode these values (using, for example, sklearn’s OneHotEncoder) because the XGBoost algorithm only supports numerical data.

Train an unsupervised Random Cut Forest model

In a fraud detection scenario, we commonly have very few labeled examples, and labeling fraud can take a lot of time and effort. Therefore, we also want to extract information from the unlabeled data at hand. We do this using an anomaly detection algorithm, taking advantage of the high data imbalance that is common in fraud detection datasets.

Anomaly detection is a form of unsupervised learning where we try to identify anomalous examples based solely on their feature characteristics. Random Cut Forest is a state-of-the-art anomaly detection algorithm that is both accurate and scalable. With each data example, RCF associates an anomaly score.

We use the SageMaker built-in RCF algorithm to train an anomaly detection model on our training dataset, then make predictions on our test dataset.

First, we examine and plot the predicted anomaly scores for positive (fraudulent) and negative (non-fraudulent) examples separately, because the numbers of positive and negative examples differ significantly. We expect the positive (fraudulent) examples to have relatively high anomaly scores, and the negative (non-fraudulent) ones to have low anomaly scores. From the histograms, we can see the following patterns:

  • Almost half of the positive examples (left histogram) have anomaly scores higher than 0.9, whereas most of the negative examples (right histogram) have anomaly scores lower than 0.85.
  • The unsupervised learning algorithm RCF has limitations to identify fraudulent and non-fraudulent examples accurately. This is because no label information is used. We address this issue by collecting label information and using a supervised learning algorithm in later steps.

Predicted anomaly scores

Then, we assume a more real-world scenario where we classify each test example as either positive (fraudulent) or negative (non-fraudulent) based on its anomaly score. We plot the score histogram for all test examples as follows, choosing a cutoff score of 1.0 (based on the pattern shown in the histogram) for classification. Specifically, if an example’s anomaly score is less than or equal to 1.0, it’s classified as negative (non-fraudulent). Otherwise, the example is classified as positive (fraudulent).

Histogram of scores for test samples

Lastly, we compare the classification result with the ground truth labels and compute the evaluation metrics. Because our dataset is imbalanced, we use the evaluation metrics balanced accuracy, Cohen’s Kappa score, F1 score, and ROC AUC, because they take into account the frequency of each class in the data. For all of these metrics, a larger value indicates a better predictive performance. Note that in this step we can’t compute the ROC AUC yet, because there is no estimated probability for positive and negative classes from the RCF model on each example. We compute this metric in later steps using supervised learning algorithms.

. RCF
Balanced accuracy 0.560023
Cohen’s Kappa 0.003917
F1 0.007082
ROC AUC

From this step, we can see that the unsupervised model can already achieve some separation between the classes, with higher anomaly scores correlated with fraudulent examples.

Train an XGBoost model with the built-in weighting schema

After we’ve gathered an adequate amount of labeled training data, we can use a supervised learning algorithm to discover relationships between the features and the classes. We choose the XGBoost algorithm because it has a proven track record, is highly scalable, and can deal with missing data. We need to handle the data imbalance this time, otherwise the majority class (the non-fraudulent, or negative examples) will dominate the learning.

We train and deploy our first supervised model using the SageMaker built-in XGBoost algorithm container. This is our baseline model. To handle the data imbalance, we use the hyperparameter scale_pos_weight, which scales the weights of the positive class examples against the negative class examples. Because the dataset is highly skewed, we set this hyperparameter to a conservative value: sqrt(num_nonfraud/num_fraud).

We train and deploy the model as follows:

  1. Retrieve the SageMaker XGBoost container URI.
  2. Set the hyperparameters we want to use for the model training, including the one we mentioned that handles data imbalance, scale_pos_weight.
  3. Create an XGBoost estimator and train it with our train dataset.
  4. Deploy the trained XGBoost model to a SageMaker managed endpoint.
  5. Evaluate this baseline model with our test dataset.

Then we evaluate our model with the same four metrics as mentioned in the last step. This time we can also calculate the ROC AUC metric.

. RCF XGBoost
Balanced accuracy 0.560023 0.847685
Cohen’s Kappa 0.003917 0.743801
F1 0.007082 0.744186
ROC AUC 0.983515

We can see that a supervised learning method XGBoost with the weighting schema (using the hyperparameter scale_pos_weight) achieves significantly better performance than the unsupervised learning method RCF. There is still room to improve the performance, however. In particular, raising the Cohen’s Kappa score above 0.8 would be generally very favorable.

Apart from single-value metrics, it’s also useful to look at metrics that indicate performance per class. For example, the confusion matrix, per-class precision, recall, and F1-score can provide more information about our model’s performance.

XGBoost model's confusion matrix

. precision recall f1-score support
non-fraud 1.00 1.00 1.00 28435
fraud 0.80 0.70 0.74 46

Keep sending test traffic to the endpoint via Lambda

To demonstrate how to use our models in a production system, we built a REST API with Amazon API Gateway and a Lambda function. When client applications send HTTP inference requests to the REST API, which triggers the Lambda function, which in turn invokes the RCF and XGBoost model endpoints and returns the predictions from the models. You can read the Lambda function code and monitor the invocations on the Lambda console.

We also created a Python script that makes HTTP inference requests to the REST API, with our test data as input data. To see how this was done, check the generate_endpoint_traffic.py file in the solution’s source code. The prediction outputs are logged to an S3 bucket through an Amazon Kinesis Data Firehose delivery stream. You can find the destination S3 bucket name on the Kinesis Data Firehose console, and check the prediction results in the S3 bucket.

Train an XGBoost model with the over-sampling technique SMOTE

Now that we have a baseline model using XGBoost, we can see if sampling techniques that are designed specifically for imbalanced problems can improve the performance of the model. We use Sythetic Minority Over-sampling (SMOTE), which oversamples the minority class by interpolating new data points between existing ones.

The steps are as follows:

  1. Use SMOTE to oversample the minority class (the fraudulent class) of our train dataset. SMOTE oversamples the minority class from about 0.17–50%. Note that this is a case of extreme oversampling of the minority class. An alternative would be to use a smaller resampling ratio, such as having one minority class sample for every sqrt(non_fraud/fraud) majority sample, or using more advanced resampling techniques. For more over-sampling options, refer to Compare over-sampling samplers.
  2. Define the hyperparameters for training the second XGBoost so that scale_pos_weight is removed and the other hyperparameters remain the same as when training the baseline XGBoost model. We don’t need to handle data imbalance with this hyperparameter anymore, because we’ve already done that with SMOTE.
  3. Train the second XGBoost model with the new hyperparameters on the SMOTE processed train dataset.
  4. Deploy the new XGBoost model to a SageMaker managed endpoint.
  5. Evaluate the new model with the test dataset.

When evaluating the new model, we can see that with SMOTE, XGBoost achieves a better performance on balanced accuracy, but not on Cohen’s Kappa and F1 scores. The reason for this is that SMOTE has oversampled the fraud class so much that it’s increased its overlap in feature space with the non-fraud cases. Because Cohen’s Kappa gives more weight to false positives than balanced accuracy does, the metric drops significantly, as does the precision and F1 score for fraud cases.

. RCF XGBoost XGBoost SMOTE
Balanced accuracy 0.560023 0.847685 0.912657
Cohen’s Kappa 0.003917 0.743801 0.716463
F1 0.007082 0.744186 0.716981
ROC AUC 0.983515 0.967497

However, we can bring back the balance between metrics by adjusting the classification threshold. So far, we’ve been using 0.5 as the threshold to label whether or not a data point is fraudulent. After experimenting different thresholds from 0.1–0.9, we can see that Cohen’s Kappa keeps increasing along with the threshold, without a significant loss in balanced accuracy.

Experiment different thresholds to bring back the balance between metrics

This adds a useful calibration to our model. We can use a low threshold if not missing any fraudulent cases (false negatives) is our priority, or we can increase the threshold to minimize the number of false positives.

Train an optimal XGBoost model with HPO

In this step, we demonstrate how to improve model performance by training our third XGBoost model with hyperparameter optimization. When building complex ML systems, manually exploring all possible combinations of hyperparameter values is impractical. The HPO feature in SageMaker can accelerate your productivity by trying many variations of a model on your behalf. It automatically looks for the best model by focusing on the most promising combinations of hyperparameter values within the ranges that you specify.

The HPO process needs a validation dataset, so we first further split our training data into training and validation datasets using stratified sampling. To tackle the data imbalance problem, we use XGBoost’s weighting schema again, setting the scale_pos_weight hyperparameter to sqrt(num_nonfraud/num_fraud).

We create an XGBoost estimator using the SageMaker built-in XGBoost algorithm container, and specify the objective evaluation metric and the hyperparameter ranges within which we’d like to experiment. With these we then create a HyperparameterTuner and kick off the HPO tuning job, which trains multiple models in parallel, looking for optimal hyperparameter combinations.

When the tuning job is complete, we can see its analytics report and inspect each model’s hyperparameters, training job information, and its performance against the objective evaluation metric.

List of each model's information from the tuning job

Then we deploy the best model and evaluate it with our test dataset.

Evaluate and compare all model performance on the same test data

Now we have the evaluation results from all four models: RCF, XGBoost baseline, XGBoost with SMOTE, and XGBoost with HPO. Let’s compare their performance.

. RCF XGBoost XGBoost with SMOTE XGBoost with HPO
Balanced accuracy 0.560023 0.847685 0.912657 0.902156
Cohen’s Kappa 0.003917 0.743801 0.716463 0.880778
F1 0.007082 0.744186 0.716981 0.880952
ROC AUC 0.983515 0.967497 0.981564

We can see that XGBoost with HPO achieves even better performance than that with the SMOTE method. In particular, Cohen’s Kappa scores and F1 are over 0.8, indicating an optimal model performance.

Clean up

When you’re finished with this solution, make sure that you delete all unwanted AWS resources to avoid incurring unintended charges. In the Delete solution section on your solution tab, choose Delete all resources to delete resources automatically created when launching this solution.

Clean up by deleting the solution

Alternatively, you can use AWS CloudFormation to delete all standard resources automatically created by the solution and notebook. To use this approach, on the AWS CloudFormation console, find the CloudFormation stack whose description contains fraud-detection-using-machine-learning, and delete it. This is a parent stack, and choosing to delete this stack will automatically delete the nested stacks.

Clean up through CloudFormation

With either approach, you still need to manually delete any extra resources that you may have created in this notebook. Some examples include extra S3 buckets (in addition to the solution’s default bucket), extra SageMaker endpoints (using a custom name), and extra Amazon Elastic Container Registry (Amazon ECR) repositories.

Conclusion

In this post, we showed you how to build the core of a dynamic, self-improving, and maintainable credit card fraud detection system using ML with SageMaker. We built, trained, and deployed an unsupervised RCF anomaly detection model, a supervised XGBoost model as the baseline, another supervised XGBoost model with SMOTE to tackle the data imbalance problem, and a final XGBoost model optimized with HPO. We discussed how to handle data imbalance and use your own data in the solution. We also included an example REST API implementation with API Gateway and Lambda to demonstrate how to use the system in your existing business infrastructure.

To try it out yourself, open SageMaker Studio and launch the JumpStart solution. To learn more about the solution, check out its GitHub repository.


About the Authors

Xiaoli ShenXiaoli Shen is a Solutions Architect and Machine Learning Technical Field Community (TFC) member at Amazon Web Services. She’s focused on helping customers architecting on the cloud and leveraging AWS services to derive business value. Prior to joining AWS, she was a tech lead and senior full-stack engineer building data-intensive distributed systems on the cloud.

Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.

Vedant Jain is a Sr. AI/ML Specialist Solutions Architect, helping customers derive value out of the Machine Learning ecosystem at AWS. Prior to joining AWS, Vedant has held ML/Data Science Specialty positions at various companies such as Databricks, Hortonworks (now Cloudera) & JP Morgan Chase. Outside of his work, Vedant is passionate about making music, using Science to lead a meaningful life & exploring delicious vegetarian cuisine from around the world.

Read More

Do Modern ImageNet Classifiers Accurately Predict Perceptual Similarity?

Do Modern ImageNet Classifiers Accurately Predict Perceptual Similarity?

The task of determining the similarity between images is an open problem in computer vision and is crucial for evaluating the realism of machine-generated images. Though there are a number of straightforward methods of estimating image similarity (e.g., low-level metrics that measure pixel differences, such as FSIM and SSIM), in many cases, the measured similarity differences do not match the differences perceived by a person. However, more recent work has demonstrated that intermediate representations of neural network classifiers, such as AlexNet, VGG and SqueezeNet trained on ImageNet, exhibit perceptual similarity as an emergent property. That is, Euclidean distances between encoded representations of images by ImageNet-trained models correlate much better with a person’s judgment of differences between images than estimating perceptual similarity directly from image pixels.

Two sets of sample images from the BAPPS dataset. Trained networks agree more with human judgements as compared to low-level metrics (PSNR, SSIM, FSIM). Image source: Zhang et al. (2018).

In “Do better ImageNet classifiers assess perceptual similarity better?” published in Transactions on Machine Learning Research, we contribute an extensive experimental study on the relationship between the accuracy of ImageNet classifiers and their emergent ability to capture perceptual similarity. To evaluate this emergent ability, we follow previous work in measuring the perceptual scores (PS), which is roughly the correlation between human preferences to that of a model for image similarity on the BAPPS dataset. While prior work studied the first generation of ImageNet classifiers, such as AlexNet, SqueezeNet and VGG, we significantly increase the scope of the analysis incorporating modern classifiers, such as ResNets and Vision Transformers (ViTs), across a wide range of hyper-parameters.

Relationship Between Accuracy and Perceptual Similarity

It is well established that features learned via training on ImageNet transfer well to a number of downstream tasks, making ImageNet pre-training a standard recipe. Further, better accuracy on ImageNet usually implies better performance on a diverse set of downstream tasks, such as robustness to common corruptions, out-of-distribution generalization and transfer learning on smaller classification datasets. Contrary to prevailing evidence that suggests models with high validation accuracies on ImageNet are likely to transfer better to other tasks, surprisingly, we find that representations from underfit ImageNet models with modest validation accuracies achieve the best perceptual scores.

Plot of perceptual scores (PS) on the 64 × 64 BAPPS dataset (y-axis) against the ImageNet 64 × 64 validation accuracies (x-axis). Each blue dot represents an ImageNet classifier. Better ImageNet classifiers achieve better PS up to a certain point (dark blue), beyond which improving the accuracy lowers the PS. The best PS are attained by classifiers with moderate accuracy (20.0–40.0).

<!–

Plot of perceptual scores (PS) on the 64 × 64 BAPPS Dataset (y-axis) against the ImageNet 64 × 64 validation accuracies (x-axis). Each blue dot represents an ImageNet classifier. Better ImageNet classifiers achieve better PS up to a certain point (dark blue), beyond which improving the accuracy lowers the PS. The best PS are attained by classifiers with moderate accuracy (20.0–40.0).

–>

We study the variation of perceptual scores as a function of neural network hyperparameters: width, depth, number of training steps, weight decay, label smoothing and dropout. For each hyperparameter, there exists an optimal accuracy up to which improving accuracy improves PS. This optimum is fairly low and is attained quite early in the hyperparameter sweep. Beyond this point, improved classifier accuracy corresponds to worse PS.

As illustration, we present the variation of PS with respect to two hyperparameters: training steps in ResNets and width in ViTs. The PS of ResNet-50 and ResNet-200 peak very early at the first few epochs of training. After the peak, PS of better classifiers decrease more drastically. ResNets are trained with a learning rate schedule that causes a stepwise increase in accuracy as a function of training steps. Interestingly, after the peak, they also exhibit a step-wise decrease in PS that matches this step-wise accuracy increase.

Early-stopped ResNets attain the best PS across different depths of 6, 50 and 200.

ViTs consist of a stack of transformer blocks applied to the input image. The width of a ViT model is the number of output neurons of a single transformer block. Increasing its width is an effective way to improve its accuracy. Here, we vary the width of two ViT variants, B/8 and L/4 (i.e., Base and Large ViT models with patch sizes 4 and 8 respectively), and evaluate both the accuracy and PS. Similar to our observations with early-stopped ResNets, narrower ViTs with lower accuracies perform better than the default widths. Surprisingly, the optimal width of ViT-B/8 and ViT-L/4 are 6 and 12% of their default widths. For a more comprehensive list of experiments involving other hyperparameters such as width, depth, number of training steps, weight decay, label smoothing and dropout across both ResNets and ViTs, check out our paper.

Narrow ViTs attain the best PS.

Scaling Down Models Improves Perceptual Scores

Our results prescribe a simple strategy to improve an architecture’s PS: scale down the model to reduce its accuracy until it attains the optimal perceptual score. The table below summarizes the improvements in PS obtained by scaling down each model across every hyperparameter. Except for ViT-L/4, early stopping yields the highest improvement in PS, regardless of architecture. In addition, early stopping is the most efficient strategy as there is no need for an expensive grid search.

Model Default Width Depth Weight
Decay
Central
Crop
Train
Steps
Best
ResNet-6 69.1 +0.4 +0.3 0.0 +0.5 69.6
ResNet-50 68.2 +0.4 +0.7 +0.7 +1.5 69.7
ResNet-200 67.6 +0.2 +1.3 +1.2 +1.9 69.5
ViT B/8 67.6 +1.1 +1.0 +1.3 +0.9 +1.1 68.9
ViT L/4 67.9 +0.4 +0.4 -0.1 -1.1 +0.5 68.4
Perceptual Score improves by scaling down ImageNet models. Each value denotes the improvement obtained by scaling down a model across a given hyperparameter over the model with default hyperparameters.

Global Perceptual Functions

In prior work, the perceptual similarity function was computed using Euclidean distances across the spatial dimensions of the image. This assumes a direct correspondence between pixels, which may not hold for warped, translated or rotated images. Instead, we adopt two perceptual functions that rely on global representations of images, namely the style-loss function from the Neural Style Transfer work that captures stylistic similarity between two images, and a normalized mean pool distance function. The style-loss function compares the inter-channel cross-correlation matrix between two images while the mean pool function compares the spatially averaged global representations.

Global perceptual functions consistently improve PS across both networks trained with default hyperparameters (top) and ResNet-200 as a function of train epochs (bottom).

We probe a number of hypotheses to explain the relationship between accuracy and PS and come away with a few additional insights. For example, the accuracy of models without commonly used skip-connections also inversely correlate with PS, and layers close to the input on average have lower PS as compared to layers close to the output. For further exploration involving distortion sensitivity, ImageNet class granularity, and spatial frequency sensitivity, check out our paper.

Conclusion

In this paper, we explore the question of whether improving classification accuracy yields better perceptual metrics. We study the relationship between accuracy and PS on ResNets and ViTs across many different hyperparameters and observe that PS exhibits an inverse-U relationship with accuracy, where accuracy correlates with PS up to a certain point, and then exhibits an inverse-correlation. Finally, in our paper, we discuss in detail a number of explanations for the observed relationship between accuracy and PS, involving skip connections, global similarity functions, distortion sensitivity, layerwise perceptual scores, spatial frequency sensitivity and ImageNet class granularity. While the exact explanation for the observed tradeoff between ImageNet accuracy and perceptual similarity is a mystery, we are excited that our paper opens the door for further research in this area.

Acknowledgements

This is joint work with Neil Houlsby and Nal Kalchbrenner. We would additionally like to thank Basil Mustafa, Kevin Swersky, Simon Kornblith, Johannes Balle, Mike Mozer, Mohammad Norouzi and Jascha Sohl-Dickstein for useful discussions.

Read More

Implement RStudio on your AWS environment and access your data lake using AWS Lake Formation permissions

R is a popular analytic programming language used by data scientists and analysts to perform data processing, conduct statistical analyses, create data visualizations, and build machine learning (ML) models. RStudio, the integrated development environment for R, provides open-source tools and enterprise-ready professional software for teams to develop and share their work across their organization Building, securing, scaling and maintaining RStudio yourself is, however, tedious and cumbersome.

Implementing the RStudio environment in AWS provides elasticity and scalability that you don’t have when deploying on-prem, eliminating the need of managing that infrastructure.  You can select the desired compute and memory based on processing requirements and can also scale up or down to work with analytical and ML workloads of different sizes without an upfront investment. This lets you quickly experiment with new data sources and code, and roll out new analytics processes and ML models to the rest of the organization. You can also seamlessly integrate your Data Lake resources to make them available to developers and Data Scientists and secure the data by using row-level and column-level access controls from AWS Lake Formation.

This post presents two ways to easily deploy and run RStudio on AWS to access data stored in data lake:

  • Fully managed on Amazon SageMaker
  • Self-hosted on Amazon Elastic Compute Cloud (Amazon EC2)
    • You can choose to deploy the open-source version of RStudio using an EC2 hosted approach that we will also describe in this post. The self-hosted option requires the administrator to create an EC2 instance and install RStudio manually or using a AWS CloudFormation There is also less flexibility for implementing user-access controls in this option since all users have the same access level in this type of implementation.

RStudio on Amazon SageMaker

You can launch RStudio Workbench with a simple click from SageMaker. With SageMaker customers don’t have to bear the operational overhead of building, installing, securing, scaling and maintaining RStudio, they don’t have to pay for the continuously running RStudio Server (if they are using t3.medium) and they only pay for RSession compute when they use it. RStudio users will have flexibility to dynamically scale compute by switching instances on-the-fly. Running RStudio on SageMaker requires an administrator to establish a SageMaker domain and associated user profiles. You also need an appropriate RStudio license

Within SageMaker, you can grant access at the RStudio administrator and RStudio user level, with differing permissions. Only user profiles granted one of these two roles can access RStudio in SageMaker. For more information about administrator tasks for setting up RStudio on SageMaker, refer to Get started with RStudio on Amazon SageMaker. That post also shows the process of selecting EC2 instances for each session, and how the administrator can restrict EC2 instance options for RStudio users.

Fig1: Architecture Diagram showing the interaction of various AWS Services

Fig1: Architecture Diagram showing the interaction of various AWS Services

Use Lake Formation row-level and column-level security access

In addition to allowing your team to launch RStudio sessions on SageMaker, you can also secure the data lake by using row-level and column-level access controls from Lake Formation. For more information, refer to Effective data lakes using AWS Lake Formation, Part 4: Implementing cell-level and row-level security.

Through Lake Formation security controls, you can make sure that each person has the right access to the data in the data lake. Consider the following two user profiles in the SageMaker domain, each with a different execution role:

User Profile Execution Role
rstudiouser-fullaccess AmazonSageMaker-ExecutionRole-FullAccess
rstudiouser-limitedaccess AmazonSageMaker-ExecutionRole-LimitedAccess

The following screenshot shows the rstudiouser-limitedaccess profile details.

Fig 2:  Profile details of rstudiouser-limitedaccess role

Fig 2:  Profile details of rstudiouser-limitedaccess role

The following screenshot shows the rstudiouser-fullaccess profile details.

Fig 3:  Profile details of rstudiouser-fullaccess role

Fig 3:  Profile details of rstudiouser-fullaccess role

The dataset used for this post is a COVID-19 public dataset. The following screenshot shows an example of the data:

Fig4:  COVID-19 Public dataset

Fig4:  COVID-19 Public dataset

After you create the user profile and assign it to the appropriate role, you can access Lake Formation to crawl the data with AWS Glue, create the metadata and table, and grant access to the table data. For the AmazonSageMaker-ExecutionRole-FullAccess role, you grant access to all of the columns in the table, and for AmazonSageMaker-ExecutionRole-LimitedAccess, you grant access using the data filter USA_Filter. We use this filter to provide row-level and cell-level column permissions (see the Resource column in the following screenshot).

Fig5:  AWS Lake Formation Permissions for AmazonSageMaker-ExecutionRole -Full/Limited Access roles

Fig5:  AWS Lake Formation Permissions for AmazonSageMaker-ExecutionRole -Full/Limited Access roles

As shown in the following screenshot, the second role has limited access. Users associated with this role can only access the continent, date, total_cases, total_deaths, new_cases, new_deaths, and iso_codecolumns.

Fig6:  AWS Lake Formation Column-level permissions for AmazonSageMaker-ExecutionRole-Limited Access role

Fig6:  AWS Lake Formation Column-level permissions for AmazonSageMaker-ExecutionRole-Limited Access role

With role permissions attached to each user profile, we can see how Lake Formation enforces the appropriate row-level and column-level permissions. You can open the RStudio Workbench from the Launch app drop-down menu in the created user list, and choose RStudio.

In the following screenshot, we launch the app as the rstudiouser-limitedaccess user.

Fig7: Launching RStudio session for rstudiouser-limitedaccess user from Amazon SageMaker Console

Fig7: Launching RStudio session for rstudiouser-limitedaccess user from Amazon SageMaker Console

You can see the RStudio Workbench home page and a list of sessions, projects, and published content.

Fig8: R Studio Workbench session for rstudiouser-limitedaccess user

Fig8: R Studio Workbench session for rstudiouser-limitedaccess user

Choose a session name to start the session in SageMaker. Install Paws (see guidance earlier in this post) so that you can access the appropriate AWS services. Now you can run a query to pull all of the fields from the dataset via Amazon Athena, using the command “SELECT * FROM "databasename.tablename", and store the query output in an Amazon Simple Storage Service (Amazon S3) bucket.

Fig9: Athena Query execution in R Studio session

Fig9: Athena Query execution in R Studio session

The following screenshot shows the output files in the S3 bucket.

Fig10: Athena Query execution results in Amazon S3 Bucket

Fig10: Athena Query execution results in Amazon S3 Bucket

The following screenshot shows the data in these output files using Amazon S3 Select.

Fig11: Reviewing the output data using Amazon S3 Select

Fig11: Reviewing the output data using Amazon S3 Select

Only USA data and columns continent, date, total_cases, total_deaths, new_cases, new_deaths, and iso_code are shown in the result for the rstudiouser-limitedaccess user.

Let’s repeat the same steps for the rstudiouser-fullaccess user.

Fig12: Launching RStudio session for rstudiouser-fullaccess user from Amazon SageMaker Console

Fig12: Launching RStudio session for rstudiouser-fullaccess user from Amazon SageMaker Console

You can see the RStudio Workbench home page and a list of sessions, projects, and published content.

Fig13: R Studio Workbench session for rstudiouser-fullaccess user

Fig13: R Studio Workbench session for rstudiouser-fullaccess user

Let’s run the same query “SELECT * FROM "databasename.tablename" using Athena.

Fig14: Athena Query execution in R Studio session

Fig14: Athena Query execution in R Studio session

The following screenshot shows the output files in the S3 bucket.

Fig15: Athena Query execution results in Amazon S3 Bucket

Fig15: Athena Query execution results in Amazon S3 Bucket

The following screenshot shows the data in these output files using Amazon S3 Select.

Fig16: Reviewing the output data using Amazon S3 Select

Fig16: Reviewing the output data using Amazon S3 Select

As shown in this example, the rstudiouser-fullaccess user has access to all the columns and rows in the dataset.

Self-Hosted on Amazon EC2

If you want to start experimenting with RStudio’s open-source version on AWS, you can install Rstudio on an EC2 instance. This CloudFormation template provided in this post provisions the EC2 instance and installs RStudio using the user data script. You can run the template multiple times to provision multiple RStudio instances as needed, and you can use it in any AWS Region. After you deploy the CloudFormation template, it provides you with a URL to access RStudio from a web browser. Amazon EC2 enables you to scale up or down to handle changes in data size and the necessary compute capacity to run your analytics.

Create a key-value pair for secure access

AWS uses public-key cryptography to secure the login information for your EC2 instance. You specify the name of the key pair in the KeyPair parameter when you launch the CloudFormation template. Then you can use the same key to log in to the provisioned EC2 instance later if needed.

Before you run the CloudFormation template, make sure that you have the Amazon EC2 key pair in the AWS account that you’re planning to use. If not, then refer to Create a key pair using Amazon EC2 for instructions to create one.

Launch the CloudFormation templateSign in to the CloudFormation console in the us-east-1 Region and choose Launch Stack.

Launch stack button

You must enter several parameters into the CloudFormation template:

  • InitialUser and InitialPassword – The user name and password that you use to log in to the RStudio session. The default values are rstudio and Rstudio@123, respectively.
  • InstanceType – The EC2 instance type on which to deploy the RStudio server. The template currently accepts all instances in the t2, m4, c4, r4, g2, p2, and g3 instance families, and can incorporate other instance families easily. The default value is t2.micro.
  • KeyPair – The key pair you use to log in to the EC2 instance.
  • VpcId and SubnetId – The Amazon Virtual Private Cloud (Amazon VPC) and subnet in which to launch the instance.

After you enter these parameters, deploy the CloudFormation template. When it’s complete, the following resources are available:

  • An EC2 instance with RStudio installed on it.
  • An IAM role with necessary permissions to connect to other AWS services.
  • A security group with rules to open up port 8787 for the RStudio Server.

Log in to RStudio

Now you’re ready to use RStudio! Go to the Outputs tab for the CloudFormation stack and copy the RStudio URL value (it’s in the format http://ec2-XX-XX-XXX-XX.compute-1.amazonaws.com:8787/). Enter that URL in a web browser. This opens your RStudio session, which you can log into using the same user name and password that you provided while running the CloudFormation template.

Access AWS services from RStudio

After you access the RStudio session, you should install the R Package for AWS (Paws). This lets you connect to many AWS services, including the services and resources in your data lake. To install Paws, enter and run the following R code:

install.packages("paws")

To use an AWS service, create a client and access the service’s operations from that client. When accessing AWS APIs, you must provide your credentials and Region. Paws searches for the credentials and Region using the AWS authentication chain:

  • Explicitly provided access key, secret key, session token, profile, or Region
  • R environment variables
  • Operating system environment variables
  • AWS shared credentials and configuration files in .aws/credentials and .aws/config
  • Container IAM role
  • Instance IAM role

Because you’re running on an EC2 instance with an attached IAM role, Paws automatically uses your IAM role credentials to authenticate AWS API requests.

# To interact with an Amazon S3 service, first create an S3 client then list the objects within your bucket by invoking: rstudio-XXXXXXXXXX
s3 <- paws::s3(config = list(region = 'us-east-1'))s3$list_objects(Bucket = "rstudio-XXXXXXXXXX")
# Let’s see how we can interactively query data from your data lake using Amazon Athena.
athena <- paws::athena(config = list(region = 'us-east-1'))
athena$start_query_execution(QueryString = "SELECT * FROM "databasename.tablename" limit 10;",QueryExecutionContext = list(Database = "databasename", Catalog = "catalogname"),ResultConfiguration = list(OutputLocation = "S3 Bucket",EncryptionConfiguration = list(EncryptionOption = "SSE_S3")), WorkGroup = "workgroup name")
$QueryExecutionId[1] 
"17ccec8a-d196-4b4c-b31c-314fab8939f3"

For production environment, we recommend using the scalable Rstudio solution outlined in this blog.

Conclusion

You learned how to deploy your RStudio environment in AWS. We demonstrated the advantages of using RStudio on Amazon SageMaker and how you can get started. You also learned how to quickly begin experimenting with the open-source version of RStudio using a self-hosted installation using Amazon EC2. We also demonstrated how to integrate RStudio into your data lake architectures and implement fine-grained access control on a data lake table using the row-level and cell-level security feature of Lake Formation.

In our next post, we will demonstrate how to containerize R scripts and run them using AWS Lambda.


About the authors

Venkata Kampana is a Senior Solutions Architect in the AWS Health and Human Services team and is based in Sacramento, CA. In that role, he helps public sector customers achieve their mission objectives with well-architected solutions on AWS.

Dr. Dawn Heisey-Grove is the public health analytics leader for Amazon Web Services’ state and local government team. In this role, she’s responsible for helping state and local public health agencies think creatively about how to achieve their analytics challenges and long-term goals. She’s spent her career finding new ways to use existing or new data to support public health surveillance and research.

Read More

Design patterns for serial inference on Amazon SageMaker

As machine learning (ML) goes mainstream and gains wider adoption, ML-powered applications are becoming increasingly common to solve a range of complex business problems. The solution to these complex business problems often requires using multiple ML models. These models can be sequentially combined to perform various tasks, such as preprocessing, data transformation, model selection, inference generation, inference consolidation, and post-processing. Organizations need flexible options to orchestrate these complex ML workflows. Serial inference pipelines are one such design pattern to arrange these workflows into a series of steps, with each step enriching or further processing the output generated by the previous steps and passing the output to the next step in the pipeline.

Additionally, these serial inference pipelines should provide the following:

  • Flexible and customized implementation (dependencies, algorithms, business logic, and so on)
  • Repeatable and consistent for production implementation
  • Undifferentiated heavy lifting by minimizing infrastructure management

In this post, we look at some common use cases for serial inference pipelines and walk through some implementation options for each of these use cases using Amazon SageMaker. We also discuss considerations for each of these implementation options.

The following table summarizes the different use cases for serial inference, implementation considerations and options. These are discussed in this post.

Use Case Use Case Description Primary Considerations Overall Implementation Complexity Recommended Implementation options Sample Code Artifacts and Notebooks
Serial inference pipeline (with preprocessing and postprocessing steps included) Inference pipeline needs to preprocess incoming data before invoking a trained model for generating inferences, and then postprocess generated inferences, so that they can be easily consumed by downstream applications Ease of implementation Low Inference container using the SageMaker Inference Toolkit Deploy a Trained PyTorch Model
Serial inference pipeline (with preprocessing and postprocessing steps included) Inference pipeline needs to preprocess incoming data before invoking a trained model for generating inferences, and then postprocess generated inferences, so that they can be easily consumed by downstream applications Decoupling, simplified deployment, and upgrades Medium SageMaker inference pipeline Inference Pipeline with Custom Containers and xgBoost
Serial model ensemble Inference pipeline needs to host and arrange multiple models sequentially, so that each model enhances the inference generated by the previous one, before generating the final inference Decoupling, simplified deployment and upgrades, flexibility in model framework selection Medium SageMaker inference pipeline Inference Pipeline with Scikit-learn and Linear Learner
Serial inference pipeline (with targeted model invocation from a group) Inference pipeline needs to invoke a specific customized model from a group of deployed models, based on request characteristics or for cost-optimization, in addition to preprocessing and postprocessing tasks Cost-optimization and customization High SageMaker inference pipeline with multi-model endpoints (MMEs) Amazon SageMaker Multi-Model Endpoints using Linear Learner

In the following sections, we discuss each use case in more detail.

Serial inference pipeline using inference containers

Serial inference pipeline use cases have requirements to preprocess incoming data before invoking a pre-trained ML model for generating inferences. Additionally, in some cases, the generated inferences may need to be processed further, so that they can be easily consumed by downstream applications. This is a common scenario for use cases where a streaming data source needs to be processed in real time before a model can be fitted on it. However, this use case can manifest for batch inference as well.

SageMaker provides an option to customize inference containers and use them to build a serial inference pipeline. Inference containers use the SageMaker Inference Toolkit and are built on SageMaker Multi Model Server (MMS), which provides a flexible mechanism to serve ML models. The following diagram illustrates a reference pattern of how to implement a serial inference pipeline using inference containers.

ml9154-inference-container

SageMaker MMS expects a Python script that implements the following functions to load the model, preprocess input data, get predictions from the model, and postprocess the output data:

  • input_fn() – Responsible for deserializing and preprocessing the input data
  • model_fn() – Responsible for loading the trained model from artifacts in Amazon Simple Storage Service (Amazon S3)
  • predict_fn() – Responsible for generating inferences from the model
  • output_fn() – Responsible for serializing and postprocessing the output data (inferences)

For detailed steps to customize an inference container, refer to Adapting Your Own Inference Container.

Inference containers are an ideal design pattern for serial inference pipeline use cases with the following primary considerations:

  • High cohesion – The processing logic and corresponding model drive single business functionality and need to be co-located
  • Low overall latency – The elapsed time between when an inference request is made and response is received

In a serial inference pipeline, the processing logic and model are encapsulated within the same single container, so much of the invocation calls remain within the container. This helps reduce the overall number of hops, resulting in better overall latency and responsiveness of the pipeline.

Also, for use cases where ease of implementation is an important criterion, inference containers can help, with various processing steps of the pipeline be co-located within the same container.

Serial inference pipeline using a SageMaker inference pipeline

Another variation of the serial inference pipeline use case requires clearer decoupling between the various steps in the pipeline (such as data preprocessing, inference generation, data postprocessing, and formatting and serialization). This could be due to a variety of reasons:

  • Decoupling – Various steps of the pipeline have a clearly defined purpose and need to be run on separate containers due to the underlying dependencies involved. This also helps keep the pipeline well structured.
  • Frameworks – Various steps of the pipeline use specific fit-for-purpose frameworks (such as scikit or Spark ML) and therefore need to be run on separate containers.
  • Resource Isolation – Various steps of the pipeline have varying resource consumption requirements and therefore need to be run on separate containers for more flexibility and control.

Furthermore, for slightly more complex serial inference pipelines, multiple steps may be involved to process a request and generate an inference. Therefore, from an operational standpoint, it may be beneficial to host these steps on separate containers for better functional isolation, and facilitate easier upgrades and enhancements (change one step without impacting other models or processing steps).

If your use case aligns with some of these considerations, a SageMaker inference pipeline provides an easy and flexible option to build a serial inference pipeline. The following diagram illustrates a reference pattern of how to implement a serial inference pipeline using multiple steps hosted on dedicated containers using a SageMaker inference pipeline.

ml9154-inference-pipeline

A SageMaker inference pipeline consists of a linear sequence of 2–15 containers that process requests for inferences on data. The inference pipeline provides the option to use pre-trained SageMaker built-in algorithms or custom algorithms packaged in Docker containers. The containers are hosted on the same underlying instance, which helps reduce the overall latency and minimize cost.

The following code snippet shows how multiple processing steps and models can be combined to create a serial inference pipeline.

We start by building and specifying Spark ML and XGBoost-based models that we intend to use as part of the pipeline:

from sagemaker.model import Model
from sagemaker.pipeline_model import PipelineModel
from sagemaker.sparkml.model import SparkMLModel
sparkml_data = 's3://{}/{}/{}'.format(s3_model_bucket, s3_model_key_prefix, 'model.tar.gz')
sparkml_model = SparkMLModel(model_data=sparkml_data)
xgb_model = Model(model_data=xgb_model.model_data, image=training_image)

The models are then arranged sequentially within the pipeline model definition:

model_name = 'serial-inference-' + timestamp_prefix
endpoint_name = 'serial-inference-ep-' + timestamp_prefix
sm_model = PipelineModel(name=model_name, role=role, models=[sparkml_model, xgb_model])

The inference pipeline is then deployed behind an endpoint for real-time inference by specifying the type and number of host ML instances:

sm_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)

The entire assembled inference pipeline can be considered a SageMaker model that you can use to make either real-time predictions or process batch transforms directly, without any external preprocessing. Within an inference pipeline model, SageMaker handles invocations as a sequence of HTTP requests originating from an external application. The first container in the pipeline handles the initial request, performs some processing, and then dispatches the intermediate response as a request to the second container in the pipeline. This happens for each container in the pipeline, and finally returns the final response to the calling client application.

SageMaker inference pipelines are fully managed. When the pipeline is deployed, SageMaker installs and runs all the defined containers on each of the Amazon Elastic Compute Cloud (Amazon EC2) instances provisioned as part of the endpoint or batch transform job. Furthermore, because the containers are co-located and hosted on the same EC2 instance, the overall pipeline latency is reduced.

Serial model ensemble using a SageMaker inference pipeline

An ensemble model is an approach in ML where multiple ML models are combined and used as part of the inference process to generate final inferences. The motivations for ensemble models could include improving accuracy, reducing model sensitivity to specific input features, and reducing single model bias, among others. In this post, we focus on the use cases related to a serial model ensemble, where multiple ML models are sequentially combined as part of a serial inference pipeline.

Let’s consider a specific example related to a serial model ensemble where we need to group a user’s uploaded images based on certain themes or topics. This pipeline could consist of three ML models:

  • Model 1 – Accepts an image as input and evaluates image quality based on image resolution, orientation, and more. This model then attempts to upscale the image quality and sends the processed images that meet a certain quality threshold to the next model (Model 2).
  • Model 2 – Accepts images validated through Model 1 and performs image recognition to identify objects, places, people, text, and other custom actions and concepts in images. The output from Model 2 that contains identified objects is sent to Model 3.
  • Model 3 – Accepts the output from Model 2 and performs natural language processing (NLP) tasks such as topic modeling for grouping images together based on themes. For example, images could be grouped based on location or people identified. The output (groupings) is sent back to the client application.

The following diagram illustrates a reference pattern of how to implement multiple ML models hosted on a serial model ensemble using a SageMaker inference pipeline.

ml9154-model-ensemble

As discussed earlier, the SageMaker inference pipeline is managed, which enables you to focus on the ML model selection and development, while reducing the undifferentiated heavy lifting associated with building the serial ensemble pipeline.

Additionally, some of the considerations discussed earlier around decoupling, algorithm and framework choice for model development, and deployment are relevant here as well. For instance, because each model is hosted on a separate container, you have flexibility in selecting the ML framework that best fits each model and your overall use case. Furthermore, from a decoupling and operational standpoint, you can continue to upgrade or modify individual steps much more easily, without affecting other models.

The SageMaker inference pipeline is also integrated with the SageMaker model registry for model cataloging, versioning, metadata management, and governed deployment to production environments to support consistent operational best practices. The SageMaker inference pipeline is also integrated with Amazon CloudWatch to enable monitoring the multi-container models in inference pipelines. You can also get visibility into real-time metrics to better understand invocations and latency for each container in the pipeline, which helps with troubleshooting and resource optimization.

Serial inference pipeline (with targeted model invocation from a group) using a SageMaker inference pipeline

SageMaker multi-model endpoints (MMEs) provide a cost-effective solution to deploy a large number of ML models behind a single endpoint. The motivations for using multi-model endpoints could include invocating a specific customized model based on request characteristics (such as origin, geographic location, user personalization, and so on) or simply hosting multiple models behind the same endpoint to achieve cost-optimization.

When you deploy multiple models on a single multi-model enabled endpoint, all models share the compute resources and the model serving container. The SageMaker inference pipeline can be deployed on an MME, where one of the containers in the pipeline can dynamically serve requests based on the specific model being invoked. From a pipeline perspective, the models have identical preprocessing requirements and expect the same feature set, but are trained to align to a specific behavior. The following diagram illustrates a reference pattern of how this integrated pipeline would work.

ml9154-mme

With MMEs, the inference request that originates from the client application should specify the target model that needs to be invoked. The first container in the pipeline handles the initial request, performs some processing, and then dispatches the intermediate response as a request to the second container in the pipeline, which hosts multiple models. Based on the target model specified in the inference request, the model is invoked to generate an inference. The generated inference is sent to the next container in the pipeline for further processing. This happens for each subsequent container in the pipeline, and finally SageMaker returns the final response to the calling client application.

Multiple model artifacts are persisted in an S3 bucket. When a specific model is invoked, SageMaker dynamically loads it onto the container hosting the endpoint. If the model is already loaded in the container’s memory, invocation is faster because SageMaker doesn’t need to download the model from Amazon S3. If instance memory utilization is high and a new model is invoked and therefore needs to be loaded, unused models are unloaded from memory. The unloaded models remain in the instance’s storage volume, however, and can be loaded into the container’s memory later again, without being downloaded from the S3 bucket again.

One of the key considerations while using MMEs is to understand model invocation latency behavior. As discussed earlier, models are dynamically loaded into the container’s memory of the instance hosting the endpoint when invoked. Therefore, the model invocation may take longer when it’s invoked for the first time. When the model is already in the instance container’s memory, the subsequent invocations are faster. If an instance memory utilization is high and a new model needs to be loaded, unused models are unloaded. If the instance’s storage volume is full, unused models are deleted from the storage volume. SageMaker fully manages the loading and unloading of the models, without you having to take any specific actions. However, it’s important to understand this behavior because it has implications on the model invocation latency and therefore overall end-to-end latency.

Pipeline hosting options

SageMaker provides multiple instance type options to select from for deploying ML models and building out inference pipelines, based on your use case, throughput, and cost requirements. For example, you can choose CPU or GPU optimized instances to build serial inference pipelines, on a single container or across multiple containers. However, there are sometimes requirements where it is desired to have flexibility and support to run models on CPU or GPU based instances within the same pipeline for additional flexibility.

You can now use NVIDIA Triton Inference Server to serve models for inference on SageMaker for heterogeneous compute requirements. Check out Deploy fast and scalable AI with NVIDIA Triton Inference Server in Amazon SageMaker for additional details.

Conclusion

As organizations discover and build new solutions powered by ML, the tools required for orchestrating these pipelines should be flexible enough to support based on a given use case, while simplifying and reducing the ongoing operational overheads. SageMaker provides multiple options to design and build these serial inference workflows, based on your requirements.

We look forward to hearing from you about what use cases you’re building using serial inference pipelines. If you have questions or feedback, please share them in the comments.


About the authors

Rahul Sharma is a Senior Solutions Architect at AWS Data Lab, helping AWS customers design and build AI/ML solutions. Prior to joining AWS, Rahul has spent several years in the finance and insurance sector, helping customers build data and analytical platforms.

Anand Prakash is a Senior Solutions Architect at AWS Data Lab. Anand focuses on helping customers design and build AI/ML, data analytics, and database solutions to accelerate their path to production.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and making machine learning more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Read More

NVIDIA, Oracle CEOs in Fireside Chat Light Pathways to Enterprise AI

Speeding adoption of enterprise AI and accelerated computing, Oracle CEO Safra Catz and NVIDIA founder and CEO Jensen Huang discussed their companies’ expanding collaboration in a fireside chat live streamed today from Oracle CloudWorld in Las Vegas.

Oracle and NVIDIA announced plans to bring NVIDIA’s full accelerated computing stack to Oracle Cloud Infrastructure (OCI). It includes NVIDIA AI Enterprise, NVIDIA RAPIDS for Apache Spark and NVIDIA Clara for healthcare.

In addition, OCI will deploy tens of thousands more NVIDIA GPUs to its cloud service, including A100 and upcoming H100 accelerators.

“I’m unbelievably excited to announce our renewed partnership and the expanded capabilities our cloud has,” said Catz to a live and online audience of several thousand customers and developers.

“We’re thrilled you’re bringing your AI solutions to OCI,” she told Huang.

The Power of Two

The combination of Oracle’s heritage in data and its powerful infrastructure with NVIDIA’s expertise in AI will give users traction facing tough challenges ahead, Huang said.

“Industries around the world need big benefits from our industry to find ways to do more without needing to spend more or consume more energy,” he said.

OracleWorld audience in Lass Vegas
Panorama of the crowd at OracleWorld in Las Vegas.

AI and GPU-accelerated computing are delivering these benefits at a time when traditional methods of increasing performance are slowing, he added.

“Data that you harness to find patterns and relationships can automate the way you work and the products and services you deliver — the next ten years will be some of the most exciting times in our industry,” Huang said.

“I’m confident all workloads will be accelerated for better performance, to drive costs out and for energy efficiency,” he added.

The capability of today’s software and hardware, coming to the cloud, “is something we’ve dreamed about since our early days,” said Catz, who joined Oracle in 1999 and has been its CEO since 2014.

Benefits for Healthcare and Every Industry

“One of the most critical areas is saving lives,” she added, pointing to the two companies’ work in healthcare.

A revolution in digital biology is transforming healthcare from a science-driven industry to one powered by both science and engineering, And NVIDIA Clara provides a platform for that work, used by healthcare experts around the world, Huang said.

“We can now use AI to understand the language of proteins and chemicals, all the way to gene screening and quantum chemistry —  amazing breakthroughs are happening now,” he said.

AI promises similar advances for every business. The automotive industry, for example, is becoming a tech industry as it discovers its smartphone moment, he said.

“We see this all over with big breakthroughs in natural language processing and large language models that can encode human knowledge to apply to all kinds of skills they were never trained to do,” he said.

The post NVIDIA, Oracle CEOs in Fireside Chat Light Pathways to Enterprise AI appeared first on NVIDIA Blog.

Read More