AWS Deep Learning AMIs: New framework-specific DLAMIs for production complement the original multi-framework DLAMIs

Since its launch in November 2017, the AWS Deep Learning Amazon Machine Image (DLAMI) has been the preferred method for running deep learning frameworks on Amazon Elastic Compute Cloud (Amazon EC2). For deep learning practitioners and learners who want to accelerate deep learning in the cloud, the DLAMI comes pre-installed with AWS-optimized deep learning (DL) frameworks and their dependencies so you can get started right away with conducting research, developing machine learning (ML) applications, or educating yourself about deep learning. DLAMIs also make it easy to get going on instance types based on AWS-built processors such as Inferentia, Trainium, and Graviton, with all the necessary dependencies pre-installed.

The original DLAMI contained several popular frameworks such as PyTorch, TensorFlow, and MXNet, all in one bundle that AWS tested and supported on AWS instances. Although the multiple-framework DLAMI enables developers to explore various frameworks in a single image, some use cases require a smaller DLAMI that contains only a single framework. To support these use cases, we recently released DLAMIs that each contain a single framework. These framework-specific DLAMIs have less complexity and smaller size, making them more optimized for production environments.

In this post, we describe the components of the framework-specific DLAMIs and compare the use cases of the framework-specific and multi-framework DLAMIs.

All the DLAMIs contain similar libraries. The PyTorch DLAMI and the TensorFlow DLAMI each contain all the drivers necessary to run the framework on AWS instances including p3, p4, Trainium, or Graviton. The following table compares DLAMIs and components. More information can be found in the release notes.

Component Framework-specific PyTorch 1.9.0

Framework-specific

Tensorflow 2.5.0

Multi-framework (AL2 – v50)
PyTorch 1.9.0 N/A 1.4.0 & 1.8.1
TensorFlow N/A 2.5.0 2.4.2, 2.3.3 & 1.15.5
NVIDIA CUDA 11.1.1 11.2.2 10.x, 11.x
NVIDIA cuDNN 8.0.5 8.1.1 N/A

Eliminating other frameworks and their associated components makes each framework-specific DLAMI approximately 60% smaller (approximately 45 GB vs. 110 GB). As described in the following section, this reduction in complexity and size has advantages for certain use cases.

DLAMI use cases

The multi-framework DLAMI has, until now, been the default for AWS developers doing deep learning on EC2. This is because DLAMIs simplify the experience for developers looking to explore and compare different frameworks within a single AMI. The multi-framework DLAMI remains as a great solution for use cases focusing on research, development, and education. This is because the multi-framework DLAMI comes preinstalled with the deep learning infrastructure for TensorFlow, PyTorch, and MXNet. Developers don’t have to spend any time installing deep learning libraries and components specific to any of these frameworks, and can experiment with the latest versions of each of the most popular frameworks. This one-stop shop means that you can focus on your deep learning-related tasks instead of MLOps and driver configurations. Having multiple frameworks in the DLAMI provides flexibility and options for practitioners looking to explore multiple deep learning frameworks.

Some examples of use cases for the multi-framework DLAMI include:

  • Medical research – Research scientists want to develop models that detect malignant tumors and want to compare performance between deep learning frameworks to achieve the highest performance metrics possible
  • Deep learning college course – College students learning to train deep learning models can choose from the multiple frameworks installed on the DLAMI in a Jupyter environment
  • Developing a model for a mobile app – Developers use the multi-framework DLAMI to develop multiple models for their voice assistant mobile app using a combination of deep learning frameworks

When deploying in a production environment, however, developers may only require a single framework and its related dependencies. The lightweight, framework-specific DLAMIs provide a more streamlined image that minimizes dependencies. In addition to a smaller footprint, the framework-specific DLAMIs minimize the surface area for security attacks and provide more consistent compatibility across versions due to the limited number of included libraries. The framework-specific DLAMIs also have less complexity, which makes them more reliable as developers increment versions in production environments.

Some examples of use cases for framework-specific DLAMIs include:

  • Deploying an ML-based credit underwriting model – A finance startup wants to deploy an inference endpoint with high reliability and availability with faster auto scaling during demand spikes
  • Batch processing of video – A film company creates a command line application that increases the resolution of low-resolution digital video files using deep learning by interpolating pixels
  • Training a framework-specific model – A mobile app startup needs to train a model using TensorFlow because their app development stack requires a TensorFlow Lite compiled model

Conclusion

DLAMIs have become the go-to image for deep learning on EC2. Now, framework-specific DLAMIs build on that success by providing images that are optimized for production use cases. Like multi-framework DLAMIs, the single-framework images remove the heavy lifting necessary for developers to build and maintain deep learning applications. With the launch of the new, lightweight framework-specific DLAMIs, developers now have more choices for accelerated Deep Learning on EC2.

Get started with Single-framework DLAMIs today using this tutorial and selecting a framework-specific Deep Learning AMI in the Launch Wizard.


About the Authors

Francisco Calderon is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he helps solve critical business problems for AWS customers using deep learning. In his spare time, Francisco likes to play music and guitar, play soccer with his daughters, and enjoy time with his family.

Corey Barrett is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he uses machine learning and deep learning to solve critical business problems for AWS customers. Outside of work, you can find him enjoying the outdoors, sipping on scotch, and spending time with his family.

Read More