Streamline custom environment provisioning for Amazon SageMaker Studio: An automated CI/CD pipeline approach

Attaching a custom Docker image to an Amazon SageMaker Studio domain involves several steps. First, you need to build and push the image to Amazon Elastic Container Registry (Amazon ECR). You also need to make sure that the Amazon SageMaker domain execution role has the necessary permissions to pull the image from Amazon ECR. After the image is pushed to Amazon ECR, you create a SageMaker custom image on the AWS Management Console. Lastly, you update the SageMaker domain configuration to specify the custom image Amazon Resource Name (ARN). This multi-step process needs to be followed manually every time end-users create new custom Docker images to make them available in SageMaker Studio.

In this post, we explain how to automate this process. This approach allows you to update the SageMaker configuration without writing additional infrastructure code, provision custom images, and attach them to SageMaker domains. By adopting this automation, you can deploy consistent and standardized analytics environments across your organization, leading to increased team productivity and mitigating security risks associated with using one-time images.

The solution described in this post is geared towards machine learning (ML) engineers and platform teams who are often responsible for managing and standardizing custom environments at scale across an organization. For individual data scientists seeking a self-service experience, we recommend that you use the native Docker support in SageMaker Studio, as described in Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support. This feature allows data scientists to build, test, and deploy custom Docker containers directly within the SageMaker Studio integrated development environment (IDE), enabling you to iteratively experiment with your analytics environments seamlessly within the familiar SageMaker Studio interface.

Solution overview

The following diagram illustrates the solution architecture.

Solution Architecture

We deploy a pipeline using AWS CodePipeline, which automates a custom Docker image creation and attachment of the image to a SageMaker domain. The pipeline first checks out the code base from the GitHub repo and creates custom Docker images based on the configuration declared in the config files. After successfully creating and pushing Docker images to Amazon ECR, the pipeline validates the image by scanning and checking for security vulnerabilities in the image. If no critical or high-security vulnerabilities are found, the pipeline continues to the manual approval stage before deployment. After manual approval is complete, the pipeline deploys the SageMaker domain and attaches custom images to the domain automatically.

Prerequisites

The prerequisites for implementing the solution described in this post include:

Deploy the solution

Complete the following steps to implement the solution:

  1. Log in to your AWS account using the AWS CLI in a shell terminal (for more details, see Authenticating with short-term credentials for the AWS CLI).
  2. Run the following command to make sure you have successfully logged in to your AWS account:
aws sts get-caller-identity
  1. Fork the the GitHub repo to your GitHub account .
  2. Clone the forked repo to your local workstation using the following command:
git clone <clone_url_of_forked_repo>
  1. Log in to the console and create an AWS CodeStar connection to the GitHub repo in the previous step. For instructions, see Create a connection to GitHub (console).
  2. Copy the ARN for the connection you created.
  3. Go to the terminal and run the following command to cd into the repository directory:
cd streamline-sagemaker-custom-images-cicd
  1. Run the following command to install all libraries from npm:
npm install
  1. Run the following commands to run a shell script in the terminal. This script will take your AWS account number and AWS Region as input parameters and deploy an AWS CDK stack, which deploys components such as CodePipeline, AWS CodeBuild, the ECR repository, and so on. Use an existing VPC to setup VPC_ID export variable below. If you don’t have a VPC, create one with at least two subnets and use it.
export AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=<YOUR_AWS_REGION>
export VPC_ID=<VPC_ID_TO_DEPLOY>
export CODESTAR_CONNECTION_ARN=<CODE_STAR_CONNECTION_ARN_CREATED_IN_ABOVE_STEP>
export REPOSITORY_OWNER=<YOUR_GITHUB_LOGIN_ID>
  1. Run the following command to deploy the AWS infrastructure using the AWS CDK V2 and make sure to wait for the template to succeed:
cdk deploy PipelineStack --require-approval never
  1. On the CodePipeline console, choose Pipelines in the navigation pane.
  2. Choose the link for the pipeline named sagemaker-custom-image-pipeline.

Sagemaker custom image pipeline

  1. You can follow the progress of the pipeline on the console and provide approval in the manual approval stage to deploy the SageMaker infrastructure. Pipeline takes approximately 5-8 min to build image and move to manual approval stage
  2. Wait for the pipeline to complete the deployment stage.

The pipeline creates infrastructure resources in your AWS account with a SageMaker domain and a SageMaker custom image. It also attaches the custom image to the SageMaker domain.

  1. On the SageMaker console, choose Domains under Admin configurations in the navigation pane.

  1. Open the domain named team-ds, and navigate to the Environment

You should be able to see one custom image that is attached.

How custom images are deployed and attached

CodePipeline has a stage called BuildCustomImages that contains the automated steps to create a SageMaker custom image using the SageMaker Custom Image CLI and push it to the ECR repository created in the AWS account. The AWS CDK stack at the deployment stage has the required steps to create a SageMaker domain and attach a custom image to the domain. The parameters to create the SageMaker domain, custom image, and so on are configured in JSON format and used in the SageMaker stack under the lib directory. Refer to the sagemakerConfig section in environments/config.json for declarative parameters.

Add more custom images

Now you can add your own custom Docker image to attach to the SageMaker domain created by the pipeline. For the custom images being created, refer to Dockerfile specifications for the Docker image specifications.

  1. cd into the images directory in the repository in the terminal:
cd images
  1. Create a new directory (for example, custom) under the images directory:
mkdir custom
  1. Add your own Dockerfile to this directory. For testing, you can use the following Dockerfile config:
FROM public.ecr.aws/amazonlinux/amazonlinux:2
ARG NB_USER="sagemaker-user"
ARG NB_UID="1000"
ARG NB_GID="100"
RUN yum update -y && 
    yum install python3 python3-pip shadow-utils -y && 
    yum clean all
RUN yum install --assumeyes python3 shadow-utils && 
    useradd --create-home --shell /bin/bash --gid "${NB_GID}" --uid ${NB_UID} ${NB_USER} && 
    yum clean all && 
    python3 -m pip install jupyterlab
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install --upgrade urllib3==1.26.6
USER ${NB_UID}
CMD jupyter lab --ip 0.0.0.0 --port 8888 
--ServerApp.base_url="/jupyterlab/default" 
--ServerApp.token='' 
--ServerApp.allow_origin='*'
  1. Update the images section in the json file under the environments directory to add the new image directory name you have created:
"images": [
      "repositoryName": "research-platform-ecr",
       "tags":[
         "jlab",
         "custom" << Add here
       ]
      }
    ]
  1. Update the same image name in customImages under the created SageMaker domain configuration:
"customImages":[
          "jlab",
          "custom" << Add here
 ],
  1. Commit and push changes to the GitHub repository.
  2. You should see CodePipeline is triggered upon push. Follow the progress of the pipeline and provide manual approval for deployment.

After deployment is completed successfully, you should be able to see that the custom image you have added is attached to the domain configuration (as shown in the following screenshot).

Custom Image 2

Clean up

To clean up your resources, open the AWS CloudFormation console and delete the stacks SagemakerImageStack and PipelineStack in that order. If you encounter errors such as “S3 Bucket is not empty” or “ECR Repository has images,” you can manually delete the S3 bucket and ECR repository that was created. Then you can retry deleting the CloudFormation stacks.

Conclusion

In this post, we showed how to create an automated continuous integration and delivery (CI/CD) pipeline solution to build, scan, and deploy custom Docker images to SageMaker Studio domains. You can use this solution to promote consistency of the analytical environments for data science teams across your enterprise. This approach helps you achieve machine learning (ML) governance, scalability, and standardization.


About the Authors

Muni Annachi, a Senior DevOps Consultant at AWS, boasts over a decade of expertise in architecting and implementing software systems and cloud platforms. He specializes in guiding non-profit organizations to adopt DevOps CI/CD architectures, adhering to AWS best practices and the AWS Well-Architected Framework. Beyond his professional endeavors, Muni is an avid sports enthusiast and tries his luck in the kitchen.

Ajay Raghunathan is a Machine Learning Engineer at AWS. His current work focuses on architecting and implementing ML solutions at scale. He is a technology enthusiast and a builder with a core area of interest in AI/ML, data analytics, serverless, and DevOps. Outside of work, he enjoys spending time with family, traveling, and playing football.

Arun Dyasani is a Senior Cloud Application Architect at AWS. His current work focuses on designing and implementing innovative software solutions. His role centers on crafting robust architectures for complex applications, leveraging his deep knowledge and experience in developing large-scale systems.

Shweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning platform team at AWS, leading the SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and a Masters of Science in Financial Engineering, both from New York University.

Jenna Eun is a Principal Practice Manager for the Health and Advanced Compute team at AWS Professional Services. Her team focuses on designing and delivering data, ML, and advanced computing solutions for the public sector, including federal, state and local governments, academic medical centers, nonprofit healthcare organizations, and research institutions.

Meenakshi Ponn Shankaran is a Principal Domain Architect at AWS in the Data & ML Professional Services Org. He has extensive expertise in designing and building large-scale data lakes, handling petabytes of data. Currently, he focuses on delivering technical leadership to AWS US Public Sector clients, guiding them in using innovative AWS services to meet their strategic objectives and unlock the full potential of their data.

Read More