Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Building a robust MLOps pipeline demands cross-functional collaboration. Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance. With the right processes and tools, MLOps enables organizations to reliably and efficiently adopt ML across their teams.

Although the requirements of continuous integration and continuous delivery (CI/CD) pipelines can be unique and reflect each organization’s needs, scaling MLOps practices across teams can be simplified by using managed orchestrations and tools that can accelerate the development process and remove the undifferentiated heavy lifting.

Amazon SageMaker MLOps is a suite of features that includes Amazon SageMaker Projects (CI/CD), Amazon SageMaker Pipelines and Amazon SageMaker Model Registry.

SageMaker Pipelines allows for straightforward creation and management of ML workflows, while also offering storage and reuse capabilities for workflow steps. The SageMaker Model Registry centralizes model tracking, simplifying model deployment. SageMaker Projects introduces CI/CD practices to ML, including environment parity, version control, testing, and automation. This allows for a quick establishment of CI/CD in your ML environment, facilitating effective scalability throughout your enterprise.

The built-in project templates provided by Amazon SageMaker include integration with some of third-party tools, such as Jenkins for orchestration and GitHub for source control, and several utilize AWS native CI/CD tools such as AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild. In many scenarios, however, customers would like to integrate SageMaker Pipelines with other existing CI/CD tools and therefore, create their custom project templates.

In this post, we show you a step-by-step implementation to achieve the following:

  • Create a custom SageMaker MLOps project template that integrates with GitHub and GitHub Actions
  • Make your custom project templates available in Amazon SageMaker Studio for your data science team with one-click provisioning

Solution overview

In this post, we construct the following architecture. We create an automated model build pipeline that includes steps for data preparation, model training, model evaluation, and registration of the trained model in the SageMaker Model Registry. The resulting trained ML model is then deployed from the SageMaker Model Registry to staging and production environments upon manual approval.

Solution Overview

Let’s delve into the elements of this architecture to understand the complete configuration.

GitHub and GitHub Actions

GitHub is a web-based platform that provides version control and source code management using Git. It enables teams to collaborate on software development projects, track changes, and manage code repositories. GitHub serves as a centralized location to store, version, and manage your ML code base. This ensures that your ML code base and pipelines are versioned, documented, and accessible by team members.

GitHub Actions is a powerful automation tool within the GitHub ecosystem. It allows you to create custom workflows that automate your software development lifecycle processes, such as building, testing, and deploying code. You can create event-driven workflows triggered by specific events, like when code is pushed to a repository or a pull request is created. When implementing MLOps, you can use GitHub Actions to automate various stages of the ML pipeline, such as:

  • Data validation and preprocessing
  • Model training and evaluation
  • Model deployment and monitoring
  • CI/CD for ML models

With GitHub Actions, you can streamline your ML workflows and ensure that your models are consistently built, tested, and deployed, leading to more efficient and reliable ML deployments.

In the following sections, we start by setting up the prerequisites relating to some of the components that we use as part of this architecture:

  • AWS CloudFormationAWS CloudFormation initiates the model deployment and establishes the SageMaker endpoints after the model deployment pipeline is activated by the approval of the trained model.
  • AWS CodeStar connection – We use AWS CodeStar to establish a link with the GitHub repository and utilize it as code repo integration with AWS resources, like SageMaker Studio.
  • Amazon EventBridgeAmazon EventBridge keeps track of all modifications to the model registry. It also maintains a rule that prompts the Lambda function to deploy the model pipeline when the status of the model package version changes from PendingManualApproval to Approved within the model registry.
  • AWS Lambda – We use an AWS Lambda function to initiate the model deployment workflow in GitHub Actions after a new model is registered in the model registry.
  • Amazon SageMaker – We configure the following SageMaker components:
    • Pipeline – This component consists of a directed acyclic graph (DAG) that helps us build the automated ML workflow for the stages of data preparation, model training, and model evaluation. The model registry maintains records of model versions, their associated artifacts, lineage, and metadata. A model package group is established that houses all related model versions. The model registry is also responsible for managing the approval status of the model version for subsequent deployment.
    • Endpoint – This component sets up two HTTPS real-time endpoints for inference. The hosting configuration can be adjusted, for instance, for batch transform or asynchronous inference. The staging endpoint is generated when the model deployment pipeline is activated by the approval of the trained model from the SageMaker Model Registry. This endpoint is utilized to validate the deployed model by ensuring it provides predictions that satisfy our accuracy standards. When the model is prepared for production deployment, a production endpoint is deployed by a manual approval stage in the GitHub Actions workflow.
    • Code repository – This creates a Git repository as a resource in your SageMaker account. Using the existing data from the GitHub code repository that you input during the creation of your SageMaker project, an association with the same repository is established in SageMaker when you initiate the project. This essentially forms a link with a GitHub repository in SageMaker, enabling interactive actions (pull/push) with your repository.
    • Model registry – This monitors the various versions of the model and the corresponding artifacts, which includes lineage and metadata. A collection known as a model package group is created, housing related versions of the model. Moreover, the model registry oversees the approval status of the model version, ensuring its readiness for subsequent deployment.
  • AWS Secrets Manager – To securely preserve your GitHub personal access token, it’s necessary to establish a secret in AWS Secrets Manager and house your access token within it.
  • AWS Service Catalog – We use the AWS Service Catalog for the implementation of SageMaker projects, which include components like a SageMaker code repository, Lambda function, EventBridge rule, artifact S3 bucket, etc., all implemented via CloudFormation. This allows your organization to use project templates repeatedly, allocate projects to each user, and streamline operations.
  • Amazon S3 – We use an Amazon Simple Storage Service (Amazon S3) bucket to keep the model artifacts produced by the pipeline.

Prerequisites

You should have the following prerequisites:

You must also complete additional setup steps before implementing the solution.

Set up an AWS CodeStar connection

If you don’t already have an AWS CodeStar connection to your GitHub account, refer to Create a connection to GitHub for instructions to create one. Your AWS CodeStar connection ARN will look like this:

arn:aws:codestar-connections:us-west-2:account_id:connection/aEXAMPLE-8aad-4d5d-8878-dfcab0bc441f

In this example, aEXAMPLE-8aad-4d5d-8878-dfcab0bc441f is the unique ID for this connection. We use this ID when we create our SageMaker project later in this example.

Set up secret access keys for your GitHub token

To securely store your GitHub personal access token, you need to create a secret in Secrets Manager. If you don’t have a personal access token for GitHub, refer to Managing your personal access tokens for instructions to create one.

You can create either a classic or fine-grained access token. However, make sure that the token has access to the repository’s contents and actions (workflows, runs, and artifacts).

Complete the following steps to store your token in Secrets Manager:

  1. On the Secrets Manager console, choose Store a new secret.
  2. Select Other type of secret for Choose secret type.
  3. Provide a name for your secret in the Key field and add your personal access token to the corresponding Value field.
  4. Choose Next, enter a name for your secret, and choose Next again.
  5. Choose Store to save your secret.

By storing your GitHub personal access token in Secrets Manager, you can securely access it within your MLOps pipeline while ensuring its confidentiality.

Create an IAM user for GitHub Actions

To allow GitHub Actions to deploy SageMaker endpoints in your AWS environment, you need to create an AWS Identity and Access Management (IAM) user and grant it the necessary permissions. For instructions, refer to Creating an IAM user in your AWS account. Use the iam/GithubActionsMLOpsExecutionPolicy.json file (provided in the code sample) to provide sufficient permissions for this user to deploy your endpoints.

After you create the IAM user, generate an access key. You will use this key, which consists of both an access key ID and a secret access key, in the subsequent step when configuring your GitHub secrets.

Set up your GitHub account

The following are the steps to prepare your GitHub account to run this example.

Clone the GitHub repository

You can reuse an existing GitHub repo for this example. However, it’s easier if you create a new repository. This repository is going to contain all the source code for both SageMaker pipeline builds and deployments.

Copy the contents of the seed code directory into the root of your GitHub repository. For instance, the .github directory should be under the root of your GitHub repository.

Create a GitHub secret containing your IAM user access key

In this step, we store the access key details of the newly created user in our GitHub secret.

  1. On the GitHub website, navigate to your repository and choose Settings.
  2. In the security section, select Secrets and Variables and choose Actions.
  3. Choose New Repository Secret.
  4. For Name, enter AWS_ACCESS_KEY_ID
  5. For Secret, enter the access key ID associated with the IAM user you created earlier.
  6. Choose Add Secret.
  7. Repeat the same procedure for AWS_SECRET_ACCESS_KEY

Configure your GitHub environments

To create a manual approval step in our deployment pipelines, we use a GitHub environment. Complete the following steps:

  1. Navigate to the Settings, Environments menu of your GitHub repository and create a new environment called production.
  2. For Environment protection rules, select Required reviewers.
  3. Add the desired GitHub user names as reviewers. For this example, you can choose your own user name.

Note that the environment feature is not available in some types of GitHub plans. For more information, refer to Using environments for deployment.

Deploy the Lambda function

In the following steps, we compress lambda_function.py into a .zip file, which is then uploaded to an S3 bucket.

The relevant code sample for this can be found in the following GitHub repo. Specifically, the lambda_function.py is located in the lambda_functions/lambda_github_workflow_trigger directory.

It’s recommended to create a fork of the code sample and clone that instead. This will give you the freedom to modify the code and experiment with different aspects of the sample.

  1. After you obtain a copy of the code, navigate to the appropriate directory and use the zip command to compress lambda_function.py. Both Windows and MacOS users can use their native file management system, File Explorer or Finder, respectively, to generate a .zip file.
cd lambda_functions/lambda_github_workflow_trigger
zip lambda-github-workflow-trigger.zip lambda_function.py
  1. Upload the lambda-github-workflow-trigger.zip to an S3 bucket.

This bucket will later be accessed by Service Catalog. You can choose any bucket that you have access to, as long as Service Catalog is able to retrieve data from it in subsequent steps.

From this step onwards, we require the AWS CLI v2 to be installed and configured. An alternative would be to utilize AWS CloudShell, which comes with all necessary tools pre-installed, eliminating the need for any additional configurations.

  1. To upload the file to the S3 bucket, use the following command:
aws s3 cp lambda-github-workflow-trigger.zip s3://your-bucket/

Now we construct a Lambda layer for the dependencies related to the lambda_function we just uploaded.

  1. Set up a Python virtual environment and get the dependencies installed:
mkdir lambda_layer
cd lambda_layer
python3 -m venv .env
source .env/bin/activate
pip install pygithub
deactivate
  1. Generate the .zip file with the following commands:
mv .env/lib/python3.9/site-packages/ python
zip -r layer.zip python
  1. Publish the layer to AWS:
aws lambda publish-layer-version --layer-name python39-github-arm64  
  --description "Python3.9 pygithub"  
  --license-info "MIT"  
  --zip-file fileb://layer.zip  
  --compatible-runtimes python3.9  
  --compatible-architectures "arm64"

With this layer published, all your Lambda functions can now reference it to meet their dependencies. For a more detailed understanding of Lambda layers, refer to Working with Lambda layers.

Create a custom project template in SageMaker

After completion of all the above steps, we have all the CI/CD pipeline resources and components. Next we demonstrate how we can make these resources available as a custom project within the SageMaker Studio accessible via one click deployment.

As discussed earlier, when the SageMaker-provided templates don’t meet your needs (for example, you want to have more complex orchestration in CodePipeline with multiple stages, custom approval steps or to integrate with a third party tool such as GitHub and GitHub actions demonstrated in this post), you can create your own templates. We recommend starting with the SageMaker-provided templates to understand how to organize your code and resources and build on top of it. For more details, refer to Create Custom Project Templates.

Note that you can also automate this step and instead use the CloudFormation to deploy the Service Catalogue portfolio and product via code. In this post however, for a greater learning experience, we show you the console deployment.

At this stage, we use the provided CloudFormation template to create a Service Catalog portfolio that helps us create custom projects in SageMaker.

You can create a new domain or reuse your SageMaker domain for the following steps. If you don’t have a domain, refer to Onboard to Amazon SageMaker Domain using Quick setup for setup instructions.

After you enable administrator access to the SageMaker templates, complete the following steps:

  1. On the Service Catalog console, under Administration in the navigation pane, choose Portfolios.
  2. Choose Create a new portfolio.
  3. Name the portfolio “SageMaker Organization Templates”.
  4. Download the template.yml file to your computer.

This Cloud Formation template provisions all the CI/CD resources we need as configuration and infrastructure as code. You can study the template in more detail to see what resources are deployed as part of it. This template has been customized to integrate with GitHub and GitHub Actions.

  1. In the template.yml file, change the S3Bucket value to your bucket where you have uploaded the Lambda .zip file:
GitHubWorkflowTriggerLambda:
  ...
  Code:
    S3Bucket: <your-bucket>
    S3Key: lambda-github-workflow-trigger.zip
  ...
  1. Choose the new portfolio.
  2. Choose Upload a new product.
  3. For Product name¸ enter a name for your template. We use the name build-deploy-github.
  4. For Description, enter a description.
  5. For Owner, enter your name.
  6. Under Version details, for Method, choose Use a template file.
  7. Choose Upload a template.
  8. Upload the template you downloaded.
  9. For Version title, choose 1.0.
  10. Choose Review.
  11. Review your settings and choose Create product.
  12. Choose Refresh to list the new product.
  13. Choose the product you just created.
  14. On the Tags tab, add the following tag to the product:
    • Key =sagemaker:studio-visibility
    • Valuetrue

Back in the portfolio details, you should see something similar to the following screenshot (with different IDs).

Service Catalog Portfolio

  1. On the Constraints tab, choose Create constraint.
  2. For Product, choose build-deploy-github (the product you just created).
  3. For Constraint type, choose Launch.
  4. Under Launch constraint, for Method, choose Select IAM role.
  5. Choose AmazonSageMakerServiceCatalogProductsLaunchRole.
  6. Choose Create.
  7. On the Groups, roles, and users tab, choose Add groups, roles, users.
  8. On the Roles tab, select the role you used when configuring your SageMaker Studio domain. This is where the SageMaker domain role can be found.

Service Catalog Launch Constraint

  1. Choose Add access.

Deploy the project from SageMaker Studio

In the previous sections, you prepared the custom MLOps project environment. Now, let’s create a project using this template:

  1. On the SageMaker console, navigate to the domain that you want to create this project.
  2. On the Launch menu, choose Studio.

You’ll be redirected to the SageMaker Studio environment.

  1. In SageMaker Studio, in the navigation pane under Deployments, choose Projects.
  2. Choose Create project.
  3. At the top of the list of templates, choose Organization templates.

If you have gone through all the previous steps successfully, you should be able to see a new custom project template named Build-Deploy-GitHub.

  1. Select that template and choose Select Project Template.
  2. Enter an optional description.
  3. For GitHub Repository Owner Name, enter the owner of your GitHub repository. For example, if your repository is at https://github.com/pooyavahidi/my-repo, the owner would be pooyavahidi.
  4. For GitHub Repository Name, enter the name of the repository into which you copied the seed code. It would be just the name of the repo. For example, in https://github.com/pooyavahidi/my-repo, the repo is my-repo.
  5. For Codestar connection unique ID, enter the unique ID of the AWS CodeStar connection that you created.
  6. For Name of the secret in the Secrets Manager which stores GitHub token, enter the name of the secret in Secrets Manager where you created and stored the GitHub token.
  7. For GitHub workflow file for deployment, enter the name of the GitHub workflow file (at .github/workflows/deploy.yml) where you have the deployment instructions. For this example, you can keep it as default, which is deploy.yml.
  8. Choose Create project.

SageMaker Studio Project

  1. After creating your project, make sure you update the AWS_REGION and SAGEMAKER_PROJECT_NAME environment variables in your GitHub workflow files accordingly. Workflow files are in your GitHub repo (copied from the seed code), inside the .github/workflows directory. Make sure you update both build.yml and deploy.yml files.
...
env:
  AWS_REGION: <region>   
  SAGEMAKER_PROJECT_NAME: <your project name>
...

Now your environment is ready to go! You can run the pipelines directly, make changes, and push those changes to your GitHub repository to trigger the automated build pipeline and see how all the steps of build and deploy are automated.

Clean up

To clean up the resources, complete the following steps:

  • Delete the CloudFormation stacks used for the SageMaker project and SageMaker endpoints.
  • Delete the SageMaker domain.
  • Delete the Service Catalog resources.
  • Delete the AWS CodeStar connection link with the GitHub repository.
  • Delete the IAM user that you created for GitHub Actions.
  • Delete the secret in Secrets Manager that stores the GitHub personal access details.

Summary

In this post, we walked through the process of using a custom SageMaker MLOps project template to automatically construct and organize a CI/CD pipeline. This pipeline effectively integrates your existing CI/CD mechanisms with SageMaker capabilities for data manipulation, model training, model approval, and model deployment. In our scenario, we focused on integrating GitHub Actions with SageMaker projects and pipelines. For a comprehensive understanding of the implementation details, visit the GitHub repository. Feel free to experiment with this and don’t hesitate to leave any queries you might have in the comments section.


About the Authors

Dr. Romina Sharifpour is a Senior Machine Learning and Artificial Intelligence Solutions Architect at Amazon Web Services (AWS). She has spent over 10 years leading the design and implementation of innovative end-to-end solutions enabled by advancements in ML and AI. Romina’s areas of interest are natural language processing, large language models, and MLOps.

Pooya Vahidi is a Senior Solutions Architect at AWS, passionate about computer science, artificial intelligence, and cloud computing. As an AI professional, he is an active member of the AWS AI/ML Area-of-Depth team. With a background spanning over two decades of expertise in leading the architecture and engineering of large-scale solutions, he helps customers on their transformative journeys through cloud and AI/ML technologies.

Read More