Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools

Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools

Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. It provides access to the most comprehensive set of tools for each step of ML development, from preparing data to building, training, deploying, and managing ML models. You can launch fully managed JuptyerLab with pre-configured SageMaker Distribution in seconds to work with your notebooks, code, and data. The flexible and extensible interface of SageMaker Studio allows you to effortlessly configure and arrange ML workflows, and you can use the AI-powered inline coding companion to quickly author, debug, explain, and test code.

In this post, we take a closer look at the updated SageMaker Studio and its JupyterLab IDE, designed to boost the productivity of ML developers. We introduce the concept of Spaces and explain how JupyterLab Spaces enable flexible customization of compute, storage, and runtime resources to improve your ML workflow efficiency. We also discuss our shift to a localized execution model in JupyterLab, resulting in a quicker, more stable, and responsive coding experience. Additionally, we cover the seamless integration of generative AI tools like Amazon CodeWhisperer and Jupyter AI within SageMaker Studio JupyterLab Spaces, illustrating how they empower developers to use AI for coding assistance and innovative problem-solving.

Introducing Spaces in SageMaker Studio

The new SageMaker Studio web-based interface acts as a command center for launching your preferred IDE and accessing your Amazon SageMaker tools to build, train, tune, and deploy models. In addition to JupyterLab and RStudio, SageMaker Studio now includes a fully managed Code Editor based on Code-OSS (Visual Studio Code Open Source). Both JupyterLab and Code Editor can be launched using a flexible workspace called Spaces.

A Space is a configuration representation of a SageMaker IDE, such as JupyterLab or Code Editor, designed to persist regardless of whether an application (IDE) associated with the Space is actively running or not. A Space represents a combination of a compute instance, storage, and other runtime configurations. With Spaces, you can create and scale the compute and storage for your IDE up and down as you go, customize runtime environments, and pause and resume coding anytime from anywhere. You can spin up multiple such Spaces, each configured with a different combination of compute, storage, and runtimes.

When a Space is created, it is equipped with an Amazon Elastic Block Store (Amazon EBS) volume, which is used to store users’ files, data, caches, and other artifacts. It’s attached to a ML compute instance whenever a Space is run. The EBS volume ensures that user files, data, cache, and session states are consistently restored whenever the Space is restarted. Importantly, this EBS volume remains persistent, whether the Space is in a running or stopped state. It will continue to persist until the Space is deleted.

Additionally, we have introduced the bring-your-own file system feature for users who wish to share environments and artifacts across different Spaces, users, or even domains. This enables you to optionally equip your Spaces with your own Amazon Elastic File System (Amazon EFS) mount, facilitating the sharing of resources across various workspaces.

Creating a Space

Creating and launching a new Space is now quick and straightforward. It takes just a few seconds to set up a new Space with fast launch instances and less than 60 seconds to run a Space. Spaces are equipped with predefined settings for compute and storage, managed by administrators. SageMaker Studio administrators can establish domain-level presets for compute, storage, and runtime configurations. This setup enables you to quickly launch a new space with minimal effort, requiring only a few clicks. You also have the option to modify a Space’s compute, storage, or runtime configurations for further customization.

It’s important to note that creating a Space requires updating the SageMaker domain execution role with a policy like the following example. You need to grant your users permissions for private spaces and user profiles necessary to access these private spaces. For detailed instructions, refer to Give your users access to private spaces.

{
  "Version": "2012-10-17",
  "Statement": [
    {

      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateApp",
        "sagemaker:DeleteApp"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:app/*",
      "Condition": {
        "Null": {
          "sagemaker:OwnerUserProfileArn": "true"
        }
      }
    },
    {
      "Sid": "SMStudioCreatePresignedDomainUrlForUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreatePresignedDomainUrl"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
    },
    {
      "Sid": "SMStudioAppPermissionsListAndDescribe",
      "Effect": "Allow",
      "Action": [
        "sagemaker:ListApps",
        "sagemaker:ListDomains",
        "sagemaker:ListUserProfiles",
        "sagemaker:ListSpaces",
        "sagemaker:DescribeApp",
        "sagemaker:DescribeDomain",
        "sagemaker:DescribeUserProfile",
        "sagemaker:DescribeSpace"
      ],
      "Resource": "*"
    },
    {
      "Sid": "SMStudioAppPermissionsTagOnCreate",
      "Effect": "Allow",
      "Action": [
        "sagemaker:AddTags"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:*/*",
      "Condition": {
        "Null": {
          "sagemaker:TaggingAction": "false"
        }
      }
    },
    {
      "Sid": "SMStudioRestrictSharedSpacesWithoutOwners",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateSpace",
        "sagemaker:UpdateSpace",
        "sagemaker:DeleteSpace"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:space/${sagemaker:DomainId}/*",
      "Condition": {
        "Null": {
          "sagemaker:OwnerUserProfileArn": "true"
        }
      }
    },
    {
      "Sid": "SMStudioRestrictSpacesToOwnerUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateSpace",
        "sagemaker:UpdateSpace",
        "sagemaker:DeleteSpace"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:space/${sagemaker:DomainId}/*",
      "Condition": {
        "ArnLike": {
          "sagemaker:OwnerUserProfileArn": "arn:aws:sagemaker:$AWS Region:$111122223333:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
        },
        "StringEquals": {
          "sagemaker:SpaceSharingType": [
            "Private",
            "Shared"
          ]
        }
      }
    },
    {
      "Sid": "SMStudioRestrictCreatePrivateSpaceAppsToOwnerUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateApp",
        "sagemaker:DeleteApp"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:app/${sagemaker:DomainId}/*",
      "Condition": {
        "ArnLike": {
          "sagemaker:OwnerUserProfileArn": "arn:aws:sagemaker:${aws:Region}:${aws:PrincipalAccount}:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
        },
        "StringEquals": {
          "sagemaker:SpaceSharingType": [
            "Private"
          ]
        }
      }
    },
  ]
}

To create a space, complete the following steps:

  1. In SageMaker Studio, choose JupyterLab on the Applications menu.
  2. Choose Create JupyterLab space.
  3. For Name, enter a name for your Space.
  4. Choose Create space.
  5. Choose Run space to launch your new Space with default presets or update the configuration based on your requirements.

Reconfiguring a Space

Spaces are designed for users to seamlessly transition between different compute types as needed. You can begin by creating a new Space with a specific configuration, primarily consisting of compute and storage. If you need to switch to a different compute type with a higher or lower vCPU count, more or less memory, or a GPU-based instance at any point in your workflow, you can do so with ease. After you stop the Space, you can modify its settings using either the UI or API via the updated SageMaker Studio interface and then restart the Space. SageMaker Studio automatically handles the provisioning of your existing Space to the new configuration, requiring no extra effort on your part.

Complete the following steps to edit an existing space:

  1. On the space details page, choose Stop space.
  2. Reconfigure the compute, storage, or runtime.
  3. Choose Run space to relaunch the space.

Your workspace will be updated with the new storage and compute instance type you requested.

The new SageMaker Studio JupyterLab architecture

The SageMaker Studio team continues to invent and simplify its developer experience with the release of a new fully managed SageMaker Studio JupyterLab experience. The new SageMaker Studio JupyterLab experience combines the best of both worlds: the scalability and flexibility of SageMaker Studio Classic (see the appendix at the end of this post) with the stability and familiarity of the open source JupyterLab. To grasp the design of this new JupyterLab experience, let’s delve into the following architecture diagram. This will help us better understand the integration and features of this new JupyterLab Spaces platform.

In summary, we have transitioned towards a localized architecture. In this new setup, Jupyter server and kernel processes operate alongside in a single Docker container, hosted on the same ML compute instance. These ML instances are provisioned when a Space is running, and linked with an EBS volume that is created when the Space was initially created.

This new architecture brings several benefits; we discuss some of these in the following sections.

Reduced latency and increased stability

SageMaker Studio has transitioned to a local run model, moving away from the previous split model where code was stored on an EFS mount and run remotely on an ML instance via remote Kernel Gateway. In the earlier setup, Kernel Gateway, a headless web server, enabled kernel operations over remote communication with Jupyter kernels through HTTPS/WSS. User actions like running code, managing notebooks, or running terminal commands were processed by a Kernel Gateway app on a remote ML instance, with Kernel Gateway facilitating these operations over ZeroMQ (ZMQ) within a Docker container. The following diagram illustrates this architecture.

The updated JupyterLab architecture runs all kernel operations directly on the local instance. This local Jupyter Server approach typically provides improved performance and straightforward architecture. It minimizes latency and network complexity, simplifies the architecture for easier debugging and maintenance, enhances resource utilization, and accommodates more flexible messaging patterns for a variety of complex workloads.

In essence, this upgrade brings running notebooks and code much closer to the kernels, significantly reducing latency and boosting stability.

Improved control over provisioned storage

SageMaker Studio Classic originally used Amazon EFS to provide persistent, shared file storage for user home directories within the SageMaker Studio environment. This setup enables you to centrally store notebooks, scripts, and other project files, accessible across all your SageMaker Studio sessions and instances.

With the latest update to SageMaker Studio, there is a shift from Amazon EFS-based storage to an Amazon EBS-based solution. The EBS volumes, provisioned with SageMaker Studio Spaces, are GP3 volumes designed to deliver a consistent baseline performance of 3,000 IOPS, independent of the volume size. This new Amazon EBS storage offers higher performance for I/O-intensive tasks such as model training, data processing, high-performance computing, and data visualization. This transition also gives SageMaker Studio administrators greater insight into and control over storage usage by user profiles within a domain or across SageMaker. You can now set default (DefaultEbsVolumeSizeInGb) and maximum (MaximumEbsVolumeSizeInGb) storage sizes for JupyterLab Spaces within each user profile.

In addition to improved performance, you have the ability to flexibly resize the storage volume attached to your Space’s ML compute instance by editing your Space setting either using the UI or API action from your SageMaker Studio interface, without requiring any administration action. However, note that you can only edit EBS volume sizes in one direction—after you increase the Space’s EBS volume size, you will not be able to lower it back down.

SageMaker Studio now offers elevated control of provisioned storage for administrators:

  • SageMaker Studio administrators can manage the EBS volume sizes for user profiles. These JupyterLab EBS volumes can vary from a minimum of 5 GB to a maximum of 16 TB. The following code snippet shows how to create or update a user profile with default and maximum space settings:
    aws --region $REGION sagemaker create-user-profile 
    --domain-id $DOMAIN_ID 
    --user-profile-name $USER_PROFILE_NAME 
    --user-settings '{
        "SpaceStorageSettings": {
            "DefaultEbsStorageSettings":{
                "DefaultEbsVolumeSizeInGb":5,
                "MaximumEbsVolumeSizeInGb":100
            }
        }
    }'
    
    
    # alternatively to update an existing user profile
    aws --region $REGION sagemaker update-user-profile 
    --domain-id $DOMAIN_ID 
    --user-profile-name $USER_PROFILE_NAME 
    --user-settings '{
        "SpaceStorageSettings": {
            "DefaultEbsStorageSettings":{
                "DefaultEbsVolumeSizeInGb":25,
                "MaximumEbsVolumeSizeInGb":100 
            }
        }
    }'

  • SageMaker Studio now offers an enhanced auto-tagging feature for Amazon EBS resources, automatically labeling volumes created by users with domain, user, and Space information. This advancement simplifies cost allocation analysis for storage resources, aiding administrators in managing and attributing costs more effectively. It’s also important to note that these EBS volumes are hosted within the service account, so you won’t have direct visibility. Nonetheless, storage usage and associated costs are directly linked to the domain ARN, user profile ARN, and Space ARN, facilitating straightforward cost allocation.
  • Administrators can also control encryption of a Space’s EBS volumes, at rest, using customer managed keys (CMK).

Shared tenancy with bring-your-own EFS file system

ML workflows are typically collaborative, requiring efficient sharing of data and code among team members. The new SageMaker Studio enhances this collaborative aspect by enabling you to share data, code, and other artifacts via a shared bring-your-own EFS file system. This EFS drive can be set up independently of SageMaker or could be an existing Amazon EFS resource. After it’s provisioned, it can be seamlessly mounted onto SageMaker Studio user profiles. This feature is not restricted to user profiles within a single domain—it can extend across domains, as long as they are within the same Region.

The following example code shows you how to create a domain and attach an existing EFS volume to it using its associated fs-id. EFS volumes can be attached to a domain at the root or prefix level, as the following commands demonstrate:

# create a domain with and attach an existing EFS volume at root level
aws sagemaker create-domain --domain-name "myDomain" 
 --vpc-id {VPC_ID} --subnet-ids {SUNBET_IDS} --auth-mode IAM 
 --default-user-settings 
 "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678"}}]"
 
# create a domain with and attach an existing EFS volume at file system prefix leve
aws sagemaker create-domain --domain-name "myDomain" 
 --vpc-id {VPC_ID} --subnet-ids {SUNBET_IDS} --auth-mode IAM 
 --default-user-settings 
 "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678", FileSystemPath="/my/custom/path"}}]"

# update an existing domain with your own EFS
aws sagemaker update-domain --region us-west-2 --domain-id d-xxxxx 
    --default-user-settings 
    "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678"}}]"

When an EFS mount is made available in a domain and its related user profiles, you can choose to attach it to a new space. This can be done using either the SageMaker Studio UI or an API action, as shown in the following example. It’s important to note that when a space is created with an EFS file system that’s provisioned at the domain level, the space inherits its properties. This means that if the file system is provisioned at a root or prefix level within the domain, these settings will automatically apply to the space created by the domain users.

# attach an a preconfigured EFS to a space
aws sagemaker create-space 
--space-name byofs-space --domain-id "myDomain" 
--ownership-settings "OwnerUserProfileName={USER_PROFILE_NAME}" 
--space-sharing-settings "SharingType=Private" 
--space-settings 
"AppType=JupyterLab,CustomFileSystems=[{EFSFileSystem={FileSystemId="fs-12345678"}}]")

After mounting it to a Space, you can locate all your files located above the admin-provisioned mount point. These files can be found in the directory path /mnt/custom-file-system/efs/fs-12345678.

EFS mounts make is straightforward to share artifacts between a user’s Space or between multiple users or across domains, making it ideal for collaborative workloads. With this feature, you can do the following:

  • Share data – EFS mounts are ideal for storing large datasets crucial for data science experiments. Dataset owners can load these mounts with training, validation, and test datasets, making them accessible to user profiles within a domain or across multiple domains. SageMaker Studio admins can also integrate existing application EFS mounts while maintaining compliance with organizational security policies. This is done through flexible prefix-level mounting. For example, if production and test data are stored on the same EFS mount (such as fs-12345678:/data/prod and fs-12345678:/data/test), mounting /data/test onto the SageMaker domain’s user profiles grants users access only to the test dataset. This setup allows for analysis or model training while keeping production data secure and inaccessible.
  • Share Code – EFS mounts facilitate the quick sharing of code artifacts between user profiles. In scenarios where users need to rapidly share code samples or collaborate on a common code base without the complexities of frequent git push/pull commands, shared EFS mounts are highly beneficial. They offer a convenient way to share work-in-progress code artifacts within a team or across different teams in SageMaker Studio.
  • Share development environments – Shared EFS mounts can also serve as a means to quickly disseminate sandbox environments among users and teams. EFS mounts provide a solid alternative for sharing Python environments like conda or virtualenv across multiple workspaces. This approach circumvents the need for distributing requirements.txt or environment.yml files, which can often lead to the repetitive task of creating or recreating environments across different user profiles.

These features significantly enhance the collaborative capabilities within SageMaker Studio, making it effortless for teams to work together efficiently on complex ML projects. Additionally, Code Editor based on Code-OSS (Visual Studio Code Open Source) shares the same architectural principles as the aforementioned JupyterLab experience This alignment brings several advantages, such as reduced latency, enhanced stability, and improved administrative control, and enables user access to shared workspaces, similar to those offered in JupyterLab Spaces.

Generative AI-powered tools on JupyterLab Spaces

Generative AI, a rapidly evolving field in artificial intelligence, uses algorithms to create new content like text, images, and code from extensive existing data. This technology has revolutionized coding by automating routine tasks, generating complex code structures, and offering intelligent suggestions, thereby streamlining development and fostering creativity and problem-solving in programming. As an indispensable tool for developers, generative AI enhances productivity and drives innovation in the tech industry. SageMaker Studio enhances this developer experience with pre-installed tools like Amazon CodeWhisperer and Jupyter AI, using generative AI to accelerate the development lifecycle.

Amazon CodeWhisperer

Amazon CodeWhisperer is a programming assistant that enhances developer productivity through real-time code recommendations and solutions. As an AWS managed AI service, it’s seamlessly integrated into the SageMaker Studio JupyterLab IDE. This integration makes Amazon CodeWhisperer a fluid and valuable addition to a developer’s workflow.

Amazon CodeWhisperer excels in increasing developer efficiency by automating common coding tasks, suggesting more effective coding patterns, and decreasing debugging time. It serves as an essential tool for both beginner and seasoned coders, providing insights into best practices, accelerating the development process, and improving the overall quality of code. To start using Amazon CodeWhisperer, make sure that the Resume Auto-Suggestions feature is activated. You can manually invoke code suggestions using keyboard shortcuts.

Alternatively, write a comment describing your intended code function and begin coding; Amazon CodeWhisperer will start providing suggestions.

Note that although Amazon CodeWhisperer is pre-installed, you must have the codewhisperer:GenerateRecommendations permission as part of the execution role to receive code recommendations. For additional details, refer to Using CodeWhisperer with Amazon SageMaker Studio. When you use Amazon CodeWhisperer, AWS may, for service improvement purposes, store data about your usage and content. To opt out of the Amazon CodeWhisperer data sharing policy, you can navigate to the Setting option from the top menu then navigate to Settings Editor and disable Share usage data with Amazon CodeWhisperer from the Amazon CodeWhisperer settings menu.

Jupyter AI

Jupyter AI is an open source tool that brings generative AI to Jupyter notebooks, offering a robust and user-friendly platform for exploring generative AI models. It enhances productivity in JupyterLab and Jupyter Notebooks by providing features like the %%ai magic for creating a generative AI playground inside notebooks, a native chat UI in JupyterLab for interacting with AI as a conversational assistant, and support for a wide array of large language model (LLM) providers like AI21, Anthropic, Cohere, and Hugging Face or managed services like Amazon Bedrock and SageMaker endpoints. This integration offers more efficient and innovative methods for data analysis, ML, and coding tasks. For example, you can interact with a domain-aware LLM using the Jupyternaut chat interface for help with processes and workflows or generate example code through CodeLlama, hosted on SageMaker endpoints. This makes it a valuable tool for developers and data scientists.

Jupyter AI provides an extensive selection of language models ready for use right out of the box. Additionally, custom models are also supported via SageMaker endpoints, offering flexibility and a broad range of options for users. It also offers support for embedding models, enabling you to perform inline comparisons and tests and even build or test ad hoc Retrieval Augmented Generation (RAG) apps.

Jupyter AI can act as your chat assistant, helping you with code samples, providing you with answers to questions, and much more.

You can use Jupyter AI’s %%ai magic to generate sample code inside your notebook, as shown in the following screenshot.

JupyterLab 4.0

The JupyterLab team has released version 4.0, featuring significant improvements in performance, functionality, and user experience. Detailed information about this release is available in the official JupyterLab Documentation.

This version, now standard in SageMaker Studio JupyterLab, introduces optimized performance for handling large notebooks and faster operations, thanks to improvements like CSS rule optimization and the adoption of CodeMirror 6 and MathJax 3. Key enhancements include an upgraded text editor with better accessibility and customization, a new extension manager for easy installation of Python extensions, and improved document search capabilities with advanced features. Additionally, version 4.0 brings UI improvements, accessibility enhancements, and updates to development tools, and certain features have been backported to JupyterLab 3.6.

Conclusion

The advancements in SageMaker Studio, particularly with the new JupyterLab experience, mark a significant leap forward in ML development. The updated SageMaker Studio UI, with its integration of JupyterLab, Code Editor, and RStudio, offers an unparalleled, streamlined environment for ML developers. The introduction of JupyterLab Spaces provides flexibility and ease in customizing compute and storage resources, enhancing the overall efficiency of ML workflows. The shift from a remote kernel architecture to a localized model in JupyterLab greatly increases stability while decreasing startup latency. This results in a quicker, more stable, and responsive coding experience. Moreover, the integration of generative AI tools like Amazon CodeWhisperer and Jupyter AI in JupyterLab further empowers developers, enabling you to use AI for coding assistance and innovative problem-solving. The enhanced control over provisioned storage and the ability to share code and data effortlessly through self-managed EFS mounts greatly facilitate collaborative projects. Lastly, the release of JupyterLab 4.0 within SageMaker Studio underscores these improvements, offering optimized performance, better accessibility, and a more user-friendly interface, thereby solidifying JupyterLab’s role as a cornerstone of efficient and effective ML development in the modern tech landscape.

Give SageMaker Studio JupyterLab Spaces a try using our quick onboard feature, which allows you to spin up a new domain for single users within minutes. Share your thoughts in the comments section!

Appendix: SageMaker Studio Classic’s kernel gateway architecture

A SageMaker Classic domain is a logical aggregation of an EFS volume, a list of users authorized to access the domain, and configurations related to security, application, networking, and more. In the SageMaker Studio Classic architecture of SageMaker, each user within the SageMaker domain has a distinct user profile. This profile encompasses specific details like the user’s role and their Posix user ID in the EFS volume, among other unique data. Users access their individual user profile through a dedicated Jupyter Server app, connected via HTTPS/WSS in their web browser. SageMaker Studio Classic uses a remote kernel architecture using a combination of Jupyter Server and Kernel Gateway app types, enabling notebook servers to interact with kernels on remote hosts. This means that the Jupyter kernels operate not on the notebook server’s host, but within Docker containers on separate hosts. In essence, your notebook is stored in the EFS home directory, and runs code remotely on a different Amazon Elastic Compute Cloud (Amazon EC2) instance, which houses a pre-built Docker container equipped with ML libraries such as PyTorch, TensorFlow, Scikit-Learn, and more.

The remote kernel architecture in SageMaker Studio offers notable benefits in terms of scalability and flexibility. However, it has its limitations, including a maximum of four apps per instance type and potential bottlenecks due to numerous HTTPS/WSS connections to a common EC2 instance type. These limitations could negatively affect the user experience.

The following architecture diagram depicts the SageMaker Studio Classic architecture. It illustrates the user’s process of connecting to a Kernel Gateway app via a Jupyter Server app, using their preferred web browser.


About the authors

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Kunal Jha is a Senior Product Manager at AWS. He is focused on building Amazon SageMaker Studio as the best-in-class  choice for end-to-end ML development. In his spare time, Kunal enjoys skiing and exploring the Pacific Northwest. You can find him on LinkedIn.

Majisha Namath Parambath is a Senior Software Engineer at Amazon SageMaker. She has been at Amazon for over 8 years and is currently working on improving the Amazon SageMaker Studio end-to-end experience.

Bharat Nandamuri is a Senior Software Engineer working on Amazon SageMaker Studio. He is passionate about building high scale backend services with focus on Engineering for ML systems. Outside of work, he enjoys playing chess, hiking and watching movies.

Derek Lause is a Software Engineer at AWS. He is committed to deliver value to customers through Amazon SageMaker Studio and Notebook Instances. In his spare time, Derek enjoys spending time with family and friends and hiking. You can find Derek on LinkedIn.

Read More

How AWS Prototyping enabled ICL-Group to build computer vision models on Amazon SageMaker

How AWS Prototyping enabled ICL-Group to build computer vision models on Amazon SageMaker

This is a customer post jointly authored by ICL and AWS employees.

ICL is a multi-national manufacturing and mining corporation based in Israel that manufactures products based on unique minerals and fulfills humanity’s essential needs, primarily in three markets: agriculture, food, and engineered materials. Their mining sites use industrial equipment that has to be monitored because machinery failures can result in loss of revenue or even environmental damages. Due to the extremely harsh conditions (low and high temperatures, vibrations, salt water, dust), attaching sensors to these mining machines for remote monitoring is difficult. Therefore, most machines are manually or visually monitored continuously by on-site workers. These workers frequently check camera pictures to monitor the state of a machine. Although this approach has worked in the past, it doesn’t scale and incurs relatively high costs.

To overcome this business challenge, ICL decided to develop in-house capabilities to use machine learning (ML) for computer vision (CV) to automatically monitor their mining machines. As a traditional mining company, the availability of internal resources with data science, CV, or ML skills was limited.

In this post, we discuss the following:

  • How ICL developed the in-house capabilities to build and maintain CV solutions that allow automatic monitoring of mining equipment to improve efficiency and reduce waste
  • A deep dive into a solution for mining screeners that was developed with the support of the AWS Prototyping program

Using the approach described in this post, ICL was able to develop a framework on AWS using Amazon SageMaker to build other use cases based on extracted vision from about 30 cameras, with the potential of scaling to thousands of such cameras on their production sites.

Building in-house capabilities through AWS Prototyping

Building and maintaining ML solutions for business-critical workloads requires sufficiently skilled staff. Outsourcing such activities is often not possible because internal know-how about business process needs to be combined with technical solution building. Therefore, ICL approached AWS for support in their journey to build a CV solution to monitor their mining equipment and acquire the necessary skills.

AWS Prototyping is an investment program where AWS embeds specialists into customer development teams to build mission-critical use cases. During such an engagement, the customer development team is enabled on the underlying AWS technologies while building the use case over the course of 3–6 weeks and getting hands-on help. Besides a corresponding use case, all the customer needs are 3–7 developers that can spend more than 80% of their working time building the aforementioned use case. During this time, the AWS specialists are fully assigned to the customer’s team and collaborate with them remotely or on-site.

ICL’s computer vision use case

For the prototyping engagement, ICL selected the use case for monitoring their mining screeners. A screener is a large industrial mining machine where minerals dissolved in water are processed. The water flows in several lanes from the top of the machine to the bottom. The influx is monitored for each of the lanes individually. When the influx runs out of the lane, it’s called overflow, which indicates that the machine is overloaded. Overflowing influx are minerals that are not processed by the screener and are lost. This needs to be avoided by regulating the influx. Without an ML solution, the overflow needs to be monitored by humans and it potentially takes time until the overflow is observed and handled.

The following images show the input and outputs of the CV models. The raw camera picture (left) is processed using a semantic segmentation model (middle) to detect the different lanes. Then the model (right) estimates the coverage (white) and overflow (red).

Although the prototyping engagement focused on a single type of machine, the general approach to use cameras and automatically process their images while using CV is applicable to a wider range of mining equipment. This allows ICL to extrapolate the know-how gained during the prototyping engagement to other locations, camera types, and machines, and also maintain the ML models without requiring support from any third party.

During the engagement, the AWS specialists and the ICL development team would meet every day and codevelop the solution step by step. ICL data scientists would either work independently on their assigned tasks or receive hands-on, pair-programming support from AWS ML specialists. This approach ensures that ICL data scientists not only gained experience to systematically develop ML models using SageMaker, but also to embed these models into applications as well as automate the whole lifecycle of such models, including automated retraining or model monitoring. After 4 weeks of this collaboration, ICL was able to move this model into production without requiring further support within 8 weeks, and has built models for other use cases since then. The technical approach of this engagement is described in the next section.

Monitoring mining screeners using CV models with SageMaker

SageMaker is a fully managed platform that addresses the complete lifecycle of an ML model: it provides services and features that support teams working on ML models from labeling their data in Amazon SageMaker Ground Truth to training and optimizing the model, as well as hosting ML models for production use. Prior to the engagement, ICL had installed the cameras and obtained pictures as shown in the previous images (left-most image) and stored them in an Amazon Simple Storage Service (Amazon S3) bucket. Before models can be trained, it’s necessary to generate training data. The joint ICL-AWS team addressed this in three steps:

  1. Label the data using a semantic segmentation labeling job in SageMaker Ground Truth, as shown in the following image.
  2. Preprocess the labeled images using image augmentation techniques to increase the number of data samples.
  3. Split the labeled images into training, test, and validation sets, so that the performance and accuracy of the model can be measured adequately during the training process.

To achieve production scale for ML workloads, automating these steps is crucial to maintain the quality of the training input. Therefore, whenever new images are labeled using SageMaker Ground Truth, the preprocessing and splitting steps are run automatically and the resulting datasets are stored in Amazon S3, as shown model training workflow in the following diagram. Similarly, the model deployment workflow uses assets from SageMaker to update endpoints automatically whenever an updated model is available.

ICL is using several approaches to implement ML models into production. Some involve their current AI platform called KNIME, which allows them to quickly deploy models developed in the development environment into production by industrializing them into products. Several combinations of using KNIME and AWS services were analyzed; the preceding architecture was the most suitable to ICL’ s environment.

The SageMaker semantic segmentation built-in algorithm is used to train models for screener grid area segmentation. By choosing this built-in algorithm over a self-built container, ICL doesn’t have to deal with the undifferentiated heavy lifting of maintaining a Convolutional Neural Network (CNN) while being able to use such a CNN for their use case. After experimenting with different configurations and parameters, ICL used a Fully Convolutional Network (FCN) algorithm with a pyramid scene parsing network (PSPNet) to train the model. This allowed ICL to finalize the model building within 1 week of the prototyping engagement.

After a model has been trained, it has to be deployed to be usable for the screener monitoring. In line with the model training, this process is fully automated and orchestrated using AWS Step Functions and AWS Lambda. After the model is successfully deployed on the SageMaker endpoint, incoming pictures from the cameras are resized to fit the model’s input format and then fed into the endpoint for predictions using Lambda functions. The result of the semantic segmentation prediction as well as the overflow detection are then stored in Amazon DynamoDB and Amazon S3 for downstream analysis. If overflow is detected, Amazon Simple Notification Service (Amazon SNS) or Lambda functions can be used to automatically mitigate the overflow and control the corresponding lanes on the affected screener. The following diagram illustrates this architecture.

Conclusion

This post described how ICL, an Israeli mining company, developed their own computer vision approach for automated monitoring of mining equipment using cameras. We first showed how to address such a challenge from an organizational point of view that is focused on enablement, then we provided a detailed look into how the model was built using AWS. Although the challenge of monitoring may be unique to ICL, the general approach to build a prototype alongside AWS specialists can be applied to similar challenges, particularly for organizations that don’t have the necessary AWS knowledge.

If you want to learn how to build a production-scale prototype of your use case, reach out to your AWS account team to discuss a prototyping engagement.


About the Authors

Markus Bestehorn leads the customer engineering and prototyping teams in Germany, Austria, Switzerland, and Israel for AWS. He has a PhD degree in computer science and is specialized in building complex machine learning and IoT solutions.

David Abekasis leads the data science team at ICL Group with a passion to educate others on data analysis and machine learning while helping solve business challenges. He has an MSc in Data Science and an MBA. He was fortunate to research spatial and time series data in the precision agriculture domain.

Ion Kleopas is a Sr. Machine Learning Prototyping Architect with an MSc in Data Science and Big Data. He helps AWS customers build innovative AI/ML solutions by enabling their technical teams on AWS technologies through the co-development of prototypes for challenging machine learning use cases, paving their path to production.

Miron Perel is a Principal Machine Learning Business Development Manager with Amazon Web Services. Miron advises Generative AI companies building their next generation models.

Read More

Automate PDF pre-labeling for Amazon Comprehend

Automate PDF pre-labeling for Amazon Comprehend

Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business.

To train a custom model, you first prepare training data by manually annotating entities in documents. This can be done with the Comprehend Semi-Structured Documents Annotation Tool, which creates an Amazon SageMaker Ground Truth job with a custom template, allowing annotators to draw bounding boxes around the entities directly on the PDF documents. However, for companies with existing tabular entity data in ERP systems like SAP, manual annotation can be repetitive and time-consuming.

To reduce the effort of preparing training data, we built a pre-labeling tool using AWS Step Functions that automatically pre-annotates documents by using existing tabular entity data. This significantly decreases the manual work needed to train accurate custom entity recognition models in Amazon Comprehend.

In this post, we walk you through the steps of setting up the pre-labeling tool and show examples of how it automatically annotates documents from a public dataset of sample bank statements in PDF format. The full code is available on the GitHub repo.

Solution overview

In this section, we discuss the inputs and outputs of the pre-labeling tool and provide an overview of the solution architecture.

Inputs and outputs

As input, the pre-labeling tool takes PDF documents that contain text to be annotated. For the demo, we use simulated bank statements like the following example.

The tool also takes a manifest file that maps PDF documents with the entities that we want to extract from these documents. Entities consists of two things: the expected_text to extract from the document (for example, AnyCompany Bank) and the corresponding entity_type (for example, bank_name). Later in this post, we show how to construct this manifest file from a CSV document like the following example.

The pre-labeling tool uses the manifest file to automatically annotate the documents with their corresponding entities. We can then use these annotations directly to train an Amazon Comprehend model.

Alternatively, you can create a SageMaker Ground Truth labeling job for human review and editing, as shown in the following screenshot.

When the review is complete, you can use the annotated data to train an Amazon Comprehend custom entity recognizer model.

Architecture

The pre-labeling tool consists of multiple AWS Lambda functions orchestrated by a Step Functions state machine. It has two versions that use different techniques to generate pre-annotations.

The first technique is fuzzy matching. This requires a pre-manifest file with expected entities. The tool uses the fuzzy matching algorithm to generate pre-annotations by comparing text similarity.

Fuzzy matching looks for strings in the document that are similar (but not necessarily identical) to the expected entities listed in the pre-manifest file. It first calculates text similarity scores between the expected text and words in the document, then it matches all pairs above a threshold. Therefore, even if there are no exact matches, fuzzy matching can find variants like abbreviations and misspellings. This allows the tool to pre-label documents without requiring the entities to appear verbatim. For example, if 'AnyCompany Bank' is listed as an expected entity, Fuzzy Matching will annotate occurrences of 'Any Companys Bank'. This provides more flexibility than strict string matching and enables the pre-labeling tool to automatically label more entities.

The following diagram illustrates the architecture of this Step Functions state machine.

The second technique requires a pre-trained Amazon Comprehend entity recognizer model. The tool generates pre-annotations using the Amazon Comprehend model, following the workflow shown in the following diagram.

The following diagram illustrates the full architecture.

In the following sections, we walk through the steps to implement the solution.

Deploy the pre-labeling tool

Clone the repository to your local machine:

git clone https://github.com/aws-samples/amazon-comprehend-automated-pdf-prelabeling-tool.git

This repository has been built on top of the Comprehend Semi-Structured Documents Annotation Tool and extends its functionalities by enabling you to start a SageMaker Ground Truth labeling job with pre-annotations already displayed on the SageMaker Ground Truth UI.

The pre-labeling tool includes both the Comprehend Semi-Structured Documents Annotation Tool resources as well as some resources specific to the pre-labeling tool. You can deploy the solution with AWS Serverless Application Model (AWS SAM), an open source framework that you can use to define serverless application infrastructure code.

If you have previously deployed the Comprehend Semi-Structured Documents Annotation Tool, refer to the FAQ section in Pre_labeling_tool/README.md for instructions on how to deploy only the resources specific to the pre-labeling tool.

If you haven’t deployed the tool before and are starting fresh, do the following to deploy the whole solution.

Change the current directory to the annotation tool folder:

cd amazon-comprehend-semi-structured-documents-annotation-tools

Build and deploy the solution:

make ready-and-deploy-guided

Create the pre-manifest file

Before you can use the pre-labeling tool, you need to prepare your data. The main inputs are PDF documents and a pre-manifest file. The pre-manifest file contains the location of each PDF document under 'pdf' and the location of a JSON file with expected entities to label under 'expected_entities'.

The notebook generate_premanifest_file.ipynb shows how to create this file. In the demo, the pre-manifest file shows the following code:

[
  {
    'pdf': 's3://<bucket>/data_aws_idp_workshop_data/bank_stmt_0.pdf',
    'expected_entities': 's3://<bucket>/prelabeling-inputs/expected-entities/example-demo/fuzzymatching_version/file_bank_stmt_0.json'
  },
  ...
]

Each JSON file listed in the pre-manifest file (under expected_entities) contains a list of dictionaries, one for each expected entity. The dictionaries have the following keys:

  • ‘expected_texts’ – A list of possible text strings matching the entity.
  • ‘entity_type’ – The corresponding entity type.
  • ‘ignore_list’ (optional) – The list of words that should be ignored in the match. These parameters should be used to prevent fuzzy matching from matching specific combinations of words that you know are wrong. This can be useful if you want to ignore some numbers or email addresses when looking at names.

For example, the expected_entities of the PDF shown previously looks like the following:

[
  {
    'expected_texts': ['AnyCompany Bank'],
    'entity_type': 'bank_name',
    'ignore_list': []
  },
  {
    'expected_texts': ['JANE DOE'],
    'entity_type': 'customer_name',
    'ignore_list': ['JANE.DOE@example_mail.com']
  },
  {
    'expected_texts': ['003884257406'],
    'entity_type': 'checking_number',
    'ignore_list': []
  },
 ...
]

Run the pre-labeling tool

With the pre-manifest file that you created in the previous step, start running the pre-labeling tool. For more details, refer to the notebook start_step_functions.ipynb.

To start the pre-labeling tool, provide an event with the following keys:

  • Premanifest – Maps each PDF document to its expected_entities file. This should contain the Amazon Simple Storage Service (Amazon S3) bucket (under bucket) and the key (under key) of the file.
  • Prefix – Used to create the execution_id, which names the S3 folder for output storage and the SageMaker Ground Truth labeling job name.
  • entity_types – Displayed in the UI for annotators to label. These should include all entity types in the expected entities files.
  • work_team_name (optional) – Used for creating the SageMaker Ground Truth labeling job. It corresponds to the private workforce to use. If it’s not provided, only a manifest file will be created instead of a SageMaker Ground Truth labeling job. You can use the manifest file to create a SageMaker Ground Truth labeling job later on. Note that as of this writing, you can’t provide an external workforce when creating the labeling job from the notebook. However, you can clone the created job and assign it to an external workforce on the SageMaker Ground Truth console.
  • comprehend_parameters (optional) – Parameters to directly train an Amazon Comprehend custom entity recognizer model. If omitted, this step will be skipped.

To start the state machine, run the following Python code:

import boto3
stepfunctions_client = boto3.client('stepfunctions')
response = stepfunctions_client.start_execution(
stateMachineArn=fuzzymatching_prelabeling_step_functions_arn,
input=json.dumps(<event-dict>)
)

This will start a run of the state machine. You can monitor the progress of the state machine on the Step Functions console. The following diagram illustrates the state machine workflow.

When the state machine is complete, do the following:

  • Inspect the following outputs saved in the prelabeling/ folder of the comprehend-semi-structured-docs S3 bucket:
    • Individual annotation files for each page of the documents (one per page per document) in temp_individual_manifests/
    • A manifest for the SageMaker Ground Truth labeling job in consolidated_manifest/consolidated_manifest.manifest
    • A manifest that can be used to train a custom Amazon Comprehend model in consolidated_manifest/consolidated_manifest_comprehend.manifest
  • On the SageMaker console, open the SageMaker Ground Truth labeling job that was created to review the annotations
  • Inspect and test the custom Amazon Comprehend model that was trained

As mentioned previously, the tool can only create SageMaker Ground Truth labeling jobs for private workforces. To outsource the human labeling effort, you can clone the labeling job on the SageMaker Ground Truth console and attach any workforce to the new job.

Clean up

To avoid incurring additional charges, delete the resources that you created and delete the stack that you deployed with the following command:

make delete

Conclusion

The pre-labeling tool provides a powerful way for companies to use existing tabular data to accelerate the process of training custom entity recognition models in Amazon Comprehend. By automatically pre-annotating PDF documents, it significantly reduces the manual effort required in the labeling process.

The tool has two versions: fuzzy matching and Amazon Comprehend-based, giving flexibility on how to generate the initial annotations. After documents are pre-labeled, you can quickly review them in a SageMaker Ground Truth labeling job or even skip the review and directly train an Amazon Comprehend custom model.

The pre-labeling tool enables you to quickly unlock the value of your historical entity data and use it in creating custom models tailored to your specific domain. By speeding up what is typically the most labor-intensive part of the process, it makes custom entity recognition with Amazon Comprehend more accessible than ever.

For more information about how to label PDF documents using a SageMaker Ground Truth labeling job, see Custom document annotation for extracting named entities in documents using Amazon Comprehend and Use Amazon SageMaker Ground Truth to Label Data.


About the authors

Oskar Schnaack is an Applied Scientist at the Generative AI Innovation Center. He is passionate about diving into the science behind machine learning to make it accessible for customers. Outside of work, Oskar enjoys cycling and keeping up with trends in information theory.

Romain Besombes is a Deep Learning Architect at the Generative AI Innovation Center. He is passionate about building innovative architectures to address customers’ business problems with machine learning.

Read More

Improve your Stable Diffusion prompts with Retrieval Augmented Generation

Improve your Stable Diffusion prompts with Retrieval Augmented Generation

Text-to-image generation is a rapidly growing field of artificial intelligence with applications in a variety of areas, such as media and entertainment, gaming, ecommerce product visualization, advertising and marketing, architectural design and visualization, artistic creations, and medical imaging.

Stable Diffusion is a text-to-image model that empowers you to create high-quality images within seconds. In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart, a machine learning (ML) hub offering models, algorithms, and solutions. The evolution continued in April 2023 with the introduction of Amazon Bedrock, a fully managed service offering access to cutting-edge foundation models, including Stable Diffusion, through a convenient API.

As an ever-increasing number of customers embark on their text-to-image endeavors, a common hurdle arises—how to craft prompts that wield the power to yield high-quality, purpose-driven images. This challenge often demands considerable time and resources as users embark on an iterative journey of experimentation to discover the prompts that align with their visions.

Retrieval Augmented Generation (RAG) is a process in which a language model retrieves contextual documents from an external data source and uses this information to generate more accurate and informative text. This technique is particularly useful for knowledge-intensive natural language processing (NLP) tasks. We now extend its transformative touch to the world of text-to-image generation. In this post, we demonstrate how to harness the power of RAG to enhance the prompts sent to your Stable Diffusion models. You can create your own AI assistant for prompt generation in minutes with large language models (LLMs) on Amazon Bedrock, as well as on SageMaker JumpStart.

Approaches to crafting text-to-image prompts

Creating a prompt for a text-to-image model may seem straightforward at first glance, but it’s a deceptively complex task. It’s more than just typing a few words and expecting the model to conjure an image that aligns with your mental image. Effective prompts should provide clear instructions while leaving room for creativity. They must balance specificity and ambiguity, and they should be tailored to the particular model being used. To address the challenge of prompt engineering, the industry has explored various approaches:

  • Prompt libraries – Some companies curate libraries of pre-written prompts that you can access and customize. These libraries contain a wide range of prompts tailored to various use cases, allowing you to choose or adapt prompts that align with your specific needs.
  • Prompt templates and guidelines – Many companies and organizations provide users with a set of predefined prompt templates and guidelines. These templates offer structured formats for writing prompts, making it straightforward to craft effective instructions.
  • Community and user contributions – Crowdsourced platforms and user communities often play a significant role in improving prompts. Users can share their fine-tuned models, successful prompts, tips, and best practices with the community, helping others learn and refine their prompt-writing skills.
  • Model fine-tuning – Companies may fine-tune their text-to-image models to better understand and respond to specific types of prompts. Fine-tuning can improve model performance for particular domains or use cases.

These industry approaches collectively aim to make the process of crafting effective text-to-image prompts more accessible, user-friendly, and efficient, ultimately enhancing the usability and versatility of text-to-image generation models for a wide range of applications.

Using RAG for prompt design

In this section, we delve into how RAG techniques can serve as a game changer in prompt engineering, working in harmony with these existing approaches. By seamlessly integrating RAG into the process, we can streamline and enhance the efficiency of prompt design.

Semantic search in a prompt database

Imagine a company that has accumulated a vast repository of prompts in its prompt library or has created a large number of prompt templates, each designed for specific use cases and objectives. Traditionally, users seeking inspiration for their text-to-image prompts would manually browse through these libraries, often sifting through extensive lists of options. This process can be time-consuming and inefficient. By embedding prompts from the prompt library using text embedding models, companies can build a semantic search engine. Here’s how it works:

  • Embedding prompts – The company uses text embeddings to convert each prompt in its library into a numerical representation. These embeddings capture the semantic meaning and context of the prompts.
  • User query – When users provide their own prompts or describe their desired image, the system can analyze and embed their input as well.
  • Semantic search – Using the embeddings, the system performs a semantic search. It retrieves the most relevant prompts from the library based on the user’s query, considering both the user’s input and historical data in the prompt library.

By implementing semantic search in their prompt libraries, companies empower their employees to access a vast reservoir of prompts effortlessly. This approach not only accelerates prompt creation but also encourages creativity and consistency in text-to-image generation.y

Prompt generation from semantic search

Although semantic search streamlines the process of finding relevant prompts, RAG takes it a step further by using these search results to generate optimized prompts. Here’s how it works:

  • Semantic search results – After retrieving the most relevant prompts from the library, the system presents these prompts to the user, alongside the user’s original input.
  • Text generation model – The user can select a prompt from the search results or provide further context on their preferences. The system feeds both the selected prompt and the user’s input into an LLM.
  • Optimized prompt – The LLM, with its understanding of language nuances, crafts an optimized prompt that combines elements from the selected prompt and the user’s input. This new prompt is tailored to the user’s requirements and is designed to yield the desired image output.

The combination of semantic search and prompt generation not only simplifies the process of finding prompts but also ensures that the prompts generated are highly relevant and effective. It empowers you to fine-tune and customize your prompts, ultimately leading to improved text-to-image generation results. The following are examples of images generated from Stable Diffusion XL using the prompts from semantic search and prompt generation.

Original Prompt Prompts from Semantic Search Optimized Prompt by LLM

a cartoon of a little dog

  • cute cartoon of a dog having a sandwich at the dinner table
  • a cartoon illustration of a punk dog, anime style, white background
  • a cartoon of a boy and his dog walking down a forest lane

A cartoon scene of a boy happily walking hand in hand down a forest lane with his cute pet dog, in animation style.

RAG-based prompt design applications across diverse industries

Before we explore the application of our suggested RAG architecture, let’s start with an industry in which an image generation model is most applicable. In AdTech, speed and creativity are critical. RAG-based prompt generation can add instant value by generating prompt suggestions to create many images quickly for an advertisement campaign. Human decision-makers can go through the auto-generated images to select the candidate image for the campaign. This feature can be a standalone application or embedded into popular software tools and platforms currently available.

Another industry where the Stable Diffusion model can enhance productivity is media and entertainment. The RAG architecture can assist in use cases of avatar creation, for example. Starting from a simple prompt, RAG can add much more color and characteristics to the avatar ideas. It can generate many candidate prompts and provide more creative ideas. From these generated images, you can find the perfect fit for the given application. It increases the productivity by automatically generating many prompt suggestions. The variation it can come up with is the immediate benefit of the solution.

Solution overview

Empowering customers to construct their own RAG-based AI assistant for prompt design on AWS is a testament to the versatility of modern technology. AWS provides a plethora of options and services to facilitate this endeavor. The following reference architecture diagram illustrates a RAG application for prompt design on AWS.

When it comes to selecting the right LLMs for your AI assistant, AWS offers a spectrum of choices to cater to your specific requirements.

Firstly, you can opt for LLMs available through SageMaker JumpStart, utilizing dedicated instances. These instances support a variety of models, including Falcon, Llama 2, Bloom Z, and Flan-T5, or you can explore proprietary models such as Cohere’s Command and Multilingual Embedding, or Jurassic-2 from AI21 Labs.

If you prefer a more simplified approach, AWS offers LLMs on Amazon Bedrock, featuring models like Amazon Titan and Anthropic Claude. These models are easily accessible through straightforward API calls, allowing you to harness their power effortlessly. The flexibility and diversity of options ensure that you have the freedom to choose the LLM that best aligns with your prompt design goals, whether you’re seeking an innovation with open containers or the robust capabilities of proprietary models.

When it comes to building the essential vector database, AWS provides a multitude of options through their native services. You can opt for Amazon OpenSearch Service, Amazon Aurora, or Amazon Relational Database Service (Amazon RDS) for PostgreSQL, each offering robust features to suit your specific needs. Alternatively, you can explore products from AWS partners like Pinecone, Weaviate, Elastic, Milvus, or Chroma, which provide specialized solutions for efficient vector storage and retrieval.

To help you get started to construct a RAG-based AI assistant for prompt design, we’ve put together a comprehensive demonstration in our GitHub repository. This demonstration uses the following resources:

  • Image generation: Stable Diffusion XL on Amazon Bedrock
  • Text embedding: Amazon Titan on Amazon Bedrock
  • Text generation: Claude 2 on Amazon Bedrock
  • Vector database: FAISS, an open source library for efficient similarity search
  • Prompt library: Prompt examples from DiffusionDB, the first large-scale prompt gallery dataset for text-to-image generative models

Additionally, we’ve incorporated LangChain for LLM implementation and Streamit for the web application component, providing a seamless and user-friendly experience.

Prerequisites

You need to have the following to run this demo application:

  • An AWS account
  • Basic understanding of how to navigate Amazon SageMaker Studio
  • Basic understanding of how to download a repo from GitHub
  • Basic knowledge of running a command on a terminal

Run the demo application

You can download all the necessary code with instructions from the GitHub repo. After the application is deployed, you will see a page like the following screenshot.

With this demonstration, we aim to make the implementation process accessible and comprehensible, providing you with a hands-on experience to kickstart your journey into the world of RAG and prompt design on AWS.

Clean up

After you try out the app, clean up your resources by stopping the application.

Conclusion

RAG has emerged as a game-changing paradigm in the world of prompt design, revitalizing Stable Diffusion’s text-to-image capabilities. By harmonizing RAG techniques with existing approaches and using the robust resources of AWS, we’ve uncovered a pathway to streamlined creativity and accelerated learning.

For additional resources, visit the following:


About the authors

James Yi is a Senior AI/ML Partner Solutions Architect in the Emerging Technologies team at Amazon Web Services. He is passionate about working with enterprise customers and partners to design, deploy and scale AI/ML applications to derive their business values. Outside of work, he enjoys playing soccer, traveling and spending time with his family.

Rumi Olsen is a Solutions Architect in the AWS Partner Program. She specializes in serverless and machine learning solutions in her current role, and has a background in natural language processing technologies. She spends most of her spare time with her daughter exploring the nature of Pacific Northwest.

Read More

Streamlining ETL data processing at Talent.com with Amazon SageMaker

Streamlining ETL data processing at Talent.com with Amazon SageMaker

This post is co-authored by Anatoly Khomenko, Machine Learning Engineer, and Abdenour Bezzouh, Chief Technology Officer at Talent.com.

Established in 2011, Talent.com aggregates paid job listings from their clients and public job listings, and has created a unified, easily searchable platform. Covering over 30 million job listings across more than 75 countries and spanning various languages, industries, and distribution channels, Talent.com caters to the diverse needs of job seekers, effectively connecting millions of job seekers with job opportunities.

Talent.com’s mission is to facilitate global workforce connections. To achieve this, Talent.com aggregates job listings from various sources on the web, offering job seekers access to an extensive pool of over 30 million job opportunities tailored to their skills and experiences. In line with this mission, Talent.com collaborated with AWS to develop a cutting-edge job recommendation engine driven by deep learning, aimed at assisting users in advancing their careers.

To ensure the effective operation of this job recommendation engine, it is crucial to implement a large-scale data processing pipeline responsible for extracting and refining features from Talent.com’s aggregated job listings. This pipeline is able to process 5 million daily records in less than 1 hour, and allows for processing multiple days of records in parallel. In addition, this solution allows for a quick deployment to production. The primary source of data for this pipeline is the JSON Lines format, stored in Amazon Simple Storage Service (Amazon S3) and partitioned by date. Each day, this results in the generation of tens of thousands of JSON Lines files, with incremental updates occurring daily.

The primary objective of this data processing pipeline is to facilitate the creation of features necessary for training and deploying the job recommendation engine on Talent.com. It’s worth noting that this pipeline must support incremental updates and cater to the intricate feature extraction requirements necessary for the training and deployment modules essential for the job recommendation system. Our pipeline belongs to the general ETL (extract, transform, and load) process family that combines data from multiple sources into a large, central repository.

For further insights into how Talent.com and AWS collaboratively built cutting-edge natural language processing and deep learning model training techniques, utilizing Amazon SageMaker to craft a job recommendation system, refer to From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker. The system includes feature engineering, deep learning model architecture design, hyperparameter optimization, and model evaluation, where all modules are run using Python.

This post shows how we used SageMaker to build a large-scale data processing pipeline for preparing features for the job recommendation engine at Talent.com. The resulting solution enables a Data Scientist to ideate feature extraction in a SageMaker notebook using Python libraries, such as Scikit-Learn or PyTorch, and then to quickly deploy the same code into the data processing pipeline performing feature extraction at scale. The solution does not require porting the feature extraction code to use PySpark, as required when using AWS Glue as the ETL solution. Our solution can be developed and deployed solely by a Data Scientist end-to-end using only a SageMaker, and does not require knowledge of other ETL solutions, such as AWS Batch. This can significantly shorten the time needed to deploy the Machine Learning (ML) pipeline to production. The pipeline is operated through Python and seamlessly integrates with feature extraction workflows, rendering it adaptable to a wide range of data analytics applications.

Solution overview

Overview for ETL pipeline using SageMaker Processing

The pipeline is comprised of three primary phases:

  1. Utilize an Amazon SageMaker Processing job to handle raw JSONL files associated with a specified day. Multiple days of data can be processed by separate Processing jobs simultaneously.
  2. Employ AWS Glue for data crawling after processing multiple days of data.
  3. Load processed features for a specified date range using SQL from an Amazon Athena table, then train and deploy the job recommender model.

Process raw JSONL files

We process raw JSONL files for a specified day using a SageMaker Processing job. The job implements feature extraction and data compaction, and saves processed features into Parquet files with 1 million records per file. We take advantage of CPU parallelization to perform feature extraction for each raw JSONL file in parallel. Processing results of each JSONL file is saved into a separate Parquet file inside a temporary directory. After all of the JSONL files have been processed, we perform compaction of thousands of small Parquet files into several files with 1 million records per file. The compacted Parquet files are then uploaded into Amazon S3 as the output of the processing job. The data compaction ensures efficient crawling and SQL queries in the next stages of the pipeline.

The following is the sample code to schedule a SageMaker Processing job for a specified day, for example 2020-01-01, using the SageMaker SDK. The job reads raw JSONL files from Amazon S3 (for example from s3://bucket/raw-data/2020/01/01) and saves the compacted Parquet files into Amazon S3 (for example to s3://bucket/processed/table-name/day_partition=2020-01-01/).

### install dependencies 
%pip install sagemaker pyarrow s3fs awswrangler

import sagemaker
import boto3

from sagemaker.processing import FrameworkProcessor
from sagemaker.sklearn.estimator import SKLearn
from sagemaker import get_execution_role
from sagemaker.processing import ProcessingInput, ProcessingOutput

region = boto3.session.Session().region_name
role = get_execution_role()
bucket = sagemaker.Session().default_bucket()

### we use instance with 16 CPUs and 128 GiB memory
### note that the script will NOT load the entire data into memory during compaction
### depending on the size of individual jsonl files, larger instance may be needed
instance = "ml.r5.4xlarge"
n_jobs = 8  ### we use 8 process workers
date = "2020-01-01" ### process data for one day

est_cls = SKLearn
framework_version_str = "0.20.0"

### schedule processing job
script_processor = FrameworkProcessor(
    role=role,
    instance_count=1,
    instance_type=instance,
    estimator_cls=est_cls,
    framework_version=framework_version_str,
    volume_size_in_gb=500,
)

script_processor.run(
    code="processing_script.py", ### name of the main processing script
    source_dir="../src/etl/", ### location of source code directory

    ### our processing script loads raw jsonl files directly from S3
    ### this avoids long start-up times of the processing jobs,
    ### since raw data does not need to be copied into instance
    inputs=[], ### processing job input is empty

    outputs=[
        ProcessingOutput(destination="s3://bucket/processed/table-name/",
                         source="/opt/ml/processing/output"),
    ],
    arguments=[
        ### directory with job's output
        "--output", "/opt/ml/processing/output",

        ### temporary directory inside instance
        "--tmp_output", "/opt/ml/tmp_output",

        "--n_jobs", str(n_jobs), ### number of process workers
        "--date", date, ### date to process

        ### location with raw jsonl files in S3
        "--path", "s3://bucket/raw-data/",
    ],
    wait=False
)

The following code outline for the main script (processing_script.py) that runs the SageMaker Processing job is as follows:

import concurrent
import pyarrow.dataset as ds
import os
import s3fs
from pathlib import Path

### function to process raw jsonl file and save extracted features into parquet file  
from process_data import process_jsonl

### parse command line arguments
args = parse_args()

### we use s3fs to crawl S3 input path for raw jsonl files
fs = s3fs.S3FileSystem()
### we assume raw jsonl files are stored in S3 directories partitioned by date
### for example: s3://bucket/raw-data/2020/01/01/
jsons = fs.find(os.path.join(args.path, *args.date.split('-')))

### temporary directory location inside the Processing job instance
tmp_out = os.path.join(args.tmp_output, f"day_partition={args.date}")

### directory location with job's output
out_dir = os.path.join(args.output, f"day_partition={args.date}")

### process individual jsonl files in parallel using n_jobs process workers
futures=[]
with concurrent.futures.ProcessPoolExecutor(max_workers=args.n_jobs) as executor:
    for file in jsons:
        inp_file = Path(file)
        out_file = os.path.join(tmp_out, inp_file.stem + ".snappy.parquet")
        ### process_jsonl function reads raw jsonl file from S3 location (inp_file)
        ### and saves result into parquet file (out_file) inside temporary directory
        futures.append(executor.submit(process_jsonl, file, out_file))

    ### wait until all jsonl files are processed
    for future in concurrent.futures.as_completed(futures):
        result = future.result()

### compact parquet files
dataset = ds.dataset(tmp_out)

if len(dataset.schema) > 0:
    ### save compacted parquet files with 1MM records per file
    ds.write_dataset(dataset, out_dir, format="parquet", 
                     max_rows_per_file=1024 * 1024)

Scalability is a key feature of our pipeline. First, multiple SageMaker Processing jobs can be used to process data for several days simultaneously. Second, we avoid loading the entire processed or raw data into memory at once, while processing each specified day of data. This enables the processing of data using instance types that can’t accommodate a full day’s worth of data in primary memory. The only requirement is that the instance type should be capable of loading N raw JSONL or processed Parquet files into memory simultaneously, with N being the number of process workers in use.

Crawl processed data using AWS Glue

After all the raw data for multiple days has been processed, we can create an Athena table from the entire dataset by using an AWS Glue crawler. We use the AWS SDK for pandas (awswrangler) library to create the table using the following snippet:

import awswrangler as wr

### crawl processed data in S3
res = wr.s3.store_parquet_metadata(
    path='s3://bucket/processed/table-name/',
    database="database_name",
    table="table_name",
    dataset=True,
    mode="overwrite",
    sampling=1.0,
    path_suffix='.parquet',
)

### print table schema
print(res[0])

Load processed features for training

Processed features for a specified date range can now be loaded from the Athena table using SQL, and these features can then be used for training the job recommender model. For example, the following snippet loads one month of processed features into a DataFrame using the awswrangler library:

import awswrangler as wr

query = """
    SELECT * 
    FROM table_name
    WHERE day_partition BETWEN '2020-01-01' AND '2020-02-01' 
"""

### load 1 month of data from database_name.table_name into a DataFrame
df = wr.athena.read_sql_query(query, database='database_name')

Additionally, the use of SQL for loading processed features for training can be extended to accommodate various other use cases. For instance, we can apply a similar pipeline to maintain two separate Athena tables: one for storing user impressions and another for storing user clicks on these impressions. Using SQL join statements, we can retrieve impressions that users either clicked on or didn’t click on and then pass these impressions to a model training job.

Solution benefits

Implementing the proposed solution brings several advantages to our existing workflow, including:

  • Simplified implementation – The solution enables feature extraction to be implemented in Python using popular ML libraries. And, it does not require the code to be ported into PySpark. This streamlines feature extraction as the same code developed by a Data Scientist in a notebook will be executed by this pipeline.
  • Quick path-to-production – The solution can be developed and deployed by a Data Scientist to perform feature extraction at scale, enabling them to develop an ML recommender model against this data. At the same time, the same solution can be deployed to production by an ML Engineer with little modifications needed.
  • Reusability – The solution provides a reusable pattern for feature extraction at scale, and can be easily adapted for other use cases beyond building recommender models.
  • Efficiency – The solution offers good performance: processing a single day of the Talent.com’s data took less than 1 hour.
  • Incremental updates – The solution also supports incremental updates. New daily data can be processed with a SageMaker Processing job, and the S3 location containing the processed data can be recrawled to update the Athena table. We can also use a cron job to update today’s data several times per day (for example, every 3 hours).

We used this ETL pipeline to help Talent.com process 50,000 files per day containing 5 million records, and created training data using features extracted from 90 days of raw data from Talent.com—a total of 450 million records across 900,000 files. Our pipeline helped Talent.com build and deploy the recommendation system into production within only 2 weeks. The solution performed all ML processes including ETL on Amazon SageMaker without utilizing other AWS service. The job recommendation system drove an 8.6% increase in clickthrough rate in online A/B testing against a previous XGBoost-based solution, helping connect millions of Talent.com’s users to better jobs.

Conclusion

This post outlines the ETL pipeline we developed for feature processing for training and deploying a job recommender model at Talent.com. Our pipeline uses SageMaker Processing jobs for efficient data processing and feature extraction at a large scale. Feature extraction code is implemented in Python enabling the use of popular ML libraries to perform feature extraction at scale, without the need to port the code to use PySpark.

We encourage the readers to explore the possibility of using the pipeline presented in this blog as a template for their use-cases where feature extraction at scale is required. The pipeline can be leveraged by a Data Scientist to build an ML model, and the same pipeline can then be adopted by an ML Engineer to run in production. This can significantly reduce the time needed to productize the ML solution end-to-end, as was the case with Talent.com. The readers can refer to the tutorial for setting up and running SageMaker Processing jobs. We also refer the readers to view the post From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker, where we discuss deep learning model training techniques utilizing Amazon SageMaker to build Talent.com’s job recommendation system.


About the authors

Dmitriy BespalovDmitriy Bespalov is a Senior Applied Scientist at the Amazon Machine Learning Solutions Lab, where he helps AWS customers across different industries accelerate their AI and cloud adoption.

Yi XiangYi Xiang is a Applied Scientist II at the Amazon Machine Learning Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.

Tong WangTong Wang is a Senior Applied Scientist at the Amazon Machine Learning Solutions Lab, where he helps AWS customers across different industries accelerate their AI and cloud adoption.

Anatoly KhomenkoAnatoly Khomenko is a Senior Machine Learning Engineer at Talent.com with a passion for natural language processing matching good people to good jobs.

Abdenour BezzouhAbdenour Bezzouh is an executive with more than 25 years experience building and delivering technology solutions that scale to millions of customers. Abdenour held the position of Chief Technology Officer (CTO) at Talent.com when the AWS team designed and executed this particular solution for Talent.com.

Yanjun QiYanjun Qi is a Senior Applied Science Manager at the Amazon Machine Learning Solution Lab. She innovates and applies machine learning to help AWS customers speed up their AI and cloud adoption.

Read More

FunSearch: Making new discoveries in mathematical sciences using Large Language Models

FunSearch: Making new discoveries in mathematical sciences using Large Language Models

In a paper published in Nature, we introduce FunSearch, a method for searching for “functions” written in computer code, and find new solutions in mathematics and computer science. FunSearch works by pairing a pre-trained LLM, whose goal is to provide creative solutions in the form of computer code, with an automated “evaluator”, which guards against hallucinations and incorrect ideas.Read More

‘Forza Horizon’ Races Over to GeForce NOW

‘Forza Horizon’ Races Over to GeForce NOW

This GFN Thursday is burning rubber with the latest Forza Horizon games from Microsoft Studios. Check them out on PC Game Pass.

Plus, give the gift of cloud gaming with the latest membership bundle, which includes a free, three-month PC Game Pass subscription with the purchase of a six-month GeForce NOW Ultimate membership.

It’s all part of an exciting week, with 13 new games joining the GeForce NOW library.

Zoom, Zoom

Jump into the driver’s seat in Forza Horizon 4 and Forza Horizon 5 from Playground Games and Microsoft Studios. Explore the critically acclaimed open-world racing games, featuring dynamic weather and seasons that can make or break even the most seasoned drivers.

Forza Horizon 4 on GeForce NOW
For-za cloud.

Race across beautiful, historical Great Britain in Forza Horizon 4. Ride solo or team up online with players from around the globe in a shared, open world. Collect, modify and drive over 450 cars from the Horizon car roster — plus, race, stunt, create and explore to become a Horizon Superstar.

Forza Horizon 5 on GeForce NOW
The ultimate “Horizon” adventure plays best on the ultimate cloud gaming service.

Clutch in, shift gears and head over to the vibrant open world of Mexico in Forza Horizon 5. Jump-start the week with limitless driving action in hundreds of the world’s greatest cars. Join a campaign with hundreds of challenges across varied terrains and climates, or head online for multiplayer action. Members can enjoy both titles in Steam and Forza Horizon 5 in PC Game Pass. Visit this Knowledgebase article for further details.

Stream every turn at GeForce quality on nearly any device and max out image resolution thanks to the cloud. Ultimate members can get in gear at up to 4K resolution and 120 frames per second for the most realistic driving experience.

The Ultimate Adventure

Minecraft Dungeons on GeForce NOW
What a blockhead.

Minecraft Dungeons from Mojang Studios and Xbox Game Studios is an immensely popular title that’s amassed over 25 million players and brings the thrill of classic dungeon crawlers to a whole new level.

Brave the dungeons alone or team up with a squad. Up to four players can battle together online or in couch co-op, making it a great game for group gatherings. Fight through action-packed, treasure-stuffed, wildly varied levels — all part of an epic quest to save the villagers and take down the evil Arch-Illager, preventing his army from controlling the Overworld.

Stream it on an Ultimate and Priority account for longer gaming sessions and faster access to GeForce RTX-powered servers. Venture forth across devices and play it on the big screen with NVIDIA SHIELD TV or on Samsung and LG smart TVs for the ultimate couch co-op experience.

Games, Games, Games

Pioneers of Pagonia on GeForce NOW
Be a pioneer of the cloud.

Time for some new games. Explore, discover and reunite the fantastical islands of Pagonia in Pioneers of Pagonia from Envision Entertainment. Build over 40 types of buildings, use more than 70 types of goods, manage widely branched production chains and get creative to establish a thriving economy.

Don’t miss the 13 newly supported games joining the GeForce NOW library this week:

  • Stellaris Nexus (New release on Steam, Dec. 12)
  • Tin Hearts (New release on Xbox, available PC Game Pass, Dec. 12)
  • Pioneers of Pagonia (New release on Steam, Dec. 13)
  • House Flipper 2 (New release on Steam, Dec. 14)
  • Soulslinger: Envoy of Death (New release on Steam, Dec. 14)
  • Escape the Backrooms (Steam)
  • Flashback 2 (Steam)
  • Forza Horizon 4 (Steam)
  • Forza Horizon 5 (Steam, Xbox, and available on PC Game Pass)
  • The Front (Steam)
  • Minecraft Dungeons (Steam, Xbox and available on PC Game Pass)
  • Primal Carnage: Extinction (Steam)
  • Universe Sandbox (Steam)

What are you planning to play this weekend? Let us know on Twitter or in the comments below.

Read More

Understanding GPU Memory 1: Visualizing All Allocations over Time

Understanding GPU Memory 1: Visualizing All Allocations over Time

During your time with PyTorch on GPUs, you may be familiar with this common error message:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 401.56 MiB is free.

In this series, we show how to use memory tooling, including the Memory Snapshot, the Memory Profiler, and the Reference Cycle Detector to debug out of memory errors and improve memory usage.

Memory Timeline

The Memory Snapshot tool provides a fine-grained GPU memory visualization for debugging GPU OOMs. Captured memory snapshots will show memory events including allocations, frees and OOMs, along with their stack traces.

In a snapshot, each tensor’s memory allocation is color coded separately. The x axis is over time, and the y axis is the amount of GPU memory in MB. The snapshot is interactive, so we can observe the stack trace for any allocation by mousing over.

In this snapshot, there are 3 peaks showing the memory allocations over 3 training iterations. When looking at the peaks, it is easy to see the rise of memory in the forward pass and the fall during the backward pass as the gradients are computed. It is also possible to see that the program has the same pattern of memory use iteration to iteration. One thing that stands out is the many tiny spikes in memory, by mousing over them, we see that they are buffers used temporarily by convolution operators.

Capturing Memory Snapshots

The API to capture memory snapshots is fairly simple and available in torch.cuda.memory:

  • Start: torch.cuda.memory._record_memory_history(max_entries=100000)
  • Save: torch.cuda.memory._dump_snapshot(file_name)
  • Stop: torch.cuda.memory._record_memory_history(enabled=None)

Code Snippet (for full code sample, see Appendix A):

   # Start recording memory snapshot history, initialized with a buffer
   # capacity of 100,000 memory events, via the `max_entries` field.
   torch.cuda.memory._record_memory_history(
       max_entries=MAX_NUM_OF_MEM_EVENTS_PER_SNAPSHOT
   )

   # Run your PyTorch Model.
   # At any point in time, save a snapshot to file for later.
   for _ in range(5):
       pred = model(inputs)
       loss_fn(pred, labels).backward()
       optimizer.step()
       optimizer.zero_grad(set_to_none=True)

   # In this sample, we save the snapshot after running 5 iterations.
   #   - Save as many snapshots as you'd like.
   #   - Snapshots will save last `max_entries` number of memory events
   #     (100,000 in this example).
   try:
       torch.cuda.memory._dump_snapshot(f"{file_prefix}.pickle")
   except Exception as e:
       logger.error(f"Failed to capture memory snapshot {e}")

   # Stop recording memory snapshot history.
   torch.cuda.memory._record_memory_history(enabled=None)

To visualize the snapshot file, we have a tool hosted at https://pytorch.org/memory_viz. There, you can drag and drop your saved snapshot file and it will plot each allocation over time.

Memory Timeline

Alternatively, you can generate an HTML from a .pickle by using the script at pytorch/torch/cuda/_memory_viz.py, here is an example:

python torch/cuda/_memory_viz.py trace_plot snapshot.pickle -o snapshot.html

Debugging CUDA OOMs

Let’s look at how we can use the memory snapshot tool to answer:

  1. Why did a CUDA OOM happen?
  2. Where is the GPU Memory being used?

ResNet50 with a bug

We’ve taken a look at a properly working model in the first snapshot. Now, let’s take a look at a training example with a bug, see snapshot:

Memory Timeline

Notice how the second iteration uses far more memory than the first iteration. If this model were much larger, it could have CUDA OOM’d in the second iteration without much more insight into why.

Memory Timeline

When examining this snapshot further, we can clearly see that several tensors are staying alive from the first iteration to the second and later iterations. If we mouse over one of these tensors, it would show a stack trace suggesting that these were gradient tensors.

And indeed if we go to the code, we can see that it doesn’t clear the gradient tensors, when it could have cleared them before the forward.

Before:

        for _ in range(num_iters):
          pred = model(inputs)
          loss_fn(pred, labels).backward()
          optimizer.step()

After:

        for _ in range(num_iters):
          pred = model(inputs)
          loss_fn(pred, labels).backward()
          optimizer.step()
          # Add this line to clear grad tensors
          optimizer.zero_grad(set_to_none=True)

We can simply add an optimizer.zero_grad(set_to_none=True) instruction to clear the gradient tensors from iteration to iteration (more details about why we need to zero the gradients here: https://pytorch.org/tutorials/recipes/recipes/zeroing_out_gradients.html).

This is a simplification of a bug we’ve found in more complicated programs using this tool. We encourage you to try out the Memory Snapshot on your GPU memory problems and let us know how it goes.

ResNet50 after bug fix

After applying the fix, the snapshot seems to be clearing the gradients now.

Memory Timeline

We now have the snapshot of a properly working ResNet50 model. Try out the code yourself (see code sample in Appendix A).

But you may be wondering, why is there still an increase in memory after the first iteration? To answer this, let’s visit the Memory Profiler in the next section.

Categorized Memory Usage

The Memory Profiler is an added feature of the PyTorch Profiler that categorizes memory usage over time. We still rely on the Memory Snapshot for stack traces for deep dives into memory allocations.

To generate a memory timeline, here is a code snippet (full code sample in Appendix B):

   # Initialize the profiler context with record_shapes, profile_memory,
   # and with_stack set to True.
   with torch.profiler.profile(
       activities=[
           torch.profiler.ProfilerActivity.CPU,
           torch.profiler.ProfilerActivity.CUDA,
       ],
       schedule=torch.profiler.schedule(wait=0, warmup=0, active=6, repeat=1),
       record_shapes=True,
       profile_memory=True,
       with_stack=True,
       on_trace_ready=trace_handler,
   ) as prof:
       # Run the PyTorch Model inside the profile context.
       for _ in range(5):
           prof.step()
           with record_function("## forward ##"):
               pred = model(inputs)

           with record_function("## backward ##"):
               loss_fn(pred, labels).backward()

           with record_function("## optimizer ##"):
               optimizer.step()
               optimizer.zero_grad(set_to_none=True)

   # Construct the memory timeline HTML plot.
   prof.export_memory_timeline(f"{file_prefix}.html", device="cuda:0")

For further reference, see https://pytorch.org/docs/main/profiler.html.

The Memory Profiler automatically generates categories based on the graph of tensor operations recorded during profiling.

Memory Timeline

In this Memory Timeline collected using the Memory Profiler, we have the same training example as before. We can observe the gradients in blue are now being cleared from iteration to iteration. We can also notice that the optimizer state in yellow is allocated after the first iteration, and is kept constant for the rest of the job.

This optimizer state is the reason behind the increase of GPU memory from the first iteration to the second. Try out the code yourself (see code sample in Appendix B). The Memory Profiler helps to improve training memory understanding so that model authors can figure out which categories are using the most GPU memory.

Where can I find these tools?

We hope that these tools will greatly improve your ability to debug CUDA OOMs and to understand your memory usage by category.

The Memory Snapshot and the Memory Profiler are available in the v2.1 release of PyTorch as experimental features.

Feedback

We look forward to hearing from you about any enhancements, bugs or memory stories that our tools helped to solve! As always, please feel free to open new issues on PyTorch’s Github page.

We are also open to contributions from the OSS community, feel free to tag Aaron Shi and Zachary DeVito in any Github PRs for reviews.

Acknowledgements

Really appreciate the content reviewers, Mark Saroufim, Gregory Chanan, and Adnan Aziz for reviewing this post and improving its readability.

Appendix

Appendix A – ResNet50 Memory Snapshot Code Example

# (c) Meta Platforms, Inc. and affiliates. 
import logging
import socket
from datetime import datetime, timedelta

import torch

from torchvision import models

logging.basicConfig(
   format="%(levelname)s:%(asctime)s %(message)s",
   level=logging.INFO,
   datefmt="%Y-%m-%d %H:%M:%S",
)
logger: logging.Logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)

TIME_FORMAT_STR: str = "%b_%d_%H_%M_%S"

# Keep a max of 100,000 alloc/free events in the recorded history
# leading up to the snapshot.
MAX_NUM_OF_MEM_EVENTS_PER_SNAPSHOT: int = 100000

def start_record_memory_history() -> None:
   if not torch.cuda.is_available():
       logger.info("CUDA unavailable. Not recording memory history")
       return

   logger.info("Starting snapshot record_memory_history")
   torch.cuda.memory._record_memory_history(
       max_entries=MAX_NUM_OF_MEM_EVENTS_PER_SNAPSHOT
   )

def stop_record_memory_history() -> None:
   if not torch.cuda.is_available():
       logger.info("CUDA unavailable. Not recording memory history")
       return

   logger.info("Stopping snapshot record_memory_history")
   torch.cuda.memory._record_memory_history(enabled=None)

def export_memory_snapshot() -> None:
   if not torch.cuda.is_available():
       logger.info("CUDA unavailable. Not exporting memory snapshot")
       return

   # Prefix for file names.
   host_name = socket.gethostname()
   timestamp = datetime.now().strftime(TIME_FORMAT_STR)
   file_prefix = f"{host_name}_{timestamp}"

   try:
       logger.info(f"Saving snapshot to local file: {file_prefix}.pickle")
       torch.cuda.memory._dump_snapshot(f"{file_prefix}.pickle")
   except Exception as e:
       logger.error(f"Failed to capture memory snapshot {e}")
       return

# Simple Resnet50 example to demonstrate how to capture memory visuals.
def run_resnet50(num_iters=5, device="cuda:0"):
   model = models.resnet50().to(device=device)
   inputs = torch.randn(1, 3, 224, 224, device=device)
   labels = torch.rand_like(model(inputs))
   optimizer = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
   loss_fn = torch.nn.CrossEntropyLoss()

   # Start recording memory snapshot history
   start_record_memory_history()

   for _ in range(num_iters):
       pred = model(inputs)
       loss_fn(pred, labels).backward()
       optimizer.step()
       optimizer.zero_grad(set_to_none=True)

   # Create the memory snapshot file
   export_memory_snapshot()

   # Stop recording memory snapshot history
   stop_record_memory_history()

if __name__ == "__main__":
    # Run the resnet50 model
    run_resnet50()

Appendix B – ResNet50 Memory Profiler Code Example

# (c) Meta Platforms, Inc. and affiliates. 
import logging
import socket
from datetime import datetime, timedelta

import torch

from torch.autograd.profiler import record_function
from torchvision import models

logging.basicConfig(
   format="%(levelname)s:%(asctime)s %(message)s",
   level=logging.INFO,
   datefmt="%Y-%m-%d %H:%M:%S",
)
logger: logging.Logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)

TIME_FORMAT_STR: str = "%b_%d_%H_%M_%S"

def trace_handler(prof: torch.profiler.profile):
   # Prefix for file names.
   host_name = socket.gethostname()
   timestamp = datetime.now().strftime(TIME_FORMAT_STR)
   file_prefix = f"{host_name}_{timestamp}"

   # Construct the trace file.
   prof.export_chrome_trace(f"{file_prefix}.json.gz")

   # Construct the memory timeline file.
   prof.export_memory_timeline(f"{file_prefix}.html", device="cuda:0")

def run_resnet50(num_iters=5, device="cuda:0"):
   model = models.resnet50().to(device=device)
   inputs = torch.randn(1, 3, 224, 224, device=device)
   labels = torch.rand_like(model(inputs))
   optimizer = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
   loss_fn = torch.nn.CrossEntropyLoss()

   with torch.profiler.profile(
       activities=[
           torch.profiler.ProfilerActivity.CPU,
           torch.profiler.ProfilerActivity.CUDA,
       ],
       schedule=torch.profiler.schedule(wait=0, warmup=0, active=6, repeat=1),
       record_shapes=True,
       profile_memory=True,
       with_stack=True,
       on_trace_ready=trace_handler,
   ) as prof:
       for _ in range(num_iters):
           prof.step()
           with record_function("## forward ##"):
               pred = model(inputs)

           with record_function("## backward ##"):
               loss_fn(pred, labels).backward()

           with record_function("## optimizer ##"):
               optimizer.step()
               optimizer.zero_grad(set_to_none=True)

if __name__ == "__main__":
    # Warm up
    run_resnet50()
    # Run the resnet50 model
    run_resnet50()

Read More

Superalignment Fast Grants

We’re launching $10M in grants to support technical research towards the alignment and safety of superhuman AI systems, including weak-to-strong generalization, interpretability, scalable oversight, and more.OpenAI Blog