Organize machine learning development using shared spaces in SageMaker Studio for real-time collaboration

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). It provides a single, web-based visual interface where you can perform all ML development steps, including preparing data and building, training, and deploying models.

Within an Amazon SageMaker Domain, users can provision a personal Amazon SageMaker Studio IDE application, which runs a free JupyterServer with built‑in integrations to examine Amazon SageMaker Experiments, orchestrate Amazon SageMaker Pipelines, and much more. Users only pay for the flexible compute on their notebook kernels. These personal applications automatically mount a respective user’s private Amazon Elastic File System (Amazon EFS) home directory so they can keep code, data, and other files isolated from other users. Amazon SageMaker Studio already supports sharing of notebooks between private applications, but the asynchronous mechanism can slow down the iteration process.

Now with shared spaces in Amazon SageMaker Studio, users can organize collaborative ML endeavors and initiatives by creating a shared IDE application that users utilize with their own Amazon SageMaker user profile. Data workers collaborating in a shared space get access to an Amazon SageMaker Studio environment where they can access, read, edit, and share their notebooks in real time, which gives them the quickest path to start iterating with their peers on new ideas. Data workers can even collaborate on the same notebook concurrently using real-time collaboration capabilities. The notebook indicates each co-editing user with a different cursor that shows their respective user profile name.

Shared spaces in SageMaker Studio automatically tag resources, such as Training jobs, Processing jobs, Experiments, Pipelines, and Model Registry entries created within the scope of a workspace with their respective sagemaker:space-arn. The space filters those resources within the Amazon SageMaker Studio user interface (UI) so users are only presented with SageMaker Experiments, Pipelines, and other resources that are pertinent to their ML endeavor.

Solution overview

Solution overview
Since shared spaces automatically tags resources, administrators can easily monitor costs associated with an ML endeavor and plan budgets using tools such as AWS Budgets and AWS Cost Explorer. As an administrator you’ll only need to attach a cost allocation tag for sagemaker:space-arn.

attach a cost allocation tag for sagemaker:space-arn

Once that’s complete, you can use AWS Cost Explorer to identify how much individual ML projects are costing your organization.

Once that’s complete, you can use AWS Cost Explorer to identify how much individual ML projects are costing your organization.

Get started with shared spaces in Amazon SageMaker Studio

In this section, we’ll analyze the typical workflow for creating and utilizing shared spaces in Amazon SageMaker Studio.

Create a shared space in Amazon SageMaker Studio

You can use the Amazon SageMaker Console or the AWS Command Line Interface (AWS CLI) to add support for spaces to an existing domain. For the most up to date information, please check Create a shared space. Shared spaces only work with a JupyterLab 3 SageMaker Studio image and for SageMaker Domains using AWS Identity and Access Management (AWS IAM) authentication.

Console creation

To create a space within a designated Amazon SageMaker Domain, you’ll first need to set a designated space default execution role. From the Domain details page, select the Domain settings tab and select Edit. Then you can set a space default execution role, which only needs to be completed once per Domain, as shown in the following diagram:

Next, you can go to the Space management tab within your domain and select the Create button, as shown in the following diagram:

go to the Space management tab within your domain and select the Create button

AWS CLI creation

You can also set a default Domain space execution role from the AWS CLI. In order to determine your region’s JupyterLab3 image ARN, check Setting a default JupyterLab version.

aws --region <REGION> 
sagemaker update-domain 
--domain-id <DOMAIN-ID> 
--default-space-settings "ExecutionRole=<YOUR-SAGEMAKER-EXECUTION-ROLE-ARN>"

Once that’s been completed for your Domain, you can create a shared space from the CLI.

aws --region <REGION> 
sagemaker create-space 
--domain-id <DOMAIN-ID> 
--space-name <SPACE-NAME> 

Launch a shared space in Amazon SageMaker Studio

Users can launch a shared space by selecting the Launch button next to their user profile within the AWS Console for their Amazon SageMaker Domain.

After selecting Spaces under the Collaborative section, then select which Space to launch:

Alternatively, users can generate a pre-signed URL to launch a space through the AWS CLI:

aws sagemaker create-presigned-domain-url 
--region <REGION> 
--domain-id <DOMAIN-ID> 
--space-name <SPACE-NAME> 
--user-profile-name <USER-PROFILE-NAME> 

Real time collaboration

Once the Amazon SageMaker Studio shared space IDE has been loaded, users can select the Collaborators tab on the left panel to see which users are actively working in your space and on what notebook. If more than one person is working on the same notebook, then you’ll see a cursor with the other user’s profile name where they are editing:

In the following screenshot, you can see the different user experiences for someone editing and viewing the same notebook:

Conclusion

In this post, we showed you how shared spaces in SageMaker Studio adds a real-time collaborative IDE experience to Amazon SageMaker Studio. Automated tagging helps users scope and filter their Amazon SageMaker resources, which includes: experiments, pipelines, and model registry entries to maximize user productivity. Additionally, administrators can use these applied tags to monitor the costs associated with a given space and set appropriate budgets using AWS Cost Explorer and AWS Budgets.

Accelerate your team’s collaboration today by setting up shared spaces in Amazon SageMaker Studio for your specific machine learning endeavors!


About the authors

Sean MorganSean Morgan is an AI/ML Solutions Architect at AWS. He has experience in the semiconductor and academic research fields, and uses his experience to help customers reach their goals on AWS. In his free time, Sean is an active open-source contributor/maintainer and is the special interest group lead for TensorFlow Add-ons.

Han Zhang is a Senior Software Engineer at Amazon Web Services. She is part of the launch team for Amazon SageMaker Notebooks and Amazon SageMaker Studio, and has been focusing on building secure machine learning environments for customers. In her spare time, she enjoys hiking and skiing in the Pacific Northwest.

Arkaprava De is a Senior Software Engineer at AWS. He has been at Amazon for over 7 years and is currently working on improving the Amazon SageMaker Studio IDE experience. You can find him on LinkedIn.

Kunal Jha is a Senior Product Manager at AWS. He is focused on building Amazon SageMaker Studio as the IDE of choice for all ML development steps. In his spare time, Kunal enjoys skiing and exploring the Pacific Northwest. You can find him on LinkedIn.

Read More