Amazon AWS – Page 110

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

December 18, 2023

by Igor Alekseev Amazon AWS

This is a guest post co-written with Babu Srinivasan from MongoDB.

As industries evolve in today’s fast-paced business landscape, the inability to have real-time forecasts poses significant challenges for industries heavily reliant on accurate and timely insights. The absence of real-time forecasts in various industries presents pressing business challenges that can significantly impact decision-making and operational efficiency. Without real-time insights, businesses struggle to adapt to dynamic market conditions, accurately anticipate customer demand, optimize inventory levels, and make proactive strategic decisions. Industries such as Finance, Retail, Supply Chain Management, and Logistics face the risk of missed opportunities, increased costs, inefficient resource allocation, and the inability to meet customer expectations. By exploring these challenges, organizations can recognize the importance of real-time forecasting and explore innovative solutions to overcome these hurdles, enabling them to stay competitive, make informed decisions, and thrive in today’s fast-paced business environment.

By harnessing the transformative potential of MongoDB’s native time series data capabilities and integrating it with the power of Amazon SageMaker Canvas, organizations can overcome these challenges and unlock new levels of agility. MongoDB’s robust time series data management allows for the storage and retrieval of large volumes of time-series data in real-time, while advanced machine learning algorithms and predictive capabilities provide accurate and dynamic forecasting models with SageMaker Canvas.

In this post, we will explore the potential of using MongoDB’s time series data and SageMaker Canvas as a comprehensive solution.

MongoDB Atlas

MongoDB Atlas is a fully managed developer data platform that simplifies the deployment and scaling of MongoDB databases in the cloud. It is a document based storage that provides a fully managed database, with built-in full-text and vector Search, support for Geospatial queries, Charts and native support for efficient time series storage and querying capabilities. MongoDB Atlas offers automatic sharding, horizontal scalability, and flexible indexing for high-volume data ingestion. Among all, the native time series capabilities is a standout feature, making it ideal for a managing high volume of time-series data, such as business critical application data, telemetry, server logs and more. With efficient querying, aggregation, and analytics, businesses can extract valuable insights from time-stamped data. By using these capabilities, businesses can efficiently store, manage, and analyze time-series data, enabling data-driven decisions and gaining a competitive edge.

Amazon SageMaker Canvas

Amazon SageMaker Canvas is a visual machine learning (ML) service that enables business analysts and data scientists to build and deploy custom ML models without requiring any ML experience or having to write a single line of code. SageMaker Canvas supports a number of use cases, including time-series forecasting, which empowers businesses to forecast future demand, sales, resource requirements, and other time-series data accurately. The service uses deep learning techniques to handle complex data patterns and enables businesses to generate accurate forecasts even with minimal historical data. By using Amazon SageMaker Canvas capabilities, businesses can make informed decisions, optimize inventory levels, improve operational efficiency, and enhance customer satisfaction.

The SageMaker Canvas UI lets you seamlessly integrate data sources from the cloud or on-premises, merge datasets effortlessly, train precise models, and make predictions with emerging data—all without coding. If you need an automated workflow or direct ML model integration into apps, Canvas forecasting functions are accessible through APIs.

Solution overview

Users persist their transactional time series data in MongoDB Atlas. Through Atlas Data Federation, data is extracted into Amazon S3 bucket. Amazon SageMaker Canvas access the data to build models and create forecasts. The results of the forecasting are stored in an S3 bucket. Using the MongoDB Data Federation services, the forecasts are presented visually through MongoDB Charts.

The following diagram outlines the proposed solution architecture.

Prerequisites

For this solution we use MongoDB Atlas to store time series data, Amazon SageMaker Canvas to train a model and produce forecasts, and Amazon S3 to store data extracted from MongoDB Atlas.

Make sure you have the following prerequisites:

Create an S3 bucket

Configure MongoDB Atlas cluster

Create a free MongoDB Atlas cluster by following the instructions in Create a Cluster. Setup the Database access and Network access.

Populate a time series collection in MongoDB Atlas

For the purposes of this demonstration, you can use a sample data set from from Kaggle and upload the same to MongoDB Atlas with the MongoDB tools , preferably MongoDB Compass.

The following code shows a sample data set for a time series collection:

{
"store": "1 1",
"timestamp": { "2010-02-05T00:00:00.000Z"},
"temperature": "42.31",
"target_value": 2.572,
"IsHoliday": false
}

The following screenshot shows the sample time series data in MongoDB Atlas:

Create an S3 Bucket

Create an S3 bucket in AWS , where the time series data need to be stored and analyzed. Note we have two folders. sales-train-data is used to store data extracted from MongoDB Atlas, while sales-forecast-output contains predictions from Canvas.

Create the Data Federation

Setup the Data Federation in Atlas and register the S3 bucket created previously as part of the data source. Notice the three different database/collections are created in the data federation for Atlas cluster, S3 bucket for MongoDB Atlas data and S3 bucket to store the Canvas results.

The following screenshots shows the setup of the data federation.

Setup the Atlas application service

Create the MongoDB Application Services to deploy the functions to transfer the data from MongoDB Atlas cluster to S3 bucket using the $out aggregation.

Verify the Datasource Configuration

The Application services create a new Altas Service Name that needs to be referred as the data services in the following function. Verify that the Atlas Service Name is created and note it for future reference.

Create the function

Setup the Atlas Application services to create the trigger and functions. The triggers need to be scheduled to write the data to S3 at a period frequency based on the business need for training the models.

The following script shows the function to write to the S3 bucket:

exports = function () {

   const service = context.services.get("");
   const db = service.db("")
   const events = db.collection("");

   const pipeline = [
    {
            "$out": {
               "s3": {
                  "bucket": "<S3_bucket_name>",
                  "region": "<AWS_Region>",
                   "filename": {$concat: ["<S3path>/<filename>_",{"$toString":  new Date(Date.now())}]},
                  "format": {
                        "name": "json",
                        "maxFileSize": "10GB"
                  }
               }
            }
      }
   ];

   return events.aggregate(pipeline);
};

Sample function

The function can be run through the Run tab and the errors can be debugged using the log features in the Application Services. In addition, the errors can be debugged using the Logs menu in the left pane.

The following screenshot shows the execution of the function along with the output:

Create dataset in Amazon SageMaker Canvas

The following steps assume that you have created a SageMaker domain and user profile. If you have not already done so, make sure that you configure the SageMaker domain and user profile. In the user profile, update your S3 bucket to be custom and supply your bucket name.

When complete, navigate to SageMaker Canvas, select your domain and profile, and select Canvas.

Create a dataset supplying the data source.

Select the dataset source as S3

Select the data location from the S3 bucket and select Create dataset.

Review the schema and click Create dataset

Upon successful import, the dataset will appear in the list as shown in the following screenshot.

Train the model

Next, we will use Canvas to set up to train the model. Select the dataset and click Create.

Create a model name, select Predictive analysis, and select Create.

Select target column

Next, click Configure time series model and select item_id as the Item ID column.

Select tm for the time stamp column

To specify the amount of time that you want to forecast, choose 8 weeks.

Now you are ready to preview the model or launch the build process.

After you preview the model or launch the build, your model will be created and can take up to four hours. You can leave the screen and return to see the model training status.

When the model is ready, select the model and click on the latest version

Review the model metrics and column impact and if you are satisfied with the model performance, click Predict.

Next, choose Batch prediction, and click Select dataset.

Select your dataset, and click Choose dataset.

Next, click Start Predictions.

Observe a job created or observe the job progress in SageMaker under Inference, Batch transform jobs.

When the job completes, select the job, and note the S3 path where Canvas stored the predictions.

Visualize forecast data in Atlas Charts

To visualize forecast data, create the MongoDB Atlas charts based on the Federated data (amazon-forecast-data) for P10, P50, and P90 forecasts as shown in the following chart.

Clean up

Delete the MongoDB Atlas cluster
Delete Atlas Data Federation Configuration
Delete Atlas Application Service App
Delete the S3 Bucket
Delete Amazon SageMaker Canvas dataset and models
Delete the Atlas Charts
Log out of Amazon SageMaker Canvas

Conclusion

In this post we extracted time series data from MongoDB time series collection. This is a special collection optimized for storage and querying speed of time series data. We used Amazon SageMaker Canvas to train models and generate predictions and we visualized the predictions in Atlas Charts.

For more information, refer to the following resources.

Try out MongoDB Atlas
Try out MongoDB Atlas Time Series
Try out Amazon SageMaker Canvas
Try out MongoDB Charts

About the authors

Igor Alekseev is a Senior Partner Solution Architect at AWS in Data and Analytics domain. In his role Igor is working with strategic partners helping them build complex, AWS-optimized architectures. Prior joining AWS, as a Data/Solution Architect he implemented many projects in Big Data domain, including several data lakes in Hadoop ecosystem. As a Data Engineer he was involved in applying AI/ML to fraud detection and office automation.

Babu Srinivasan is a Senior Partner Solutions Architect at MongoDB. In his current role, he is working with AWS to build the technical integrations and reference architectures for the AWS and MongoDB solutions. He has more than two decades of experience in Database and Cloud technologies . He is passionate about providing technical solutions to customers working with multiple Global System Integrators(GSIs) across multiple geographies.

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

December 15, 2023

by Adeleke Coker Amazon AWS

We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas, allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. Amazon SageMaker Canvas is a no-code ML workspace offering ready-to-use models, including foundation models, and the ability to prepare data and build and deploy custom models.

In this post, we discuss how to bring data stored in Amazon DocumentDB into SageMaker Canvas and use that data to build ML models for predictive analytics. Without creating and maintaining data pipelines, you will be able to power ML models with your unstructured data stored in Amazon DocumentDB.

Solution overview

Let’s assume the role of a business analyst for a food delivery company. Your mobile app stores information about restaurants in Amazon DocumentDB because of its scalability and flexible schema capabilities. You want to gather insights on this data and build an ML model to predict how new restaurants will be rated, but find it challenging to perform analytics on unstructured data. You encounter bottlenecks because you need to rely on data engineering and data science teams to accomplish these goals.

This new integration solves these problems by making it simple to bring Amazon DocumentDB data into SageMaker Canvas and immediately start preparing and analyzing data for ML. Additionally, SageMaker Canvas removes the dependency on ML expertise to build high-quality models and generate predictions.

We demonstrate how to use Amazon DocumentDB data to build ML models in SageMaker Canvas in the following steps:

Create an Amazon DocumentDB connector in SageMaker Canvas.
Analyze data using generative AI.
Prepare data for machine learning.
Build a model and generate predictions.

Prerequisites

To implement this solution, complete the following prerequisites:

Have AWS Cloud admin access with an AWS Identity and Access Management (IAM) user with permissions required to complete the integration.
Complete the environment setup using AWS CloudFormation through either of the following options:
1. Deploy a CloudFormation template into a new VPC – This option builds a new AWS environment that consists of the VPC, private subnets, security groups, IAM execution roles, Amazon Cloud9, required VPC endpoints, and SageMaker domain. It then deploys Amazon DocumentDB into this new VPC. Download the template or quick launch the CloudFormation stack by choosing Launch Stack:
2. Deploy a CloudFormation template into an existing VPC – This option creates the required VPC endpoints, IAM execution roles, and SageMaker domain in an existing VPC with private subnets. Download the template or quick launch the CloudFormation stack by choosing Launch Stack:

Note that if you’re creating a new SageMaker domain, you must configure the domain to be in a private VPC without internet access to be able to add the connector to Amazon DocumentDB. To learn more, refer to Configure Amazon SageMaker Canvas in a VPC without internet access.

Follow the tutorial to load sample restaurant data into Amazon DocumentDB.
Add access to Amazon Bedrock and the Anthropic Claude model within it. For more information, see Add model access.

Create an Amazon DocumentDB connector in SageMaker Canvas

After you create your SageMaker domain, complete the following steps:

On the Amazon DocumentDB console, choose No-code machine learning in the navigation pane.
Under Choose a domain and profile¸ choose your SageMaker domain and user profile.
Choose Launch Canvas to launch SageMaker Canvas in a new tab.

When SageMaker Canvas finishes loading, you will land on the Data flows tab.

Choose Create to create a new data flow.
Enter a name for your data flow and choose Create.
Add a new Amazon DocumentDB connection by choosing Import data, then choose Tabular for Dataset type.
On the Import data page, for Data Source, choose DocumentDB and Add Connection.
Enter a connection name such as demo and choose your desired Amazon DocumentDB cluster.

Note that SageMaker Canvas will prepopulate the drop-down menu with clusters in the same VPC as your SageMaker domain.

Enter a user name, password, and database name.
Finally, select your read preference.

To protect the performance of primary instances, SageMaker Canvas defaults to Secondary, meaning that it will only read from secondary instances. When read preference is Secondary preferred, SageMaker Canvas reads from available secondary instances, but will read from the primary instance if a secondary instance is not available. For more information on how to configure an Amazon DocumentDB connection, see the Connect to a database stored in AWS.

Choose Add connection.

If the connection is successful, you will see collections in your Amazon DocumentDB database shown as tables.

Drag your table of choice to the blank canvas. For this post, we add our restaurant data.

The first 100 rows are displayed as a preview.

To start analyzing and preparing your data, choose Import data.
Enter a dataset name and choose Import data.

Analyze data using generative AI

Next, we want to get some insights on our data and look for patterns. SageMaker Canvas provides a natural language interface to analyze and prepare data. When the Data tab loads, you can start chatting with your data with the following steps:

Choose Chat for data prep.
Gather insights about your data by asking questions like the samples shown in the following screenshots.

To learn more about how to use natural language to explore and prepare data, refer to Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas.

Let’s get a deeper sense of our data quality by using the SageMaker Canvas Data Quality and Insights Report, which automatically evaluates data quality and detects abnormalities.

On the Analyses tab, choose Data Quality and Insights Report.
Choose rating as the target column and Regression as the problem type, then choose Create.

This will simulate model training and provide insights on how we can improve our data for machine learning. The complete report is generated in a few minutes.

Our report shows that 2.47% of rows in our target have missing values—we’ll address that in the next step. Additionally, the analysis shows that the address line 2, name, and type_of_food features have the most prediction power in our data. This indicates basic restaurant information like location and cuisine may have an outsized impact on ratings.

Prepare data for machine learning

SageMaker Canvas offers over 300 built-in transformations to prepare your imported data. For more information on transformation features of SageMaker Canvas, refer to Prepare data with advanced transformations. Let’s add some transformations to get our data ready for training an ML model.

Navigate back to the Data flow page by choosing the name of your data flow at the top of the page.
Choose the plus sign next to Data types and choose Add transform.
Choose Add step.
Let’s rename the address line 2 column to cities.
1. Choose Manage columns.
2. Choose Rename column for Transform.
3. Choose address line 2 for Input column, enter cities for New name, and choose Add.
Additionally, lets drop some unnecessary columns.
1. Add a new transform.
2. For Transform, choose Drop column.
3. For Columns to drop, choose URL and restaurant_id.
4. Choose Add.
  [
Our rating feature column has some missing values, so let’s fill in those rows with the average value of this column.
1. Add a new transform.
2. For Transform, choose Impute.
3. For Column type, choose Numeric.
4. For Input columns, choose the rating column.
5. For Imputing strategy, choose Mean.
6. For Output column, enter rating_avg_filled.
7. Choose Add.
We can drop the rating column because we have a new column with filled values.
Because type_of_food is categorical in nature, we’ll want to numerically encode it. Let’s encode this feature using the one-hot encoding technique.
1. Add a new transform.
2. For Transform, choose One-hot encode.
3. For Input columns, choose type_of_food.
4. For Invalid handling strategy¸ choose Keep.
5. For Output style¸ choose Columns.
6. For Output column, enter encoded.
7. Choose Add.

Build a model and generate predictions

Now that we have transformed our data, let’s train a numeric ML model to predict the ratings for restaurants.

Choose Create model.
For Dataset name, enter a name for the dataset export.
Choose Export and wait for the transformed data to be exported.
Choose the Create model link at the bottom left corner of the page.

You can also select the dataset from the Data Wrangler feature on the left of the page.

Enter a model name.
Choose Predictive analysis, then choose Create.
Choose rating_avg_filled as the target column.

SageMaker Canvas automatically selects a suitable model type.

Choose Preview model to ensure there are no data quality issues.
Choose Quick build to build the model.

The model creation will take approximately 2–15 minutes to complete.

You can view the model status after the model finishes training. Our model has an RSME of 0.422, which means the model often predicts the rating of a restaurant within +/- 0.422 of the actual value, a solid approximation for the rating scale of 1–6.

Finally, you can generate sample predictions by navigating to the Predict tab.

Clean up

To avoid incurring future charges, delete the resources you created while following this post. SageMaker Canvas bills you for the duration of the session, and we recommend logging out of SageMaker Canvas when you’re not using it. Refer to Logging out of Amazon SageMaker Canvas for more details.

Conclusion

In this post, we discussed how you can use SageMaker Canvas for generative AI and ML with data stored in Amazon DocumentDB. In our example, we showed how an analyst can quickly build a high-quality ML model using a sample restaurant dataset.

We showed the steps to implement the solution, from importing data from Amazon DocumentDB to building an ML model in SageMaker Canvas. The entire process was completed through a visual interface without writing a single line of code.

To start your low-code/no-code ML journey, refer to Amazon SageMaker Canvas.

About the authors

Adeleke Coker is a Global Solutions Architect with AWS. He works with customers globally to provide guidance and technical assistance in deploying production workloads at scale on AWS. In his spare time, he enjoys learning, reading, gaming and watching sport events.

Gururaj S Bayari is a Senior DocumentDB Specialist Solutions Architect at AWS. He enjoys helping customers adopt Amazon’s purpose-built databases. He helps customers design, evaluate, and optimize their internet scale and high performance workloads powered by NoSQL and/or Relational databases.

Tim Pusateri is a Senior Product Manager at AWS where he works on Amazon SageMaker Canvas. His goal is to help customers quickly derive value from AI/ML. Outside of work, he loves to be outdoors, play guitar, see live music, and spend time with family and friends.

Pratik Das is a Product Manager at AWS. He enjoys working with customers looking to build resilient workloads and strong data foundations in the cloud. He brings expertise working with enterprises on modernization, analytical and data transformation initiatives.

Varma Gottumukkala is a Senior Database Specialist Solutions Architect at AWS based out of Dallas Fort Worth. Varma works with the customers on their database strategy and architect their workloads using AWS purpose built databases. Before joining AWS, he worked extensively with relational databases, NOSQL databases and multiple programming languages for the last 22 years.

Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools

December 14, 2023

by Pranav Murthy Amazon AWS

Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. It provides access to the most comprehensive set of tools for each step of ML development, from preparing data to building, training, deploying, and managing ML models. You can launch fully managed JuptyerLab with pre-configured SageMaker Distribution in seconds to work with your notebooks, code, and data. The flexible and extensible interface of SageMaker Studio allows you to effortlessly configure and arrange ML workflows, and you can use the AI-powered inline coding companion to quickly author, debug, explain, and test code.

In this post, we take a closer look at the updated SageMaker Studio and its JupyterLab IDE, designed to boost the productivity of ML developers. We introduce the concept of Spaces and explain how JupyterLab Spaces enable flexible customization of compute, storage, and runtime resources to improve your ML workflow efficiency. We also discuss our shift to a localized execution model in JupyterLab, resulting in a quicker, more stable, and responsive coding experience. Additionally, we cover the seamless integration of generative AI tools like Amazon CodeWhisperer and Jupyter AI within SageMaker Studio JupyterLab Spaces, illustrating how they empower developers to use AI for coding assistance and innovative problem-solving.

Introducing Spaces in SageMaker Studio

The new SageMaker Studio web-based interface acts as a command center for launching your preferred IDE and accessing your Amazon SageMaker tools to build, train, tune, and deploy models. In addition to JupyterLab and RStudio, SageMaker Studio now includes a fully managed Code Editor based on Code-OSS (Visual Studio Code Open Source). Both JupyterLab and Code Editor can be launched using a flexible workspace called Spaces.

A Space is a configuration representation of a SageMaker IDE, such as JupyterLab or Code Editor, designed to persist regardless of whether an application (IDE) associated with the Space is actively running or not. A Space represents a combination of a compute instance, storage, and other runtime configurations. With Spaces, you can create and scale the compute and storage for your IDE up and down as you go, customize runtime environments, and pause and resume coding anytime from anywhere. You can spin up multiple such Spaces, each configured with a different combination of compute, storage, and runtimes.

When a Space is created, it is equipped with an Amazon Elastic Block Store (Amazon EBS) volume, which is used to store users’ files, data, caches, and other artifacts. It’s attached to a ML compute instance whenever a Space is run. The EBS volume ensures that user files, data, cache, and session states are consistently restored whenever the Space is restarted. Importantly, this EBS volume remains persistent, whether the Space is in a running or stopped state. It will continue to persist until the Space is deleted.

Additionally, we have introduced the bring-your-own file system feature for users who wish to share environments and artifacts across different Spaces, users, or even domains. This enables you to optionally equip your Spaces with your own Amazon Elastic File System (Amazon EFS) mount, facilitating the sharing of resources across various workspaces.

Creating a Space

Creating and launching a new Space is now quick and straightforward. It takes just a few seconds to set up a new Space with fast launch instances and less than 60 seconds to run a Space. Spaces are equipped with predefined settings for compute and storage, managed by administrators. SageMaker Studio administrators can establish domain-level presets for compute, storage, and runtime configurations. This setup enables you to quickly launch a new space with minimal effort, requiring only a few clicks. You also have the option to modify a Space’s compute, storage, or runtime configurations for further customization.

It’s important to note that creating a Space requires updating the SageMaker domain execution role with a policy like the following example. You need to grant your users permissions for private spaces and user profiles necessary to access these private spaces. For detailed instructions, refer to Give your users access to private spaces.

{
  "Version": "2012-10-17",
  "Statement": [
    {

      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateApp",
        "sagemaker:DeleteApp"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:app/*",
      "Condition": {
        "Null": {
          "sagemaker:OwnerUserProfileArn": "true"
        }
      }
    },
    {
      "Sid": "SMStudioCreatePresignedDomainUrlForUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreatePresignedDomainUrl"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
    },
    {
      "Sid": "SMStudioAppPermissionsListAndDescribe",
      "Effect": "Allow",
      "Action": [
        "sagemaker:ListApps",
        "sagemaker:ListDomains",
        "sagemaker:ListUserProfiles",
        "sagemaker:ListSpaces",
        "sagemaker:DescribeApp",
        "sagemaker:DescribeDomain",
        "sagemaker:DescribeUserProfile",
        "sagemaker:DescribeSpace"
      ],
      "Resource": "*"
    },
    {
      "Sid": "SMStudioAppPermissionsTagOnCreate",
      "Effect": "Allow",
      "Action": [
        "sagemaker:AddTags"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:*/*",
      "Condition": {
        "Null": {
          "sagemaker:TaggingAction": "false"
        }
      }
    },
    {
      "Sid": "SMStudioRestrictSharedSpacesWithoutOwners",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateSpace",
        "sagemaker:UpdateSpace",
        "sagemaker:DeleteSpace"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:space/${sagemaker:DomainId}/*",
      "Condition": {
        "Null": {
          "sagemaker:OwnerUserProfileArn": "true"
        }
      }
    },
    {
      "Sid": "SMStudioRestrictSpacesToOwnerUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateSpace",
        "sagemaker:UpdateSpace",
        "sagemaker:DeleteSpace"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:space/${sagemaker:DomainId}/*",
      "Condition": {
        "ArnLike": {
          "sagemaker:OwnerUserProfileArn": "arn:aws:sagemaker:$AWS Region:$111122223333:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
        },
        "StringEquals": {
          "sagemaker:SpaceSharingType": [
            "Private",
            "Shared"
          ]
        }
      }
    },
    {
      "Sid": "SMStudioRestrictCreatePrivateSpaceAppsToOwnerUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateApp",
        "sagemaker:DeleteApp"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:app/${sagemaker:DomainId}/*",
      "Condition": {
        "ArnLike": {
          "sagemaker:OwnerUserProfileArn": "arn:aws:sagemaker:${aws:Region}:${aws:PrincipalAccount}:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
        },
        "StringEquals": {
          "sagemaker:SpaceSharingType": [
            "Private"
          ]
        }
      }
    },
  ]
}

To create a space, complete the following steps:

In SageMaker Studio, choose JupyterLab on the Applications menu.
Choose Create JupyterLab space.
For Name, enter a name for your Space.
Choose Create space.
Choose Run space to launch your new Space with default presets or update the configuration based on your requirements.

Reconfiguring a Space

Spaces are designed for users to seamlessly transition between different compute types as needed. You can begin by creating a new Space with a specific configuration, primarily consisting of compute and storage. If you need to switch to a different compute type with a higher or lower vCPU count, more or less memory, or a GPU-based instance at any point in your workflow, you can do so with ease. After you stop the Space, you can modify its settings using either the UI or API via the updated SageMaker Studio interface and then restart the Space. SageMaker Studio automatically handles the provisioning of your existing Space to the new configuration, requiring no extra effort on your part.

Complete the following steps to edit an existing space:

On the space details page, choose Stop space.
Reconfigure the compute, storage, or runtime.
Choose Run space to relaunch the space.

Your workspace will be updated with the new storage and compute instance type you requested.

The new SageMaker Studio JupyterLab architecture

The SageMaker Studio team continues to invent and simplify its developer experience with the release of a new fully managed SageMaker Studio JupyterLab experience. The new SageMaker Studio JupyterLab experience combines the best of both worlds: the scalability and flexibility of SageMaker Studio Classic (see the appendix at the end of this post) with the stability and familiarity of the open source JupyterLab. To grasp the design of this new JupyterLab experience, let’s delve into the following architecture diagram. This will help us better understand the integration and features of this new JupyterLab Spaces platform.

In summary, we have transitioned towards a localized architecture. In this new setup, Jupyter server and kernel processes operate alongside in a single Docker container, hosted on the same ML compute instance. These ML instances are provisioned when a Space is running, and linked with an EBS volume that is created when the Space was initially created.

This new architecture brings several benefits; we discuss some of these in the following sections.

Reduced latency and increased stability

SageMaker Studio has transitioned to a local run model, moving away from the previous split model where code was stored on an EFS mount and run remotely on an ML instance via remote Kernel Gateway. In the earlier setup, Kernel Gateway, a headless web server, enabled kernel operations over remote communication with Jupyter kernels through HTTPS/WSS. User actions like running code, managing notebooks, or running terminal commands were processed by a Kernel Gateway app on a remote ML instance, with Kernel Gateway facilitating these operations over ZeroMQ (ZMQ) within a Docker container. The following diagram illustrates this architecture.

The updated JupyterLab architecture runs all kernel operations directly on the local instance. This local Jupyter Server approach typically provides improved performance and straightforward architecture. It minimizes latency and network complexity, simplifies the architecture for easier debugging and maintenance, enhances resource utilization, and accommodates more flexible messaging patterns for a variety of complex workloads.

In essence, this upgrade brings running notebooks and code much closer to the kernels, significantly reducing latency and boosting stability.

Improved control over provisioned storage

SageMaker Studio Classic originally used Amazon EFS to provide persistent, shared file storage for user home directories within the SageMaker Studio environment. This setup enables you to centrally store notebooks, scripts, and other project files, accessible across all your SageMaker Studio sessions and instances.

With the latest update to SageMaker Studio, there is a shift from Amazon EFS-based storage to an Amazon EBS-based solution. The EBS volumes, provisioned with SageMaker Studio Spaces, are GP3 volumes designed to deliver a consistent baseline performance of 3,000 IOPS, independent of the volume size. This new Amazon EBS storage offers higher performance for I/O-intensive tasks such as model training, data processing, high-performance computing, and data visualization. This transition also gives SageMaker Studio administrators greater insight into and control over storage usage by user profiles within a domain or across SageMaker. You can now set default (DefaultEbsVolumeSizeInGb) and maximum (MaximumEbsVolumeSizeInGb) storage sizes for JupyterLab Spaces within each user profile.

In addition to improved performance, you have the ability to flexibly resize the storage volume attached to your Space’s ML compute instance by editing your Space setting either using the UI or API action from your SageMaker Studio interface, without requiring any administration action. However, note that you can only edit EBS volume sizes in one direction—after you increase the Space’s EBS volume size, you will not be able to lower it back down.

SageMaker Studio now offers elevated control of provisioned storage for administrators:

SageMaker Studio administrators can manage the EBS volume sizes for user profiles. These JupyterLab EBS volumes can vary from a minimum of 5 GB to a maximum of 16 TB. The following code snippet shows how to create or update a user profile with default and maximum space settings:

aws --region $REGION sagemaker create-user-profile 
--domain-id $DOMAIN_ID 
--user-profile-name $USER_PROFILE_NAME 
--user-settings '{
    "SpaceStorageSettings": {
        "DefaultEbsStorageSettings":{
            "DefaultEbsVolumeSizeInGb":5,
            "MaximumEbsVolumeSizeInGb":100
        }
    }
}'


# alternatively to update an existing user profile
aws --region $REGION sagemaker update-user-profile 
--domain-id $DOMAIN_ID 
--user-profile-name $USER_PROFILE_NAME 
--user-settings '{
    "SpaceStorageSettings": {
        "DefaultEbsStorageSettings":{
            "DefaultEbsVolumeSizeInGb":25,
            "MaximumEbsVolumeSizeInGb":100 
        }
    }
}'

SageMaker Studio now offers an enhanced auto-tagging feature for Amazon EBS resources, automatically labeling volumes created by users with domain, user, and Space information. This advancement simplifies cost allocation analysis for storage resources, aiding administrators in managing and attributing costs more effectively. It’s also important to note that these EBS volumes are hosted within the service account, so you won’t have direct visibility. Nonetheless, storage usage and associated costs are directly linked to the domain ARN, user profile ARN, and Space ARN, facilitating straightforward cost allocation.
Administrators can also control encryption of a Space’s EBS volumes, at rest, using customer managed keys (CMK).

Shared tenancy with bring-your-own EFS file system

ML workflows are typically collaborative, requiring efficient sharing of data and code among team members. The new SageMaker Studio enhances this collaborative aspect by enabling you to share data, code, and other artifacts via a shared bring-your-own EFS file system. This EFS drive can be set up independently of SageMaker or could be an existing Amazon EFS resource. After it’s provisioned, it can be seamlessly mounted onto SageMaker Studio user profiles. This feature is not restricted to user profiles within a single domain—it can extend across domains, as long as they are within the same Region.

The following example code shows you how to create a domain and attach an existing EFS volume to it using its associated fs-id. EFS volumes can be attached to a domain at the root or prefix level, as the following commands demonstrate:

# create a domain with and attach an existing EFS volume at root level
aws sagemaker create-domain --domain-name "myDomain" 
 --vpc-id {VPC_ID} --subnet-ids {SUNBET_IDS} --auth-mode IAM 
 --default-user-settings 
 "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678"}}]"
 
# create a domain with and attach an existing EFS volume at file system prefix leve
aws sagemaker create-domain --domain-name "myDomain" 
 --vpc-id {VPC_ID} --subnet-ids {SUNBET_IDS} --auth-mode IAM 
 --default-user-settings 
 "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678", FileSystemPath="/my/custom/path"}}]"

# update an existing domain with your own EFS
aws sagemaker update-domain --region us-west-2 --domain-id d-xxxxx 
    --default-user-settings 
    "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678"}}]"

When an EFS mount is made available in a domain and its related user profiles, you can choose to attach it to a new space. This can be done using either the SageMaker Studio UI or an API action, as shown in the following example. It’s important to note that when a space is created with an EFS file system that’s provisioned at the domain level, the space inherits its properties. This means that if the file system is provisioned at a root or prefix level within the domain, these settings will automatically apply to the space created by the domain users.

# attach an a preconfigured EFS to a space
aws sagemaker create-space 
--space-name byofs-space --domain-id "myDomain" 
--ownership-settings "OwnerUserProfileName={USER_PROFILE_NAME}" 
--space-sharing-settings "SharingType=Private" 
--space-settings 
"AppType=JupyterLab,CustomFileSystems=[{EFSFileSystem={FileSystemId="fs-12345678"}}]")

After mounting it to a Space, you can locate all your files located above the admin-provisioned mount point. These files can be found in the directory path /mnt/custom-file-system/efs/fs-12345678.

EFS mounts make is straightforward to share artifacts between a user’s Space or between multiple users or across domains, making it ideal for collaborative workloads. With this feature, you can do the following:

Share data – EFS mounts are ideal for storing large datasets crucial for data science experiments. Dataset owners can load these mounts with training, validation, and test datasets, making them accessible to user profiles within a domain or across multiple domains. SageMaker Studio admins can also integrate existing application EFS mounts while maintaining compliance with organizational security policies. This is done through flexible prefix-level mounting. For example, if production and test data are stored on the same EFS mount (such as fs-12345678:/data/prod and fs-12345678:/data/test), mounting /data/test onto the SageMaker domain’s user profiles grants users access only to the test dataset. This setup allows for analysis or model training while keeping production data secure and inaccessible.
Share Code – EFS mounts facilitate the quick sharing of code artifacts between user profiles. In scenarios where users need to rapidly share code samples or collaborate on a common code base without the complexities of frequent git push/pull commands, shared EFS mounts are highly beneficial. They offer a convenient way to share work-in-progress code artifacts within a team or across different teams in SageMaker Studio.
Share development environments – Shared EFS mounts can also serve as a means to quickly disseminate sandbox environments among users and teams. EFS mounts provide a solid alternative for sharing Python environments like conda or virtualenv across multiple workspaces. This approach circumvents the need for distributing requirements.txt or environment.yml files, which can often lead to the repetitive task of creating or recreating environments across different user profiles.

These features significantly enhance the collaborative capabilities within SageMaker Studio, making it effortless for teams to work together efficiently on complex ML projects. Additionally, Code Editor based on Code-OSS (Visual Studio Code Open Source) shares the same architectural principles as the aforementioned JupyterLab experience This alignment brings several advantages, such as reduced latency, enhanced stability, and improved administrative control, and enables user access to shared workspaces, similar to those offered in JupyterLab Spaces.

Generative AI-powered tools on JupyterLab Spaces

Generative AI, a rapidly evolving field in artificial intelligence, uses algorithms to create new content like text, images, and code from extensive existing data. This technology has revolutionized coding by automating routine tasks, generating complex code structures, and offering intelligent suggestions, thereby streamlining development and fostering creativity and problem-solving in programming. As an indispensable tool for developers, generative AI enhances productivity and drives innovation in the tech industry. SageMaker Studio enhances this developer experience with pre-installed tools like Amazon CodeWhisperer and Jupyter AI, using generative AI to accelerate the development lifecycle.

Amazon CodeWhisperer

Amazon CodeWhisperer is a programming assistant that enhances developer productivity through real-time code recommendations and solutions. As an AWS managed AI service, it’s seamlessly integrated into the SageMaker Studio JupyterLab IDE. This integration makes Amazon CodeWhisperer a fluid and valuable addition to a developer’s workflow.

Amazon CodeWhisperer excels in increasing developer efficiency by automating common coding tasks, suggesting more effective coding patterns, and decreasing debugging time. It serves as an essential tool for both beginner and seasoned coders, providing insights into best practices, accelerating the development process, and improving the overall quality of code. To start using Amazon CodeWhisperer, make sure that the Resume Auto-Suggestions feature is activated. You can manually invoke code suggestions using keyboard shortcuts.

Alternatively, write a comment describing your intended code function and begin coding; Amazon CodeWhisperer will start providing suggestions.

Note that although Amazon CodeWhisperer is pre-installed, you must have the codewhisperer:GenerateRecommendations permission as part of the execution role to receive code recommendations. For additional details, refer to Using CodeWhisperer with Amazon SageMaker Studio. When you use Amazon CodeWhisperer, AWS may, for service improvement purposes, store data about your usage and content. To opt out of the Amazon CodeWhisperer data sharing policy, you can navigate to the Setting option from the top menu then navigate to Settings Editor and disable Share usage data with Amazon CodeWhisperer from the Amazon CodeWhisperer settings menu.

Jupyter AI

Jupyter AI is an open source tool that brings generative AI to Jupyter notebooks, offering a robust and user-friendly platform for exploring generative AI models. It enhances productivity in JupyterLab and Jupyter Notebooks by providing features like the %%ai magic for creating a generative AI playground inside notebooks, a native chat UI in JupyterLab for interacting with AI as a conversational assistant, and support for a wide array of large language model (LLM) providers like AI21, Anthropic, Cohere, and Hugging Face or managed services like Amazon Bedrock and SageMaker endpoints. This integration offers more efficient and innovative methods for data analysis, ML, and coding tasks. For example, you can interact with a domain-aware LLM using the Jupyternaut chat interface for help with processes and workflows or generate example code through CodeLlama, hosted on SageMaker endpoints. This makes it a valuable tool for developers and data scientists.

Jupyter AI provides an extensive selection of language models ready for use right out of the box. Additionally, custom models are also supported via SageMaker endpoints, offering flexibility and a broad range of options for users. It also offers support for embedding models, enabling you to perform inline comparisons and tests and even build or test ad hoc Retrieval Augmented Generation (RAG) apps.

Jupyter AI can act as your chat assistant, helping you with code samples, providing you with answers to questions, and much more.

You can use Jupyter AI’s %%ai magic to generate sample code inside your notebook, as shown in the following screenshot.

JupyterLab 4.0

The JupyterLab team has released version 4.0, featuring significant improvements in performance, functionality, and user experience. Detailed information about this release is available in the official JupyterLab Documentation.

This version, now standard in SageMaker Studio JupyterLab, introduces optimized performance for handling large notebooks and faster operations, thanks to improvements like CSS rule optimization and the adoption of CodeMirror 6 and MathJax 3. Key enhancements include an upgraded text editor with better accessibility and customization, a new extension manager for easy installation of Python extensions, and improved document search capabilities with advanced features. Additionally, version 4.0 brings UI improvements, accessibility enhancements, and updates to development tools, and certain features have been backported to JupyterLab 3.6.

Conclusion

The advancements in SageMaker Studio, particularly with the new JupyterLab experience, mark a significant leap forward in ML development. The updated SageMaker Studio UI, with its integration of JupyterLab, Code Editor, and RStudio, offers an unparalleled, streamlined environment for ML developers. The introduction of JupyterLab Spaces provides flexibility and ease in customizing compute and storage resources, enhancing the overall efficiency of ML workflows. The shift from a remote kernel architecture to a localized model in JupyterLab greatly increases stability while decreasing startup latency. This results in a quicker, more stable, and responsive coding experience. Moreover, the integration of generative AI tools like Amazon CodeWhisperer and Jupyter AI in JupyterLab further empowers developers, enabling you to use AI for coding assistance and innovative problem-solving. The enhanced control over provisioned storage and the ability to share code and data effortlessly through self-managed EFS mounts greatly facilitate collaborative projects. Lastly, the release of JupyterLab 4.0 within SageMaker Studio underscores these improvements, offering optimized performance, better accessibility, and a more user-friendly interface, thereby solidifying JupyterLab’s role as a cornerstone of efficient and effective ML development in the modern tech landscape.

Give SageMaker Studio JupyterLab Spaces a try using our quick onboard feature, which allows you to spin up a new domain for single users within minutes. Share your thoughts in the comments section!

Appendix: SageMaker Studio Classic’s kernel gateway architecture

A SageMaker Classic domain is a logical aggregation of an EFS volume, a list of users authorized to access the domain, and configurations related to security, application, networking, and more. In the SageMaker Studio Classic architecture of SageMaker, each user within the SageMaker domain has a distinct user profile. This profile encompasses specific details like the user’s role and their Posix user ID in the EFS volume, among other unique data. Users access their individual user profile through a dedicated Jupyter Server app, connected via HTTPS/WSS in their web browser. SageMaker Studio Classic uses a remote kernel architecture using a combination of Jupyter Server and Kernel Gateway app types, enabling notebook servers to interact with kernels on remote hosts. This means that the Jupyter kernels operate not on the notebook server’s host, but within Docker containers on separate hosts. In essence, your notebook is stored in the EFS home directory, and runs code remotely on a different Amazon Elastic Compute Cloud (Amazon EC2) instance, which houses a pre-built Docker container equipped with ML libraries such as PyTorch, TensorFlow, Scikit-Learn, and more.

The remote kernel architecture in SageMaker Studio offers notable benefits in terms of scalability and flexibility. However, it has its limitations, including a maximum of four apps per instance type and potential bottlenecks due to numerous HTTPS/WSS connections to a common EC2 instance type. These limitations could negatively affect the user experience.

The following architecture diagram depicts the SageMaker Studio Classic architecture. It illustrates the user’s process of connecting to a Kernel Gateway app via a Jupyter Server app, using their preferred web browser.

About the authors

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Kunal Jha is a Senior Product Manager at AWS. He is focused on building Amazon SageMaker Studio as the best-in-class choice for end-to-end ML development. In his spare time, Kunal enjoys skiing and exploring the Pacific Northwest. You can find him on LinkedIn.

Majisha Namath Parambath is a Senior Software Engineer at Amazon SageMaker. She has been at Amazon for over 8 years and is currently working on improving the Amazon SageMaker Studio end-to-end experience.

Bharat Nandamuri is a Senior Software Engineer working on Amazon SageMaker Studio. He is passionate about building high scale backend services with focus on Engineering for ML systems. Outside of work, he enjoys playing chess, hiking and watching movies.

Derek Lause is a Software Engineer at AWS. He is committed to deliver value to customers through Amazon SageMaker Studio and Notebook Instances. In his spare time, Derek enjoys spending time with family and friends and hiking. You can find Derek on LinkedIn.

How AWS Prototyping enabled ICL-Group to build computer vision models on Amazon SageMaker

December 14, 2023

by Dr. Markus Bestehorn Amazon AWS

This is a customer post jointly authored by ICL and AWS employees.

ICL is a multi-national manufacturing and mining corporation based in Israel that manufactures products based on unique minerals and fulfills humanity’s essential needs, primarily in three markets: agriculture, food, and engineered materials. Their mining sites use industrial equipment that has to be monitored because machinery failures can result in loss of revenue or even environmental damages. Due to the extremely harsh conditions (low and high temperatures, vibrations, salt water, dust), attaching sensors to these mining machines for remote monitoring is difficult. Therefore, most machines are manually or visually monitored continuously by on-site workers. These workers frequently check camera pictures to monitor the state of a machine. Although this approach has worked in the past, it doesn’t scale and incurs relatively high costs.

To overcome this business challenge, ICL decided to develop in-house capabilities to use machine learning (ML) for computer vision (CV) to automatically monitor their mining machines. As a traditional mining company, the availability of internal resources with data science, CV, or ML skills was limited.

In this post, we discuss the following:

How ICL developed the in-house capabilities to build and maintain CV solutions that allow automatic monitoring of mining equipment to improve efficiency and reduce waste
A deep dive into a solution for mining screeners that was developed with the support of the AWS Prototyping program

Using the approach described in this post, ICL was able to develop a framework on AWS using Amazon SageMaker to build other use cases based on extracted vision from about 30 cameras, with the potential of scaling to thousands of such cameras on their production sites.

Building in-house capabilities through AWS Prototyping

Building and maintaining ML solutions for business-critical workloads requires sufficiently skilled staff. Outsourcing such activities is often not possible because internal know-how about business process needs to be combined with technical solution building. Therefore, ICL approached AWS for support in their journey to build a CV solution to monitor their mining equipment and acquire the necessary skills.

AWS Prototyping is an investment program where AWS embeds specialists into customer development teams to build mission-critical use cases. During such an engagement, the customer development team is enabled on the underlying AWS technologies while building the use case over the course of 3–6 weeks and getting hands-on help. Besides a corresponding use case, all the customer needs are 3–7 developers that can spend more than 80% of their working time building the aforementioned use case. During this time, the AWS specialists are fully assigned to the customer’s team and collaborate with them remotely or on-site.

ICL’s computer vision use case

For the prototyping engagement, ICL selected the use case for monitoring their mining screeners. A screener is a large industrial mining machine where minerals dissolved in water are processed. The water flows in several lanes from the top of the machine to the bottom. The influx is monitored for each of the lanes individually. When the influx runs out of the lane, it’s called overflow, which indicates that the machine is overloaded. Overflowing influx are minerals that are not processed by the screener and are lost. This needs to be avoided by regulating the influx. Without an ML solution, the overflow needs to be monitored by humans and it potentially takes time until the overflow is observed and handled.

The following images show the input and outputs of the CV models. The raw camera picture (left) is processed using a semantic segmentation model (middle) to detect the different lanes. Then the model (right) estimates the coverage (white) and overflow (red).

Although the prototyping engagement focused on a single type of machine, the general approach to use cameras and automatically process their images while using CV is applicable to a wider range of mining equipment. This allows ICL to extrapolate the know-how gained during the prototyping engagement to other locations, camera types, and machines, and also maintain the ML models without requiring support from any third party.

During the engagement, the AWS specialists and the ICL development team would meet every day and codevelop the solution step by step. ICL data scientists would either work independently on their assigned tasks or receive hands-on, pair-programming support from AWS ML specialists. This approach ensures that ICL data scientists not only gained experience to systematically develop ML models using SageMaker, but also to embed these models into applications as well as automate the whole lifecycle of such models, including automated retraining or model monitoring. After 4 weeks of this collaboration, ICL was able to move this model into production without requiring further support within 8 weeks, and has built models for other use cases since then. The technical approach of this engagement is described in the next section.

Monitoring mining screeners using CV models with SageMaker

SageMaker is a fully managed platform that addresses the complete lifecycle of an ML model: it provides services and features that support teams working on ML models from labeling their data in Amazon SageMaker Ground Truth to training and optimizing the model, as well as hosting ML models for production use. Prior to the engagement, ICL had installed the cameras and obtained pictures as shown in the previous images (left-most image) and stored them in an Amazon Simple Storage Service (Amazon S3) bucket. Before models can be trained, it’s necessary to generate training data. The joint ICL-AWS team addressed this in three steps:

Label the data using a semantic segmentation labeling job in SageMaker Ground Truth, as shown in the following image.
Preprocess the labeled images using image augmentation techniques to increase the number of data samples.
Split the labeled images into training, test, and validation sets, so that the performance and accuracy of the model can be measured adequately during the training process.

To achieve production scale for ML workloads, automating these steps is crucial to maintain the quality of the training input. Therefore, whenever new images are labeled using SageMaker Ground Truth, the preprocessing and splitting steps are run automatically and the resulting datasets are stored in Amazon S3, as shown model training workflow in the following diagram. Similarly, the model deployment workflow uses assets from SageMaker to update endpoints automatically whenever an updated model is available.

ICL is using several approaches to implement ML models into production. Some involve their current AI platform called KNIME, which allows them to quickly deploy models developed in the development environment into production by industrializing them into products. Several combinations of using KNIME and AWS services were analyzed; the preceding architecture was the most suitable to ICL’ s environment.

The SageMaker semantic segmentation built-in algorithm is used to train models for screener grid area segmentation. By choosing this built-in algorithm over a self-built container, ICL doesn’t have to deal with the undifferentiated heavy lifting of maintaining a Convolutional Neural Network (CNN) while being able to use such a CNN for their use case. After experimenting with different configurations and parameters, ICL used a Fully Convolutional Network (FCN) algorithm with a pyramid scene parsing network (PSPNet) to train the model. This allowed ICL to finalize the model building within 1 week of the prototyping engagement.

After a model has been trained, it has to be deployed to be usable for the screener monitoring. In line with the model training, this process is fully automated and orchestrated using AWS Step Functions and AWS Lambda. After the model is successfully deployed on the SageMaker endpoint, incoming pictures from the cameras are resized to fit the model’s input format and then fed into the endpoint for predictions using Lambda functions. The result of the semantic segmentation prediction as well as the overflow detection are then stored in Amazon DynamoDB and Amazon S3 for downstream analysis. If overflow is detected, Amazon Simple Notification Service (Amazon SNS) or Lambda functions can be used to automatically mitigate the overflow and control the corresponding lanes on the affected screener. The following diagram illustrates this architecture.

Conclusion

This post described how ICL, an Israeli mining company, developed their own computer vision approach for automated monitoring of mining equipment using cameras. We first showed how to address such a challenge from an organizational point of view that is focused on enablement, then we provided a detailed look into how the model was built using AWS. Although the challenge of monitoring may be unique to ICL, the general approach to build a prototype alongside AWS specialists can be applied to similar challenges, particularly for organizations that don’t have the necessary AWS knowledge.

If you want to learn how to build a production-scale prototype of your use case, reach out to your AWS account team to discuss a prototyping engagement.

About the Authors

Markus Bestehorn leads the customer engineering and prototyping teams in Germany, Austria, Switzerland, and Israel for AWS. He has a PhD degree in computer science and is specialized in building complex machine learning and IoT solutions.

David Abekasis leads the data science team at ICL Group with a passion to educate others on data analysis and machine learning while helping solve business challenges. He has an MSc in Data Science and an MBA. He was fortunate to research spatial and time series data in the precision agriculture domain.

Ion Kleopas is a Sr. Machine Learning Prototyping Architect with an MSc in Data Science and Big Data. He helps AWS customers build innovative AI/ML solutions by enabling their technical teams on AWS technologies through the co-development of prototypes for challenging machine learning use cases, paving their path to production.

Miron Perel is a Principal Machine Learning Business Development Manager with Amazon Web Services. Miron advises Generative AI companies building their next generation models.

Automate PDF pre-labeling for Amazon Comprehend

December 14, 2023

by Oskar Schnaack Amazon AWS

Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business.

To train a custom model, you first prepare training data by manually annotating entities in documents. This can be done with the Comprehend Semi-Structured Documents Annotation Tool, which creates an Amazon SageMaker Ground Truth job with a custom template, allowing annotators to draw bounding boxes around the entities directly on the PDF documents. However, for companies with existing tabular entity data in ERP systems like SAP, manual annotation can be repetitive and time-consuming.

To reduce the effort of preparing training data, we built a pre-labeling tool using AWS Step Functions that automatically pre-annotates documents by using existing tabular entity data. This significantly decreases the manual work needed to train accurate custom entity recognition models in Amazon Comprehend.

In this post, we walk you through the steps of setting up the pre-labeling tool and show examples of how it automatically annotates documents from a public dataset of sample bank statements in PDF format. The full code is available on the GitHub repo.

Solution overview

In this section, we discuss the inputs and outputs of the pre-labeling tool and provide an overview of the solution architecture.

Inputs and outputs

As input, the pre-labeling tool takes PDF documents that contain text to be annotated. For the demo, we use simulated bank statements like the following example.

The tool also takes a manifest file that maps PDF documents with the entities that we want to extract from these documents. Entities consists of two things: the expected_text to extract from the document (for example, AnyCompany Bank) and the corresponding entity_type (for example, bank_name). Later in this post, we show how to construct this manifest file from a CSV document like the following example.

The pre-labeling tool uses the manifest file to automatically annotate the documents with their corresponding entities. We can then use these annotations directly to train an Amazon Comprehend model.

Alternatively, you can create a SageMaker Ground Truth labeling job for human review and editing, as shown in the following screenshot.

When the review is complete, you can use the annotated data to train an Amazon Comprehend custom entity recognizer model.

Architecture

The pre-labeling tool consists of multiple AWS Lambda functions orchestrated by a Step Functions state machine. It has two versions that use different techniques to generate pre-annotations.

The first technique is fuzzy matching. This requires a pre-manifest file with expected entities. The tool uses the fuzzy matching algorithm to generate pre-annotations by comparing text similarity.

Fuzzy matching looks for strings in the document that are similar (but not necessarily identical) to the expected entities listed in the pre-manifest file. It first calculates text similarity scores between the expected text and words in the document, then it matches all pairs above a threshold. Therefore, even if there are no exact matches, fuzzy matching can find variants like abbreviations and misspellings. This allows the tool to pre-label documents without requiring the entities to appear verbatim. For example, if 'AnyCompany Bank' is listed as an expected entity, Fuzzy Matching will annotate occurrences of 'Any Companys Bank'. This provides more flexibility than strict string matching and enables the pre-labeling tool to automatically label more entities.

The following diagram illustrates the architecture of this Step Functions state machine.

The second technique requires a pre-trained Amazon Comprehend entity recognizer model. The tool generates pre-annotations using the Amazon Comprehend model, following the workflow shown in the following diagram.

The following diagram illustrates the full architecture.

In the following sections, we walk through the steps to implement the solution.

Deploy the pre-labeling tool

Clone the repository to your local machine:

git clone https://github.com/aws-samples/amazon-comprehend-automated-pdf-prelabeling-tool.git

This repository has been built on top of the Comprehend Semi-Structured Documents Annotation Tool and extends its functionalities by enabling you to start a SageMaker Ground Truth labeling job with pre-annotations already displayed on the SageMaker Ground Truth UI.

The pre-labeling tool includes both the Comprehend Semi-Structured Documents Annotation Tool resources as well as some resources specific to the pre-labeling tool. You can deploy the solution with AWS Serverless Application Model (AWS SAM), an open source framework that you can use to define serverless application infrastructure code.

If you have previously deployed the Comprehend Semi-Structured Documents Annotation Tool, refer to the FAQ section in Pre_labeling_tool/README.md for instructions on how to deploy only the resources specific to the pre-labeling tool.

If you haven’t deployed the tool before and are starting fresh, do the following to deploy the whole solution.

Change the current directory to the annotation tool folder:

cd amazon-comprehend-semi-structured-documents-annotation-tools

Build and deploy the solution:

make ready-and-deploy-guided

Create the pre-manifest file

Before you can use the pre-labeling tool, you need to prepare your data. The main inputs are PDF documents and a pre-manifest file. The pre-manifest file contains the location of each PDF document under 'pdf' and the location of a JSON file with expected entities to label under 'expected_entities'.

The notebook generate_premanifest_file.ipynb shows how to create this file. In the demo, the pre-manifest file shows the following code:

[
  {
    'pdf': 's3://<bucket>/data_aws_idp_workshop_data/bank_stmt_0.pdf',
    'expected_entities': 's3://<bucket>/prelabeling-inputs/expected-entities/example-demo/fuzzymatching_version/file_bank_stmt_0.json'
  },
  ...
]

Each JSON file listed in the pre-manifest file (under expected_entities) contains a list of dictionaries, one for each expected entity. The dictionaries have the following keys:

‘expected_texts’ – A list of possible text strings matching the entity.
‘entity_type’ – The corresponding entity type.
‘ignore_list’ (optional) – The list of words that should be ignored in the match. These parameters should be used to prevent fuzzy matching from matching specific combinations of words that you know are wrong. This can be useful if you want to ignore some numbers or email addresses when looking at names.

For example, the expected_entities of the PDF shown previously looks like the following:

[
  {
    'expected_texts': ['AnyCompany Bank'],
    'entity_type': 'bank_name',
    'ignore_list': []
  },
  {
    'expected_texts': ['JANE DOE'],
    'entity_type': 'customer_name',
    'ignore_list': ['JANE.DOE@example_mail.com']
  },
  {
    'expected_texts': ['003884257406'],
    'entity_type': 'checking_number',
    'ignore_list': []
  },
 ...
]

Run the pre-labeling tool

With the pre-manifest file that you created in the previous step, start running the pre-labeling tool. For more details, refer to the notebook start_step_functions.ipynb.

To start the pre-labeling tool, provide an event with the following keys:

Premanifest – Maps each PDF document to its expected_entities file. This should contain the Amazon Simple Storage Service (Amazon S3) bucket (under bucket) and the key (under key) of the file.
Prefix – Used to create the execution_id, which names the S3 folder for output storage and the SageMaker Ground Truth labeling job name.
entity_types – Displayed in the UI for annotators to label. These should include all entity types in the expected entities files.
work_team_name (optional) – Used for creating the SageMaker Ground Truth labeling job. It corresponds to the private workforce to use. If it’s not provided, only a manifest file will be created instead of a SageMaker Ground Truth labeling job. You can use the manifest file to create a SageMaker Ground Truth labeling job later on. Note that as of this writing, you can’t provide an external workforce when creating the labeling job from the notebook. However, you can clone the created job and assign it to an external workforce on the SageMaker Ground Truth console.
comprehend_parameters (optional) – Parameters to directly train an Amazon Comprehend custom entity recognizer model. If omitted, this step will be skipped.

To start the state machine, run the following Python code:

import boto3
stepfunctions_client = boto3.client('stepfunctions')

response = stepfunctions_client.start_execution(
stateMachineArn=fuzzymatching_prelabeling_step_functions_arn,
input=json.dumps(<event-dict>)
)

This will start a run of the state machine. You can monitor the progress of the state machine on the Step Functions console. The following diagram illustrates the state machine workflow.

When the state machine is complete, do the following:

Inspect the following outputs saved in the prelabeling/ folder of the comprehend-semi-structured-docs S3 bucket:
- Individual annotation files for each page of the documents (one per page per document) in temp_individual_manifests/
- A manifest for the SageMaker Ground Truth labeling job in consolidated_manifest/consolidated_manifest.manifest
- A manifest that can be used to train a custom Amazon Comprehend model in consolidated_manifest/consolidated_manifest_comprehend.manifest
On the SageMaker console, open the SageMaker Ground Truth labeling job that was created to review the annotations
Inspect and test the custom Amazon Comprehend model that was trained

As mentioned previously, the tool can only create SageMaker Ground Truth labeling jobs for private workforces. To outsource the human labeling effort, you can clone the labeling job on the SageMaker Ground Truth console and attach any workforce to the new job.

Clean up

To avoid incurring additional charges, delete the resources that you created and delete the stack that you deployed with the following command:

make delete

Conclusion

The pre-labeling tool provides a powerful way for companies to use existing tabular data to accelerate the process of training custom entity recognition models in Amazon Comprehend. By automatically pre-annotating PDF documents, it significantly reduces the manual effort required in the labeling process.

The tool has two versions: fuzzy matching and Amazon Comprehend-based, giving flexibility on how to generate the initial annotations. After documents are pre-labeled, you can quickly review them in a SageMaker Ground Truth labeling job or even skip the review and directly train an Amazon Comprehend custom model.

The pre-labeling tool enables you to quickly unlock the value of your historical entity data and use it in creating custom models tailored to your specific domain. By speeding up what is typically the most labor-intensive part of the process, it makes custom entity recognition with Amazon Comprehend more accessible than ever.

For more information about how to label PDF documents using a SageMaker Ground Truth labeling job, see Custom document annotation for extracting named entities in documents using Amazon Comprehend and Use Amazon SageMaker Ground Truth to Label Data.

About the authors

Oskar Schnaack is an Applied Scientist at the Generative AI Innovation Center. He is passionate about diving into the science behind machine learning to make it accessible for customers. Outside of work, Oskar enjoys cycling and keeping up with trends in information theory.

Romain Besombes is a Deep Learning Architect at the Generative AI Innovation Center. He is passionate about building innovative architectures to address customers’ business problems with machine learning.

Improve your Stable Diffusion prompts with Retrieval Augmented Generation

December 14, 2023

by James Yi Amazon AWS

Text-to-image generation is a rapidly growing field of artificial intelligence with applications in a variety of areas, such as media and entertainment, gaming, ecommerce product visualization, advertising and marketing, architectural design and visualization, artistic creations, and medical imaging.

Stable Diffusion is a text-to-image model that empowers you to create high-quality images within seconds. In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart, a machine learning (ML) hub offering models, algorithms, and solutions. The evolution continued in April 2023 with the introduction of Amazon Bedrock, a fully managed service offering access to cutting-edge foundation models, including Stable Diffusion, through a convenient API.

As an ever-increasing number of customers embark on their text-to-image endeavors, a common hurdle arises—how to craft prompts that wield the power to yield high-quality, purpose-driven images. This challenge often demands considerable time and resources as users embark on an iterative journey of experimentation to discover the prompts that align with their visions.

Retrieval Augmented Generation (RAG) is a process in which a language model retrieves contextual documents from an external data source and uses this information to generate more accurate and informative text. This technique is particularly useful for knowledge-intensive natural language processing (NLP) tasks. We now extend its transformative touch to the world of text-to-image generation. In this post, we demonstrate how to harness the power of RAG to enhance the prompts sent to your Stable Diffusion models. You can create your own AI assistant for prompt generation in minutes with large language models (LLMs) on Amazon Bedrock, as well as on SageMaker JumpStart.

Approaches to crafting text-to-image prompts

Creating a prompt for a text-to-image model may seem straightforward at first glance, but it’s a deceptively complex task. It’s more than just typing a few words and expecting the model to conjure an image that aligns with your mental image. Effective prompts should provide clear instructions while leaving room for creativity. They must balance specificity and ambiguity, and they should be tailored to the particular model being used. To address the challenge of prompt engineering, the industry has explored various approaches:

Prompt libraries – Some companies curate libraries of pre-written prompts that you can access and customize. These libraries contain a wide range of prompts tailored to various use cases, allowing you to choose or adapt prompts that align with your specific needs.
Prompt templates and guidelines – Many companies and organizations provide users with a set of predefined prompt templates and guidelines. These templates offer structured formats for writing prompts, making it straightforward to craft effective instructions.
Community and user contributions – Crowdsourced platforms and user communities often play a significant role in improving prompts. Users can share their fine-tuned models, successful prompts, tips, and best practices with the community, helping others learn and refine their prompt-writing skills.
Model fine-tuning – Companies may fine-tune their text-to-image models to better understand and respond to specific types of prompts. Fine-tuning can improve model performance for particular domains or use cases.

These industry approaches collectively aim to make the process of crafting effective text-to-image prompts more accessible, user-friendly, and efficient, ultimately enhancing the usability and versatility of text-to-image generation models for a wide range of applications.

Using RAG for prompt design

In this section, we delve into how RAG techniques can serve as a game changer in prompt engineering, working in harmony with these existing approaches. By seamlessly integrating RAG into the process, we can streamline and enhance the efficiency of prompt design.

Semantic search in a prompt database

Imagine a company that has accumulated a vast repository of prompts in its prompt library or has created a large number of prompt templates, each designed for specific use cases and objectives. Traditionally, users seeking inspiration for their text-to-image prompts would manually browse through these libraries, often sifting through extensive lists of options. This process can be time-consuming and inefficient. By embedding prompts from the prompt library using text embedding models, companies can build a semantic search engine. Here’s how it works:

Embedding prompts – The company uses text embeddings to convert each prompt in its library into a numerical representation. These embeddings capture the semantic meaning and context of the prompts.
User query – When users provide their own prompts or describe their desired image, the system can analyze and embed their input as well.
Semantic search – Using the embeddings, the system performs a semantic search. It retrieves the most relevant prompts from the library based on the user’s query, considering both the user’s input and historical data in the prompt library.

By implementing semantic search in their prompt libraries, companies empower their employees to access a vast reservoir of prompts effortlessly. This approach not only accelerates prompt creation but also encourages creativity and consistency in text-to-image generation.y

Prompt generation from semantic search

Although semantic search streamlines the process of finding relevant prompts, RAG takes it a step further by using these search results to generate optimized prompts. Here’s how it works:

Semantic search results – After retrieving the most relevant prompts from the library, the system presents these prompts to the user, alongside the user’s original input.
Text generation model – The user can select a prompt from the search results or provide further context on their preferences. The system feeds both the selected prompt and the user’s input into an LLM.
Optimized prompt – The LLM, with its understanding of language nuances, crafts an optimized prompt that combines elements from the selected prompt and the user’s input. This new prompt is tailored to the user’s requirements and is designed to yield the desired image output.

The combination of semantic search and prompt generation not only simplifies the process of finding prompts but also ensures that the prompts generated are highly relevant and effective. It empowers you to fine-tune and customize your prompts, ultimately leading to improved text-to-image generation results. The following are examples of images generated from Stable Diffusion XL using the prompts from semantic search and prompt generation.

Original Prompt

Prompts from Semantic Search

Optimized Prompt by LLM

a cartoon of a little dog

cute cartoon of a dog having a sandwich at the dinner table
a cartoon illustration of a punk dog, anime style, white background
a cartoon of a boy and his dog walking down a forest lane

A cartoon scene of a boy happily walking hand in hand down a forest lane with his cute pet dog, in animation style.

RAG-based prompt design applications across diverse industries

Before we explore the application of our suggested RAG architecture, let’s start with an industry in which an image generation model is most applicable. In AdTech, speed and creativity are critical. RAG-based prompt generation can add instant value by generating prompt suggestions to create many images quickly for an advertisement campaign. Human decision-makers can go through the auto-generated images to select the candidate image for the campaign. This feature can be a standalone application or embedded into popular software tools and platforms currently available.

Another industry where the Stable Diffusion model can enhance productivity is media and entertainment. The RAG architecture can assist in use cases of avatar creation, for example. Starting from a simple prompt, RAG can add much more color and characteristics to the avatar ideas. It can generate many candidate prompts and provide more creative ideas. From these generated images, you can find the perfect fit for the given application. It increases the productivity by automatically generating many prompt suggestions. The variation it can come up with is the immediate benefit of the solution.

Solution overview

Empowering customers to construct their own RAG-based AI assistant for prompt design on AWS is a testament to the versatility of modern technology. AWS provides a plethora of options and services to facilitate this endeavor. The following reference architecture diagram illustrates a RAG application for prompt design on AWS.

When it comes to selecting the right LLMs for your AI assistant, AWS offers a spectrum of choices to cater to your specific requirements.

Firstly, you can opt for LLMs available through SageMaker JumpStart, utilizing dedicated instances. These instances support a variety of models, including Falcon, Llama 2, Bloom Z, and Flan-T5, or you can explore proprietary models such as Cohere’s Command and Multilingual Embedding, or Jurassic-2 from AI21 Labs.

If you prefer a more simplified approach, AWS offers LLMs on Amazon Bedrock, featuring models like Amazon Titan and Anthropic Claude. These models are easily accessible through straightforward API calls, allowing you to harness their power effortlessly. The flexibility and diversity of options ensure that you have the freedom to choose the LLM that best aligns with your prompt design goals, whether you’re seeking an innovation with open containers or the robust capabilities of proprietary models.

When it comes to building the essential vector database, AWS provides a multitude of options through their native services. You can opt for Amazon OpenSearch Service, Amazon Aurora, or Amazon Relational Database Service (Amazon RDS) for PostgreSQL, each offering robust features to suit your specific needs. Alternatively, you can explore products from AWS partners like Pinecone, Weaviate, Elastic, Milvus, or Chroma, which provide specialized solutions for efficient vector storage and retrieval.

To help you get started to construct a RAG-based AI assistant for prompt design, we’ve put together a comprehensive demonstration in our GitHub repository. This demonstration uses the following resources:

Image generation: Stable Diffusion XL on Amazon Bedrock
Text embedding: Amazon Titan on Amazon Bedrock
Text generation: Claude 2 on Amazon Bedrock
Vector database: FAISS, an open source library for efficient similarity search
Prompt library: Prompt examples from DiffusionDB, the first large-scale prompt gallery dataset for text-to-image generative models

Additionally, we’ve incorporated LangChain for LLM implementation and Streamit for the web application component, providing a seamless and user-friendly experience.

Prerequisites

You need to have the following to run this demo application:

An AWS account
Basic understanding of how to navigate Amazon SageMaker Studio
Basic understanding of how to download a repo from GitHub
Basic knowledge of running a command on a terminal

Run the demo application

You can download all the necessary code with instructions from the GitHub repo. After the application is deployed, you will see a page like the following screenshot.

With this demonstration, we aim to make the implementation process accessible and comprehensible, providing you with a hands-on experience to kickstart your journey into the world of RAG and prompt design on AWS.

Clean up

After you try out the app, clean up your resources by stopping the application.

Conclusion

RAG has emerged as a game-changing paradigm in the world of prompt design, revitalizing Stable Diffusion’s text-to-image capabilities. By harmonizing RAG techniques with existing approaches and using the robust resources of AWS, we’ve uncovered a pathway to streamlined creativity and accelerated learning.

For additional resources, visit the following:

About the authors

James Yi is a Senior AI/ML Partner Solutions Architect in the Emerging Technologies team at Amazon Web Services. He is passionate about working with enterprise customers and partners to design, deploy and scale AI/ML applications to derive their business values. Outside of work, he enjoys playing soccer, traveling and spending time with his family.

Rumi Olsen is a Solutions Architect in the AWS Partner Program. She specializes in serverless and machine learning solutions in her current role, and has a background in natural language processing technologies. She spends most of her spare time with her daughter exploring the nature of Pacific Northwest.

Streamlining ETL data processing at Talent.com with Amazon SageMaker

December 14, 2023

by Dmitriy Bespalov Amazon AWS

This post is co-authored by Anatoly Khomenko, Machine Learning Engineer, and Abdenour Bezzouh, Chief Technology Officer at Talent.com.

Established in 2011, Talent.com aggregates paid job listings from their clients and public job listings, and has created a unified, easily searchable platform. Covering over 30 million job listings across more than 75 countries and spanning various languages, industries, and distribution channels, Talent.com caters to the diverse needs of job seekers, effectively connecting millions of job seekers with job opportunities.

Talent.com’s mission is to facilitate global workforce connections. To achieve this, Talent.com aggregates job listings from various sources on the web, offering job seekers access to an extensive pool of over 30 million job opportunities tailored to their skills and experiences. In line with this mission, Talent.com collaborated with AWS to develop a cutting-edge job recommendation engine driven by deep learning, aimed at assisting users in advancing their careers.

To ensure the effective operation of this job recommendation engine, it is crucial to implement a large-scale data processing pipeline responsible for extracting and refining features from Talent.com’s aggregated job listings. This pipeline is able to process 5 million daily records in less than 1 hour, and allows for processing multiple days of records in parallel. In addition, this solution allows for a quick deployment to production. The primary source of data for this pipeline is the JSON Lines format, stored in Amazon Simple Storage Service (Amazon S3) and partitioned by date. Each day, this results in the generation of tens of thousands of JSON Lines files, with incremental updates occurring daily.

The primary objective of this data processing pipeline is to facilitate the creation of features necessary for training and deploying the job recommendation engine on Talent.com. It’s worth noting that this pipeline must support incremental updates and cater to the intricate feature extraction requirements necessary for the training and deployment modules essential for the job recommendation system. Our pipeline belongs to the general ETL (extract, transform, and load) process family that combines data from multiple sources into a large, central repository.

For further insights into how Talent.com and AWS collaboratively built cutting-edge natural language processing and deep learning model training techniques, utilizing Amazon SageMaker to craft a job recommendation system, refer to From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker. The system includes feature engineering, deep learning model architecture design, hyperparameter optimization, and model evaluation, where all modules are run using Python.

This post shows how we used SageMaker to build a large-scale data processing pipeline for preparing features for the job recommendation engine at Talent.com. The resulting solution enables a Data Scientist to ideate feature extraction in a SageMaker notebook using Python libraries, such as Scikit-Learn or PyTorch, and then to quickly deploy the same code into the data processing pipeline performing feature extraction at scale. The solution does not require porting the feature extraction code to use PySpark, as required when using AWS Glue as the ETL solution. Our solution can be developed and deployed solely by a Data Scientist end-to-end using only a SageMaker, and does not require knowledge of other ETL solutions, such as AWS Batch. This can significantly shorten the time needed to deploy the Machine Learning (ML) pipeline to production. The pipeline is operated through Python and seamlessly integrates with feature extraction workflows, rendering it adaptable to a wide range of data analytics applications.

Solution overview

The pipeline is comprised of three primary phases:

Utilize an Amazon SageMaker Processing job to handle raw JSONL files associated with a specified day. Multiple days of data can be processed by separate Processing jobs simultaneously.
Employ AWS Glue for data crawling after processing multiple days of data.
Load processed features for a specified date range using SQL from an Amazon Athena table, then train and deploy the job recommender model.

Process raw JSONL files

We process raw JSONL files for a specified day using a SageMaker Processing job. The job implements feature extraction and data compaction, and saves processed features into Parquet files with 1 million records per file. We take advantage of CPU parallelization to perform feature extraction for each raw JSONL file in parallel. Processing results of each JSONL file is saved into a separate Parquet file inside a temporary directory. After all of the JSONL files have been processed, we perform compaction of thousands of small Parquet files into several files with 1 million records per file. The compacted Parquet files are then uploaded into Amazon S3 as the output of the processing job. The data compaction ensures efficient crawling and SQL queries in the next stages of the pipeline.

The following is the sample code to schedule a SageMaker Processing job for a specified day, for example 2020-01-01, using the SageMaker SDK. The job reads raw JSONL files from Amazon S3 (for example from s3://bucket/raw-data/2020/01/01) and saves the compacted Parquet files into Amazon S3 (for example to s3://bucket/processed/table-name/day_partition=2020-01-01/).

### install dependencies 
%pip install sagemaker pyarrow s3fs awswrangler

import sagemaker
import boto3

from sagemaker.processing import FrameworkProcessor
from sagemaker.sklearn.estimator import SKLearn
from sagemaker import get_execution_role
from sagemaker.processing import ProcessingInput, ProcessingOutput

region = boto3.session.Session().region_name
role = get_execution_role()
bucket = sagemaker.Session().default_bucket()

### we use instance with 16 CPUs and 128 GiB memory
### note that the script will NOT load the entire data into memory during compaction
### depending on the size of individual jsonl files, larger instance may be needed
instance = "ml.r5.4xlarge"
n_jobs = 8  ### we use 8 process workers
date = "2020-01-01" ### process data for one day

est_cls = SKLearn
framework_version_str = "0.20.0"

### schedule processing job
script_processor = FrameworkProcessor(
    role=role,
    instance_count=1,
    instance_type=instance,
    estimator_cls=est_cls,
    framework_version=framework_version_str,
    volume_size_in_gb=500,
)

script_processor.run(
    code="processing_script.py", ### name of the main processing script
    source_dir="../src/etl/", ### location of source code directory

    ### our processing script loads raw jsonl files directly from S3
    ### this avoids long start-up times of the processing jobs,
    ### since raw data does not need to be copied into instance
    inputs=[], ### processing job input is empty

    outputs=[
        ProcessingOutput(destination="s3://bucket/processed/table-name/",
                         source="/opt/ml/processing/output"),
    ],
    arguments=[
        ### directory with job's output
        "--output", "/opt/ml/processing/output",

        ### temporary directory inside instance
        "--tmp_output", "/opt/ml/tmp_output",

        "--n_jobs", str(n_jobs), ### number of process workers
        "--date", date, ### date to process

        ### location with raw jsonl files in S3
        "--path", "s3://bucket/raw-data/",
    ],
    wait=False
)

The following code outline for the main script (processing_script.py) that runs the SageMaker Processing job is as follows:

import concurrent
import pyarrow.dataset as ds
import os
import s3fs
from pathlib import Path

### function to process raw jsonl file and save extracted features into parquet file  
from process_data import process_jsonl

### parse command line arguments
args = parse_args()

### we use s3fs to crawl S3 input path for raw jsonl files
fs = s3fs.S3FileSystem()
### we assume raw jsonl files are stored in S3 directories partitioned by date
### for example: s3://bucket/raw-data/2020/01/01/
jsons = fs.find(os.path.join(args.path, *args.date.split('-')))

### temporary directory location inside the Processing job instance
tmp_out = os.path.join(args.tmp_output, f"day_partition={args.date}")

### directory location with job's output
out_dir = os.path.join(args.output, f"day_partition={args.date}")

### process individual jsonl files in parallel using n_jobs process workers
futures=[]
with concurrent.futures.ProcessPoolExecutor(max_workers=args.n_jobs) as executor:
    for file in jsons:
        inp_file = Path(file)
        out_file = os.path.join(tmp_out, inp_file.stem + ".snappy.parquet")
        ### process_jsonl function reads raw jsonl file from S3 location (inp_file)
        ### and saves result into parquet file (out_file) inside temporary directory
        futures.append(executor.submit(process_jsonl, file, out_file))

    ### wait until all jsonl files are processed
    for future in concurrent.futures.as_completed(futures):
        result = future.result()

### compact parquet files
dataset = ds.dataset(tmp_out)

if len(dataset.schema) > 0:
    ### save compacted parquet files with 1MM records per file
    ds.write_dataset(dataset, out_dir, format="parquet", 
                     max_rows_per_file=1024 * 1024)

Scalability is a key feature of our pipeline. First, multiple SageMaker Processing jobs can be used to process data for several days simultaneously. Second, we avoid loading the entire processed or raw data into memory at once, while processing each specified day of data. This enables the processing of data using instance types that can’t accommodate a full day’s worth of data in primary memory. The only requirement is that the instance type should be capable of loading N raw JSONL or processed Parquet files into memory simultaneously, with N being the number of process workers in use.

Crawl processed data using AWS Glue

After all the raw data for multiple days has been processed, we can create an Athena table from the entire dataset by using an AWS Glue crawler. We use the AWS SDK for pandas (awswrangler) library to create the table using the following snippet:

import awswrangler as wr

### crawl processed data in S3
res = wr.s3.store_parquet_metadata(
    path='s3://bucket/processed/table-name/',
    database="database_name",
    table="table_name",
    dataset=True,
    mode="overwrite",
    sampling=1.0,
    path_suffix='.parquet',
)

### print table schema
print(res[0])

Load processed features for training

Processed features for a specified date range can now be loaded from the Athena table using SQL, and these features can then be used for training the job recommender model. For example, the following snippet loads one month of processed features into a DataFrame using the awswrangler library:

import awswrangler as wr

query = """
    SELECT * 
    FROM table_name
    WHERE day_partition BETWEN '2020-01-01' AND '2020-02-01' 
"""

### load 1 month of data from database_name.table_name into a DataFrame
df = wr.athena.read_sql_query(query, database='database_name')

Additionally, the use of SQL for loading processed features for training can be extended to accommodate various other use cases. For instance, we can apply a similar pipeline to maintain two separate Athena tables: one for storing user impressions and another for storing user clicks on these impressions. Using SQL join statements, we can retrieve impressions that users either clicked on or didn’t click on and then pass these impressions to a model training job.

Solution benefits

Implementing the proposed solution brings several advantages to our existing workflow, including:

Simplified implementation – The solution enables feature extraction to be implemented in Python using popular ML libraries. And, it does not require the code to be ported into PySpark. This streamlines feature extraction as the same code developed by a Data Scientist in a notebook will be executed by this pipeline.
Quick path-to-production – The solution can be developed and deployed by a Data Scientist to perform feature extraction at scale, enabling them to develop an ML recommender model against this data. At the same time, the same solution can be deployed to production by an ML Engineer with little modifications needed.
Reusability – The solution provides a reusable pattern for feature extraction at scale, and can be easily adapted for other use cases beyond building recommender models.
Efficiency – The solution offers good performance: processing a single day of the Talent.com’s data took less than 1 hour.
Incremental updates – The solution also supports incremental updates. New daily data can be processed with a SageMaker Processing job, and the S3 location containing the processed data can be recrawled to update the Athena table. We can also use a cron job to update today’s data several times per day (for example, every 3 hours).

We used this ETL pipeline to help Talent.com process 50,000 files per day containing 5 million records, and created training data using features extracted from 90 days of raw data from Talent.com—a total of 450 million records across 900,000 files. Our pipeline helped Talent.com build and deploy the recommendation system into production within only 2 weeks. The solution performed all ML processes including ETL on Amazon SageMaker without utilizing other AWS service. The job recommendation system drove an 8.6% increase in clickthrough rate in online A/B testing against a previous XGBoost-based solution, helping connect millions of Talent.com’s users to better jobs.

Conclusion

This post outlines the ETL pipeline we developed for feature processing for training and deploying a job recommender model at Talent.com. Our pipeline uses SageMaker Processing jobs for efficient data processing and feature extraction at a large scale. Feature extraction code is implemented in Python enabling the use of popular ML libraries to perform feature extraction at scale, without the need to port the code to use PySpark.

We encourage the readers to explore the possibility of using the pipeline presented in this blog as a template for their use-cases where feature extraction at scale is required. The pipeline can be leveraged by a Data Scientist to build an ML model, and the same pipeline can then be adopted by an ML Engineer to run in production. This can significantly reduce the time needed to productize the ML solution end-to-end, as was the case with Talent.com. The readers can refer to the tutorial for setting up and running SageMaker Processing jobs. We also refer the readers to view the post From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker, where we discuss deep learning model training techniques utilizing Amazon SageMaker to build Talent.com’s job recommendation system.

About the authors

Dmitriy Bespalov is a Senior Applied Scientist at the Amazon Machine Learning Solutions Lab, where he helps AWS customers across different industries accelerate their AI and cloud adoption.

Yi Xiang is a Applied Scientist II at the Amazon Machine Learning Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.

Tong Wang is a Senior Applied Scientist at the Amazon Machine Learning Solutions Lab, where he helps AWS customers across different industries accelerate their AI and cloud adoption.

Anatoly Khomenko is a Senior Machine Learning Engineer at Talent.com with a passion for natural language processing matching good people to good jobs.

Abdenour Bezzouh is an executive with more than 25 years experience building and delivering technology solutions that scale to millions of customers. Abdenour held the position of Chief Technology Officer (CTO) at Talent.com when the AWS team designed and executed this particular solution for Talent.com.

Yanjun Qi is a Senior Applied Science Manager at the Amazon Machine Learning Solution Lab. She innovates and applies machine learning to help AWS customers speed up their AI and cloud adoption.

At NeurIPS, what’s old is new again

December 13, 2023

by Amazon AWS

Amazon Scholar and NeurIPS advisory board member Richard Zemel on what robustness and responsible AI have in common, what AI can still learn from neuroscience, and the emerging topics that interest him most.Read More

Create summaries of recordings using generative AI with Amazon Bedrock and Amazon Transcribe

December 13, 2023

by Rob Barnes Amazon AWS

Meeting notes are a crucial part of collaboration, yet they often fall through the cracks. Between leading discussions, listening closely, and typing notes, it’s easy for key information to slip away unrecorded. Even when notes are captured, they can be disorganized or illegible, rendering them useless.

In this post, we explore how to use Amazon Transcribe and Amazon Bedrock to automatically generate clean, concise summaries of video or audio recordings. Whether it’s an internal team meeting, conference session, or earnings call, this approach can help you distill hours of content down to salient points.

We walk through a solution to transcribe a project team meeting and summarize the key takeaways with Amazon Bedrock. We also discuss how you can customize this solution for other common scenarios like course lectures, interviews, and sales calls. Read on to simplify and automate your note-taking process.

Solution overview

By combining Amazon Transcribe and Amazon Bedrock, you can save time, capture insights, and enhance collaboration. Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward to add speech-to-text capability to applications. It uses advanced deep learning technologies to accurately transcribe audio into text. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API, along with a broad set of capabilities you need to build generative AI applications. With Amazon Bedrock, you can easily experiment with a variety of top FMs, and privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG).

The solution presented in this post is orchestrated using an AWS Step Functions state machine that is triggered when you upload a recording to the designated Amazon Simple Storage Service (Amazon S3) bucket. Step Functions lets you create serverless workflows to orchestrate and connect components across AWS services. It handles the underlying complexity so you can focus on application logic. It’s useful for coordinating tasks, distributed processing, ETL (extract, transform, and load), and business process automation.

The following diagram illustrates the high-level solution architecture.

The solution workflow includes the following steps:

A user stores a recording in the S3 asset bucket.
This action triggers the Step Functions transcription and summarization state machine.
As part of the state machine, an AWS Lambda function is triggered, which transcribes the recording using Amazon Transcribe and stores the transcription in the asset bucket.
A second Lambda function retrieves the transcription and generates a summary using the Anthropic Claude model in Amazon Bedrock.
Lastly, a final Lambda function uses Amazon Simple Notification Service (Amazon SNS) to send a summary of the recording to the recipient.

This solution is supported in Regions where Anthropic Claude on Amazon Bedrock is available.

The state machine orchestrates the steps to perform the specific tasks. The following diagram illustrates the detailed process.

Prerequisites

Amazon Bedrock users need to request access to models before they are available for use. This is a one-time action. For this solution, you’ll need to enable access to the Anthropic Claude (not Anthropic Claude Instant) model in Amazon Bedrock. For more information, refer to Model access.

Deploy solution resources

The solution is deployed using an AWS CloudFormation template, found on the GitHub repo, to automatically provision the necessary resources in your AWS account. The template requires the following parameters:

Email address used to send summary – The summary will be sent to this address. You must acknowledge the initial Amazon SNS confirmation email before receiving additional notifications.
Summary instructions – These are the instructions given to the Amazon Bedrock model to generate the summary.

Run the solution

After you deploy the solution using AWS CloudFormation, complete the following steps:

Acknowledge the Amazon SNS email confirmation that you should receive a few moments after creating the CloudFormation stack.
On the AWS CloudFormation console, navigate to stack you just created.
On the stack’s Outputs tab, and look for the value associated with AssetBucketName; it will look something like summary-generator-assetbucket-xxxxxxxxxxxxx.
On the Amazon S3 console, navigate to your asset bucket.

This is where you’ll upload your recordings. Valid file formats are MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.

Upload your recording to the recordings folder.

Uploading recordings will automatically trigger the Step Functions state machine. For this example, we use a sample team meeting recording in the sample-recording directory of the GitHub repository.

On the Step Functions console, navigate to the summary-generator state machine.
Choose the name of the state machine run with the status Running.

Here, you can watch the progress of the state machine as it processes the recording.

After it reaches its Success state, you should receive an emailed summary of the recording.

Alternatively, you can navigate to the S3 assets bucket and view the transcript there in the transcripts folder.

Review the summary

You will get the recording summary emailed to the address you provided when you created the CloudFormation stack. If you don’t receive the email in a few moments, make sure that you acknowledged the Amazon SNS confirmation email that you should have received after you created the stack and then upload the recording again, which will trigger the summary process.

This solution includes a mock team meeting recording that you can use to test the solution. The summary will look similar to the following example. Because of the nature of generative AI, however, your output will look a bit different, but the content should be close.

Here are the key points from the standup:

Joe finished reviewing the current state for task EDU1 and created a new task to develop the future state. That new task is in the backlog to be prioritized. He’s now starting EDU2 but is blocked on resource selection.

Rob created a tagging strategy for SLG1 based on best practices, but may need to coordinate with other teams who have created their own strategies, to align on a uniform approach. A new task was created to coordinate tagging strategies.

Rob has made progress debugging for SLG2 but may need additional help. This task will be moved to Sprint 2 to allow time to get extra resources.

Next Steps:

Joe to continue working on EDU2 as able until resource selection is decided

New task to be prioritized to coordinate tagging strategies across teams

SLG2 moved to Sprint 2

Standups moving to Mondays starting next week

Expand the solution

Now that you have a working solution, here are some potential ideas to customize the solution for your specific use cases:

Try altering the process to fit your available source content and desired outputs:
- For situations where transcripts are available, create an alternate Step Functions workflow to ingest existing text-based or PDF-based transcriptions.
- Instead of using Amazon SNS to notify recipients via email, you can use it to send the output to a different endpoint, such as a team collaboration site, or to the team’s chat channel.
Try changing the summary instructions CloudFormation stack parameter provided to Amazon Bedrock to produce outputs specific to your use case (this is the generative AI prompt):
- When summarizing a company’s earnings call, you could have the model focus on potential promising opportunities, areas of concern, and things that you should continue to monitor.
- If you are using this to summarize a course lecture, the model could identify upcoming assignments, summarize key concepts, list facts, and filter out any small talk from the recording.
For the same recording, create different summaries for different audiences:
- Engineers’ summaries focus on design decisions, technical challenges, and upcoming deliverables.
- Project managers’ summaries focus on timelines, costs, deliverables, and action items.
- Project sponsors get a brief update on project status and escalations.
- For longer recordings, try generating summaries for different levels of interest and time commitment. For example, create a single sentence, single paragraph, single page, or in-depth summary. In addition to the prompt, you may want to adjust the max_tokens_to_sample parameter to accommodate different content lengths.

Clean up

To clean up the solution, delete the CloudFormation stack that you created earlier. Note that deleting the stack will not delete the asset bucket. If you no longer need the recordings or transcripts, you can delete this bucket separately. Amazon Transcribe will automatically delete transcription jobs after 90 days, but you can delete these manually before then.

Conclusion

In this post, we explored how to use Amazon Transcribe and Amazon Bedrock to automatically generate clean, concise summaries of video or audio recordings. We encourage you to continue evaluating Amazon Bedrock, Amazon Transcribe, and other AWS AI services, like Amazon Textract, Amazon Translate, and Amazon Rekognition, to see how they can help meet your business objectives.

About the Authors

Rob Barnes is a principal consultant for AWS Professional Services. He works with our customers to address security and compliance requirements at scale in complex, multi-account AWS environments through automation.

Jason Stehle is a Senior Solutions Architect at AWS, based in the New England area. He works with customers to align AWS capabilities with their greatest business challenges. Outside of work, he spends his time building things and watching comic book movies with his family.

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

December 13, 2023

by Wei Teh Amazon AWS

In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2. We use the AWS Neuron software development kit (SDK) to access the AWS Inferentia2 device and benefit from its high performance. We then use a large model inference container powered by Deep Java Library (DJLServing) as our model serving solution.

Solution overview

Efficient Fine-tuning Llama2 using QLoRa

The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Llama 2 was pre-trained on 2 trillion tokens of data from publicly available sources. AWS customers sometimes choose to fine-tune Llama 2 models using customers’ own data to achieve better performance for downstream tasks. However, due to Llama 2 model’s large number of parameters, full fine-tuning could be prohibitively expensive and time consuming. Parameter-Efficient Fine-Tuning (PEFT) approach can address this problem by only fine-tune a small number of extra model parameters while freezing most parameters of the pre-trained model. For more information on PEFT, one can read this post. In this post, we use QLoRa to fine-tune a Llama 2 7B model.

Deploy a fine-tuned Model on Inf2 using Amazon SageMaker

AWS Inferentia2 is purpose-built machine learning (ML) accelerator designed for inference workloads and delivers high-performance at up to 40% lower cost for generative AI and LLM workloads over other inference optimized instances on AWS. In this post, we use Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instance, featuring AWS Inferentia2, the second generation Inferentia2 accelerators, each containing two NeuronCores-v2. Each NeuronCore-v2 is an independent, heterogenous compute-unit, with four main engines: Tensor, Vector, Scalar, and GPSIMD engines. It includes an on-chip software-managed SRAM memory for maximizing data locality. Since several blogs on Inf2 has been published, the reader can refer to this post and our documentation for more information on Inf2.

To deploy models on Inf2, we need AWS Neuron SDK as the software layer running on top of the Inf2 hardware. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and AWS Trainium based instances. It enables end-to-end ML development lifecycle to build new models, train and optimize these models, and deploy them for production. AWS Neuron includes a deep learning compiler, runtime, and tools that are natively integrated with popular frameworks like TensorFlow and PyTorch. In this blog, we are going to use transformers-neuronx, which is part of the AWS Neuron SDK for transformer decoder inference workflows. It supports a range of popular models, including Llama 2.

To deploy models on Amazon SageMaker, we usually use a container that contains the required libraries, such as Neuron SDK and transformers-neuronx as well as the model serving component. Amazon SageMaker maintains deep learning containers (DLCs) with popular open source libraries for hosting large models. In this post, we use the Large Model Inference Container for Neuron. This container has everything you need to deploy your Llama 2 model on Inf2. For resources to get started with LMI on Amazon SageMaker, please refer to many of our existing posts (blog 1, blog 2, blog 3) on this topic. In short, you can run the container without writing any additional code. You can use the default handler for a seamless user experience and pass in one of the supported model names and any load time configurable parameters. This compiles and serve an LLM on an Inf2 instance. For example, to deploy OpenAssistant/llama2-13b-orca-8k-3319, you can provide the follow configuration (as serving.properties file). In serving.properties, we specify the model type as llama2-13b-orca-8k-3319, the batch size as 4, the tensor parallel degree as 2, and that is it. For the full list of configurable parameters, refer to All DJL configuration options.

# Engine to use: MXNet, PyTorch, TensorFlow, ONNX, PaddlePaddle, DeepSpeed, etc.
engine = Python 
# default handler for model serving
option.entryPoint = djl_python.transformers_neuronx
# The Hugging Face ID of a model or the s3 url of the model artifacts. 
option.model_id = meta-llama/Llama-2-7b-chat-hf
#the dynamic batch size, default is 1.
option.batch_size=4
# This option specifies number of tensor parallel partitions performed on the model.
option.tensor_parallel_degree=2
# The input sequence length
option.n_positions=512
#Enable iteration level batching using one of "auto", "scheduler", "lmi-dist"
option.rolling_batch=auto
# The data type to which you plan to cast the model default
option.dtype=fp16
# worker load model timeout
option.model_loading_timeout=1500

Alternatively, you can write your own model handler file as shown in this example, but that requires implementing the model loading and inference methods to serve as a bridge between the DJLServing APIs.

Prerequisites

The following list outlines the prerequisites for deploying the model described in this blog post. You can implement either from the AWS Management Console or using the latest version of the AWS Command Line Interface (AWS CLI).

Walkthrough

In the following section, we’ll walkthrough the code in two parts:

Fine-tuning a Llama2-7b model, and upload the model artifacts to a specified Amazon S3 bucket location.
Deploy the model into an Inferentia2 using DJL serving container hosted in Amazon SageMaker.

The complete code samples with instructions can be found in this GitHub repository.

Part 1: Fine-tune a Llama2-7b model using PEFT

We are going to use the recently introduced method in the paper QLoRA: Quantization-aware Low-Rank Adapter Tuning for Language Generation by Tim Dettmers et al. QLoRA is a new technique to reduce the memory footprint of large language models during fine-tuning, without sacrificing performance.

Note: The fine-tuning of llama2-7b model shown in the following was tested on an Amazon SageMaker Studio Notebook with Python 2.0 GPU Optimized Kernel using a ml.g5.2xlarge instance type. As a best practice, we recommend using an Amazon SageMaker Studio Integrated Development Environment (IDE) launched in your own Amazon Virtual Private Cloud (Amazon VPC). This allows you to control, monitor, and inspect network traffic within and outside your VPC using standard AWS networking and security capabilities. For more information, see Securing Amazon SageMaker Studio connectivity using a private VPC.

Quantize the base model

We first load a quantized model with 4-bit quantization using Huggingface transformers library as follows:

# The base pretrained model for fine-tuning
model_name = "NousResearch/Llama-2-7b-chat-hf"

# The instruction dataset to use
dataset_name = "mlabonne/guanaco-llama2-1k"

#Activate 4-bit precision base model loading
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)

# Load base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

Load training dataset

Next, we load the dataset to feed the model for fine-tuning step shown as followed:

# Load dataset (you can process it here)
dataset = load_dataset(dataset_name, split="train")

Attach an adapter layer

Here we attach a small, trainable adapter layer, configured as LoraConfig defined in the Hugging Face’s peft library.

# include linear layers to apply LoRA to.
modules = find_all_linear_names(model)

## Setting up LoRA configuration
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
target_modules=modules)

Train a model

Using the LoRA configuration shown above, we’ll fine-tune the Llama2 model along with hyper-parameters. A code snippet for training the model is shown in the following:

# Set training parameters
training_arguments = TrainingArguments(...)

trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config, # LoRA config
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)

# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model)

Merge model weight

The fine-tuned model executed above created a new model containing the trained LoRA adapter weights. In the following code snippet, we’ll merge the adapter with the base model so that we could use the fine-tuned model for inference.

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

save_dir = "merged_model"
model.save_pretrained(save_dir, safe_serialization=True, max_shard_size="2GB")

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
tokenizer.save_pretrained(save_dir)

Upload model weight to Amazon S3

In the final step of part 1, we’ll save the merged model weights to a specified Amazon S3 location. The model weight will be used by a model serving container in Amazon SageMaker to host the model using an Inferentia2 instance.

model_data_s3_location = "s3://<bucket_name>/<prefix>/"
!cd {save_dir} && aws s3 cp —recursive . {model_data_s3_location}

Part 2: Host QLoRA model for inference with AWS Inf2 using SageMaker LMI Container

In this section, we’ll walk through the steps of deploying a QLoRA fine-tuned model into an Amazon SageMaker hosting environment. We’ll use a DJL serving container from SageMaker DLC, which integrates with the transformers-neuronx library to host this model. The setup facilitates the loading of models onto AWS Inferentia2 accelerators, parallelizes the model across multiple NeuronCores, and enables serving via HTTP endpoints.

Prepare model artifacts

DJL supports many deep learning optimization libraries, including DeepSpeed, FasterTransformer and more. For model specific configurations, we provide a serving.properties with key parameters, such as tensor_parallel_degree and model_id to define the model loading options. The model_id could be a Hugging Face model ID, or an Amazon S3 path where the model weights are stored. In our example, we provide the Amazon S3 location of our fine-tuned model. The following code snippet shows the properties used for the model serving:

%%writefile serving.properties
engine=Python
option.entryPoint=djl_python.transformers_neuronx
option.model_id=<model data s3 location>
option.batch_size=4
option.neuron_optimize_level=2
option.tensor_parallel_degree=8
option.n_positions=512
option.rolling_batch=auto
option.dtype=fp16
option.model_loading_timeout=1500

Please refer to this documentation for more information about the configurable options available via serving.properties. Please note that we use option.n_position=512 in this blog for faster AWS Neuron compilation. If you want to try larger input token length, then we recommend the reader to pre-compile the model ahead of time (see AOT Pre-Compile Model on EC2). Otherwise, you might run into timeout error if the compilation time is too much.

After the serving.properties file is defined, we’ll package the file into a tar.gz format, as follows:

%%sh
mkdir mymodel
mv serving.properties mymodel/
tar czvf mymodel.tar.gz mymodel/
rm -rf mymodel

Then, we’ll upload the tar.gz to an Amazon S3 bucket location:

s3_code_prefix = "large-model-lmi/code"
bucket = sess.default_bucket()  # bucket to house artifacts
code_artifact = sess.upload_data("mymodel.tar.gz", bucket, s3_code_prefix)
print(f"S3 Code or Model tar ball uploaded to --- > {code_artifact}")

Create an Amazon SageMaker model endpoint

To use an Inf2 instance for serving, we use an Amazon SageMaker LMI container with DJL neuronX support. Please refer to this post for more information about using a DJL NeuronX container for inference. The following code shows how to deploy a model using Amazon SageMaker Python SDK:

# Retrieves the DJL-neuronx docker image URI
image_uri = image_uris.retrieve(
framework="djl-neuronx",
region=sess.boto_session.region_name,
version="0.24.0"
)

# Define inf2 instance type to use for serving
instance_type = "ml.inf2.48xlarge"

endpoint_name = sagemaker.utils.name_from_base("lmi-model")

# Deploy the model for inference
model.deploy(initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=1500,
volume_size=256,
endpoint_name=endpoint_name)

# our requests and responses will be in json format so we specify the serializer and the deserializer
predictor = sagemaker.Predictor(
endpoint_name=endpoint_name,
sagemaker_session=sess,
serializer=serializers.JSONSerializer(),
)

Test model endpoint

After the model is deployed successfully, we can validate the endpoint by sending a sample request to the predictor:

prompt="What is machine learning?"
input_data = f"<s>[INST] <<SYS>>nAs a data scientistn<</SYS>>n{prompt} [/INST]"

response = predictor.predict(
{"inputs": input_data, "parameters": {"max_new_tokens":300, "do_sample":"True"}}
)

print(json.loads(response)['generated_text'])

The sample output is shown as follows:

In the context of data analysis, Machine Learning (ML) refers to a statistical technique capable of extracting predictive power from a dataset with an increasing complexity and accuracy by iteratively narrowing down the scope of a statistic.

Machine Learning is not a new statistical technique, but rather a combination of existing techniques. Furthermore, it has not been designed to be used with a specific dataset or to produce a specific outcome. Rather, it was designed to be flexible enough to adapt to any dataset and to make predictions about any outcome.

Clean up

If you decide that you no longer want to keep the SageMaker endpoint running, you can delete it using AWS SDK for Python (boto3), AWS CLI or Amazon SageMaker Console. Additionally, you can also shutdown the Amazon SageMaker Studio Resources that are no longer required.

Conclusion

In this post, we showed you how to fine-tune a Llama2-7b model using LoRA adaptor with 4-bit quantization using a single GPU instance. Then we deployed the model to an Inf2 instance hosted in Amazon SageMaker using a DJL serving container. Finally, we validated the Amazon SageMaker model endpoint with a text generation prediction using the SageMaker Python SDK. Go ahead and give it a try, we love to hear your feedback. Stay tuned for updates on more capabilities and new innovations with AWS Inferentia.

For more examples about AWS Neuron, see aws-neuron-samples.

About the Authors

Wei Teh is a Senior AI/ML Specialist Solutions Architect at AWS. He is passionate about helping customers advance their AWS journey, focusing on Amazon Machine Learning services and machine learning-based solutions. Outside of work, he enjoys outdoor activities like camping, fishing, and hiking with his family.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.