Build machine learning-ready datasets from the Amazon SageMaker offline Feature Store using the Amazon SageMaker Python SDK

Build machine learning-ready datasets from the Amazon SageMaker offline Feature Store using the Amazon SageMaker Python SDK

Amazon SageMaker Feature Store is a purpose-built service to store and retrieve feature data for use by machine learning (ML) models. Feature Store provides an online store capable of low-latency, high-throughput reads and writes, and an offline store that provides bulk access to all historical record data. Feature Store handles the synchronization of data between the online and offline stores.

Because model development is an iterative process, customers will frequently query the offline store and build various datasets for model training. Currently, there are several ways to access features in the offline store, including running SQL queries with Amazon Athena or using Spark SQL in Apache Spark. However, these patterns require writing ad hoc (and sometimes complex) SQL statements, which isn’t always suitable for the data scientist persona.

Feature Store recently extended the SageMaker Python SDK to make it easier to create datasets from the offline store. With this release, you can use a new set of methods in the SDK to create datasets without writing SQL queries. These new methods support common operations such as time travel, filtering duplicate records, and joining multiple feature groups while ensuring point-in-time accuracy.

In this post, we demonstrate how to use the SageMaker Python SDK to build ML-ready datasets without writing any SQL statements.

Solution overview

To demonstrate the new functionality, we work with two datasets: leads and web marketing metrics. These datasets can be used to build a model that predicts if a lead will convert into a sale given marketing activities and metrics captured for that lead.

The leads data contains information on prospective customers who are identified using Lead_ProspectID. The features for a lead (for example, LeadSource) can be updated over time, which results in a new record for that lead. The Lead_EventTime represents the time in which each record is created. The following screenshot shows an example of this data.

The web marketing metrics data tracks the engagement metrics for a lead, where each lead is identified using the Web_ProspectID. The Web_EventTime represents the time in which the record was created. Unlike the leads feature group, there is only one record per lead in this feature group. The following screenshot shows an example of this data.

We walk through the key parts of the sagemaker-feature-store-offline-sdk.ipynb notebook, which demonstrates the following steps:

  1. Create a dataset from a feature group.
  2. Join multiple feature groups.
  3. Create a point-in-time join between a feature group and a dataset based on a set of events at specific timestamps.
  4. Retrieve feature history within a specific time range.
  5. Retrieve features as of a specific timestamp.

Prerequisites

You need the following prerequisites:

git clone https://github.com/aws-samples/amazon-sagemaker-feature-store-offline-queries.git

We assume a feature group for the leads data has been created using the existing FeatureGroup.create method, and can be referenced using the variable base_fg. For more information on feature groups, refer to Create Feature Groups.

Create a dataset from a feature group

To create a dataset using the SageMaker SDK, we use the new FeatureStore class, which contains the create_dataset method. This method accepts a base feature group that may be joined with other feature groups or DataFrames. We start by providing the leads feature group as the base and an Amazon Simple Storage Service (Amazon S3) path to store the dataset:

from sagemaker.feature_store.feature_store import FeatureStore
feature_store = FeatureStore(sagemaker_session=feature_store_session)
ds1_builder = feature_store.create_dataset (base=base_fg,
output_path=f"s3://{s3_bucket_name}/dataset_query_results",)

The create_dataset method returns a DatasetBuilder object, which can be used to generate a dataset from one or multiple feature groups (which we demonstrate in the next section). To create a simple dataset consisting of only the leads features, we invoke the to_csv_file method. This runs a query in Athena to retrieve the features from the offline store, and saves the results to the specified S3 path.

csv, query = ds1_builder.to_csv_file()
# Show S3 location of CSV file
print(f'CSV file: {csv}')

Join multiple feature groups

With the SageMaker SDK, you can easily join multiple feature groups to build a dataset. You can also perform join operations between an existing Pandas DataFrame to a single or multiple feature groups. The base feature group is an important concept for joins. The base feature group is the feature group that has other feature groups or the Pandas DataFrame joined to it.

While creating the dataset using the create_dataset function, we use the with_feature_group method, which performs an inner join between the base feature group and another feature group using the record identifier and the target feature name in the base feature group. In our example, the base feature group is the leads feature group, and the target feature group is the web marketing feature group. The with_feature_group method accepts the following arguments:

  • feature_group – This is the feature group we are joining with. In our code sample, the target feature group is created by using the web marketing dataset.
  • target_feature_name_in_base – The name of the feature in the base feature group that we’re using as a key in the join. We use Lead_ProspectID as the record identifier for the base feature group.
  • included_feature_names – This is the list of the feature names of the base feature group. We use this field to specify the features that we want to include in the dataset.

The following code shows an example of creating a dataset by joining the base feature group with the target feature group:

join_builder = feature_store.create_dataset(base=base_fg, 
output_path=f"s3://{s3_bucket_name}/dataset_query_results").with_feature_group(
feature_group=target_fg,
target_feature_name_in_base="Lead_ProspectID",
included_feature_names=["Web_ProspectID",
"LastCampaignActivity","PageViewsPerVisit",
"TotalTimeOnWebsite","TotalWebVisits",
"AttendedMarketingEvent","OrganicSearch",
"ViewedAdvertisement",],)

You can extend the join operations to include multiple feature groups by adding the with_feature_group method at the end of the preceding code example and defining the required arguments for the new feature group. You can also perform join operations with an existing DataFrame by defining the base to be your existing Pandas DataFrame and joining with the interested feature groups. The following code sample shows how to create dataset with an existing Pandas DataFrame and an existing feature group:

ds2_builder = feature_store.create_dataset(
base=new_records_df2, # Pandas DataFrame
event_time_identifier_feature_name="Lead_EventTime",
record_identifier_feature_name="Lead_ProspectID",
output_path=f"s3://{s3_bucket_name}/dataset_query_results",).with_feature_group(
base_fg, "Lead_ProspectID", ["LeadSource"])

For more examples on these various configurations, refer to Create a Dataset from your Feature Groups.

Create a point-in-time join

One of the most powerful capabilities of this enhancement is to perform point-in-time joins simply and without the need to write complex SQL code. When building ML models, data scientists need to avoid data leakage or target leakage, which is accidentally using data during model training that wouldn’t be available at the time of prediction. For instance, if we’re trying to predict credit card fraud, we should exclude transactions that arrive after the fraudulent charge we’re trying to predict, otherwise the trained model could use this post-fraud information to alter the model, making it generalize less well.

Retrieval of point-in-time accurate feature data requires you to supply an entity DataFrame that provides a set of record IDs (or primary key) and corresponding event times that serve as the cutoff time for the event. This retrieval mechanism is sometimes referred to as row-level time travel, because it allows a different time constraint to be applied for each row key. To perform point-in-time joins with the SageMaker SDK, we use the Dataset Builder class and provide the entity DataFrame as the base argument to the constructor.

In the following code, we create a simple entity DataFrame with two records. We set the event times, used to indicate the cutoff time, near the middle of the time series data (mid-January 2023):

# Create Events (entity table) dataframe to pass Timestamp for Point-in-Time Join
events = [['2023-01-20T00:00:00Z', record_id1],
['2023-01-15T00:00:00Z', record_id2]]
df_events = pd.DataFrame(events, columns=['Event_Time', 'Lead_ProspectID'])

When we use the point_in_time_accurate_join functionality with the create_dataset call, the internal query excludes all records with timestamps later then the cutoff times supplied, returning the latest feature values that would have been available at the time of the event:

# Create Dataset Builder using point-in-time-accurate-join function

pit_builder = feature_store.create_dataset(
base=df_events,
event_time_identifier_feature_name='Event_Time',
record_identifier_feature_name='Lead_ProspectID',
output_path=f"s3://{s3_bucket_name}/{s3_prefix}/dataset_query_results"
).with_feature_group(base_fg, "Lead_ProspectID"
).point_in_time_accurate_join(
).with_number_of_recent_records_by_record_identifier(1)

Notice that there are only two records in the DataFrame returned by the point-in-time join. This is because we only submitted two record IDs in the entity DataFrame, one for each Lead_ProspectID we want to retrieve. The point-in-time criteria specifies that a record’s event time (stored in the Lead_Eventtime field) must contain a value that is less than the cutoff time.

Additionally, we instruct the query to retrieve only the latest record that meets this criteria because we have applied the with_number_of_recent_records_by_record_identifier method. When used in conjunction with the point_in_time_accurate_join method, this allows the caller to specify how many records to return from those that meet the point-in-time join criteria.

Compare point-in-time join results with Athena query results

To verify the output returned by the SageMaker SDK point_in_time_accurate_join function, we compare it to the result of an Athena query. First, we create a standard Athena query using a SELECT statement tied to the specific table created by the Feature Store runtime. This table name can be found by referencing the table_name field after instantiating the athena_query from the FeatureGroup API:

SELECT * FROM "sagemaker_featurestore"."off_sdk_fg_lead_1682348629" 
WHERE "off_sdk_fg_lead_1682348629"."Lead_ProspectID" = '5e84c78f-6438-4d91-aa96-b492f7e91029'

The Athena query doesn’t contain any point-in-time join semantics, so it returns all records that match the specified record_id (Lead_ProspectID).

Next, we use the Pandas library to sort the Athena results by event times for easy comparison. The records with timestamps later than the event times specified in the entity DataFrame (for example, 2023-01-15T00:00:00Z) submitted to the point_in_time_accurate_join don’t show up in the point-in-time results. Because we additionally specified that we only want a single record from the preceding create_dataset code, we only get the latest record prior to the cutoff time. By comparing the SageMaker SDK results with the Athena query results, we see that the point-in-time join function returned the proper records.

Therefore, we have confidence that we can use the SageMaker SDK to perform row-level time travel and avoid target leakage. Furthermore, this capability works across multiple feature groups that may be refreshed on completely different timelines.

Retrieve feature history within a specific time range

We also want to demonstrate the use of specifying a time range window when joining the feature groups to form a dataset. The time window is defined using with_event_time_range, which accepts two inputs, starting_timestamp and ending_timestamp, and returns a dataset builder object. In our code sample, we set the retrieval time window for 1 full day from 2022-07-01 00:00:00 until 2022-07-02 00:00:00.

The following code shows how to create a dataset with the specified event time window while joining the base feature group with the target feature group:

# Setup Event Time window: seconds of unix epoch time
# Start at 07/01/2022 and set time window to one day
start_ts = 1656633600
time_window = 86400
# Using hard-coded timestamps from dataset, then adding time window
datetime_start = datetime.fromtimestamp(start_ts)
datetime_end = datetime.fromtimestamp(start_ts+time_window)
print(f'Setting retrieval time window: {datetime_start} until {datetime_end}')
time_window_builder = (feature_store.create_dataset(
base=base_fg, output_path=f"s3://{s3_bucket_name}/dataset_query_results").with_feature_group(
feature_group=target_fg,
target_feature_name_in_base="Lead_ProspectID",
included_feature_names=["Web_ProspectID","LastCampaignActivity","PageViewsPerVisit",
"TotalTimeOnWebsite","TotalWebVisits","AttendedMarketingEvent",
"OrganicSearch","ViewedAdvertisement",],)
.with_event_time_range(starting_timestamp=datetime_start,ending_timestamp=datetime_end))

We also confirm the difference between the sizes of the dataset created using with_event_time_range by exporting to a Pandas DataFrame with the to_dataframe() method and displaying the data. Notice how the result set has only a fraction of the original 10,020 records, because it only retrieves records whose event_time is within the 1-day time period.

Retrieve features as of a specific timestamp

The DatasetBuilder as_of method retrieves features from a dataset that meet a timestamp-based constraint, which the caller provides as an argument to the function. This mechanism is useful for scenarios such as rerunning experiments on previously collected data, backtesting time series models, or building a dataset from a previous state of the offline store for data auditing purposes. This functionality is sometimes referred to as time travel because it essentially rolls back the data store to an earlier date and time. This time constraint is also referred to as the cutoff timestamp.

In our sample code, we first create the cutoff timestamp by reading the write_time value for the last record written to the Feature Store, the one written with put_record. Then we provide this cutoff timestamp to the DatasetBuilder as an argument to the as_of method:

# Create dataset using as-of timestamp
print(f'using cut-off time: {asof_cutoff_datetime}')
as_of_builder = feature_store.create_dataset(
base=base_fg,
output_path=f"s3://{s3_bucket_name}/{s3_prefix}/dataset_query_results").with_feature_group(
feature_group=target_fg,
target_feature_name_in_base='Lead_ProspectID',
included_feature_names=['Web_ProspectID','Web_EventTime',
'TotalWebVisits']).as_of(asof_cutoff_datetime)

It’s important to note that the as_of method applies the time constraint to the internal write_time field, which is automatically generated by Feature Store. The write_time field represents the actual timestamp when the record is written to the data store. This is different than other methods like point-in-time-accurate-join and with_event_time_range that use the client-provided event_time field as a comparator.

Clean up

Be sure to delete all the resources created as part of this example to avoid incurring ongoing charges. This includes the feature groups and the S3 bucket containing the offline store data.

SageMaker Python SDK experience vs. writing SQL

The new methods in the SageMaker Python SDK allow you to quickly create datasets and move to the training step quickly during the ML lifecycle. To show the time and effort that can be saved, let’s examine a use case where we need to join two feature groups while retrieving the features within a specified time frame. The following figure compares the Python queries on the offline Feature Store vs. SQL used to create the dataset behind a Python query.

As you can see, the same operation of joining two feature groups requires you to create a long, complex SQL query, whereas it can be accomplished using just the with_feature_group and with_event_time_range methods in the SageMaker Python SDK.

Conclusion

The new offline store methods in the Python SageMaker SDK allow you to query your offline features without having to write complex SQL statements. This provides a seamless experience for customers who are accustomed to writing Python code during model development. For more information about feature groups, refer to Create a Dataset From Your Feature Groups and Feature Store APIs: Feature Group.

The full example in this post can be found in the GitHub repository. Give it a try and let us know your feedback in the comments.


About the Authors

Paul Hargis has focused his efforts on machine learning at several companies, including AWS, Amazon, and Hortonworks. He enjoys building technology solutions and teaching people how to leverage them. Paul likes to help customers expand their machine learning initiatives to solve real-world problems. Prior to his role at AWS, he was lead architect for Amazon Exports and Expansions, helping amazon.com improve the experience for international shoppers.

Mecit Gungor is an AI/ML Specialist Solution Architect at AWS helping customers design and build AI/ML solutions at scale. He covers a wide range of AI/ML use cases for Telecommunication customers and currently focuses on Generative AI, LLMs, and training and inference optimization. He can often be found hiking in the wilderness or playing board games with his friends in his free time.

Tony Chen is a Machine Learning Solutions Architect at AWS, helping customers design scalable and robust machine learning capabilities in the cloud. As a former data scientist and data engineer, he leverages his experience to help tackle some of the most challenging problems organizations face with operationalizing machine learning.

Sovik Kumar Nath is an AI/ML solution architect with AWS. He has extensive experience in end-to-end designs and solutions for machine learning; business analytics within financial, operational, and marketing analytics; healthcare; supply chain; and IoT. Outside work, Sovik enjoys traveling and watching movies.

Read More

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

Data is the foundation for machine learning (ML) algorithms. One of the most common formats for storing large amounts of data is Apache Parquet due to its compact and highly efficient format. This means that business analysts who want to extract insights from the large volumes of data in their data warehouse must frequently use data stored in Parquet.

To simplify access to Parquet files, Amazon SageMaker Canvas has added data import capabilities from over 40 data sources, including Amazon Athena, which supports Apache Parquet.

Canvas provides connectors to AWS data sources such as Amazon Simple Storage Service (Amazon S3), Athena, and Amazon Redshift. In this post, we describe how to query Parquet files with Athena using AWS Lake Formation and use the output Canvas to train a model.

Solution overview

Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open table and file formats. Many teams are turning to Athena to enable interactive querying and analyze their data in the respective data stores without creating multiple data copies.

Athena allows applications to use standard SQL to query massive amounts of data on an S3 data lake. Athena supports various data formats, including:

  • CSV
  • TSV
  • JSON
  • text files
  • Open-source columnar formats, such as ORC and Parquet
  • Compressed data in Snappy, Zlib, LZO, and GZIP formats

Parquet files organize the data into columns and use efficient data compression and encoding schemes for fast data storage and retrieval. You can reduce the import time in Canvas by using Parquet files for bulk data imports and with specific columns.

Lake Formation is an integrated data lake service that makes it easy for you to ingest, clean, catalog, transform, and secure your data and make it available for analysis and ML. Lake Formation automatically manages access to the registered data in Amazon S3 through services including AWS Glue, Athena, Amazon Redshift, Amazon QuickSight, and Amazon EMR using Zeppelin notebooks with Apache Spark to ensure compliance with your defined policies.

In this post, we show you how to import Parquet data to Canvas from Athena, where Lake Formation enables data governance.

To illustrate, we use the operations data of a consumer electronics business. We create a model to estimate the demand for electronic products using their historical time series data.

This solution is illustrated in three steps:

  1. Set up the Lake Formation.
  2. Grant Lake Formation access permissions to Canvas.
  3. Import the Parquet data to Canvas using Athena.
  4. Use the imported Parquet data to build ML models with Canvas.

The following diagram illustrates the solution architecture.

Set up the Lake Formation database

The steps listed here form a one-time setup to show you the data lake hosting the Parquet data, which can be consumed by your analysts to gain insights using Canvas. Either cloud engineers or administrators can best perform these prerequisites. Analysts can go directly to Canvas and import the data from Athena.

The data used in this post consist of two datasets sourced from Amazon S3. These datasets have been generated synthetically for this post.

  • Consumer Electronics Target Time Series (TTS) – The historical data of the quantity to forecast is called the Target Time Series (TTS). In this case, it’s the demand for an item.
  • Consumer Electronics Related Time Series (RTS) – Other historical data that is known at exactly the same time as every sales transaction is called the Related Time Series (RTS). In our use case, it’s the price of an item. An RTS dataset includes time series data that isn’t included in a TTS dataset and might improve the accuracy of your predictor.
  1. Upload data to Amazon S3 as Parquet files from these two folders:
    1. ce-rts – Contains Consumer Electronics Related Time Series (RTS).
    2. ce-tts – Contains Consumer Electronics Target Time Series (TTS).

  1. Create a data lake with Lake Formation.
  2. On the Lake Formation console, create a database called consumer-electronics.

  1. Create two tables for the consumer electronics dataset with the names ce-rts-Parquet and ce-tts-Parquet with the data sourced from the S3 bucket.

We use the database we created in this step in a later step to import the Parquet data into Canvas using Athena.

Grant Lake Formation access permissions to Canvas

This is a one-time setup to be done by either cloud engineers or administrators.

  1. Grant data lake permissions to access Canvas to access the consumer-electronics Parquet data.
  2. In the SageMaker Studio domain, view the Canvas user’s details.
  3. Copy the execution role name.
  4. Make sure the execution role has enough permissions to access the following services:
    • Canvas.
    • The S3 bucket where Parquet data is stored.
    • Athena to connect from Canvas.
    • AWS Glue to access the Parquet data using the Athena connector.

  1. In Lake Formation, choose Data Lake permissions in the navigation pane.
  2. Choose Grant.

  1. For Principals, select IAM users and roles to provide Canvas access to your data artifacts.
  2. Specify your SageMaker Studio domain user’s execution role.
  3. Specify the database and tables.
  4. Choose Grant.

You can grant granular actions on the tables, columns, and data. This option provides granular access configuration of your sensitive data by the segregation of roles you have defined.

After you set up the required environment for the Canvas and Athena integration, proceed to the next step to import the data into Canvas using Athena.

Import data using Athena

Complete the following steps to import the Lake Formation-managed Parquet files:

  1. In Canvas, choose Datasets in the navigation pane.

  1. Choose + Import to import the Parquet datasets managed by Lake Formation.

  1. Choose Athena as the data source.

  1. Choose the consumer-electronics dataset in Parquet format from the Athena data catalog and table details in the menu.
  2. Import the two datasets. Drag and drop the data source to select the first one.

When you drag and drop the dataset, the data preview appears in the bottom frame of the page.

  1. Choose Import data.
  2. Enter consumer-electronics-rts as the name for the dataset you’re importing.

Data import takes time based on the data size. The dataset in this example is small, so the import takes a few seconds. When the data import is completed, the status turns from Processing to Ready.

  1. Repeat the import process for the second dataset (ce-tts).

When the ce-tts Parquet data is imported, the Datasets pageshow both datasets.

The imported datasets contain targeted and related time series data. The RTS dataset can help deep learning models improve forecast accuracy.

Let’s join the datasets to prepare for our analysis.

  1. Select the datasets.
  2. Choose Join data.

  1. Select and drag both the datasets to the center pane, which applies an inner join.
  2. Choose the Join icon to see the join conditions applied and to make sure the inner join is applied and the right columns are joined.
  3. Choose Save & close to apply the join condition.

  1. Provide a name for the joined dataset.
  2. Choose Import data.

Joined data is imported and created as a new dataset. The joined dataset source is shown as Join.

Use the Parquet data to build ML models with Canvas

The Parquet data from Lake Formation is now available on Canvas. Now you can run your ML analysis on the data.

  1. Choose Create a custom model in Ready-to-use models from Canvas after successfully importing the data.

  1. Enter a name for the model.
  2. Select your problem type (for this post, Predictive analysis).
  3. Choose Create.

  1. Select the consumer-electronic-joined dataset to train the model to predict the demand for electronic items.

  1. Select demand as the target column to forecast demand for consumer electronic items.

Based on the data provided to Canvas, the Model type is automatically derived as Time series forecasting and provides a Configure time series model option.

  1. Choose the Configure time series model link to provide time series model options.
  2. Enter forecasting configurations as shown in the following screenshot.
  3. Exclude group column because no logical grouping is executed for the dataset.

For building the model, Canvas offers two build options. Choose the option as per your preference. Quick build generally takes around 15–20 minutes, whereas Standard takes around 4 hours.

    • Quick build – Builds a model in a fraction of the time compared to a standard build; potential accuracy is exchanged for speed
    • Standard build – Builds the best model from an optimized process powered by AutoML; speed is exchanged for greatest accuracy
  1. For this post, we choose Quick build for illustrative purposes.

When the quick build is completed, the model evaluation metrics are presented in the Analyze section.

  1. Choose Predict to run a single prediction or batch prediction.

Clean up

Log out from Canvas to avoid future charges.

Conclusion

Enterprises have data in data lakes in various formats, including the highly efficient Parquet format. Canvas has launched more than 40 data sources, including Athena, from which you can easily pull data in various formats from data lakes. To learn more, refer to Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas.

In this post, we took Lake Formation-managed Parquet files and imported them into Canvas using Athena. The Canvas ML model forecasted the demand of consumer electronics using historical demand and price data. Thanks to a user-friendly interface and vivid visualizations, we completed this without writing a single line of code. Canvas now allows business analysts to use Parquet files from data engineering teams and build ML models, conduct analysis, and extract insights independently of data science teams.

To learn more about Canvas, refer to Predict types of machine failures with no-code machine learning using Canvas. Refer to Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capabilities for Business Analysts for more information on creating ML models with a no-code solution.


About the authors

Gopi Mudiyala is a Senior Technical Account Manager at AWS. He helps customers in the Financial Services industry with their operations in AWS. As a machine learning enthusiast, Gopi works to help customers succeed in their ML journey. In his spare time, he likes to play badminton, spend time with family, and travel.

Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.

Read More

Amazon SageMaker Automatic Model Tuning now automatically chooses tuning configurations to improve usability and cost efficiency

Amazon SageMaker Automatic Model Tuning now automatically chooses tuning configurations to improve usability and cost efficiency

Amazon SageMaker Automatic Model Tuning has introduced Autotune, a new feature to automatically choose hyperparameters on your behalf. This provides an accelerated and more efficient way to find hyperparameter ranges, and can provide significant optimized budget and time management for your automatic model tuning jobs.

In this post, we discuss this new capability and some of the benefits it brings.

Hyperparameter overview

When training any machine learning (ML) model, you are generally dealing with three types of data: input data (also called the training data), model parameters, and hyperparameters. You use the input data to train your model, which in effect learns your model parameters. During the training process, your ML algorithms are trying to find the optimal model parameters based on data while meeting the goals of your objective function. For example, when a neural network is trained, the weight of the network nodes is learned from the training, and indicates how much impact it has on the final prediction. These weights are the model parameters.

Hyperparameters, on the other hand, are parameters of a learning algorithm and not the model itself. The number of hidden layers and the number of nodes are some of the examples of hyperparameters you can set for a neural network. The difference between model parameters and hyperparameters is that model parameters are learned during the training process, whereas hyperparameters are set prior to the training and remain constant during the training process.

Pain points

SageMaker automatic model tuning, also called hyperparameter tuning, runs many training jobs on your dataset using a range of hyperparameters that you specify. It can accelerate your productivity by trying many variations of a model. It looks for the best model automatically by focusing on the most promising combinations of hyperparameter values within the ranges that you specify. However, to get good results, you must choose the right ranges to explore.

But how do you know what the right range is to begin with? With hyperparameter tuning jobs, we are assuming that the optimal set of hyperparameters lies within the range that we specified. What happens if the chosen range is not right, and the optimal hyperparameter actually falls outside of the range?

Choosing the right hyperparameters requires experience with the ML technique you are using and understanding how its hyperparameters behave. It’s important to understand the hyperparameter implications because every hyperparameter that you choose to tune has the potential to increase the number of trials required for a successful tuning job. You need to strike an optimal trade-off between resources allocated to the tuning job and achieving the goals you’ve set.

The SageMaker Automatic Model Tuning team is constantly innovating on behalf of our customers to optimize their ML workloads. AWS recently announced support of new completion criteria for hyperparameter optimization: the max runtime criteria, which is a budget control completion criteria that can be used to bound cost and runtime. Desired target metrics, improvement monitoring, and convergence detection monitors the performance of the model and assists with early stopping if the models don’t improve after a defined number of training jobs. Autotune is a new feature of automatic model tuning that helps save you time and reduce wasted resources on finding optimal hyperparameter ranges.

Benefits of Autotune and how automatic model tuning alleviates those pain points

Autotune is a new configuration in the CreateHyperParameterTuningJob API and in the HyperparameterTuner SageMaker Python SDK that alleviates the need to specify the hyperparameter ranges, tuning strategy, objective metrics, or the number of jobs that were required as part of the job definition. Autotune automatically chooses the optimal configurations for your tuning job, helps prevent wasted resources, and accelerates productivity.

The following example showcases how many of the parameters are not necessary when using Autotune.

The following code creates a hyperparameter tuner using the SageMaker Python SDK without Autotune:

estimator = PyTorch(
    entry_point="mnist.py",
    instance_type="ml.p4d.24xlarge",
    hyperparameters={
        "epochs": 1, "backend": "gloo"
    },
)

tuner = HyperparameterTuner(
    estimator, 
    objective_metric_name='validation:rmse',
    objective_type='Minimize',
    hyperparameter_ranges = {
        "lr": ContinuousParameter(0.001, 0.1),
        "batch-size": CategoricalParameter([32, 64, 128, 256, 512])
    },
    metric_definitions=[{...}],
    max_jobs=10,
    strategy="Random"
)

tuner.fit(...)

The following example showcases how many of the parameters are not necessary when using Autotune:

estimator = PyTorch(
    entry_point="mnist.py",
    instance_type="ml.p4d.24xlarge",
    hyperparameters={
        "epochs": 1, "backend": "gloo", "lr": 0.01, "batch-size": 32
    },
)
tuner = HyperparameterTuner(
    estimator, 
    objective_metric_name='validation:rmse',
    objective_type='Minimize', 
    autotune=True
)

If you are using API, the equivalent code would be as follows:

create_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuning_job_name,
    HyperParameterTuningJobConfig=tuning_job_config,
    TrainingJobDefinition=training_job_definition,
    Autotune={'Mode': 'Enabled'},
)

The code example illustrates some of the key benefits of Autotune:

  • A key choice for a tuning job is which hyperparameters to tune and their ranges. Autotune makes this choice for you based on a list of hyperparameters that you provide. Using the previous example, the hyperparameters that Autotune can choose to be tunable are lr and batch-size.
  • Autotune will automatically select the hyperparameter ranges on your behalf. Autotune uses best practices as well as internal benchmarks for selecting the appropriate ranges.
  • Autotune automatically selects the strategy on how to choose the combinations of hyperparameter values to use for the training job.
  • Early stopping is enabled by default when using Autotune. When using early stopping, SageMaker stops training jobs launched by the hyperparameter tuning job when they are unlikely to perform better than previously completed training jobs to avoid additional resource utilization.
  • Maximum expected resources to be consumed by the tuning job (parallel jobs, max runtime, and so on) will be calculated and set in the tuning job record as soon as the tuning job is created. Such reserved resources will not increase during the course of the tuning job; this will maintain an upper bound of cost and duration of the tuning job that is easily predictable by the user. A max runtime of 48 hours will be used by default.

You can override any settings chosen automatically by Autotune. As an example, if you specify your own hyperparameter ranges, those will be used alongside the inferred ranges. Any user-specified hyperparameter range will take precedence over the same named inferred ranges:

estimator = PyTorch(
    ...
    hyperparameters={
        "epochs": 100, "backend": "gloo", "lr": 0.01, "beta1": 0.8
    }

tuner = HyperparameterTuner(
    ...
    hyperparameter_ranges = {
        "lr": ContinuousParameter(0.001, 0.01) # takes precedence over inferred "lr"
    }

Autotune generates a set of settings as part of the tuning job. Any customer-specified settings that have the same name will override the Autotune-selected settings. Any customer-provided settings (that aren’t the same as the named Autotune settings) are added in addition to the Autotune-selected settings.

Inspecting parameters chosen by Autotune

Autotune reduces the time you would normally have spent in deciding on the initial set of hyperparameters to tune. But how do you get insights into what hyperparameter values Autotune ended up choosing? You can get information about decisions made for you in the description of the running tuning job (in the response of the DescribeHyperParameterTuningJob operation). After you submit a request to create a tuning job, the request is processed, and all missing fields are set by Autotune. All set fields are reported in the DescribeHyperParameterTuningJob operation.

Alternatively, you can inspect HyperparameterTuner class fields to see the settings chosen by Autotune.

The following is an XGBoost example of how you may use the DescribeHyperParameterTuningJob to inspect the hyperparameters chosen by Autotune.

First, we create a tuning job with automatic model tuning:

hyperparameters = {
    "objective": "reg:squarederror",
    "num_round": "50",
    "verbosity": "2",
    "max_depth": "5",  # overlap with ranges is ok when Autotune is enabled
}
estimator = XGBoost(hyperparameters=hyperparameters, ...)

hp_tuner = HyperparameterTuner(estimator, autotune=True)
hp_tuner.fit(wait=False)

After the tuning job is created successfully, we can discover what settings Autotune chose. For example, we can describe the tuning job by the name given by it from hp_tuner:

import boto3 
sm = boto3.client('sagemaker')

response = sm.describe_hyper_parameter_tuning_job(
   HyperParameterTuningJobName=hp_tuner.latest_tuning_job.name
)

print(response)

Then we can inspect the generated response to review the settings chosen by Autotune on our behalf.

If the current tuning job settings are not satisfactory, you can stop the tuning job:

hp_tuner.stop()

Conclusion

SageMaker Automatic Model Tuning allows you to reduce the time to tune a model by automatically searching for the best hyperparameter configuration within the ranges that you specify. However, choosing the right hyperparameter ranges can be a time-consuming process and can have direct implications on your training cost and duration.

In this post, we discussed how you can now use Autotune, a new feature introduced as part of automatic model tuning, to automatically pick an initial set of hyperparameter ranges on your behalf. This can reduce the time it takes for you to get started with your model tuning process. Additionally, you can evaluate the ranges picked by Autotune and adjust them according to your needs.

We also showed how Autotune can automatically pick the optimal parameter settings on your behalf, such as the number of training jobs, the strategy to choose the hyperparameter combinations, and enabling early stopping by default. This can result in significantly optimized budget and time bounds that are easily predictable.

To learn more, refer to Perform Automatic Model Tuning with SageMaker.


About the Authors

Jas Singh is a Senior Solutions Architect helping public sector customers achieve their business outcomes through architecting and implementing innovative and resilient solutions at scale. Jas has over 20 years of experience in designing and implementing mission-critical applications and holds a master’s degree in computer science from Baylor University.

Gopi Mudiyala is a Senior Technical Account Manager at AWS. He helps customers in the Financial Services industry with their operations in AWS. As a machine learning enthusiast, Gopi works to help customers succeed in their ML journey. In his spare time, he likes to play badminton, spend time with family, and travel.

Raviteja Yelamanchili is an Enterprise Solutions Architect with Amazon Web Services based in New York. He works with large financial services enterprise customers to design and deploy highly secure, scalable, reliable, and cost-effective applications on the cloud. He brings over 11 years of risk management, technology consulting, data analytics, and machine learning experience. When he is not helping customers, he enjoys traveling and playing PS5.

Iaroslav Shcherbatyi is a Machine Learning Engineer at AWS. He works mainly on improvements to the Amazon SageMaker platform and helping customers best use its features. In his spare time, he likes to go to gym, do outdoor sports such as ice skating or hiking, and to catch up on new AI research.

Read More

Train a Large Language Model on a single Amazon SageMaker GPU with Hugging Face and LoRA

Train a Large Language Model on a single Amazon SageMaker GPU with Hugging Face and LoRA

This post is co-written with Philipp Schmid from Hugging Face.

We have all heard about the progress being made in the field of large language models (LLMs) and the ever-growing number of problem sets where LLMs are providing valuable insights. Large models, when trained over massive datasets and several tasks, are also able to generalize well over tasks that they aren’t trained specifically for. Such models are called foundation models, a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence. Even though these foundation models are able to generalize well, especially with the help of prompt engineering techniques, often the use case is so domain specific, or the task is so different, that the model needs further customization. One approach to improve performance of a large model for a specific domain or task is to further train the model with a smaller, task-specific dataset. Although this approach, known as fine-tuning, successfully improves the accuracy of LLMs, it requires modifying all of the model weights. Fine-tuning is much faster than the pre-training of a model thanks to the much smaller dataset size, but still requires significant computing power and memory. Fine-tuning modifies all the parameter weights of the original model, which makes it expensive and results in a model that is the same size as the original.

To address these challenges, Hugging Face introduced the Parameter-Efficient Fine-Tuning library (PEFT). This library allows you to freeze most of the original model weights and replace or extend model layers by training an additional, much smaller, set of parameters. This makes training much less expensive in terms of required compute and memory.

In this post, we show you how to train the 7-billion-parameter BloomZ model using just a single graphics processing unit (GPU) on Amazon SageMaker, Amazon’s machine learning (ML) platform for preparing, building, training, and deploying high-quality ML models. BloomZ is a general-purpose natural language processing (NLP) model. We use PEFT to optimize this model for the specific task of summarizing messenger-like conversations. The single-GPU instance that we use is a low-cost example of the many instance types AWS provides. Training this model on a single GPU highlights AWS’s commitment to being the most cost-effective provider of AI/ML services.

The code for this walkthrough can be found on the Hugging Face notebooks GitHub repository under the sagemaker/24_train_bloom_peft_lora folder.

Prerequisites

In order to follow along, you should have the following prerequisites:

  • An AWS account.
  • A Jupyter notebook within Amazon SageMaker Studio or SageMaker notebook instances.
  • You will need access to the SageMaker ml.g5.2xlarge instance type, containing a single NVIDIA A10G GPU. On the AWS Management Console, navigate to Service Quotas for SageMaker and request a 1-instance increase for the following quotas: ml.g5.2xlarge for training job usage and ml.g5.2xlarge for training job usage.
  • After your requested quotas are applied to your account, you can use the default Studio Python 3 (Data Science) image with a ml.t3.medium instance to run the notebook code snippets. For the full list of available kernels, refer to Available Amazon SageMaker Kernels.

Set up a SageMaker session

Use the following code to set up your SageMaker session:

import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it does not exist
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

Load and prepare the dataset

We use the samsum dataset, a collection of 16,000 messenger-like conversations with summaries. The conversations were created and written down by linguists fluent in English. The following is an example of the dataset:

{
  "id": "13818513",
  "summary": "Amanda baked cookies and will bring Jerry some tomorrow.",
  "dialogue": "Amanda: I baked cookies. Do you want some?rnJerry: Sure!rnAmanda: I'll bring you tomorrow :-)"
}

To train the model, you need to convert the inputs (text) to token IDs. This is done by a Hugging Face Transformers tokenizer. For more information, refer to Chapter 6 of the Hugging Face NLP Course.

Convert the inputs with the following code:

from transformers import AutoTokenizer

model_id="bigscience/bloomz-7b1"

# Load tokenizer of BLOOMZ
tokenized = AutoTokenizer.from_pretrained(model_id)
tokenizer.model_max_length = 2048 # overwrite wrong value

Before starting training, you need to process the data. Once it’s trained, the model will take a set of text messages as the input and generate a summary as the output. You need to format the data as a prompt (the messages) with a correct response (the summary). You also need to chunk examples into longer input sequences to optimize the model training. See the following code:

from random import randint
from itertools import chain
from functools import partial

# custom instruct prompt start
prompt_template = f"Summarize the chat dialogue:n{{dialogue}}n---nSummary:n{{summary}}{{eos_token}}"

# template dataset to add prompt to each sample
def template_dataset(sample):
    sample["text"] = prompt_template.format(dialogue=sample["dialogue"],
                                            summary=sample["summary"],
                                            eos_token=tokenizer.eos_token)
    return sample


# apply prompt template per sample
dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))

print(dataset[randint(0, len(dataset))]["text"])

# empty list to save remainder from batches to use in next batch
remainder = {"input_ids": [], "attention_mask": []}


def chunk(sample, chunk_length=2048):
    # define global remainder variable to save remainder from batches to use in next batch
    global remainder
    # Concatenate all texts and add remainder from previous batch
    concatenated_examples = {k: list(chain(*sample[k])) for k in sample.keys()}
    concatenated_examples = {k: remainder[k] + concatenated_examples[k] for k in concatenated_examples.keys()}
    # get total number of tokens for batch
    batch_total_length = len(concatenated_examples[list(sample.keys())[0]])

    # get max number of chunks for batch
    if batch_total_length >= chunk_length:
        batch_chunk_length = (batch_total_length // chunk_length) * chunk_length

    # Split by chunks of max_len.
    result = {
        k: [t[i : i + chunk_length] for i in range(0, batch_chunk_length, chunk_length)]
        for k, t in concatenated_examples.items()
    }
    # add remainder to global variable for next batch
    remainder = {k: concatenated_examples[k][batch_chunk_length:] for k in concatenated_examples.keys()}
    # prepare labels
    result["labels"] = result["input_ids"].copy()
    return result


# tokenize and chunk dataset
lm_dataset = dataset.map(
    lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(dataset.features)
).map(
    partial(chunk, chunk_length=2048),
    batched=True,
)

# Print total number of samples
print(f"Total number of samples: {len(lm_dataset)}")

Now you can use the FileSystem integration to upload the dataset to Amazon Simple Storage Service (Amazon S3):

# save train_dataset to s3
training_input_path = f's3://{sess.default_bucket()}/processed/samsum-sagemaker/train'
lm_dataset.save_to_disk(training_input_path)

print("uploaded data to:")
print(f"training dataset to: {training_input_path}")

In [ ]:
training_input_path="s3://sagemaker-us-east-1-558105141721/processed/samsum-sagemaker/train"

Fine-tune BLOOMZ-7B with LoRA and bitsandbytes int-8 on SageMaker

The Hugging Face BLOOMZ-7B model card indicates its initial training was distributed over 8 nodes with 8 A100 80 GB GPUs and 512 GB memory CPUs each. This computing configuration is not readily accessible, is cost-prohibitive to consumers, and requires expertise in distributed training performance optimization. SageMaker lowers the barriers to replication of this setup through its distributed training libraries; however, the cost of comparable eight on-demand ml.p4de.24xlarge instances would be $376.88 per hour. Furthermore, the fully trained model consumes about 40 GB of memory, which exceeds the available memory of many individual consumer available GPUs and requires strategies to address for large model inferencing. As a result, full fine-tuning of the model for your task over multiple model runs and deployment would require significant compute, memory, and storage costs on hardware that isn’t readily accessible to consumers.

Our goal is to find a way to adapt BLOOMZ-7B to our chat summarization use case in a more accessible and cost-effective way while maintaining accuracy. To enable our model to be fine-tuned on a SageMaker ml.g5.2xlarge instance with a single consumer-grade NVIDIA A10G GPU, we employ two techniques to reduce the compute and memory requirements for fine-tuning: LoRA and quantization.

LoRA (Low Rank Adaptation) is a technique that significantly reduces the number of model parameters and associated compute needed for fine-tuning to a new task without a loss in predictive performance. First, it freezes your original model weights and instead optimizes smaller rank-decomposition weight matrices to your new task rather than updating the full weights, and then injects these adapted weights back into the original model. Consequently, fewer weight gradient updates means less compute and GPU memory during fine-tuning. The intuition behind this approach is that LoRA allows LLMs to focus on the most important input and output tokens while ignoring redundant and less important tokens. To deepen your understanding of the LoRA technique, refer to the original paper LoRA: Low-Rank Adaptation of Large Language Models.

In addition to the LoRA technique, you use the bitsanbytes Hugging Face integration LLM.int8() method to quantize out the frozen BloomZ model, or reduce the precision of the weight and bias values, by rounding them from float16 to int8. Quantization reduces the needed memory for BloomZ by about four times, which enables you to fit the model on the A10G GPU instance without a significant loss in predictive performance. To deepen your understanding of how int8 quantization works, its implementation in the bitsandbytes library, and its integration with the Hugging Face Transformers library, see A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes.

Hugging Face has made LoRA and quantization accessible across a broad range of transformer models through the PEFT library and its integration with the bitsandbytes library. The create_peft_config() function in the prepared script run_clm.py illustrates their usage in preparing your model for training:

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8, # Lora attention dimension.
        lora_alpha=32, # the alpha parameter for Lora scaling.
        lora_dropout=0.05, # the dropout probability for Lora layers.
        target_modules=["query_key_value"],
    )

    # prepare int-8 model for training
    model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model

With LoRA, the output from print_trainable_parameters()indicates we were able to reduce the number of model parameters from 7 billion to 3.9 million. This means that only 5.6% of the original model parameters need to be updated. This significant reduction in compute and memory requirements allows us to fit and train our model on the GPU without issues.

To create a SageMaker training job, you will need a Hugging Face estimator. The estimator handles end-to-end SageMaker training and deployment tasks. SageMaker takes care of starting and managing all the required Amazon Elastic Compute Cloud (Amazon EC2) instances for you. Additionally, it provides the correct Hugging Face training container, uploads the provided scripts, and downloads the data from our S3 bucket into the container at the path /opt/ml/input/data. Then, it starts the training job. See the following code:

import time
# define Training Job Name 
job_name = f'huggingface-peft-{time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())}'

from sagemaker.huggingface import HuggingFace

# hyperparameters, which are passed into the training job
hyperparameters ={
  'model_id': model_id,                                # pre-trained model
  'dataset_path': '/opt/ml/input/data/training', # path where sagemaker will save training dataset
  'epochs': 3,                                         # number of training epochs
  'per_device_train_batch_size': 1,                    # batch size for training
  'lr': 2e-4,                                          # learning rate used during training
}

# create the Estimator
huggingface_estimator = HuggingFace(
    entry_point          = 'run_clm.py',      # train script
    source_dir           = 'scripts',         # directory which includes all the files needed for training
    instance_type        = 'ml.g5.2xlarge', # instances type used for the training job
    instance_count       = 1,                 # the number of instances used for training
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # IAM role used in training job to access AWS resources, e.g. S3
    volume_size          = 300,               # the size of the EBS volume in GB
    transformers_version = '4.26',            # the transformers version used in the training job
    pytorch_version      = '1.13',            # the pytorch_version version used in the training job
    py_version           = 'py39',            # the python version used in the training job
    hyperparameters      =  hyperparameters
)

You can now start your training job using the .fit() method and passing the S3 path to the training script:

# define a data input dictionary with our uploaded s3 uris
data = {'training': training_input_path}

# starting the train job with our uploaded datasets as inputs
huggingface_estimator.fit(data, wait=True)

Using LoRA and quantization makes fine-tuning BLOOMZ-7B to our task affordable and efficient with SageMaker. When using SageMaker training jobs, you only pay for GPUs for the duration of model training. In our example, the SageMaker training job took 20,632 seconds, which is about 5.7 hours. The ml.g5.2xlarge instance we used costs $1.515 per hour for on-demand usage. As a result, the total cost for training our fine-tuned BLOOMZ-7B model was only $8.63. Comparatively, full fine-tuning of the model’s 7 billion weights would cost an estimated $600, or 6,900% more per training run, assuming linear GPU scaling on the original computing configuration outlined in the Hugging Face model card. In practice, this would further vary depending upon your training strategy, instance selection, and instance pricing.

We could also further reduce our training costs by using SageMaker managed Spot Instances. However, there is a possibility this would result in the total training time increasing due to Spot Instance interruptions. See Amazon SageMaker Pricing for instance pricing details.

Deploy the model to a SageMaker endpoint for inference

With LoRA, you previously adapted a smaller set of weights to your new task. You need a way to combine these task-specific weights with the pre-trained weights of the original model. In the run_clm.py script, the PEFT library merge_and_unload() method takes care of merging the base BLOOMZ-7B model with the updated adapter weights fine-tuned to your task to make them easier to deploy without introducing any inference latency compared to the original model.

In this section, we go through the steps to create a SageMaker model from the fine-tuned model artifact and deploy it to a SageMaker endpoint for inference. First, you can create a Hugging Face model using your new fine-tuned model artifact for deployment to a SageMaker endpoint. Because you previously trained the model with a SageMaker Hugging Face estimator, you can deploy the model immediately. You could instead upload the trained model to an S3 bucket and use them to create a model package later. See the following code:

from sagemaker.huggingface import HuggingFaceModel

# 1. create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=huggingface_estimator.model_data,
   #model_data="s3://hf-sagemaker-inference/model.tar.gz",  # Change to your model path
   role=role, 
   transformers_version="4.26", 
   pytorch_version="1.13", 
   py_version="py39",
   model_server_workers=1
)

As with any SageMaker estimator, you can deploy the model using the deploy() method from the Hugging Face estimator object, passing in the desired number and type of instances. In this example, we use the same G5 instance type equipped with a single NVIDIA A10g GPU that the model was fine-tuned on in the previous step:

# 2. deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type= "ml.g5.4xlarge"
)

It may take 5–10 minutes for the SageMaker endpoint to bring your instance online and download your model in order to be ready to accept inference requests.

When the endpoint is running, you can test it by sending a sample dialog from the dataset test split. First load the test split using the Hugging Face Datasets library. Next, select a random integer for index slicing a single test sample from the dataset array. Using string formatting, combine the test sample with a prompt template into a structured input to guide our model’s response. This structured input can then be combined with additional model input parameters into a formatted sample JSON payload. Finally, invoke the SageMaker endpoint with the formatted sample and print the model’s output summarizing the sample dialog. See the following code:

from random import randint
from datasets import load_dataset

# 1. Load dataset from the hub
test_dataset = load_dataset("samsum", split="test")

# 2. select a random test sample
sample = test_dataset[randint(0,len(test_dataset))]

# 3. format the sample
prompt_template = f"Summarize the chat dialogue:n{{dialogue}}n---nSummary:n"

fomatted_sample = {
  "inputs": prompt_template.format(dialogue=sample["dialogue"]),
  "parameters": {
    "do_sample": True, # sample output predicted probabilities
    "top_p": 0.9, # sampling technique Fan et. al (2018)
    "temperature": 0.1, # increasing the likelihood of high probability words and decreasing the likelihood of low probability words
    "max_new_tokens": 100, # 
  }
}

# 4. Invoke the SageMaker endpoint with the formatted sample
res = predictor.predict(fomatted_sample)


# 5. Print the model output
print(res[0]["generated_text"].split("Summary:")[-1])
# Sample model output: Kirsten and Alex are going bowling this Friday at 7 pm. They will meet up and then go together.

Now let’s compare the model summarized dialog output to the test sample summary:

print(sample["summary"])
# Sample model input: Kirsten reminds Alex that the youth group meets this Friday at 7 pm to go bowling.

Clean up

Now that you’ve tested your model, make sure that you clean up the associated SageMaker resources to prevent continued charges:

predictor.delete_model()
predictor.delete_endpoint()

Summary

In this post, you used the Hugging Face Transformer, PEFT, and the bitsandbytes libraries with SageMaker to fine-tune a BloomZ large language model on a single GPU for $8 and then deployed the model to a SageMaker endpoint for inference on a test sample. SageMaker offers multiple ways to use Hugging Face models; for more examples, check out the AWS Samples GitHub.

To continue using SageMaker to fine-tune foundation models, try out some of the techniques in the post Architect personalized generative AI SaaS applications on Amazon SageMaker. We also encourage you to learn more about Amazon Generative AI capabilities by exploring JumpStartAmazon Titan models, and Amazon Bedrock.


About the Authors

Philipp Schmid is a Technical Lead at Hugging Face with the mission to democratize good machine learning through open source and open science. Philipp is passionate about productionizing cutting-edge and generative AI machine learning models. He loves to share his knowledge on AI and NLP at various meetups such as Data Science on AWS, and on his technical blog.

Robert Fisher is a Sr. Solutions Architect for Healthcare and Life Sciences customers. He works closely with customers to understand how AWS can help them solve problems, especially in the AI/ML space. Robert has many years of experience in software engineering across a range of industry verticals including medical devices, fintech, and consumer-facing applications.

Doug Kelly is an AWS Sr. Solutions Architect that serves as a trusted technical advisor to top machine learning startups in verticals ranging from machine learning platforms, autonomous vehicles, to precision agriculture. He is member of the AWS ML technical field community where he specializes in supporting customers with MLOps and ML inference workloads.

Read More

Announcing the launch of new Hugging Face LLM Inference containers on Amazon SageMaker

Announcing the launch of new Hugging Face LLM Inference containers on Amazon SageMaker

This post is co-written with Philipp Schmid and Jeff Boudier from Hugging Face.

Today, as part of Amazon Web Services’ partnership with Hugging Face, we are excited to announce the release of a new Hugging Face Deep Learning Container (DLC) for inference with Large Language Models (LLMs). This new Hugging Face LLM DLC is powered by Text Generation Inference (TGI), an open source, purpose-built solution for deploying and serving Large Language Models. TGI enables high-performance text generation using Tensor Parallelism and dynamic batching for the most popular open-source LLMs, including StarCoder, BLOOM, GPT-NeoX, StableLM, Llama, and T5.

Large Language Models are growing in popularity but can be difficult to deploy

LLMs have emerged as the leading edge of artificial intelligence, captivating developers and enthusiasts alike with their ability to comprehend and generate human-like text across diverse domains. These powerful models, such as those based on the GPT and T5 architectures, have experienced an unprecedented surge in popularity for a broad set of applications, including language understanding, conversational experiences, and automated writing assistance. As a result, companies across industries are seizing the opportunity to unlock their potential and offer new LLM-powered experiences in their applications.

Hosting LLMs at scale presents a unique set of complex engineering challenges. To provide an ideal user experience, an LLM hosting service should provide adequate response times while scaling to a large number of concurrent users. Given the high resource requirements of large models, general-purpose inference frameworks may not provide the optimizations required to maximize the utilization of available resources and provide the best possible performance.

Some of these optimizations include:

  • Tensor parallelism to distribute the computation across multiple accelerators
  • Model quantization to reduce the memory footprint of the model
  • Dynamic batching of inference requests to improve throughput, and many others.

The Hugging Face LLM DLC provides these optimizations out of the box and makes it easier to host LLM models at scale.

Hugging Face’s Text Generation Inference simplifies LLM deployment

TGI is an open source, purpose-built solution for deploying Large Language Models (LLMs). It incorporates optimizations including tensor parallelism for faster multi-GPU inference, dynamic batching to boost overall throughput, and optimized transformers code using flash-attention for popular model architectures including BLOOM, T5, GPT-NeoX, StarCoder, and LLaMa.

With the new Hugging Face LLM Inference DLCs on Amazon SageMaker, AWS customers can benefit from the same technologies that power highly concurrent, low latency LLM experiences like HuggingChat, OpenAssistant, and Inference API for LLM models on the Hugging Face Hub, while enjoying SageMaker’s managed service capabilities, such as autoscaling, health checks, and model monitoring.

Get started with TGI on SageMaker Hosting

Let’s walk through a code example that deploys a GPT NeoX 20B parameter model on a SageMaker Endpoint. You can find our complete example notebook here.

First, make sure that the latest version of SageMaker SDK is installed:

%pip install sagemaker>=2.161.0

Then, we import the SageMaker Python SDK and instantiate a sagemaker_session to find the current region and execution role.

import sagemaker
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
import time

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
role = sagemaker.get_execution_role()

Next we retrieve the LLM image URI. We use the helper function get_huggingface_llm_image_uri() to generate the appropriate image URI for the Hugging Face Large Language Model (LLM) inference. The function takes a required parameter backend and several optional parameters. The backend specifies the type of backend to use for the model, the values can be “lmi” and “huggingface”.  “lmi” stands for SageMaker Large Model Inference backend and “huggingface” refers to using Hugging Face TGI backend that is used in this tutorial.

image_uri = get_huggingface_llm_image_uri(
  backend="huggingface", # or lmi
  region=region
)

Now that we have the image uri, the next step is to configure the model object. We specify a unique name, the image_uri for the managed TGI container, and the execution role for the endpoint. Additionally, we specify a number of environment variables including the HF_MODEL_ID which corresponds to the model from the HuggingFace Hub that will be deployed, and the HF_TASK which configures the inference task to be performed by the model.

You should also define SM_NUM_GPUS, which specifies the tensor parallelism degree of the model. Tensor parallelism can be used to split the model across multiple GPUs, which is necessary when working with LLMs that are too big for a single GPU. To learn more about tensor parallelism with inference, see our previous blog post. Here, you should set SM_NUM_GPUS to the number of available GPUs on your selected instance type. For example, in this tutorial, we set SM_NUM_GPUS to 4 because our selected instance type ml.g4dn.12xlarge has 4 available GPUs.

Note that you can optionally reduce the memory and computational footprint of the model by setting the HF_MODEL_QUANTIZE environment variable to “true”, but this lower weight precision could affect the quality of the output for some models.

model_name = "gpt-neox-20b-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

hub = {
    'HF_MODEL_ID':'EleutherAI/gpt-neox-20b',
    'HF_TASK':'text-generation',
    'SM_NUM_GPUS':'4',
    'HF_MODEL_QUANTIZE':'true'
}

model = HuggingFaceModel(
    name=model_name,
    env=hub,
    role=role,
    image_uri=image_uri
)

Next, we invoke the deploy method to deploy the model.

predictor = model.deploy(
  initial_instance_count=1,
  instance_type="ml.g4dn.12xlarge",
  endpoint_name=model_name
)

Once the model is deployed, we can invoke it to generate text. We pass an input prompt and run the predict method to generate a text response from the LLM running in the TGI container.

input_data = {
  "inputs": "The diamondback terrapin was the first reptile to",
  "parameters": {
    "do_sample": True,
    "max_new_tokens": 100,
    "temperature": 0.7,
    "watermark": True
  }
}

predictor.predict(input_data)

We receive the following auto-generated text response:.

[{'generated_text': 'The diamondback terrapin was the first reptile to make the list, followed by the American alligator, the American crocodile, and the American box turtle. The polecat, a ferret-like animal, and the skunk rounded out the list, both having gained their slots because they have proven to be particularly dangerous to humans.nnCalifornians also seemed to appreciate the new list, judging by the comments left after the election.nn“This is fantastic,” one commenter declared.nn“California is a very'}]

To mitigate the risk of potential exploitation of Generative AI capabilities by automated bots, the response is watermarked. Such watermarked responses can be easily detected by algorithms, promoting the responsible use of Generative AI.



Once we are done experimenting, we delete the endpoint and the model resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion and next steps

Deploying Large Language Models using Hugging Face’s Text Generation Inference and SageMaker Hosting is a straightforward solution for hosting open source models like GPT-NeoX, Flan-T5-XXL, StarCoder or LLaMa. The state of the art LLMs are deployed within the secure managed SageMaker environment, and AWS customers can benefit from Large Language Models while keeping full control over their implementation, and without sending their data over to a third-party API.

In the tutorial, we demonstrated the deployment of GPT-NeoX using the new Hugging Face LLM Inference DLC, leveraging the power of 4 GPUs on a SageMaker ml.g4dn.12xlarge  instance. With this approach, users can effortlessly harness the capabilities of state-of-the-art language models, enabling a wide range of applications and advancements in natural language processing.

As a next step, you can learn more about Hugging Face LLM Inference on SageMaker with the following resources:


About the authors


Philipp Schmid
is a Technical Lead at Hugging Face with the mission to democratize good machine learning through open source and open science. Philipp is passionate about productionizing cutting-edge & generative AI machine learning models.

Jeff Boudier builds products at Hugging Face, the #1 open platform for AI builders. Previously Jeff was a co-founder of Stupeflix, acquired by GoPro, where he served as director of Product Management, Product Marketing,  Business Development and Corporate Development.

Robert Van Dusen is a Senior Product Manager with Amazon SageMaker. He leads deep learning model optimization for applications such as large model inference.

Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.

Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.

Xin Yang is a Software Development Engineer at AWS. She has been working on deploying and optimizing deep learning inference systems. Her work spans both the realms of real-time inference and scalable offline inference solutions. In her spare time, Xin enjoys reading and hiking.

Gagan Singh is a Senior Technical Account Manager at AWS helping digital native startups maximize business success. He helps customers with adoption and optimization of real-time, multi-model ML inferencing endpoints using Amazon SageMaker. In his spare time, Gagan enjoys trekking in the Himalayas and listening to music.

Read More

Implement a multi-object tracking solution on a custom dataset with Amazon SageMaker

Implement a multi-object tracking solution on a custom dataset with Amazon SageMaker

The demand for multi-object tracking (MOT) in video analysis has increased significantly in many industries, such as live sports, manufacturing, and traffic monitoring. For example, in live sports, MOT can track soccer players in real time to analyze physical performance such as real-time speed and moving distance.

Since its introduction in 2021, ByteTrack remains to be one of best performing methods on various benchmark datasets, among the latest model developments in MOT application. In ByteTrack, the author proposed a simple, effective, and generic data association method (referred to as BYTE) for detection box and tracklet matching. Rather than only keep the high score detection boxes, it also keeps the low score detection boxes, which can help recover unmatched tracklets with these low score detection boxes when occlusion, motion blur, or size changing occurs. The BYTE association strategy can also be used in other Re-ID based trackers, such as FairMOT. The experiments showed improvements compared to the vanilla tracker algorithms. For example, FairMOT achieved an improvement of 1.3% on MOTA (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking), which is one of the main metrics in the MOT task when applying BYTE in data association.

In the post Train and deploy a FairMOT model with Amazon SageMaker, we demonstrated how to train and deploy a FairMOT model with Amazon SageMaker on the MOT challenge datasets. When applying a MOT solution in real-world cases, you need to train or fine-tune a MOT model on a custom dataset. With Amazon SageMaker Ground Truth, you can effectively create labels on your own video dataset.

Following on the previous post, we have added the following contributions and modifications:

  • Generate labels for a custom video dataset using Ground Truth
  • Preprocess the Ground Truth generated label to be compatible with ByteTrack and other MOT solutions
  • Train the ByteTrack algorithm with a SageMaker training job (with the option to extend a pre-built container)
  • Deploy the trained model with various deployment options, including asynchronous inference

We also provide the code sample on GitHub, which uses SageMaker for labeling, building, training, and inference.

SageMaker is a fully managed service that provides every developer and data scientist with the ability to prepare, build, train, and deploy machine learning (ML) models quickly. SageMaker provides several built-in algorithms and container images that you can use to accelerate training and deployment of ML models. Additionally, custom algorithms such as ByteTrack can also be supported via custom-built Docker container images. For more information about deciding on the right level of engagement with containers, refer to Using Docker containers with SageMaker.

SageMaker provides plenty of options for model deployment, such as real-time inference, serverless inference, and asynchronous inference. In this post, we show how to deploy a tracking model with different deployment options, so that you can choose the suitable deployment method in your own use case.

Overview of solution

Our solution consists of the following high-level steps:

  1. Label the dataset for tracking, with a bounding box on each object (for example, pedestrian, car, and so on). Set up the resources for ML code development and execution.
  2. Train a ByteTrack model and tune hyperparameters on a custom dataset.
  3. Deploy the trained ByteTrack model with different deployment options depending on your use case: real-time processing, asynchronous, or batch prediction.

The following diagram illustrates the architecture in each step.
overview_flow

Prerequisites

Before getting started, complete the following prerequisites:

  1. Create an AWS account or use an existing AWS account.
  2. We recommend running the source code in the us-east-1 Region.
  3. Make sure that you have a minimum of one GPU instance (for example, ml.p3.2xlarge for single GPU training, or ml.p3.16xlarge) for the distributed training job. Other types of GPU instances are also supported, with various performance differences.
  4. Make sure that you have a minimum of one GPU instance (for example, ml.p3.2xlarge) for inference endpoint.
  5. Make sure that you have a minimum of one GPU instance (for example, ml.p3.2xlarge) for running batch prediction with processing jobs.

If this is your first time running SageMaker services on the aforementioned instance types, you may have to request a quota increase for the required instances.

Set up your resources

After you complete all the prerequisites, you’re ready to deploy the solution.

  1. Create a SageMaker notebook instance. For this task, we recommend using the ml.t3.medium instance type. While running the code, we use docker build to extend the SageMaker training image with the ByteTrack code (the docker build command will be run locally within the notebook instance environment). Therefore, we recommend increasing the volume size to 100 GB (default volume size to 5 GB) from the advanced configuration options. For your AWS Identity and Access Management (IAM) role, choose an existing role or create a new role, and attach the AmazonS3FullAccess, AmazonSNSFullAccess, AmazonSageMakerFullAccess, and AmazonElasticContainerRegistryPublicFullAccess policies to the role.
  2. Clone the GitHub repo to the /home/ec2-user/SageMaker folder on the notebook instance you created.
  3. Create a new Amazon Simple Storage Service (Amazon S3) bucket or use an existing bucket.

Label the dataset

In the data-preparation.ipynb notebook, we download an MOT16 test video file and split the video file into small video files with 200 frames. Then we upload those video files to the S3 bucket as the data source for labeling.

To label the dataset for the MOT task, refer to Getting started. When the labeling job is complete, we can access the following annotation directory at the job output location in the S3 bucket.

The manifests directory should contain an output folder if we finished labeling all the files. We can see the file output.manifest in the output folder. This manifest file contains information about the video and video tracking labels that you can use later to train and test a model.

Train a ByteTrack model and tune hyperparameters on the custom dataset

To train your ByteTrack model, we use the bytetrack-training.ipynb notebook. The notebook consists of the following steps:

  1. Initialize the SageMaker setting.
  2. Perform data preprocessing.
  3. Build and push the container image.
  4. Define a training job.
  5. Launch the training job.
  6. Tune hyperparameters.

Especially in data preprocessing, we need to convert the labeled dataset with the Ground Truth output format to the MOT17 format dataset, and convert the MOT17 format dataset to a MSCOCO format dataset (as shown in the following figure) so that we can train a YOLOX model on the custom dataset. Because we keep both the MOT format dataset and MSCOCO format dataset, you can train other MOT algorithms without separating detection and tracking on the MOT format dataset. You can easily change the detector to other algorithms such as YOLO7 to use your existing object detection algorithm.

Deploy the trained ByteTrack model

After we train the YOLOX model, we deploy the trained model for inference. SageMaker provides several options for model deployment, such as real-time inference, asynchronous inference, serverless inference, and batch inference. In our post, we use the sample code for real-time inference, asynchronous inference, and batch inference. You can choose the suitable code from these options based on your own business requirements.

Because SageMaker batch transform requires the data to be partitioned and stored on Amazon S3 as input and the invocations are sent to the inference endpoints concurrently, it doesn’t meet the requirements in object tracking tasks where the targets need to be sent in a sequential manner. Therefore, we don’t use the SageMaker batch transform jobs to run the batch inference. In this example, we use SageMaker processing jobs to do batch inference.

The following table summarizes the configuration for our inference jobs.

Inference Type Payload Processing Time Auto Scaling
Real-time Up to 6 MB Up to 1 minute Minimum instance count is 1 or higher
Asynchronous Up to 1 GB Up to 15 minutes Minimum instance count can be zero
Batch (with processing job) No limit No limit Not supported

Deploy a real-time inference endpoint

To deploy a real-time inference endpoint, we can run the bytetrack-inference-yolox.ipynb notebook. We separate ByteTrack inference into object detection and tracking. In the inference endpoint, we only run the YOLOX model for object detection. In the notebook, we create a tracking object, receive the result of object detection from the inference endpoint, and update trackers.

We use SageMaker PyTorchModel SDK to create and deploy a ByteTrack model as follows:

from sagemaker.pytorch.model import PyTorchModel
 
pytorch_model = PyTorchModel(
    model_data=s3_model_uri,
    role=role,
    source_dir="sagemaker-serving/code",
    entry_point="inference.py",
    framework_version="1.7.1",
    py_version="py3",
)
 
endpoint_name =<endpint name>
pytorch_model.deploy(
    initial_instance_count=1,
    instance_type="ml.p3.2xlarge",
    endpoint_name=endpoint_name
)

After we deploy the model to an endpoint successfully, we can invoke the inference endpoint with the following code snippet:

with open(f"datasets/frame_{frame_id}.png", "rb") as f:
    payload = f.read()

response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="application/x-image", Body=payload
)
outputs = json.loads(response["Body"].read().decode())

We run the tracking task on the client side after accepting the detection result from the endpoint (see the following code). By drawing the tracking results in each frame and saving as a tracking video, you can confirm the tracking result on the tracking video.

aspect_ratio_thresh = 1.6
min_box_area = 10
tracker = BYTETracker(
        frame_rate=30,
        track_thresh=0.5,
        track_buffer=30,
        mot20=False,
        match_thresh=0.8
    )

online_targets = tracker.update(torch.as_tensor(outputs[0]), [height, width], (800, 1440))
online_tlwhs = []
online_ids = []
online_scores = []
for t in online_targets:
    tlwh = t.tlwh
    tid = t.track_id
    vertical = tlwh[2] / tlwh[3] > aspect_ratio_thresh
    if tlwh[2] * tlwh[3] > min_box_area and not vertical:
        online_tlwhs.append(tlwh)
        online_ids.append(tid)
        online_scores.append(t.score)
        results.append(
            f"{frame_id},{tid},{tlwh[0]:.2f},{tlwh[1]:.2f},{tlwh[2]:.2f},{tlwh[3]:.2f},{t.score:.2f},-1,-1,-1n"
        )
online_im = plot_tracking(
    frame, online_tlwhs, online_ids, frame_id=frame_id + 1, fps=1. / timer.average_time
)

Deploy an asynchronous inference endpoint

SageMaker asynchronous inference is the ideal option for requests with large payload sizes (up to 1 GB), long processing times (up to 1 hour), and near-real-time latency requirements. For MOT tasks, it’s common that a video file is beyond 6 MB, which is the payload limit of a real-time endpoint. Therefore, we deploy an asynchronous inference endpoint. Refer to Asynchronous inference for more details of how to deploy an asynchronous endpoint. We can reuse the model created for the real-time endpoint; for this post, we put a tracking process into the inference script so that we can get the final tracking result directly for the input video.

To use scripts related to ByteTrack on the endpoint, we need to put the tracking script and model into the same folder and compress the folder as the model.tar.gz file, and then upload it to the S3 bucket for model creation. The following diagram shows the structure of model.tar.gz.

We need to explicitly set the request size, response size, and response timeout as the environment variables, as shown in the following code. The name of the environment variable varies depending on the framework. For more details, refer to Create an Asynchronous Inference Endpoint.

pytorch_model = PyTorchModel(
    model_data=s3_model_uri,
    role=role,
    entry_point="inference.py",
    framework_version="1.7.1",
    sagemaker_session=sm_session,
    py_version="py3",
    env={
        'TS_MAX_REQUEST_SIZE': '1000000000', #default max request size is 6 Mb for torchserve, need to update it to support the 1GB input payload
        'TS_MAX_RESPONSE_SIZE': '1000000000',
        'TS_DEFAULT_RESPONSE_TIMEOUT': '900' # max timeout is 15mins (900 seconds)
    }
)

pytorch_model.create(
    instance_type="ml.p3.2xlarge",
)

When invoking the asynchronous endpoint, instead of sending the payload in the request, we send the Amazon S3 URL of the input video. When the model inference finishes processing the video, the results will be saved on the S3 output path. We can configure Amazon Simple Notification Service (Amazon SNS) topics so that when the results are ready, we can receive an SNS message as a notification.

Run batch inference with SageMaker processing

For video files bigger than 1 GB, we use a SageMaker processing job to do batch inference. We define a custom Docker container to run a SageMaker processing job (see the following code). We draw the tracking result on the input video. You can find the result video in the S3 bucket defined by s3_output.

from sagemaker.processing import ProcessingInput, ProcessingOutput
script_processor.run(
    code='./container-batch-inference/predict.py',
    inputs=[
        ProcessingInput(source=s3_input, destination="/opt/ml/processing/input"),
        ProcessingInput(source=s3_model_uri, destination="/opt/ml/processing/model"),
    ], 
    outputs=[
        ProcessingOutput(source='/opt/ml/processing/output', destination=s3_output),
    ]
)

Clean up

To avoid unnecessary costs, delete the resources you created as part of this solution, including the inference endpoint.

Conclusion

This post demonstrated how to implement a multi-object tracking solution on a custom dataset using one of the state-of-the-art algorithms on SageMaker. We also demonstrated three deployment options on SageMaker so that you can choose the optimal option for your own business scenario. If the use case requires low latency and needs a model to be deployed on an edge device, you can deploy the MOT solution at the edge with AWS Panorama.

For more information, refer to Multi Object Tracking using YOLOX + BYTE-TRACK and data analysis.


About the Authors

Gordon Wang, is a Senior AI/ML Specialist TAM at AWS. He supports strategic customers with AI/ML best practices cross many industries. He is passionate about computer vision, NLP, Generative AI and MLOps. In his spare time, he loves running and hiking.

Yanwei Cui, PhD, is a Senior Machine Learning Specialist Solutions Architect at AWS. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building artificial intelligence powered industrial applications in computer vision, natural language processing and online user behavior prediction. At AWS, he shares the domain expertise and helps customers to unlock business potentials, and to drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Melanie Li, PhD, is a Senior AI/ML Specialist TAM at AWS based in Sydney, Australia. She helps enterprise customers to build solutions leveraging the state-of-the-art AI/ML tools on AWS and provides guidance on architecting and implementing machine learning solutions with best practices. In her spare time, she loves to explore nature outdoors and spend time with family and friends.

Guang Yang, is a Senior applied scientist at the Amazon ML Solutions Lab where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art ML/AI solutions.

Read More