Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

This post presents and compares options and recommended practices on how to manage Python packages and virtual environments in Amazon SageMaker Studio notebooks. A public GitHub repo provides hands-on examples for each of the presented approaches.

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity.

Studio notebooks are collaborative Jupyter notebooks that you can launch quickly because you don’t need to set up compute instances and file storage beforehand. When you open a notebook in Studio, you are prompted to set up your environment by choosing a SageMaker image, a kernel, an instance type, and, optionally, a lifecycle configuration script that runs on image startup.

For more details on Studio notebook concepts and other aspects of the architecture, refer to Dive deep into Amazon SageMaker Studio Notebooks architecture.

Studio notebooks are designed to support you in all phases of your ML development, for example, ideation, experimentation, and operationalization of an ML workflow. Studio comes with pre-built images that include the latest Amazon SageMaker Python SDK and, depending on the image type, other specific packages and resources, such as Spark, MXNet, or PyTorch framework libraries, and their required dependencies. Each image can host one or multiple kernels, which can be different virtual environments for development.

To ensure the best fit for your development process and phases, access to specific or latest ML frameworks, or to fulfil data access and governance requirements, you can customize the pre-built notebook environments or create new environments using your own images and kernels.

This post considers the following approaches for customizing Studio environments by managing packages and creating Python virtual environments in Studio notebooks:

  • Use a custom Studio KernelGateway app image
  • Use Studio notebook lifecycle configurations
  • Use the Studio Amazon Elastic File System (Amazon EFS) volume to persist Conda environments
  • Use pip install

Studio KernelGateway apps and notebooks kernels

One of the main differences of Studio notebooks architecture compared to SageMaker notebook instances is that Studio notebook kernels run in a Docker container, called a SageMaker image container, rather than hosted directly on Amazon Elastic Compute Cloud (Amazon EC2) instances, which is the case with SageMaker notebook instances.

The following diagram shows the relations between KernelGateway, notebook kernels, and SageMaker images. (For more information, see Use Amazon SageMaker Studio Notebooks.)

Because of this difference, there are some specifics of how you create and manage virtual environments in Studio notebooks, for example usage of Conda environments or persistence of ML development environments between kernel restarts.

The following sections explain each of four environment customization approaches in detail, provide hands-on examples, and recommend use cases for each option.

Prerequisites

To get started with the examples and try the customization approaches on your own, you need an active SageMaker domain and at least one user profile in the domain. If you don’t have a domain, refer to the instructions in Onboard to Amazon SageMaker Domain.

Studio KernelGateway custom app images

A Studio KernelGateway app image is a Docker container that identifies the kernels, language packages, and other dependencies required to run a Jupyter notebook in Studio. You use these images to create environments that you then run Jupyter notebooks on. Studio provides many built-in images for you to use.

If you need different functionality, specific frameworks, or library packages, you can bring your own custom images (BYOI) to Studio.

You can create app images and image versions, attach image versions to your domain, and make an app available for all domain users or for specific user profiles. You can manage app images via the SageMaker console, the AWS SDK for Python (Boto3), and the AWS Command Line Interface (AWS CLI). The custom image needs to be stored in an Amazon Elastic Container Registry (Amazon ECR) repository.

The main benefits of this approach are a high level of version control and reproducibility of an ML runtime environment and immediate availability of library packages because they’re installed in the image. You can implement comprehensive tests, governance, security guardrails, and CI/CD automation to produce custom app images. Having snapshots of development environments facilitates and enforces your organization’s guardrails and security practices.

The provided notebook implements an app image creation process for Conda-based environments. The notebook demonstrates how you can create multi-environment images so the users of the app can have a selection of kernels they can run their notebooks on.

Configure a custom app image

You must run this notebook as a SageMaker notebook instance to allow using Docker locally and run Docker commands in the notebook. Alternatively to using notebook instances or shell scripts, you can use the Studio Image Build CLI to work with Docker in Studio. The Studio Image Build CLI lets you build SageMaker-compatible Docker images directly from your Studio environments by using AWS CodeBuild.

If you don’t have a SageMaker notebook instance, follow the instructions in Create an Amazon SageMaker Notebook Instance to get started.

You must also ensure that the execution role you use for a notebook instance has the required permissions for Amazon ECR and SageMaker domain operations:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CompleteLayerUpload",
                "ecr:GetAuthorizationToken",
                "ecr:UploadLayerPart",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage",
                "ecr:CreateRepository",
                "ecr:ListImages"
            ],
            "Resource": "arn:aws:ecr:<REGION>:<ACCOUNT ID>:repository/<YOUR REPOSITORY NAME>"
        }
    ]
}

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:UpdateDomain"
            ],
            "Resource": "arn:aws:sagemaker:<REGION>:<ACCOUNT ID>:domain/<YOUR DOMAIN ID>"
        }
    ]
}

To create a custom image with two kernels, each with their own Conda virtual environment, the notebook implements the following steps:

  1. Define the Conda environments. The Conda environment must have a Jupyter kernel package installed, for example, ipykernel for Python kernel.
  2. Define a Dockerfile. Consider the custom SageMaker image specifications when creating your own image.
  3. Build a Docker image compatible with Studio and push the image into the ECR repository.
  4. Create a SageMaker image with the Docker image from the ECR repository and create an initial image version. Every time you update the image in Amazon ECR, a new image version must be created.
  5. Update an existing SageMaker domain to use this image. For this operation, the execution role needs the UpdateDomain permission. The image is immediately available to all user profiles of the domain. If you want to make the image available only for a specific user profile, you can use the UpdateUserProfile API call instead of UpdateDomain.
  6. Launch the custom image in Studio. Start a new notebook and choose the new image on the image selection drop-down menu.

Studio automatically recognizes the Conda environments in your image as corresponding kernels in the kernel selection drop-down menu in the Set up notebook environment widget.

Refer to these sample notebooks for more examples and use cases on custom app image implementation.

Clean up

To avoid charges, you must stop the active SageMaker notebook instances. For instructions, refer to Clean up.

Implement an automated image authoring process

As already mentioned, you can use the Studio Image Build CLI to implement an automated CI/CD process of app image creation and deployment with CodeBuild and sm-docker CLI. It abstracts the setup of your Docker build environments by automatically setting up the underlying services and workflow necessary for building Docker images.

Recommended use cases

The custom app image approach is a good fit for the following scenarios when using a Studio notebook environment:

  • Stable and controlled environments for production or sensitive development use
  • Environments without internet access, where you want to pre-package all needed resources and libraries into the image
  • High environment reuse ratio and low rate of changes in the environments
  • High scale of data science operations, dozens or hundreds of developers or teams who need access to standardized custom environments
  • Use libraries that can’t be configured on the SageMaker first-party images
  • Requirements to use custom images for a different OS or different programming language
  • Centralized governance and environment development using automated CI/CD pipelines

Limitations of this approach

This approach requires a multi-step image creation process including tests, which might be overkill for smaller or very dynamic environments. Furthermore, consider the following limitations of the approach:

  • An upfront effort is needed to add new packages or create new versions of an image. As mitigation, you can customize the existing custom image with pip, even if it’s not persistent.
  • Attaching a new custom image or adding a new version to the domain requires the UpdateDomain permission, which isn’t normally attached to the user profile execution role. We recommend using an automated pipeline with a dedicated execution role to perform this operation or give the permission to update a domain to a dedicated admin user or role.
  • A high manual effort for image authoring is involved. We recommend implementing an automated pipeline if you produce and update custom images frequently.
  • If you use Conda environments, you might encounter issues with it in Docker environment. For an example, refer to Activating a Conda environment in your Dockerfile. Not all Conda commands may work in the notebook virtual environment. However, this Studio customization approach is not limited to Conda-based environments.
  • You can’t manually switch between Conda environments in the notebook; you must switch kernels in the notebook environment setup widget.

Also consider that there are default quotas of 30 custom images per domain and 5 images per user profile. These are soft limits and can be increased.

The next sections describe more lightweight approaches that may be a better fit for other use cases.

Studio notebook lifecycle configurations

Studio lifecycle configurations define a shell script that runs at each restart of the kernel gateway application and can install the required packages. The main benefit is that a data scientist can choose which script to run to customize the container with new packages. This option doesn’t require rebuilding the container and in most cases doesn’t require a custom image at all because you can customize the pre-built ones.

Set up a lifecycle configuration process

This process takes around 5 minutes to complete. The post demonstrates how to use the lifecycle configurations via the SageMaker console. The provided notebook shows how to implement the same programmatically using Boto3.

  1. On the SageMaker console, choose Lifecycle configurations in the navigation pane.
  2. On the Studio tab, choose Create configuration.

The first step to create the lifecycle configuration is to select the type.

  1. For this use case of installing dependencies each time a Jupyter kernel gateway app is created, choose Jupyter kernel gateway app and choose Next.
  2. For Name, enter a name for the configuration.
  3. In the Scripts section, define the script to be run when the kernel starts. For this example, the PyArrow library will be installed with the following script:
    # This script installs a single pip package on a SageMaker Studio Kernel Application
    #!/bin/bash
    set -eux
    # PARAMETERS
    PACKAGE=pyarrow
    pip install --upgrade $PACKAGE

  4. Choose Create Configuration.

Now that the configuration has been created, it needs to be attached to a domain or user profile. When attached to the domain, all user profiles in that domain inherit it, whereas when attached to a user profile, it is scoped to that specific profile. For this walkthrough, we use the Studio domain route.

  1. Choose Domains in the navigation pane and open your existing domain.
  2. On the Environment tab, in the Lifecycle configurations for personal Studio apps section, choose Attach.
  3. For Source, select Existing configuration.
  4. Select the lifecycle configuration you created and choose Attach to domain.

Now that all the configuration is done, it’s time to test the script within Studio.

  1. Launch Studio and on the Launcher tab, locate the Notebooks and compute resources section, and choose Change environment to select the lifecycle configuration you created.
  2. For Start-up script, choose the lifecycle configuration you created, then choose Select.
  3. Choose Create notebook.

You can also set the Lifecycle configuration to be run by default in the Lifecycle configurations for personal Studio apps section of the Domain page.

Within the new notebook, the dependencies installed in the startup script will be available.

Recommended use cases

This approach is lightweight but also powerful because it allows you to control the setup of your notebook environment via shell scripts. The use cases that best fit this approach are the following:

  • Integrating package installations in the notebook lifecycle configuration that must run at each kernel start.
  • Environments without internet access. Use lifecycle configurations to set up an environment to access local or security artifact and package repositories, such as AWS CodeArtifact.
  • If you already use lifecycle configurations, you can extend them to include package install.
  • Installation of a few additional packages on top of built-in or custom app images.
  • When you need a shorter time to market than with custom app images.

Limitations of this approach

The main limitations are a high effort to manage lifecycle configuration scripts at scale and slow installation of packages. Depending on how many packages are installed and how large they are, the lifecycle script might even timeout. There are also limited options for ad hoc script customization by users, such as data scientists or ML engineers, due to permissions of the user profile execution role.

Refer to SageMaker Studio Lifecycle Configuration Samples for more samples and use cases.

Persist Conda environments to the Studio EFS volume

SageMaker domains and Studio use an EFS volume as a persistent storage layer. You can save your Conda environments on this EFS volume. These environments are persistent between kernel, app, or Studio restart. Studio automatically picks up all environments as KernelGateway kernels.

This is a straightforward process for a data scientist, but there is a 1-minute delay for the environment to appear in the list of selectable kernels. There also might be issues with using environments for kernel gateway apps that have different compute requirements, for example a CPU-based environment on a GPU-based app.

Refer to Custom Conda environments on SageMaker Studio for detailed instructions. The post’s GitHub repo also contains a notebook with the step-by-step guide.

Create persistent Conda environments on a Studio EFS volume

This walkthrough should take around 10 minutes.

  1. On Studio, choose Home in the navigation pane.
  2. Choose Open Launcher.
  3. Within the Launcher, locate the Notebooks and compute resources section.
  4. Check that the SageMaker image selected is a Conda-supported first-party kernel image such as “Data Science.”
  5. Choose Open image terminal to open a terminal window with a new kernel.

A message displays saying “Starting image terminal…” and after a few moments, the new terminal will open in a new tab.

  1. Within the terminal, run the following commands:
    mkdir -p ~/.conda/envs
    conda create --yes -p ~/.conda/envs/custom
    conda activate ~/.conda/envs/custom
    conda install -y ipykernel
    conda config --add envs_dirs ~/.conda/envs

These commands will take about 3 minutes to run and will create a directory on the EFS volume to store the Conda environments, create the new Conda environment and activate it, install the ipykernel dependencies (without this dependency this solution will not work), and finally create a Conda configuration file (.condarc), which contains the reference to the new Conda environment directory. Because this is a new Conda environment, no additional dependencies are installed. To install other dependencies, you can modify the conda install line or wait for the following commands to finish and install any additional dependencies while inside the Conda environment.

  1. For this example, we install the NumPy library by running the following command in the terminal window:
    conda install -y numpy
    python -c "import numpy; print(numpy.version.version)"

Now that the Conda environment is created and the dependencies are installed, you can create a notebook that uses this Conda environment persisted on Amazon EFS.

  1. On the Studio Launcher, choose Create notebook.
  2. From the new notebook, choose the “Python 3 (Data Science)” kernel.
  3. For Kernel, choose the newly created Conda environment, then choose Select.

If at first there is no option for the new Conda environment, this could be because it takes a few minutes to propagate.

Back within the notebook, the kernel name will have changed in the top right-hand corner, and within a cell you can test that the dependencies installed are available.

Recommended use cases

The following use cases are the best fit for this approach:

  • Environments without internet access, with all dependencies pre-installed in the persisted Conda environments
  • Ad hoc environments that need persistence between kernel sessions
  • Testing of custom SageMaker images in Studio before creating a Docker image and pushing to Amazon ECR

Limitations of this approach

Although this approach has practical uses, consider the following limitations:

  • There might be performance issues with Amazon EFS on many small files, which is very common when managing Python packages.
  • It may be challenging to share persistent environments between Studio user profiles.
  • It may be challenging to reuse persistent environments.
  • It may be challenging to address management at scale.
  • The approach works only with specific Conda-based first-party SageMaker images, for example “Data Science,” “Data Science 2.0,” and “Data Science 3.0.” For a list of all available images, refer to Available Amazon SageMaker Images.

Pip install

You can install packages directly into the default Conda environment or the default Python environment.

Create a setup.py or requirements.txt file with all required dependencies and run %pip install .-r requirement.txt. You have to run this command every time you restart the kernel or recreate an app.

This approach is recommended for ad hoc experimentation because these environments are not persistent.

For more details about using the pip install command and limitations, refer to Install External Libraries and Kernels in Amazon SageMaker Studio.

Recommended use cases

This approach is a standard way to install packages to customize your notebook environment. The recommended use cases are limited to non-production use for ad hoc experimentation in a notebook:

  • Ad hoc experimentation in Studio notebooks
  • Non-productive and non-sensitive environments, sandbox environments
  • Environments with internet access

Limitations of this approach

The main limitations of this approach are:

  • Some enterprise environments block all egress and ingress internet connections and you can’t use pip install to pull Python packages or need to configure an offline mode
  • Lower reproducibility of environments
  • Need to wait until packages are downloaded and installed
  • No persistence between image restarts

Conclusion

SageMaker Studio offers a broad range of possible customization of development environments. Each user role such as a data scientist; an ML, MLOps, or DevOps engineer; and an administrator can choose the most suitable approach based on their needs, place in the development cycle, and enterprise guardrails.

The following table summarizes the presented approaches along with their preferred use cases and main limitations.

Approach Persistence Best Fit Use Cases Limitations
Bring your own image Permanent, transferrable between user profiles and domains
  • Need for a stable, reproduceable, shareable, and centrally managed ML runtime
  • Reuse the same image for Studio development, and SageMaker processing and training jobs
  • Enterprise ML runtime golden images with built-in security controls and guardrails
  • Multi-step manual authoring process or needs an automated build and test pipeline
Lifecycle configurations Permanent, transferrable between user profiles and domains
  • Need for a centrally managed, reproduceable, and shareable environment
  • Need for installation of a few additional packages on top of an existing environment
  • Time limit for environment installation
  • Effort and challenges for managing at scale
Conda environments on the Studio EFS volume Permanent, not transferrable between user profiles or domains
  • Fast experimentation in a notebook with a need for persistence, reuse, and reproducibility of environments
  • Single-user self-managed environments
  • Works only with some kernels
  • Performance issues with many small files
  • Can’t share environments between users
Pip install Transient, no persistence between image or Studio restarts, not transferrable between user profiles or domains
  • Fast experimentation in a notebook
  • Single-user self-managed environments
  • Non-productive environments
  • Low reproducibility of environments
  • Potentially long package download and installation times
  • No persistence

It’s still Day 1. The real-world virtual environment and Python management is far more complex than these four approaches, but this post helps you with the first steps for developing your own use case.

You can find more use cases, details, and hands-on examples in the following resources:


About the authors

Yevgeniy Ilyin is a Solutions Architect at Amazon Web Services (AWS). He has over 20 years of experience working at all levels of software development and solutions architecture and has used programming languages from COBOL and Assembler to .NET, Java, and Python. He develops and codes cloud native solutions with a focus on big data, analytics, and data engineering.

Alex Grace is a Solutions Architect at Amazon Web Services (AWS) who looks after Fintech Digital Native Businesses. Based in London, Alex works with a few of the UK’s leading Fintechs and enjoys supporting their use of AWS to solve business problems and fuel future growth. Previously, Alex has worked as a software developer and tech lead at Fintech startups in London and has more recently been specialising in AWS’ machine learning solutions.

Read More

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

The Amazon International Seller Growth (ISG) team runs the CSBA (Customer Service by Amazon) program that supports over 200,000 third-party Merchant Fulfilled Network (MFN) sellers. Amazon call centers facilitate hundreds of thousands of phone calls, chats, and emails going between the consumers and Amazon MFN sellers. The large volume of contacts creates a challenge for CSBA to extract key information from the transcripts that helps sellers promptly address customer needs and improve customer experience. Therefore, it’s critical to automatically discover insights from these transcripts, perform theme detection to analyze multiple customer conversations, and automatically present a set of themes that indicate the top reasons for customer contact, so that the customer problems are addressed in the right way and as soon as possible.

This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker, and utilize a combined architecture. This solution is tested with ISG using a small volume of data samples. In this post, we discuss the thought process, building this solution, and the outcome from the test. We believe the lessons learned and our journey presented here may help you on your own journey.

Operational landscape and business workflow

The following figure shows the recommended operational landscape with stakeholders and business workflow for ISG so that sellers can stay close to their customers anytime, anywhere. The consumer contacts Customer Service through a contact center platform and engages with the Customer Service Associate (CSA). Then the transcripts of contacts become available to CSBA to extract actionable insights through millions of customer contacts for the sellers, and the data is stored in the Seller Data Lake. Sellers use the Amazon Seller Central portal to access the analytics outcomes and take action to quickly and effectively address customer problems.

Solution overview

The following diagram shows the architecture reflecting the workflow operations into AI/ML and ETL (extract, transform, and load) services.

solution architecture

The workflow steps are as follows:

  1. We use Amazon Connect as a cloud contact center for consumer-CSA interactions. Contact Lens for Amazon Connect generates call and chat transcripts; derives contact summary, analytics, categorization of associate-customer interaction, and issue detection; and measures customer sentiments.
  2. Contact Lens then stores analytics data into an Amazon Simple Storage Service (Amazon S3) bucket for long-term retention.
  3. Amazon Kinesis Data Streams collects and transfers the high-throughput analytics data, processed by AWS Lambda, and injects and stores the data into an intermediate S3 bucket. At this stage, the data contains call and chat transcripts, sentiment scores, detected issues, and categories.
  4. It triggers the Lambda functions to ingest the data stream, extract the requested data fields, and trigger inference of custom ML analyses by AWS AI/ML services, on top of Contact Lens results.In this analysis, Contact Lens provides accurate sentiment scores measuring customer satisfaction on consumer-CSA interactions. Contact Lens rules help us categorize known issues in the contact center. At this stage, ISG wanted to provide additional insights to the seller by detecting the theme through discovering previously unknown issues in seller-specific calls, performed resolutions, and specific key phrases. Here, a non-deep learning model was trained and run on SageMaker, the details of which will be explained in the following section.
  5. After the AI/ML-based analytics, all actionable insights are generated and then stored in the Seller Data Lake. The insights are shared on the Seller Central Portal for the international sellers to pinpoint the root cause and take prompt action.

In the following sections, we dive deeper into the AI/ML solution and its components.

Data labeling

In this section, we describe our approach for data labeling to identify the contact reason and resolution, and our methodology for keywords extraction for the sellers to perform root cause analysis.

Contact reason and resolution labeling

To detect the contact reason from transcripts by ML, we utilized seven Standardized Issue Codes (SICs) as the data labels from the sample data provided by ISG team:

  • Contacted seller to request cancelation
  • Tracking shows delivered but shipment not received
  • Shipment undeliverable
  • Shipment not delivered past delivery date
  • Shipment in transit to customer
  • Request Return Mailing Label (RML)
  • Item non-returnable

The contact reason labels can be further extended by adding the previously unknown issues to the seller; however, those issues had not been defined in the SIC. Unlike the contact reason, the contact resolution doesn’t have a label associated with the transcripts. The resolution categories were specified by the ISG team, and the resolutions needed to be labeled based on these categories. Therefore, we utilized Amazon SageMaker Ground Truth to create or update labels for each contact.

Ground Truth provides a data labeling service that makes it easy to label data, and gives you the option to use human annotators through Amazon Mechanical Turk, third-party vendors, or your own private workforce. For this solution, the ISG team defined for categories for contact resolution in over 140 transcript documents, which were labeled by Amazon Mechanical Turk contractors:

  • Full refund – 69 records
  • Contact seller – 52 records
  • Partial refund – 15 records
  • Other – 4 records

It only took a couple of hours for the contractors to complete the multi-label text classification contact center resolution labeling for the 140 documents, and have them reviewed by the customer. In the next step, we build the multi-class classification models, then predict the contact reason and resolution from the new call and chat transcripts coming from the customer service.

Keywords for the root cause analysis

Another challenge is to extract the keywords from the transcripts that can guide the MFN sellers on specific actions. For this example, the seller needs to capture the key information such as product information, critical timeline, problem details, and refund offered by the CSA, which may not be clear. Here we built a custom key phrases extraction model in SageMaker using the RAKE (Rapid Automatic Keyword Extraction) algorithm, following the process shown in the following figure. RAKE is a domain-independent keyword extraction algorithm that determines key phrases by analyzing the frequency of word appearance and its co-occurrence with other words in the text.

keywords extraction process

After the standard document preprocessing, RAKE detects the most relevant key words and phrases from the transcript documents. The output is listed as follows:

[('im amazons chat helper .. im', 0.08224299065420558),

('jun 23 .. could', 0.041588785046728964), <== timeline

('original payment method please', 0.04112149532710279), <== resolution: refund

('amazon gift card balance', 0.04112149532710279), <== resolution: refund

('previous conversation .. let', 0.04018691588785046),

('faulty pieces would like', 0.036448598130841114), <== call reason: faulty piece

('nice day !:)..', 0.025233644859813078),

('dual fuel gas', 0.025233644859813078), <== call reason: product info

('customer service hub', 0.025233644859813078),

('5 business days', 0.025233644859813078), <== timeline

('try .. got', 0.02476635514018691),

('right place ..', 0.023364485981308407),

('item .. let', 0.023364485981308407),

('youd like help', 0.02242990654205607),

('think would help', 0.02242990654205607),

('search help pages', 0.02242990654205607),

('gbc1793w ). order', 0.02242990654205607), <== call reason: product info

('moment .. ok', 0.021962616822429903),

('charcoal combo grill', 0.021028037383177565), <== call reason: product info

('call back ..', 0.021028037383177565),

('yes please anything', 0.020093457943925228),

('chat ..', 0.014953271028037382),

('would love', 0.014018691588785043),

('looks like', 0.014018691588785043),

('bent pieces', 0.013084112149532708), <== call reason: faulty details

This method captured key phrases with high relevance scores on the critical information such as timeline (“June 23”), refund resolution (“Amazon gift card,” “in 5 business days”), product information (“charcoal combo grill,” “dual fuel gas,” “gbc1793w”) and problem details (“faulty piece,” “bent pieces”). These insights not only tell the seller that this customer has been taken care of by getting a refund, but also guide the seller to further investigate the gas grill product defect and avoid having similar issues for other customers.

Text classification model training

Contact Lens generated transcripts, contact summary, and sentiments for call and chat samples collected from ISG Customer Service. Throughout the testing, the transcription and sentiment scores were accurate as expected. Along with known issues, the ISG team also looks for detecting unknown issues from transcripts to meet the seller-specific needs such as delivery problems, product defects, the resolutions provided by the contact, and issues or key phrases leading to a return or refund.

To address this challenge, we extended our tests through custom models on SageMaker. Our experience pointed to “bag-of-words” based, more conventional (non-deep learning) models using SageMaker based on the size of the dataset and samples.

We performed the contact reason classification modeling following the three steps on SageMaker as shown in the following figure.

text classification process

The steps are as follows:

  1. Preprocessing – We used the NLTK library to lower the cases; remove punctuation, tags, markups, and white space trailing; and filter single letters, numeric values, and customized lists of stop words.
  2. Vectorization – We used the TF-IDF (Term Frequency-Inverse Document Frequency) method to convert the processed document into a matrix of TF-IDF features. The method quantifies the importance and relevance of words and phrases in a document with a collection of documents (corpus), then generates the features in numeric values to represent how important a word is to the document in the corpus. For this solution, we tested with specifying 750 and 1,500 features.
  3. Multi-class classification – We generated a seven-class classification model using a vectorized feature list and SIC labels. We utilized 90% of the samples for training and 10% for validation.

We tested three algorithms aiming to obtain the best-performing model:

  • First, we used the SageMaker Linear Learner algorithm with default hyperparameters and performed 10 epochs, and reached 71% accuracy for the testing set.
  • Next, we used the SageMaker built-in XGBoost algorithm, and ran automatic hyperparameter optimization (HPO) tuning on four parameters (eta, alpha, min_child_weight, max_depth), which gave us 71% accuracy for the testing set.
  • Finally, we worked with AutoGluon’s AutoML framework on SageMaker, which performs automatic modeling and hyperparameter selection with multiple models ensembling and multiple layers stacking. The framework trained 13 models and generated the final ensemble model yielding 74% accuracy for the testing set. We also tested by increasing the number of TF-IDF vectorizer features to 1,500; with the AutoGluon model, the classification accuracy on testing set can be further improved to 82%.

For our model training through AutoGluon, we used the MultilabelPredictor method from the AutoGluon library. This predictor performs multi-label prediction for tabular data. We used the sample notebook from AWS samples on GitHub. We used the same notebook by starting with importing AutoGluon libraries and defining the class for MultilabelPredictor(). To save space, we don’t show those lines in the following code snippet; you can copy/paste that part from the sample notebook. We employed the training in the file train.csv in our S3 bucket (your_path_to_s3/train.csv), specified the column used for label, and performed model training through MultilabelPredictor.

train_data = TabularDataset(‘your_path_to_s3/train.csv’)
subsample_size = 106                                                    # the sample size for training
train_data = train_data.sample(n=subsample_size, random_state=0)
labels = [‘label’]                                                      # column to predict based on the others
problem_types = ['multiclass']                                          # type of each prediction problem
save_path = ‘your_save_path_to_results’                                 # the path to your s3 bucket for results to store
time_limit = 60                                                         # number of seconds to train the TabularPredictor for each label

multi_predictor = MultilabelPredictor(labels=labels, problem_types=problem_types, path=save_path)
multi_predictor.fit(train_data, time_limit=time_limit)

The following table lists the AI/ML services and models, and summarizes the accuracy.

. Transcripts Feature Linear Learner XGB with HPO AutoGluon
Validation set 11 750 0.91 0.82 0.82
Validation set 11 1500 0.82 0.82 0.91
Testing set 34 750 0.71 0.71 0.74
Testing set 34 1500 0.65 0.65 0.82

The following charts summarize the accuracy for the sample set based on amount of features.

text classification 750 feature text classification 1500 feature

In the following charts, we observed that the models of the decision tree with a gradient boosting machine, such as LGB, XGBoost, and Random Forest, were better choices for this type of problem for both the 750-feature models and 1,500-feature models. The neural net model is ranked lower among the 13 models, which confirmed our expectation that deep learning might not be suitable for our case.

model score time to train

Conclusion

With AWS AI/ML services, we can provide accurate and efficient contact reason and contact resolution detection and other actionable insights for Amazon International Seller Growth. MFN sellers can use these insights to better understand consumer problems, and take effective actions to resolve Amazon consumers’ issues, while also optimizing their process and costs.

You can tailor the solution for your contact center by developing your own custom model in SageMaker, and feeding the call and chat transcripts for training and inference. You could also apply this solution for general theme detection to analyze customer conversations in your contact center.


About the Authors

YunfeiYunfei Bai is a Senior Solutions Architect at AWS. With the background in AI/ML, Data Science and Analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and Data Analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei is a PhD in Electronic and Electrical Engineering . Outside of work, Yunfei enjoys reading and music.

BurakBurak Gozluklu is a Principal ML Specialist Solutions Architect located in Boston, MA. Burak has +15 years of industry experience in simulation modeling, data science and ML technology. He helps global customers adopting AWS technologies and especially, AI/ML solutions to achieve their business objectives. Burak has a PhD in Aerospace Eng. from METU, MS in Systems Engineering and post-doc on system dynamics from MIT in Cambridge, MA. Burak is passionate about yoga and meditation.

ChelseaChelsea Cai is a Senior Product Manager at Amazon’s International Seller Growth (ISG) organization, where she works for Customer Service by Amazon service (CSBA) helping 3P sellers improve their customer service/CX through Amazon CS technology and worldwide organizations. In her spare time, she likes philosophy, psychology, swimming, hiking, good food, and spending time with her family and friends.

AbhishekAbhishek Kumar is a Senior Product Manager at Amazon’s International Seller Growth (ISG) organization, where he develops software platforms and applications to help global 3P sellers manage their Amazon business. In his free time, Abhishek enjoys traveling, learning Italian, and exploring European cultures and cuisines with his extended Italian family.

Read More

Announcing the Yammer connector for Amazon Kendra

Announcing the Yammer connector for Amazon Kendra

Yammer is a social networking platform designed for open and dynamic communications and collaborations within organizations. It allows you to build communities of interest, gather ideas and feedback, and keep everyone informed. It’s available via browser or mobile app, and provides a variety of common social networking features such as private and public communities, news feeds, groups of interest, instant messaging, and more. Each of these features create a huge amount of unstructured data collected over time and stored in multiple repositories. Searching through these fragmented repositories provides an enormous challenge to users, which is where Amazon Kendra comes in.

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to pull together data across several structured and unstructured repositories to index and search on.

We’re excited to announce that you can now use the Amazon Kendra connector for Yammer to search information stored in Yammer. In this post, we show how to index information stored in Yammer and use Amazon Kendra intelligent search to find answers to your questions accurately and quickly. In addition, the ML-powered intelligent search can accurately find information from unstructured documents containing natural language narrative content, for which keyword search isn’t very effective.

Solution overview

With Amazon Kendra, you can configure multiple data sources to provide a central place to index and search across your document repository. For our solution, we demonstrate how to index a Yammer repository using the Amazon Kendra connector for Yammer. The solution consists of the following steps:

  1. Configure the Yammer app API connector on Azure and get the connection details.
  2. Create an Amazon Kendra index.
  3. Create a Yammer data source.
  4. Run a sample query to get information.

Prerequisites

To try out the Amazon Kendra connector for Yammer, you need the following:

Configure the Yammer app API connector and gather connection details

Before we set up the Yammer data source, we need a few details about your Yammer repository. Let’s

gather those in advance.

  1. Log in to the Azure portal using your global admin user account and choose Next.
  2. Enter your password and choose Sign in.
  3. On the Azure welcome page, choose App registrations.

Alternatively, you can search for “App Registrations” in the search bar.

  1. Choose New registration.
  2. Enter a name for the app (for example, my-yammer-connector) and choose Register.
  3. Note down the tenant ID (you need it when setting up the data source for Amazon Kendra).
  4. Next to Client credentials, choose Add a certificate or secret.
  5. Enter a description (for example, Yammer Connector Client Credentials).
  6. Choose an expiration period (for this post, 6 months).
  7. Choose Add.
  8. Save the client ID and secret ID for AWS Secrets Manager configuration.
  9. In the navigation pane, choose API permissions.

This is where you can add or remove admin permissions.

  1. Choose Add a permission and choose Yammer for Request API permissions.
  2. Choose Delegated permissions and select user_impersonation.
  3. Choose Add permissions.

Now the Yammer connector application is configured in the Azure portal. Let’s switch over to the Amazon Kendra console to complete our setup.

Create an Amazon Kendra index

You can create an Amazon Kendra index or use an existing index. For this post, we create a new index called my-yammer-index. For instructions, refer to Creating an index.

Create a Yammer data source

Complete the following steps to create your data source:

  1. On the Amazon Kendra console, choose Data sources in the navigation pane.
  2. Under Microsoft Yammer connector, choose Add connector.
  3. For Data source name, enter a name (for example, my-yammer-datasource).
  4. Enter an optional description.
  5. Choose Next.

You have the choice of creating credentials in Secrets Manager in advance. For this post, we create a secret on-demand.

  1. Configure a Secrets Manager secret with the user name, password, client ID, and secret ID you collected earlier.
  2. Choose Save.
  3. For IAM role, choose Create a new role.
  4. For Role name, choose AmazonKendra-my-yammer-iam-role.
  5. Choose Next.
  6. In the Configure sync settings section, you can optionally configure contents to sync, communities to include, and date since.
  7. Choose Sync mode and Sync run schedule.

You can choose how you want to update your index when your data source content changes. Amazon Kendra provides three types of sync modes:

  • Full sync – Amazon Kendra will sync all contents in all entities, regardless of the previous sync status
  • New or modified content sync – Amazon Kendra will only sync new or modified content
  • New, modified, or deleted content sync – Amazon Kendra will only sync new, modified, or deleted content
  1. For this post, select Full sync.
  2. For Frequency, choose Run on demand
  3. Choose Next.
  4. You can optionally set field mappings and Amazon Kendra associates data fields with the index.
  5. Choose Next.
  6. Review and choose Add data source.
  7. Choose Sync now.

The sync takes between minutes to hours based on the size of the repository Amazon Kendra is indexing.

Test the solution

Now that you have ingested the content from Yammer into your Amazon Kendra index, you can test some queries.

  1. On the Amazon Kendra console, navigate to your index and choose Search indexed content.
  2. Enter a sample search query and test out your search results (your query will vary based on the contents of your account).

The Yammer connector also crawls local identity information from Yammer. When a document is indexed into Amazon Kendra, a corresponding Access Control List (ACL) is ingested for most documents.

The ACL specifies which user names and group names are allowed or denied access to the document. Documents without an ACL are public documents. You can use this feature to narrow down your query by user.

You can use the user ID (email) to filter search results based on the user or their group access to documents. When you issue a query, Amazon Kendra checks the user and group information and runs the query. All the documents relevant to the query that the user has access to, including public documents, are returned.

  1. To use this feature, go back to the search results page.
  2. Expand Test query with user name or groups and choose Apply user name or groups.

For Yammer, we don’t import groups, we just import user names. User names are email IDs in this case.

  1. Enter the user ID (email) of your user and choose Apply.

The following screenshot shows the updated search results.

When fronting Amazon Kendra with an application such as an application built using Experience Builder, you can pass the user identity (in the form of the email ID) to Amazon Kendra to ensure that each user only sees content specific to their user ID. Alternately, you can use AWS IAM Identity Center (successor to AWS Single Sign-On) to control user context being passed to Amazon Kendra to limit queries by user.

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Yammer account.

Limitations

This solution has the following limitations:

  • Only the export API is available to fetch all communities. API support for fetching event details, votes about polls, and update messages is not available as of this writing.
  • Deleted entities such as messages, attachments, communities, and users are not crawled in change log crawl mode. You need to run another full crawl to get the updated information on deletion of all the entities.
  • For communities, the following are not part of indexing:
    • Community insight details
    • Community information
    • Related communities for that community
    • Files uploaded directly into the community without any attachment to a message
  • Yammer has rate limits that govern the speed of ingestion. For more information, refer to Limits in Yammer.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Yammer, delete that data source.

Conclusion

With the Yammer connector for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.

To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide. For more information on how you can create, modify, or delete metadata and content when ingesting your data from Yammer, refer to Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the authors

 Senthil Ramachandran is an Enterprise Solutions Architect at AWS, supporting customers in the US North East. He is primarily focused on Cloud adoption and Digital Transformation in Financial Services Industry. Senthil’s area of interest is AI, especially Deep Learning and Machine Learning. He focuses on application automations with continuous learning and improving human enterprise experience. Senthil enjoys watching Autosport, Soccer and spending time with his family.

Read More

Training large language models on Amazon SageMaker: Best practices

Training large language models on Amazon SageMaker: Best practices

Language models are statistical methods predicting the succession of tokens in sequences, using natural text. Large language models (LLMs) are neural network-based language models with hundreds of millions (BERT) to over a trillion parameters (MiCS), and whose size makes single-GPU training impractical. LLMs’ generative abilities make them popular for text synthesis, summarization, machine translation, and more.

The size of an LLM and its training data is a double-edged sword: it brings modeling quality, but entails infrastructure challenges. The model itself is often too big to fit in memory of a single GPU device or on the multiple devices of a multi-GPU instance. These factors require training an LLM over large clusters of accelerated machine learning (ML) instances. In the past few years, numerous customers have been using the AWS Cloud for LLM training.

In this post, we dive into tips and best practices for successful LLM training on Amazon SageMaker Training. SageMaker Training is a managed batch ML compute service that reduces the time and cost to train and tune models at scale without the need to manage infrastructure. Within one launch command, Amazon SageMaker launches a fully functional, ephemeral compute cluster running the task of your choice, and with enhanced ML features such as metastore, managed I/O, and distribution. The post covers all the phases of an LLM training workload and describes associated infrastructure features and best practices. Some of the best practices in this post refer specifically to ml.p4d.24xlarge instances, but most are applicable to any instance type. These best practices allow you to train LLMs on SageMaker in the scale of dozens to hundreds of millions of parameters.

Regarding the scope of this post, note the following:

  • We don’t cover neural network scientific design and associated optimizations. Amazon.Science features numerous scientific publications, including and not limited to LLMs.
  • Although this post focuses on LLMs, most of its best practices are relevant for any kind of large-model training, including computer vision and multi-modal models, such as Stable Diffusion.

Best practices

We discuss the following best practices in this post:

  • Compute – SageMaker Training is a great API to launch CPU dataset preparation jobs and thousand-scale GPU jobs.
  • Storage – We see data loading and checkpointing done in two ways, depending on skills and preferences: with an Amazon FSx Lustre file system, or Amazon Simple Storage Service (Amazon S3) only.
  • Parallelism – Your choice of distributed training library is crucial for appropriate use of the GPUs. We recommend using a cloud-optimized library, such as SageMaker sharded data parallelism, but self-managed and open-source libraries can also work.
  • Networking – Make sure EFA and NVIDIA GPUDirectRDMA are enabled, for fast inter-machine communication.
  • Resiliency – At scale, hardware failures can happen. We recommend checkpointing regularly. Every few hours is common.

Region selection

Instance type and desired capacity is a determining factor for Region selection. For the Regions supported by SageMaker and the Amazon Elastic Compute Cloud (Amazon EC2) instance types that are available in each Region, see Amazon SageMaker Pricing. In this post, we assume the training instance type to be a SageMaker-managed ml.p4d.24xlarge.

We recommend working with your AWS account team or contacting AWS Sales to determine the appropriate Region for your LLM workload.

Data preparation

LLM developers train their models on large datasets of naturally occurring text. Popular examples of such data sources include Common Crawl and The Pile. Naturally occurring text may contain biases, inaccuracies, grammatical errors, and syntax variations. An LLM’s eventual quality significantly depends on the selection and curation of the training data. LLM training data preparation is an active area of research and innovation in the LLM industry. The preparation of a natural language processing (NLP) dataset abounds with share-nothing parallelism opportunities. In other words, there are steps that can be applied to units of works—source files, paragraphs, sentences, words—without requiring inter-worker synchronization.

The SageMaker jobs APIs, namely SageMaker Training and SageMaker Processing, excel for this type of tasks. They enable developers to run an arbitrary Docker container over a fleet of multiple machines. In the case of the SageMaker Training API, the computing fleet can be heterogeneous. Numerous distributed computing frameworks have been used on SageMaker, including Dask, Ray, and also PySpark, which have a dedicated AWS-managed container and SDK in SageMaker Processing.

When you launch a job with multiple machines, SageMaker Training and Processing run your code one time per machine. You don’t need to use a particular distributed computing framework to write a distributed application: you can write the code of your choice, which will run one time per machine, to realize share-nothing parallelism. You can also write or install the inter-node communication logic of your choice.

Data loading

There are multiple ways to store the training data and move it from its storage to the accelerated compute nodes. In this section, we discuss the options and best practices for data loading.

SageMaker storage and loading options

A typical LLM dataset size is in the hundreds of millions of text tokens, representing a few hundred gigabytes. SageMaker-managed clusters of ml.p4d.24xlarge instances propose several options for dataset storage and loading:

  • On-node NVMe SSD – ml.P4d.24xlarge instances are equipped with 8TB NVMe, available under /opt/ml/input/data/<channel> if you use SageMaker File mode, and at /tmp. If you’re seeking the simplicity and performance of a local read, you can copy your data to the NVMe SSD. The copy can either be done by SageMaker File mode, or by your own code, for example using multi-processed Boto3 or S5cmd.
  • FSx for Lustre – On-node NVMe SSDs are limited in size, and require ingestion from Amazon S3 at each job or warm cluster creation. If you’re looking to scale to larger datasets while maintaining low-latency random access, you can use FSx for Lustre. Amazon FSx is an open-source parallel file system, popular in high-performance computing (HPC). FSx for Lustre uses distributed file storage (stripping) and physically separates file metadata from file content to achieve high-performance read/writes.
  • SageMaker FastFile Mode – FastFile Mode (FFM) is a SageMaker-only feature that presents remote S3 objects in SageMaker-managed compute instances under a POSIX-compliant interface, and streams them only upon reading, using FUSE. FFM reads results in S3 calls that stream remote files block by block. As a best practice to avoid errors related to Amazon S3 traffic, FFM developers should aim to keep the underlying number of S3 calls reasonable, for example by reading files sequentially and with a controlled amount of parallelism.
  • Self-managed data loading – Of course, you may also decide to implement your own, fully custom data loading logic, using proprietary or open-source code. Some reasons to use self-managed data loading are to facilitate a migration by reusing already-developed code, to implement custom error handling logic, or to have more control on underlying performance and sharding. Examples of libraries you may use for self-managed data loading include torchdata.datapipes (previously AWS PyTorch S3 Plugin) and Webdataset. The AWS Python SDK Boto3 may also be combined with Torch Dataset classes to create custom data loading code. Custom data loading classes also enable the creative use of SageMaker Training heterogeneous clusters, to finely adapt the CPU and GPU balance to a given workload.

For more information about those options and how to choose them, refer to Choose the best data source for your Amazon SageMaker training job.

Best practices for large-scale interaction with Amazon S3

Amazon S3 is capable of handling LLM workloads, both for data reading and checkpointing. It supports a request rate of 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. However, this rate is not necessarily available by default. Instead, as the request rate for a prefix grows, Amazon S3 automatically scales to handle the increased rate. For more information, refer to Why am I getting 503 Slow Down errors from Amazon S3 when the requests are within the supported request rate per prefix.

If you expect high-frequency Amazon S3 interaction, we recommend the following best practices:

  • Try to read and write from multiple S3 buckets and prefixes. For example, you can partition training data and checkpoints across different prefixes.
  • Check Amazon S3 metrics in Amazon CloudWatch to track request rates.
  • Try to minimize the amount of simultaneous PUT/GET:
    • Have fewer processes using Amazon S3 at the same time. For example, if eight processes per nodes need to checkpoint to Amazon S3, you can reduce PUT traffic by a factor of 8 by checkpointing hierarchically: first within-node, then from the node to Amazon S3.
    • Read multiple training records from a single file or S3 GET, instead of using an S3 GET for every training record.
    • If you use Amazon S3 via SageMaker FFM, SageMaker FFM makes S3 calls to fetch files chunk by chunk. To limit the Amazon S3 traffic generated by FFM, we encourage you to read files sequentially and limit the number files opened in parallel.

If you have a Developer, Business, or Enterprise Support plan, you can open a technical support case about S3 503 Slow Down errors. But first make sure you have followed the best practices, and get the request IDs for the failed requests.

Training parallelism

LLMs commonly have dozens to hundreds of billions of parameters, making them too big to fit within a single NVIDIA GPU card. LLM practitioners have developed several open-source libraries facilitating the distributed computation of LLM training, including FSDP, DeepSpeed and Megatron. You can run those libraries in SageMaker Training, but you can also use SageMaker distributed training libraries, which have been optimized for the AWS Cloud and provide a simpler developer experience. Developers have two choices for distributed training of their LLM on SageMaker: distributed libraries or self-managed.

SageMaker distributed libraries

To provide you with improved distributed training performance and usability, SageMaker Training proposes several proprietary extensions to scale TensorFlow and PyTorch training code. LLM training is often conducted in a 3D-parallelism fashion:

  • Data parallelism splits and feeds the training mini-batches to multiple identical replicas of the model, to increase processing speed
  • Pipeline parallelism attributes various layers of the model to different GPUs or even instances, in order to scale model size beyond a single GPU and a single server
  • Tensor parallelism splits a single layer into multiple GPUs, usually within the same server, to scale individual layers to sizes exceeding a single GPU

In the following example, a 6-layer model is trained on a cluster of k*3 servers with 8*k*3 GPUs (8 GPUs per server). Data parallelism degree is k, pipeline parallelism 6, and tensor parallelism 4. Each GPU in the cluster contains one-fourth of a model layer, and a full model is partitioned over three servers (24 GPUs in total).

diagram of a 3D-parallel neural network training

The following are specifically relevant for LLMs:

  • SageMaker distributed model parallel – This library uses graph partitioning to produce intelligent model partitioning optimized for speed or memory. SageMaker distributed model parallel exposes the latest and greatest large-model training optimization, including data parallelism, pipeline parallelism, tensor parallelism, optimizer state sharding, activation checkpointing, and offloading. With the SageMaker distributed model parallel library, we documented a 175-billion parameter model training over 920 NVIDIA A100 GPUs. For more information, refer to Train 175+ billion parameter NLP models with model parallel additions and Hugging Face on Amazon SageMaker.
  • SageMaker sharded data parallel – In MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud, Zhang et al. introduce a low-communication model parallel strategy that partitions models over a data parallel group only, instead of the whole cluster. With MiCS, AWS scientists were able to achieve 176 teraflops per GPU (56.4% of the theoretical peak) for training a 210-layer 1.06-trillion-parameter model on EC2 P4de instances. MiCS is now available for SageMaker Training customers as SageMaker sharded data parallel.

SageMaker distributed training libraries provide high performance and a simpler developer experience. In particular, developers don’t need to write and maintain a custom parallel process launcher or use a framework-specific launch tool, because the parallel launcher is built into the job launch SDK.

Self-managed

With SageMaker Training, you have the freedom to use the framework and scientific paradigm of your choice. In particular, if you want to manage distributed training yourself, you have two options to write your custom code:

  • Use an AWS Deep Learning Container (DLC) – AWS develops and maintains DLCs, providing AWS-optimized Docker-based environments for top open-source ML frameworks. SageMaker Training has a unique integration allowing to you pull and run AWS DLCs with external, user-defined entry point. For LLM training in particular, AWS DLCs for TensorFlow, PyTorch, Hugging Face, and MXNet are particularly relevant. Using a framework DLC allows you to use framework-native parallelism, such as PyTorch Distributed, without having to develop and manage your own Docker images. Additionally, our DLCs feature an MPI integration, which allows you to launch parallel code easily.
  • Write a custom SageMaker-compatible Docker image – You can bring your own (BYO) image (see Use Your Own Training Algorithms and Amazon SageMaker Custom Training containers), either starting from scratch or extending an existing DLC image. When using a custom image for LLM training on SageMaker, it’s particularly important to verify the following:
    • Your image contains EFA with appropriate settings (discussed more later in this post)
    • Your image contains an NVIDIA NCCL communication library, enabled with GPUDirectRDMA

Customers have been able to use a number of self-managed distributed training libraries, including DeepSpeed.

Communications

Given the distributed nature of an LLM training job, inter-machine communication is critical to the feasibility, performance, and costs of the workload. In this section, we present key features for inter-machine communication and conclude with tips for installation and tuning.

Elastic Fabric Adapter

In order to accelerate ML applications, and improve performances by achieving flexibility, scalability, and elasticity provided by the cloud, you can take advantage of Elastic Fabric Adapter (EFA) with SageMaker. In our experience, using EFA is a requirement to get satisfactory multi-node LLM training performance.

An EFA device is a network interface attached to EC2 instances managed by SageMaker during the run of the training jobs. EFA is available on specific families of instances, including the P4d. EFA networks are capable of achieving several hundreds of Gbps of throughput.

Associated to EFA, AWS has introduced the Scalable Reliable Datagram (SRD), an ethernet-based transport inspired by the InfiniBand Reliable Datagram, evolved with relaxed packet ordering constraint. For more information about EFA and SRD, refer to In the search for performance, there’s more than one way to build a network, the video How EFA works and why we don’t use infiniband in the cloud, and the research paper A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC from Shalev et al.

You can add EFA integration on compatible instances to SageMaker existing Docker containers, or custom containers that can be used for training ML models using SageMaker jobs. For more information, refer to Run Training with EFA. EFA is exposed via the open-source Libfabric communication package. However, LLM developers rarely directly program it with Libfabric, and usually instead rely on the NVIDIA Collective Communications Library (NCCL).

AWS-OFI-NCCL plugin

In distributed ML, EFA is most often used with the NVIDIA Collective Communications Library (NCCL). NCCL is an NVIDIA-developed open-source library implementing inter-GPU communication algorithms. Inter-GPU communication is a cornerstone of LLM training that catalyzes scalability and performance. It is so critical to DL training that the NCCL is often directly integrated as a communication backend in deep learning training libraries, so that LLM developers use it—sometimes without noticing—from their preferred Python DL development framework. To use the NCCL on EFA, LLM developers use the AWS-developed AWS OFI NCCL plugin, which maps NCCL calls to the Libfabric interface used by EFA. We recommend using the latest version of AWS OFI NCCL to benefit from recent improvements.

To verify that the NCCL uses EFA, you should set the environment variable NCCL_DEBUG to INFO, and check in the logs that EFA is loaded by the NCCL:

...
NCCL INFO NET/OFI Selected Provider is efa
NCCL INFO Using network AWS Libfabric
...

For more information about the NCCL and EFA configuration, refer to Test your EFA and NCCL configuration. You can further customize the NCCL with several environment variables. Note that effective in NCCL 2.12 and above, AWS contributed an automated communication algorithm selection logic for EFA networks (NCCL_ALGO can be left unset).

NVIDIA GPUDirect RDMA over EFA

With the P4d instance type, we introduced GPUDirect RDMA (GDR) over EFA fabric. It enables network interface cards (NICs) to directly access GPU memory, making remote GPU-to-GPU communication across NVIDIA GPU-based EC2 instances faster, reducing orchestration overhead on CPUs and user applications. GDR is used under the hood by the NCCL, when feasible.

GDR usage appears in inter-GPU communication when the log level is set to INFO, as in the following code:


NCCL INFO Channel 00 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/1/GDRDMA
NCCL INFO Channel 00 : 1[101d0] -> 8[101c0] [send] via NET/AWS Libfabric/1/GDRDMA

Using EFA in AWS Deep Learning Containers

AWS maintains Deep Learning Containers (DLCs), many of which come with AWS-managed Dockerfiles and built containing EFA, AWS OFI NCCL, and NCCL. The following GitHub repos offer examples with PyTorch and TensorFlow. You don’t need to install those libraries yourself.

Using EFA in your own SageMaker Training container

If you create your own SageMaker Training container and want to use the NCCL over EFA for accelerated inter-node communication, you need to install EFA, NCCL, and AWS OFI NCCL. For more information, refer to Run Training with EFA. Additionally, you should set the following environment variables in your container or in your entry point code:

  • FI_PROVIDER="efa" specifies the fabric interface provider
  • NCCL_PROTO=simple instructs the NCCL to use a simple protocol for communication (currently, the EFA provider doesn’t support LL protocols; enabling them could lead to data corruption)
  • FI_EFA_USE_DEVICE_RDMA=1 uses the device’s RDMA functionality for one-sided and two-sided transfer
  • NCCL_LAUNCH_MODE="PARALLEL"
  • NCCL_NET_SHARED_COMMS="0"

Orchestration

Managing the lifecycle and workload of dozens to hundreds of compute instances requires orchestration software. In this section, we offer best practices for LLM orchestration

Within-job orchestration

Developers must write both server-side training code and client-side launcher code in most distributed frameworks. Training code runs on training machines, whereas client-side launcher code launches the distributed workload from a client machine. There is little standardization today, for example:

  • In PyTorch, developers can launch multi-machine tasks using torchrun, torchx, torch.distributed.launch (deprecation path), or torch.multiprocessing.spawn
  • DeepSpeed proposes its own deepspeed CLI launcher and also supports MPI launch
  • MPI is a popular parallel computing framework that has the benefit of being ML-agnostic and reasonably tenured, and therefore stable and documented, and is increasingly seen in distributed ML workloads

In a SageMaker Training cluster, the training container is launched one time on each machine. Consequently, you have three options:

  • Native launcher – You can use as an entry point the native launcher of a particular DL framework, for example a torchrun call, which will itself spawn multiple local process and establish communications across instances.
  • SageMaker MPI integration – You can use SageMaker MPI integration, available in our AWS DLC, or self-installable via sagemaker-training-toolkit, to directly run your entry point code N times per machine. This has the benefit of avoiding the use of intermediary, framework-specific launcher scripts in your own code.
  • SageMaker distributed libraries – If you use the SageMaker distributed libraries, you can focus on the training code and don’t have to write launcher code at all! SageMaker distributed launcher code is built into the SageMaker SDK.

Inter-job orchestration

LLM projects often consist of multiple jobs: parameter search, scaling experiments, recovery from errors, and more. In order to start, stop, and parallelize training tasks, it’s important to use a job orchestrator. SageMaker Training is a serverless ML job orchestrator that provisions transient compute instances immediately upon request. You pay only for what you use, and clusters get decommissioned as soon as your code ends. With SageMaker Training Warm Pools, you have the option to define a time-to-live on training clusters, in order to reuse the same infrastructure across jobs. This reduces iteration time and inter-job placement variability. SageMaker jobs can be launched from a variety of programming languages, including Python and CLI.

There is a SageMaker-specific Python SDK called the SageMaker Python SDK and implemented via the sagemaker Python library, but its use is optional.

Increasing quotas for training jobs with a large and long training cluster

SageMaker has default quotas on resources, designed to prevent unintentional usage and costs. To train an LLM using a big cluster of high-end instances running for a long time, you’ll likely need to increase the quotas in the following table.

Quota name Default value
Longest run time for a training job 432,000 seconds
Number of instances across all training jobs 4
Maximum number of instances per training job 20
ml.p4d.24xlarge for training job usage 0
ml.p4d.24xlarge for training warm pool usage 0

See AWS service quotas how to view your quota values and request a quota increase. On-Demand, Spot Instance, and training warm pools quotas are tracked and modified separately.

If you decide to keep the SageMaker Profiler activated, be aware that every training job launches a SageMaker Processing job, each consuming one ml.m5.2xlarge instance. Confirm that your SageMaker Processing quotas are high enough to accommodate the expected training job concurrency. For example, if you want to launch 50 Profiler-enabled training jobs running concurrently, you’ll need to raise the ml.m5.2xlarge for processing job usage limit to 50.

Additionally, to launch a long-running job, you’ll need to explicitly set the Estimator max_run parameter to the desired maximum duration for the training job in seconds, up to the quota value of the longest runtime for a training job.

Monitoring and resiliency

Hardware failure is extremely rare at the scale of a single instance and becomes more and more frequent as the number of instances used simultaneously increases. At typical LLM scale—hundreds to thousands of GPUs used 24/7 for weeks to months—hardware failures are near-certain to happen. Therefore, an LLM workload must implement appropriate monitoring and resiliency mechanisms. Firstly, it’s important to closely monitor LLM infrastructure, to limit the impact of failures and optimize the use of compute resources. SageMaker Training proposes several features for this purpose:

  • Logs are automatically sent to CloudWatch Logs. Logs include your training script stdout and stderr. In MPI-based distributed training, all MPI workers send their logs to the leader process.
  • System resource utilization metrics like memory, CPU usage, and GPU usage, are automatically sent to CloudWatch.
  • You can define custom training metrics that will be sent to CloudWatch. The metrics are captured from logs based on regular expressions you set. Third-party experiment packages like the AWS Partner offering Weights & Biases can be used with SageMaker Training (for an example, see Optimizing CIFAR-10 Hyperparameters with W&B and SageMaker).
  • SageMaker Profiler allows you to inspect infrastructure usage and get optimization recommendations.
  • Amazon EventBridge and AWS Lambda allow you to create automated client logic reacting to events such as job failures, successes, S3 file uploads, and more.
  • SageMaker SSH Helper is a community-maintained open-source library allowing to you connect to training job hosts through SSH. It can be helpful to inspect and troubleshoot code runs on specific nodes.

In addition to monitoring, SageMaker also brings equipment for job resiliency:

  • Cluster health checks – Before your job starts, SageMaker runs GPU health checks and verifies NCCL communication on GPU instances, replacing any faulty instances if necessary in order to ensure your training script starts running on a healthy cluster of instances. Health checks are currently enabled for P and G GPU-based instance types.
  • Built-in retries and cluster update – You can configure SageMaker to automatically retry training jobs that fails with a SageMaker internal server error (ISE). As part of retrying a job, SageMaker will replace any instances that encountered unrecoverable GPU errors with fresh instances, reboot all healthy instances, and start the job again. This results in faster restarts and workload completion. Cluster update is currently enabled for P and G GPU-based instance types. You can add in your own applicative retry mechanism around the client code that submits the job, to handle other types of launch errors, such as like exceeding your account quota.
  • Automated checkpoint to Amazon S3 – This helps you checkpoint your progress and reload a past state on new jobs.

To benefit from node-level replacement, your code must error. Collectives may hang, instead of erroring, when a node fails. Therefore, to have prompt remediation, properly set a timeout on your collectives and have the code throw an error when it is reached.

Some customers set up a monitoring client to monitor and act in case of job hangs or applicative convergence stopping, by monitoring CloudWatch logs and metrics for abnormal patterns like no logs written or 0% GPU usage to hint for a hang, convergence stopping, and auto stop/retry the job.

Deep dive on checkpointing

The SageMaker checkpoint feature copies everything you write on /opt/ml/checkpoints back to Amazon S3 as the URI specified in the checkpoint_s3_uri SDK parameter. When a job starts or restarts, everything written at that URI is sent back to all the machines, at /opt/ml/checkpoints. This is convenient if you want all nodes to have access to all checkpoints, but at scale—when you have many machines or many historical checkpoints, it can lead to long download times and too high traffic on Amazon S3. Additionally, in tensor and pipeline parallelism, the workers need only a fraction of the checkpointed model, not all of it. If you face those limitations, we recommend the following options:

  • Checkpointing to FSx for Lustre – Thanks to high-performance random I/O, you can define the sharding and file attribution scheme of your choice
  • Self-managed Amazon S3 checkpointing – For examples of Python functions that can be used to save and read checkpoints in a non-blocking fashion, refer to Saving Checkpoints

We strongly suggest checkpointing your model every few hours, for example 1–3 hours, depending on associated overhead and costs.

Front end and user management

User management is a key usability strength of SageMaker compared to legacy shared HPC infrastructure. SageMaker Training permissions are ruled by several AWS Identity and Access Management (IAM) abstractions:

  • Principals—users and systems—are given permission to launch resources
  • Training jobs carry roles themselves, which allow them to have permissions of their own, for example regarding data access and service invocation

Additionally, in 2022 we added SageMaker Role Manager to facilitate the creation of persona-driven permissions.

Conclusion

With SageMaker Training, you can reduce costs and increase iteration speed on your large-model training workload. We have documented success stories in numerous posts and case studies, including:

If you’re looking to improve your LLM time-to-market while reducing your costs, take a look at the SageMaker Training API and let us know what you build!

Special thanks to Amr Ragab, Rashika Kheria, Zmnako Awrahman, Arun Nagarajan, Gal Oshri for their helpful reviews and teachings.


About the Authors

Anastasia Tzeveleka is a Machine Learning and AI Specialist Solutions Architect at AWS. She works with customers in EMEA and helps them architect machine learning solutions at scale using AWS services. She has worked on projects in different domains including Natural Language Processing (NLP), MLOps and Low Code No Code tools.

Gili Nachum is a senior AI/ML Specialist Solutions Architect who works as part of the EMEA Amazon Machine Learning team. Gili is passionate about the challenges of training deep learning models, and how machine learning is changing the world as we know it. In his spare time, Gili enjoy playing table tennis.

Olivier Cruchant is a Principal Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Bruno Pistone is an AI/ML Specialist Solutions Architect for AWS based in Milan. He works with customers of any size on helping them to to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His field of expertice are Machine Learning end to end, Machine Learning Industrialization and MLOps. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations.

Read More

Index your Microsoft Exchange content using the Exchange connector for Amazon Kendra

Index your Microsoft Exchange content using the Exchange connector for Amazon Kendra

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.

Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to pull together data across several structured and unstructured repositories to index and search on.

One such unstructured data repository is Microsoft Exchange. Email conversations contain important messages exchanged between various parties over time. Users often attach documents containing valuable information in the context of that email. In addition to emails, an Exchange account gives access to other valuable sources of information like calendar entries, OneNote notebooks, and contacts.

We’re excited to announce that you can now use the Amazon Kendra connector for Microsoft Exchange to search information stored in your Exchange account. In this post, we show how to index information stored in Exchange and use the Amazon Kendra intelligent search function. In addition, the ML-powered intelligent search can accurately find information from unstructured documents having natural language narrative content, for which keyword search is not very effective.

Solution overview

With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a Exchange repository or folder using the Amazon Kendra connector for Exchange. The solution consists of the following steps:

  1. Configure an app on Exchange and get the connection details.
  2. Store the details in AWS Secrets Manager.
  3. Create an Exchange data source via the Amazon Kendra console.
  4. Index the data in the Exchange repository.
  5. Run a sample query to test the solution.

Prerequisites

To try out the Amazon Kendra connector for Exchange, you need the following:

Configure an Exchange app and gather connection details

Before we set up the Exchange data source, we need a few details about your Exchange repository. Let’s gather those in advance.

  1. Log in to the Azure portal using your global admin user account and choose Next.
  2. Enter your password and choose Sign in.
  3. On the Azure welcome page, choose App registrations.
  4. Choose New registration.
  5. Enter a name for the app (for example, my-exchange-app) and choose Register.
  6. Note down the tenant ID (you need it when setting up the data source for Amazon Kendra).
  7. Under Client credentials, choose Add a certificate or secret.
  8. Choose New client secret.
  9. Enter a description (for example, my exchange secret).
  10. Choose an expiration period (for this post, 6 months).
  11. Choose Add.
  12. Note the secret ID and value to use later when setting up the data source.
  13. In the navigation pane, choose API permissions.

This is where you can add or remove admin permissions.

  1. For this post, leave the defaults as is.

Store Exchange credentials in Secrets Manager

To store your Exchange credentials in Secrets Manager, compete the following steps:

  1. On the Secrets Manager console, choose Store a new secret.
  2. Select Other type of secret.
  3. Create two key-value pairs for clientid and clientsecret and enter the values saved from Exchange.
  4. Choose Next.
  5. For Secret name, enter a name (for example, AmazonKendra-my-exchange-secret).
  6. Enter an optional description.
  7. Choose Next.
  8. In the Configure rotation section, keep all settings at their defaults and choose Next.
  9. On the Review page, choose Store.

Configure the Amazon Kendra connector for Exchange

To configure the Amazon Kendra connector, complete the following steps:

  1. On the Amazon Kendra console, choose Create an Index.
  2. For Index name, enter a name for the index (for example, my-exchange-index).
  3. Enter an optional description.
  4. For Role name, enter an IAM role name.
  5. Configure optional encryption settings and tags.
  6. Choose Next.
  7. For Specify provisioning, select Developer edition and choose Next.
  8. In the Configure user access control section, leave the settings at their defaults and choose Next.
  9. On the review page, choose Create.

This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.

Create an Exchange data source

Complete the following steps to create your data source:

  1. On the Amazon Kendra console, choose Data sources in the navigation pane.
  2. Under Microsoft Exchange, choose Add connector.
  3. For Data source name, enter a name (for example, my-exchange-data-source).
  4. Enter an optional description.
  5. Choose Next.
  6. For Tenant ID, choose the tenant ID you collected earlier.
  7. For AWS Secrets Manager secret, choose the secret you created earlier.
  8. For IAM role, choose Create a new role.
  9. For Role name, enter a name (for example, AmazonKendra-myexchange-datasource-role).
  10. Choose Next.
  11. For User email ID, you can enter a list of email IDs. To capture content from all users, leave the field blank.

We have kept the default selections, but you can fine-tune your selection of content as needed.

  1. For Sync mode, select Full sync (this is the first time and we need to import all content).
  2. For Frequency, choose Run on demand.
  3. Choose Next.
  4. Set any optional field mappings and choose Next.
  5. Choose Review and Create and choose Add data source.
  6. Choose Sync now.
  7. Wait for the sync to complete.

Test the solution

Now that you have ingested the content from your Exchange account into your Amazon Kendra index, you can test some queries.

  1. Go to your index and choose Search indexed content.
  2. Enter a sample search query and test out your search results (your query will vary based on the contents of your account).

The Exchange connector also crawls local identity information from Exchange. You can use this feature to narrow down your query by user.

  1. To use this feature, go back to the search results page.
  2. Expand Test query with user name or groups and choose Apply user name or groups.

For Microsoft Exchange, we don’t import groups, we just import user names. User names are email IDs in this case.

  1. Enter the user ID (email) of your user and choose Apply.
  2. Rerun your search query.

This brings you a filtered set of results based on your criteria.

  1. Go back to the search page and enter the name of a user who doesn’t have access to this content, then choose Apply.
  2. Run the same query again.

When fronting Amazon Kendra with an application such as an application built using Experience Builder, you can pass the user identity (in the form of the email ID) to Amazon Kendra to ensure that each user only sees content specific to their user ID. Alternately, you can use AWS IAM Identity Center (successor to AWS Single Sign-On) to control user context being passed to Amazon Kendra to limit queries by user.

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Exchange account.

Limitations

This solution has the following limitations:

  • Multiple domain emails are not supported.
  • Sticky notes are not supported.
  • Incremental updates are valid only for a specific period (7 days) before the client application needs to run a full synchronization again.
  • Exchange Online has rate limits that govern the speed of ingestion. For more information, refer to Exchange Online limits.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Exchange, delete that data source.

Conclusion

With the Microsoft Exchange connector for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.

To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide. For more information on how you can create, modify, or delete metadata and content when ingesting your data from Exchange, refer to Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the author

Ashish Lagwankar is a Senior Enterprise Solutions Architect at AWS. His core interests include AI/ML, serverless, and container technologies. Ashish is based in the Boston, MA, area and enjoys reading, outdoors, and spending time with his family.

Read More

Achieve rapid time-to-value business outcomes with faster ML model training using Amazon SageMaker Canvas

Achieve rapid time-to-value business outcomes with faster ML model training using Amazon SageMaker Canvas

Machine learning (ML) can help companies make better business decisions through advanced analytics. Companies across industries apply ML to use cases such as predicting customer churn, demand forecasting, credit scoring, predicting late shipments, and improving manufacturing quality.

In this blog post, we’ll look at how Amazon SageMaker Canvas delivers faster and more accurate model training times enabling iterative prototyping and experimentation, which in turn speeds up the time it takes to generate better predictions.

Training machine learning models

SageMaker Canvas offers two methods to train ML models without writing code: Quick build and Standard build. Both methods deliver a fully trained ML model including column impact for tabular data, with Quick build focusing on speed and experimentation, while Standard build providing the highest levels of accuracy.

With both methods, SageMaker Canvas pre-processes the data, chooses the right algorithm, explores and optimizes the hyperparameter space, and generates the model. This process is abstracted from the user and done behind the scenes, allowing the user to focus on the data and the results rather than the technical aspects of model training.

Housing Regression Build

Faster model training times

Previously, quick build models took up to 20 minutes and standard build models used to take up to 4 hours to generate a fully trained model with feature importance. With new performance optimizations, you can now get a quick build model in less than 7 minutes and a standard build model in less than 2 hours, depending on the size of your dataset. We estimated these numbers by running benchmark tests on different dataset sizes from 0.5 MB to 100 MB in size.

Under the hood, SageMaker Canvas uses multiple AutoML technologies to automatically build the best ML models for your data. Considering the heterogeneous characteristics of datasets, it’s difficult to know in advance which algorithm best fits a particular dataset. The newly introduced performance optimizations in SageMaker Canvas run several trials across different algorithms and trains a series of models behind the scenes, before returning the best model for the given dataset.

The configurations across all these trials are run in parallel for each dataset to find the best configuration in terms of performance and latency. The configuration tests include objective metrics such as F1 scores and Precision, and tune algorithm hyperparameters to produce optimal scores for these metrics.

Improved and accelerated model training times now enable you to prototype and experiment rapidly, resulting in quicker time to value for generating predictions using SageMaker Canvas.

Housing Regression Analyze

Summary

Amazon SageMaker Canvas enables you to get a fully trained ML model in under 7 mins, and helps generate accurate predictions for multiple machine-learning problems. With faster model training times, you can focus on understanding your data and analyzing the impact of the data, and achieve effective business outcomes.

This capability is available in all AWS regions where SageMaker Canvas is now supported. You can learn more on the SageMaker Canvas product page and the documentation.


About the Authors

Ajjay Govindaram is a Senior Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.

Meenakshisundaram Thandavarayan is a Senior AI/ML specialist with AWS. He helps hi-tech strategic accounts on their AI and ML journey. He is very passionate about data-driven AI.

Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.

Read More

Accelerate hyperparameter grid search for sentiment analysis with BERT models using Weights & Biases, Amazon EKS, and TorchElastic

Accelerate hyperparameter grid search for sentiment analysis with BERT models using Weights & Biases, Amazon EKS, and TorchElastic

Financial market participants are faced with an overload of information that influences their decisions, and sentiment analysis stands out as a useful tool to help separate out the relevant and meaningful facts and figures. However, the same piece of news can have a positive or negative impact on stock prices, which presents a challenge for this task. Sentiment analysis and other natural language programming (NLP) tasks often start out with pre-trained NLP models and implement fine-tuning of the hyperparameters to adjust the model to changes in the environment. Transformer-based language models such as BERT (Bidirectional Transformers for Language Understanding) have the ability to capture words or sentences within a bigger context of data, and allow for the classification of the news sentiment given the current state of the world. To account for changes in the economic environment, the model needs to be fine-tuned once more when the data starts drifting or the model’s prediction accuracy starts to degrade.

Hyperparameter optimization is highly computationally demanding for deep learning models. The architectural complexity increases when a single model training run requires multiple GPUs. In this post, we use the Weights & Biases (W&B) Sweeps function and Amazon Elastic Kubernetes Service (Amazon EKS) to address these challenges. Amazon EKS is a highly available managed Kubernetes service that automatically scales instances based on load, and is well suited for running distributed training workloads.

In our solution, we implement a hyperparameter grid search on an EKS cluster for tuning a bert-base-cased model for classifying positive or negative sentiment for stock market data headlines. The code can be found on the GitHub repo.

Solution overview

In this post, we present an overview of the solution architecture and discuss its key components. More specifically, we discuss the following:

  • How to set up an EKS cluster with a scalable file system
  • How to train PyTorch models using TorchElastic
  • Why the W&B platform is the right choice for machine learning (ML) experimentation and hyperparameter grid search
  • A solution architecture integrating W&B with EKS and TorchElastic

Prerequisites

To follow along with the solution, you should have an understanding of PyTorch, distributed data parallel (DDP) training, and Kubernetes.

Set up an EKS cluster with a scalable file system

One way to get started with Amazon EKS is aws-do-eks, which is an open-source project offering easy-to-use and configurable scripts and tools to provision EKS clusters and run distributed training jobs. This project is built following the principles of the Do Framework: simplicity, intuitiveness, and productivity. A desired cluster can simply be configured using the eks.conf file and launched by running the eks-create.sh script. Detailed instructions are provided in the GitHub repository for aws-do-eks.

The following diagram illustrates the EKS cluster architecture.

Some helpful tips when creating an EKS cluster with aws-do-eks:

  • Make sure CLUSTER_REGION in conf is the same as your default Region when you do aws configure.
  • Creating an EKS cluster can take up to 30 minutes. We recommended creating an aws-do-eks container like the GitHub repo suggests to ensure consistency and simplicity because the container has all the necessary tools such as kubectl, aws cli, eksctl, and so on. Then you can run into the container and run ./eks-create.sh to launch the cluster.
  • Unless you specify Spot Instances in conf, instances will be created on demand.
  • You can specify custom AMIs or specific zones for different instance types.
  • The ./eks-create.sh script will create the VPC, subnets, auto scaling groups, the EKS cluster, its nodes, and any other necessary resources. This will create one instance of each type. Then ./eks-scale.sh will scale your node groups to the desired sizes.
  • After the cluster is created, AWS Identity and Access Management (IAM) roles are generated with Amazon EKS related policies for each instance type. Policies may be needed to access Amazon Simple Storage Service (Amazon S3) or other services with these roles.
  • The following are common reasons why the ./eks-create.sh script might give an error:
    • Node groups fail to get created because of insufficient capacity. Check instance availability in the requested Region and your capacity limits.
    • A specific instance type may not be available or supported in a given zone.
    • The EKS cluster creation AWS CloudFormation stacks aren’t properly deleted. Check the active CloudFormation stacks to see if stack deletion has failed.

A scalable shared file system is needed so that multiple compute nodes in the EKS cluster can access concurrently. In this post, we use Amazon Elastic File System (Amazon EFS) as a shared file system that is elastic and provides high throughput. The scripts in aws-do-eks/Container-Root/eks/deployment/csi/ provide instructions to mount Amazon EFS on an EKS cluster. After the cluster is created and the node groups are scaled to the desired number of instances, you can view the running pods with kubectl get pod -A. Here the aws-node-xxxx, kube-proxy-xxxx, and nvidia-device-plugin-daemonset-xxxx pods run on each of the three compute nodes, and we have one system node in the kube-system namespace.

Before proceeding to create and mount an EFS volume, make sure you are in the kube-system namespace. If not, you can change it with the following code:

kubectl config set-context —current —namespace=kube-system

Then view the running pods with kubectl get pod -A.

The efs-create.sh script will create the EFS volume and mount targets in each subnet and the persistent volume. Then a new EFS volume will be visible on the Amazon EFS console.

Next, run the ./deploy.sh script to get the EFS files system ID, deploy an EFS-CSI driver on each node group, and mount the EFS persistent volume using the efs-sc.yaml and efs-pv.yaml manifest files. You can validate whether a persistent volume is mounted by checking kubectl get pv. You can also run kubectl apply -f efs-share-test.yaml, which will spin up an efs-share-test pod in the default namespace. This is a test pod that writes “hello from EFS” in the /shared-efs/test.txt file. You can run into a pod using kubectl exec -it <pod-name> -- bash. To move data from Amazon S3 to Amazon EFS, efs-data-prep-pod.yaml gives an example manifest file, assuming a data-prep.sh script exists in a Docker image that copies data from Amazon S3 to Amazon EFS.

If your model training needs higher throughput, Amazon FSx for Lustre might be a better option.

Train PyTorch models using TorchElastic

For deep learning models that train on amounts of data too large to fit in memory on a single GPU, DistributedDataParallel (PyTorch DDP) will enable the sharding of large training data into mini batches across multiple GPUs and instances, reducing training time.

TorchElastic is a PyTorch library developed with a native Kubernetes strategy supporting fault tolerance and elasticity. When training on Spot Instances, the training needs to be fault tolerant and able to resume from the epoch where the compute nodes left when the Spot Instances were last available. Elasticity allows for the seamless addition of new compute resources when available or removal of resources when they are needed elsewhere.

The following figure illustrates the architecture for DistributedDataParallel with TorchElastic. TorchElastic for Kubernetes consists of two components: TorchElastic Kubernetes Controller and the parameter server (etcd). The controller is responsible for monitoring and managing the training jobs, and the parameter server keeps track of the training job workers for distributed synchronization and peer discovery.

W&B platform for ML experimentation and hyperparameter grid search

W&B helps ML teams build better models faster. With just a few lines of code, you can instantly debug, compare, and reproduce your models—architecture, hyperparameters, git commits, model weights, GPU usage, datasets, and predictions—while collaborating with your teammates.

W&B Sweeps is a powerful tool to automate hyperparameter optimization. It allows developers to set up the hyperparameter search strategy, including grid search, random search, or Bayesian search, and it will automatically implement each training run.

To try W&B for free, sign up at Weights & Biases, or visit the W&B AWS Marketplace listing.

Integrate W&B with Amazon EKS and TorchElastic

The following figure illustrates the end-to-end process flow to orchestrate multiple DistributedDataParallel training runs on Amazon EKS with TorchElastic based on a W&B sweep config. Specifically, the steps involved are:

  1. Move data from Amazon S3 to Amazon EFS.
  2. Load and preprocess data with W&B.
  3. Build a Docker image with the training code and all necessary dependencies, then push the image to Amazon ECR.
  4. Deploy the TorchElastic controller.
  5. Create a W&B sweep config file containing all hyperparameters that need to be swept and their ranges.
  6. Create a yaml manifest template file that takes inputs from the sweep config file.
  7. Create a Python job controller script that creates N training manifest files, one for each training run, and submits the jobs to the EKS cluster.
  8. Visualize results on the W&B platform.

In the following sections, we walk through each step in more detail.

Move data from Amazon S3 to Amazon EFS

The first step is to move training, validation, and test data from Amazon S3 to Amazon EFS so all EKS compute nodes can access it. The s3_efs folder has the scripts to move data from Amazon S3 to Amazon EFS. Following the Do Framework, we need a basic Dockerfile that creates a container with a data-prep.sh script, build.sh script, and push.sh script to build the image and push it to Amazon ECR. After a Docker image is pushed to Amazon ECR, you can use the efs-data-prep-pod.yaml manifest file (see the following code), which you can run like kubectl apply -f efs-data-prep-pod.yaml to run the data-prep.sh script in a pod:

apiVersion: v1
kind: ConfigMap
metadata
name: efs-data-prep-map
data:
S3_BUCKET:<S3 Bucket URI with data>
MOUNT_PATH: /shared-efs
---
apiVersion: v1
kind: Pod
metadata:
name: efs-data-prep-pod
spec:
containers:
- name: efs-data-prep-pod
image: <Path to Docker image in ECR>
envFrom:
- configMapRef:
name: efs-data-prep-map
command: ["/bin/bash"]
args: ["-c", "/data-prep.sh $(S3_BUCKET) $(MOUNT_PATH)"]
volumeMounts:
- name: efs-pvc
mountPath: /shared-efs
volumes:
- name: efs-pvc
persistentVolumeClaim:
claimName: efs-claim
restartPolicy: Never

Load and preprocess data with W&B

The process to submit a preprocessing job is very similar to the preceding step, with a few exceptions. Instead of a data-prep.sh script, you likely need to run a Python job to preprocess the data. The preprocess folder has the scripts to run a preprocessing job. The pre-process_data.py script accomplishes two tasks: it takes in the raw data in Amazon EFS and splits it into train and test files, then it adds the data to the W&B project.

Build a Docker image with training code

main.py demonstrates how to implement DistributedDataParallel training with TorchElastic. For compatibility with W&B, it’s standard practice to add WANDB_API_KEY as an environment variable and add wandb.login() at the very beginning of the code. In addition to the standard arguments (number of epochs, batch size, number of workers for the data loader), we need to pass in wandb_project name and sweep_id as well.

In the main.py code, the run() function stores the end-to-end pipeline for the following actions:

  • Initializing wandb on node 0 for logging results
  • Loading the pre-trained model and setting up the optimizer
  • Initializing custom training and validation data loaders
  • Loading and saving checkpoints at every epoch
  • Looping through the epochs and calling the training and validation functions
  • After training is done, running predictions on the specified test set

The training, validation, custom data loader, and collate functions don’t need to be changed to log results to W&B. For a distributed training setup, we need to add the following block of code to log on the node 0 process. Here, args are the parameters for the training function in addition to the sweep ID and W&B project name:

if local_rank == 0:
  wandb.init(config=args, project=args.wandb_project)
  args = wandb.config
  do_log = True
else:
  do_log = False

For more information on W&B and distributed training, refer to Log distributed training experiments.

In the main() function, you can call the run() function as shown in the following code. Here the wandb.agent is the orchestrator of the sweep, but because we’re running multiple training jobs on Amazon EKS in parallel, we need to specify count = 1:

wandb.require("service")
   wandb.setup()

   if args.sweep_id is not None:
       wandb.agent(args.sweep_id, lambda: run(args), project=args.wandb_project, count = 1)
   else:
       run(args=args)

The Dockerfile installs the necessary dependencies for PyTorch, HuggingFace, and W&B, and specifies a Python call to torch.distributed.run as an entry point.

Deploy a TorchElastic Controller

Before training, we need to deploy a TorchElastic Controller for Kubernetes, which manages a Kubernetes custom resource ElasticJob to run TorchElastic workloads on Kubernetes. We also deploy a pod running the etcd server by running the script deploy.sh. It is recommended to delete and restart the etcd server when restarting a fresh training job.

W&B sweep config

After setting up the cluster and the container, we set up multiple runs in parallel with slightly different parameters in order to improve our model performance. W&B Sweeps will automate this kind of exploration. We set up a configuration file where we define the search strategy, the metric to monitor, and the parameters to explore. The following code shows an example sweep config file:

method: bayes
metric:
  name: val_loss
  goal: minimize
parameters:
  learning_rate:
    min: 0.001
    max: 0.1
optimizer:
  values: ["adam", "sgd"]

For more details on how to configure your sweeps, follow the W&B Sweeps Quickstart.

Create a train.yaml template

The following code is an example of the train.yaml template that we need to create. The Python job controller will take this template and generate one training .yaml file for each run in the hyperparameter grid search. Some key points to note are:

  • The kubernetes.io/instance-type value takes in the name of the instance type of the EKS compute nodes.
  • The args section includes all parameters that the py code takes in as arguments, including number of epochs, batch size, number of data loader workers, sweep_id, wandb project name, checkpoint file location, data directory location, and so on.
  • The --nproc_per_node and nvidia.com/gpu values take in the number of GPUs you want to use for training. For example, in the following config, we have p3.8xlarge as the EKS compute nodes, which have 4 Nvidia Tesla V100 GPUs, and in each training run we use 2 GPUs. We can kick off six training runs in parallel that will exhaust all available 12 GPUs, thereby ensuring high GPU utilization.
apiVersion: elastic.pytorch.org/v1alpha1
kind: ElasticJob
metadata:
 name: wandb-finbert-baseline
 #namespace: elastic-job
spec:
 # Use "etcd-service:2379" if you already apply etcd.yaml
 rdzvEndpoint: etcd-service:2379
 minReplicas: 1
 maxReplicas: 128
 replicaSpecs:
   Worker:
     replicas: 1
     restartPolicy: ExitCode
     template:
       apiVersion: v1
       kind: Pod
       spec:
         nodeSelector:
           node.kubernetes.io/instance-type: p3.8xlarge
         containers:
         - name: elasticjob-worker
           image: <path to docker image in ECR>
           imagePullPolicy: Always
           env:
           - name: NCCL_DEBUG
             value: INFO
             #  - name: NCCL_SOCKET_IFNAME
             #    value: lo
             #  - name: FI_PROVIDER
             #    value: sockets
           args:
           - "--nproc_per_node=2"
           - "/workspace/examples/huggingface/main.py"
           - "--data=/shared-efs/wandb-finbert/"
           - "--epochs=1"
           - "--batch-size=16"
           - "--workers=6"
           - "--wandb_project=aws_eks_demo"
           - "--sweep_id=jba9d36p"
           - "--checkpoint-file=/shared-efs/wandb-finbert/job-z74e8ix8/run-baseline/checkpoint.tar"
           resources:
             limits:
               nvidia.com/gpu: 2
           volumeMounts:
           - name: efs-pvc
             mountPath: /shared-efs
           - name: dshm
             mountPath: /dev/shm
         volumes:
         - name: efs-pvc
           persistentVolumeClaim:
             claimName: efs-claim
         - name: dshm
           emptyDir:
             medium: Memory

Create a grid search job controller

The script run-grid.py is the key orchestrator that takes in a TorchElastic training .yaml template and W&B sweep config file, generates multiple training manifest files, and submits them.

Visualize the results

We set up an EKS cluster with three p3.8xlarge instances with 4 Tesla V100 GPUs each. We set up six parallel runs with 2 GPUs each, while varying learning rate and weight decay parameters for the Adam optimizer. Each individual training run would take roughly 25 minutes, so the entire hyperparameter grid could be swept in 25 minutes when operating in parallel as opposed to 150 minutes if operating sequentially. If desired, a single GPU can be used for each training round by changing the --nproc_per_node and nvidia.com/gpu values in the training .yaml template.

TorchElastic implements elasticity and fault tolerance. In this work, we are using On-Demand instances, but a cluster of Spot Instances can be generated with a few changes in the EKS config. If an instance becomes available at a later time and needs to be added to the training pool while the training is going on, we just need to update the training .yaml template and resubmit it. The rendezvous functionality of TorchElastic will assimilate the new instance in the training job dynamically.

Once the grid search job controller is running, you can see all six Kubernetes jobs with kubectl get pod -A. There will be one job per training run, and each job will have one worker per node. To see the logs for each pod, you can tail logs using kubectl logs -f <pod-name>. kubetail will display the logs of all pods for each training job simultaneously. At the start of the grid controller, you get a link to the W&B platform where you can view the progress of all jobs.

The following parallel coordinates graph visualizes all grid search runs with respect to test accuracy in one plot, including those that didn’t finish. We got the highest test accuracy with a learning rate of 9.1e-4 and weight decay of 8.5e-3.

The following dashboard visualizes all grid search runs together for all metrics.

Clean up

It’s important to spin down resources after model training in order to avoid costs associated with running idle instances. With each script that creates resources, the GitHub repo provides a matching script to delete them. To clean up our setup, we must delete the EFS file system before deleting the cluster because it’s associated with a subnet in the cluster’s VPC. To delete the EFS file system, run the following command (from inside the efs folder):

./efs-delete.sh

Note that this will not only delete the persistent volume, it will also delete the EFS file system, and all the data on the file system will be lost. When this step is complete, delete the cluster by using the following script in the eks folder:

./eks-delete.sh

This will delete all the existing pods, remove the cluster, and delete the VPC created in the beginning.

Conclusion

In this post, we showed how to use an EKS cluster with Weights & Biases to accelerate hyperparameter grid search for deep learning models. Weights & Biases and Amazon EKS enables you to orchestrate multiple training runs in parallel to reduce time and cost to fine-tune your deep learning model. We have published the GitHub repo, which gives you step-by-step instructions to create an EKS cluster, set up Weights & Biases and TorchElastic for distributed data parallel training, and kickstart grid search runs on Amazon EKS with one click.


About the authors

Ankur Srivastava is a Sr. Solutions Architect in the ML Frameworks Team. He focuses on helping customers with self-managed distributed training and inference at scale on AWS. His experience includes industrial predictive maintenance, digital twins, probabilistic design optimization and has completed his doctoral studies from Mechanical Engineering at Rice University and post-doctoral research from Massachusetts Institute of Technology.

Thomas Chapelle is a Machine Learning Engineer at Weights and Biases. He is responsible for keeping the www.github.com/wandb/examples repository live and up to date. He also builds content on MLOPS, applications of W&B to industries, and fun deep learning in general. Previously he was using deep learning to solve short-term forecasting for solar energy. He has a background in Urban Planning, Combinatorial Optimization, Transportation Economics, and Applied Math.

Scott Juang is the Director of Alliances at Weights & Biases. Prior to W&B, he led a number of strategic alliances at AWS and Cloudera. Scott studied Materials Engineering and has a passion for renewable energy.

Ilan Gleiser is a Principal Global Impact Computing Specialist at AWS leading the Circular Economy, Responsible AI and ESG businesses. He is an Expert Advisor of Digital Technologies for Circular Economy with United Nations. Prior to AWS, he led AI Enterprise Solutions at Wells Fargo. He spent 10 years as Head of Morgan Stanley’s Algorithmic Trading Division in San Francisco.

Ana Simoes is a Principal ML Specialist at AWS focusing on GTM strategy for startups in the emerging technology space. Ana has had several leadership roles at startups and large corporations such as Intel and eBay, leading ML inference and linguistics related products. Ana has a Masters in Computational Linguistics and an MBA form Haas/UC Berkeley, and and has been a visiting scholar in Linguistics at Stanford. She has a technical background in AI and Natural Language Processing.

Read More

Search for answers accurately using Amazon Kendra S3 Connector with VPC support

Search for answers accurately using Amazon Kendra S3 Connector with VPC support

Amazon Kendra is an easy-to-use intelligent search service that allows you to integrate search capabilities with your applications so users can find information stored across data sources like Amazon Simple Storage Service , OneDrive and Google Drive; applications such as SalesForce, SharePoint and Service Now; and relational databases like Amazon Relational Database Service (Amazon RDS). Using Amazon Kendra connectors enables you to synchronize data from multiple content repositories with your Amazon Kendra index. When end-users ask natural language questions, Amazon Kendra uses machine learning (ML) algorithms to understand the context and return the most relevant answers.

The Amazon Kendra’s S3 connector supports indexing documents and their associated metadata stored in an S3 bucket. It’s often the case that you want to make sure that applications running inside a VPC have access only to specific S3 buckets and in many cases the connection must not traverse the internet to reach public endpoints. Many customers, however, own multiple S3 buckets, some of which are accessible by VPC endpoints for Amazon S3.  In this post, we describe how to use the updated Amazon Kendra S3 connector with VPC support for using VPC endpoints.

This post provides the steps to help you create an enterprise search engine on AWS using Amazon Kendra by connecting documents stored in a S3 bucket only accessible from within a VPC. For more information, see enhancing enterprise search with Amazon Kendra. The post also demonstrates how to configure your connector for Amazon S3 and configure how your index syncs with your data source when your data source content changes.

Overview of solution

There are three main improvements to the Amazon Kendra S3 connector :

  1. VPC support – The connector now supports using your Amazon Virtual Private Cloud (Amazon VPC) networks. You can now securely connect to Amazon S3 using VPC endpoints for Amazon S3  by specifying the VPC connection, subnet and security groups.
  2. Two sync modes – When you schedule sync of a data source in Amazon S3 to an Amazon Kendra index, you can now choose to run in Full sync mode or New, modified and deleted document sync mode. In the full sync mode, every time the synchronization runs, it scans objects in every folder under the root path it was configured to crawl and re-ingests all documents . The full refresh enables you to reset the index without the need to delete and create a new data source. In the New, modified and deleted document sync mode, every time the sync job runs, it processes only objects that were added, modified, or deleted since the last crawl. Incremental crawls can reduce runtime and cost when used with datasets that append new objects to existing data sources on a regular basis.
  3. Additional inclusion and exclusion patterns for documents: In addition to prefixes, we’re introducing patterns for inclusion or exclusion of documents from your index. Two supported pattern types are Unix style glob or file types. You can now add a regular expression pattern to include specific folders or exclude folders, file types, or specific files from your data source. This can be useful for shared data repositories that contain content belonging to different categories, classification and file types.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Create and configure your document repository

Before you can create an index in Amazon Kendra, you need to load documents into an S3 bucket. This section contains instructions to create an S3 bucket, get the files, and load them into the bucket. After completing all the steps in this section, you have a data source that Amazon Kendra can use.

  1. On the AWS Management Console, in the Region list, choose US East (N. Virginia) or any Region of your choice that Amazon Kendra is available in.
  2. Choose Services.
  3. Under Storage, choose S3.
  4. On the Amazon S3 console, choose Create bucket.
  5. Under General configuration, provide the following information:
    • For Bucket name, enter kendrapost-{your account id}.
    • For Region, choose the same Region that you use to deploy your Amazon Kendra index (this post uses us-east-1).
    • Under Bucket settings, for Block Public Access, leave everything with the default values.
  6. Under Advanced settings, leave everything with the default values.
  7. Choose Create bucket.
  8. Download AWS_Whitepapers.zip and unzip the files.
  9. On the Amazon S3 console, select the bucket that you just created and choose Upload.
  10. Upload the folders Best Practices, Databases, General, and Machine Learning from the unzipped file.

Inside your bucket, you should now see four folders.

Add a data source

A data source is a location that stores the documents for indexing. You can synchronize data sources automatically with an Amazon Kendra index to make sure that searches correctly reflect new, updated, or deleted documents in the source repositories.

After completing all the steps in this section, you’ll have a data source linked to Amazon Kendra. For more information, see Adding documents from a data source.

Before continuing, make sure that the index creation is complete and the index shows as Active. For more information, see Creating an Index.

  1. On the Amazon Kendra console, navigate to your index (for this post, kendra-blog-index).
  2. On the kendra-blog-index page, choose Add data sources.
  3. Under Amazon S3, choose Add connector.

For more information about the different data sources that Amazon Kendra supports, see Adding documents from a data source.

  1. In the Specify data source details section, for Data source name, enter aws_white_paper.
  2. For Description, enter AWS White Paper documentation.
  3. Choose Next.

Now you create an AWS Identity and Access Management (IAM) role for Amazon Kendra.

  1. In the Define access and security page, for IAM role section, choose Create a new role.
  2. For Role name, enter source-role (your role name is prefixed with AmazonKendra-).
  3. In the Configure VPC and security section, choose your VPC, and enter your Subnets and VPC security groups.

For more information on connecting your Amazon Kendra to your Amazon Virtual Private Cloud, see Configuring Amazon Kendra to use a VPC.

  1. Choose Next.
  2. In the Configure sync settings page, for Enter the data source location, enter the S3 bucket you created: kendrapost-{your account id}.
  3. Leave Metadata files prefix folder location blank.

By default, metadata files are stored in the same directory as the documents. If you want to place these files in a different folder, you can add a prefix. For more information, see Amazon S3 document metadata.

  1. For Select decryption key, leave it deselected.
  2. For Additional configuration, you can add a pattern to include or exclude certain folders or files. For this post, keep the default values.
  3. For Sync mode choose New, modified, or deleted documents sync.
  4. For Frequency, choose Run on demand.

This step defines the frequency with which the data source is synchronized with the Amazon Kendra index.

  1. Choose Next.
  2. In the Set field mappings page, keep the default values.
  3. Choose Next.
  4. On the Review and create page, choose Add data source.
  5. Navigate back to your Kendra index.
  6. Choose your Data Source, then choose Sync now to synchronize the documents with the Amazon Kendra index.

The duration of this process depends on the number of documents that you index. For this use case, it may take 15 minutes, after which you should see a message that the sync was successful. In the Sync run history section, you can see that 40 documents were synchronized.

Your Amazon Kendra index is now ready for natural language queries. When you search your index, Amazon Kendra uses all the data and metadata provided to return the most accurate answers to your search query. On the Amazon Kendra console, choose Search indexed content. In the query field, start with a query such as “Which AWS service has 11 nines of durability?”

For more information about querying the index, see Querying an Index

Synchronize data source changes to search the index

Your data source is set up to sync any new, modified or deleted data. Before you can synchronize your data source incrementally with an index in Amazon Kendra, you need to load new documents into an S3 bucket.

  1. On the Amazon S3 console, select the bucket that you just created and choose Upload.
  2. Upload the folders Security and Well_Architected from the unzipped file.

Now you can synchronize the new documents added to the S3 bucket:

  1. On the Amazon Kendra console, choose Data sources and then select your S3 data source.
  2. Choose Sync Now.

The duration of this process depends on the number of documents that you index. For this use case, it may take 15 minutes, after which you should see a message that the sync was successful.

In the Sync run history section, you can see that 20 documents were synchronized.

Re-index the data source

In a scenario where the data source has stale information, you can now re-index the data source without having to delete and create a new data source. To modify the sync mode and re-index the data source, complete the following steps:

  1. On the Amazon Kendra console, choose Data sources and then select your S3 data source.
  2. On the Actions menu, choose Edit.
  3. Choose Next to move to Step 3 – Configure sync settings page.
  4. For Sync mode, select Full Sync.

  5. For Frequency, choose Run on demand.
  6. Choose Next.
  7. In the Set field mappings page, keep the default values.
  8. Choose Next.
  9. On the Review and create page, choose Update.

Now you can synchronize the new documents added to the S3 bucket.

  1. On the Amazon Kendra console, choose Data sources and then select your S3 data source.
  2. Choose Sync Now.

In the Sync run history section, you can see that all documents were synchronized irrespective of the previous sync status under the modified column.

Clean up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created:

  1. On the Amazon Kendra index, choose Indexes in the navigation pane.
  2. Select the index you created and on the Actions menu, choose Delete.
  3. To confirm deletion, enter Delete when prompted and choose Delete.

Wait until you get the confirmation message; the process can take up to 15 minutes.

  1. On the Amazon S3 console, delete the S3 bucket.
  2. On the IAM console, delete the corresponding IAM roles.

Conclusion

In this post, you learned how to use Amazon Kendra to deploy an enterprise search service using a secure connection to Amazon S3 that doesn’t require an internet gateway or Network Address Translation (NAT) device. You can enable quicker syncs for your documents using sync mode.

There are many additional features that we didn’t cover. For example:

  • You can enable user-based access control for your Amazon Kendra index, and restrict access to documents based on the access controls you have already configured.
  • You can map object attributes to Amazon Kendra index attributes, and enable them for faceting, search, and display in the search results.
  • You can quickly find information from webpages (HTML tables) using Amazon Kendra tabular search

To learn more about Amazon Kendra, refer Amazon Kendra Developer Guide.


About the Authors

Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel.

Arjun Agrawal is Software Engineer at AWS, currently working with an Amazon Kendra team on an enterprise search engine. He is passionate about new technology and solving real-world problems. Outside of work, he loves to hike and travel.

Read More

Virtual fashion styling with generative AI using Amazon SageMaker 

Virtual fashion styling with generative AI using Amazon SageMaker 

The fashion industry is a highly lucrative business, with an estimated value of $2.1 trillion by 2025, as reported by the World Bank. This field encompasses a diverse range of segments, such as the creation, manufacture, distribution, and sales of clothing, shoes, and accessories. The industry is in a constant state of change, with new styles and trends appearing frequently. Therefore, fashion companies must be flexible and able to adapt in order to maintain their relevance and achieve success in the market.

Generative artificial intelligence (AI) refers to AI algorithms designed to generate new content, such as images, text, audio, or video, based on a set of learned patterns and data. It can be utilized to generate new and innovative apparel designs while offering improved personalization and cost-effectiveness. AI-driven design tools can create unique apparel designs based on input parameters or styles specified by potential customers through text prompts. Furthermore, AI can be utilized to personalize designs to the customer’s preferences. For example, a customer could select from a variety of colors, patterns, and styles, and AI models would generate a one-of-a-kind design based on those selections. The adoption of AI in the fashion industry is currently hindered by various technical, feasibility, and cost challenges. However, these obstacles can now be mitigated by utilizing advanced generative AI methods such as natural language-based image semantic segmentation and diffusion for virtual styling.

This blog post details the implementation of generative AI-assisted fashion online styling using text prompts. Machine learning (ML) engineers can fine-tune and deploy text-to-semantic-segmentation and in-painting models based on pre-trained CLIPSeq and Stable Diffusion with Amazon SageMaker. This enables fashion designers and consumers to create virtual modeling images based on text prompts and choose their preferred styles.

Solution Architecture

Generative AI Solutions

The CLIPSeg model introduced a novel image semantic segmentation method allowing you to easily identify fashion items in pictures using simple text commands. It utilizes a text prompt or an image encoder to encode textual and visual information into a multimodal embedding space, enabling highly accurate segmentation of target objects based on the prompt. The model has been trained on a vast amount of data with techniques such as zero-shot transfer, natural language supervision, and multimodal self-supervised contrastive learning. This means that you can utilize a pre-trained model that is publicly available by Timo Lüddecke et al without the need for customization.

CLIPSeg Architecture

CLIPSeg is a model that uses a text and image encoder to encode textual and visual information into a multimodal embedding space to perform semantic segmentation based on a text prompt. The architecture of CLIPSeg consists of two main components: a text encoder and an image encoder. The text encoder takes in the text prompt and converts it into a text embedding, while the image encoder takes in the image and converts it into an image embedding. Both embeddings are then concatenated and passed through a fully connected layer to produce the final segmentation mask.

In terms of data flow, the model is trained on a dataset of images and corresponding text prompts, where the text prompts describe the target object to be segmented. During the training process, the text encoder and image encoder are optimized to learn the mapping between the text prompts and the image to produce the final segmentation mask. Once the model is trained, it can take in a new text prompt and image and produce a segmentation mask for the object described in the prompt.

Stable Diffusion is a technique that allows fashion designers to generate highly realistic imagery in large quantities purely based on text descriptions without the need for lengthy and expensive customization. This is beneficial for designers who want to create vogue styles quickly, and manufacturers who want to produce personalized products at a lower cost.

The following diagram illustrates the Stable Diffusion architecture and data flow.

Stable Diffusion Architecture

Compared to traditional GAN-based methods, Stable Diffusion is a generative AI that is capable of producing more stable and photo-realistic images that match the distribution of the original image. The model can be conditioned on a wide range of purposes, such as text for text-to-image generation, bounding boxes for layout-to-image generation, masked images for in-painting, and lower-resolution images for super-resolution. Diffusion models have a wide range of business applications, and their practical uses continue to evolve. These models will greatly benefit various industries such as fashion, retail and e-commerce, entertainment, social media, marketing, and more.

Generate masks from text prompts using CLIPSeg

Vogue online styling is a service that enables customers to receive fashion advice and recommendations from AI through an online platform. It does this by selecting clothing and accessories that complement the customer’s appearance, fit within their budget, and match their personal preferences. With the utilization of generative AI, tasks can be accomplished with greater ease, leading to increased customer satisfaction and reduced expenses.

The solution can be deployed on an Amazon Elastic Compute Cloud (EC2) p3.2xlarge instance, which has one single V100 GPU with 16G memory. Several techniques were employed to improve performance and reduce GPU memory usage, resulting in faster image generation. These include using fp16 and enabling memory efficient attention to decrease bandwidth in the attention block.

We began by having the user upload a fashion image, followed by downloading and extracting the pre-trained model from CLIPSeq. The image is then normalized and resized to comply with the size limit. Stable Diffusion V2 supports image resolution up to 768×768 while V1 supports up to 512×512. See the following code:

from models.clipseg import CLIPDensePredT

# The original image
image = download_image(img_url).resize((768, 768))

# Download pre-trained CLIPSeq model and unzip the pkg
! wget https://owncloud.gwdg.de/index.php/s/ioHbRzFx6th32hn/download -O weights.zip
! unzip -d weights -j weights.zip

# Load CLIP model. Available models = ['RN50', 'RN101', 'RN50x4', 
# 'RN50x16', 'RN50x64', 'ViT-B/32', 'ViT-B/16', 'ViT-L/14', 'ViT-L/14@336px']
model = CLIPDensePredT(version='ViT-B/16', reduce_dim=64)
model.eval()

# non-strict, because we only stored decoder weights (not CLIP weights)
model.load_state_dict(torch.load('weights/rd64-uni.pth', 
    map_location=torch.device('cuda')), strict=False)

# Image normalization and resizing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    transforms.Resize((768, 768)),
])
img = transform(image).unsqueeze(0)

With the use of the pre-trained CLIPSeq model, we are able to extract the target object from an image using a text prompt. This is done by inputting the text prompt into the text encoder, which converts it into a text embedding. The image is then input into the image encoder, which converts it into an image embedding. Both embeddings are then concatenated and passed through a fully connected layer to produce the final segmentation mask, which highlights the target object described in the text prompt. See the following code:

# Text prompt
prompt = 'Get the dress only.'

# predict
mask_image_filename = 'the_mask_image.png'
with torch.no_grad():
    preds = model(img.repeat(4,1,1,1), prompt)[0]
    
# save the mask image after computing the area under the standard 
#   Gaussian probability density function and calculates the cumulative 
#   distribution function of the normal distribution with ndtr.   
plt.imsave(mask_image_filename,torch.special.ndtr(preds[0][0]))

With the accurate mask image from semantic segmentation, we can use in-painting for content substitution. In-painting is the process of using a trained generative model to fill in missing parts of an image. By using the mask image to identify the target object, we can apply the in-painting technique to substitute the target object with something else, such as a different clothing item or accessory. The Stable Diffusion V2 model can be used for this purpose, because it is capable of producing high-resolution, photo-realistic images that match the distribution of the original image.

Fine-tuning from pre-trained models using DreamBooth

Fine-tuning is a process in deep learning where a pre-trained model is further trained on a new task using a small amount of labelled data. Rather than training from scratch, the idea is to take a network that has already been trained on a large dataset for a similar task and further train it on a new dataset to make it more specialized for that particular task.

Fashion designers can also use a subject-driven, fine-tuned Stable Diffusion in-painting model to generate a specific class of style, such as casual long skirts for ladies. To do this, the first step is to provide a set of sample images in the target domain, roughly about 1 dozens, with proper text labels such as the following and binding them to a unique identifier that references the design, style, color and fabric. The label on the text plays a critical role in determining the results of the fine-tuned model. There are several ways to enhance fine tuning through effective prompt engineering and here are a few examples.

Sample text prompts to descibe some of the most common design elements of casual 
long skirts for ladies:

Design Style: A-line, wrap, maxi, mini, and pleated skirts are some of the most 
    popular styles for casual wear. A-line skirts are fitted at the waist and 
    flare out at the hem, creating a flattering silhouette. Wrap skirts have a
    wrap closure and can be tied at the waist for a customizable fit. Maxi skirts 
    are long and flowy, while mini skirts are short and flirty. Pleated skirts 
    have folds that add texture and movement to the garment.
Pattern: Casual skirts can feature a variety of patterns, including stripes, 
    florals, polka dots, and solids. These patterns can range from bold and graphic 
    to subtle and understated.
Colors: Casual skirts come in a range of colors, including neutral shades likeblack, 
    white, and gray, as well as brighter hues like pink, red, and blue. Some skirts 
    may also feature multiple colors in a single garment, such asa skirt with a bold 
    pattern that incorporates several shades.
Fabrics: Common fabrics used in casual skirts include cotton, denim, linen, and 
    rayon. These materials offer different levels of comfort and durability, making 
    it easy to find a skirt that suits your personal style and needs.

Using a small set of images to fine-tune Stable Diffusion may result in model overfitting. DreamBooth[5] addresses this by using a class-specific prior-preservation loss. It learns to bind a unique identifier with that specific subject in two steps. First, it fine-tunes the low-resolution model with the input images paired with a text prompt that contains a unique identifier and the name of the class the subject belongs to, such as “skirt”. In practice, this means having the model fit images and the images sampled from the visual prior of the non-fine-tuned class simultaneously. These prior-preserving images are sampled and labeled using the “class noun” prompt. Second, it will fine-tune the super-high-resolution components by pairing low-resolution and high-resolution images from the input images set, which allows the outputs of the fine-tuned model to maintain fidelity to small details.

Fine-tuning a pre-trained in-painting text encoder with the UNet for resolution 512×512 images requires approximately 22GB of VRAM or higher for 768×768 resolution.  Ideally fine-tune samples should be resized to match the desirable output image resolution to avoid performance degradation. The text encoder produces more accurate details such as model faces. One option is to run on a single AWS EC2 g5.2xlarge instance, now available in eight regions or use Hugging Face Accelerate to run the fine-tuned code across a distributed configuration. For additional memory savings, you can choose a sliced version of attention that performs the computation in steps instead of all at once by simply modifying DreamBooth’s training script train_dreambooth_inpaint.py to add the pipeline enable_attention_slicing() function.

Accelerate is a library that enables one fine tuning code to be run across any distributed configuration. Hugging Face and Amazon introduced Hugging Face Deep Learning Containers (DLCs) to scale fine tuning tasks across multiple GPUs and nodes. You can configure the launch configuration for Amazon SageMaker with a single CLI command.

# From your aws account, install the sagemaker sdk for Accelerate
pip install "accelerate[sagemaker]" --upgrade

# Configure the launch configuration for Amazon SageMaker 
accelerate config

# List and verify Accelerate configuration
accelerate env

# Make necessary modification of the training script as the following to save 
# output on S3, if needed
#  - torch.save('/opt/ml/model`)
#  + accelerator.save('/opt/ml/model')

To launch a fine-tune job, verify Accelerate’s configuration using CLI and provide the necessary training arguments, then use the following shell script.

# Instance images — Custom images that represents the specific 
#          concept for dreambooth training. You should collect 
#          high #quality images based on your use cases.
# Class images — Regularization images for prior-preservation 
#          loss to prevent overfitting. You should generate these 
#          images directly from the base pre-trained model. 
#          You can choose to generate them on your own or generate 
#         them on the fly when running the training script.
# 
# You can access train_dreambooth_inpaint.py from huggingface/diffuser 

export MODEL_NAME="stabilityai/stable-diffusion-2-inpainting"
export INSTANCE_DIR="/data/fashion/gowns/highres/"
export CLASS_DIR="/opt/data/fashion/generated_gowns/imgs"
export OUTPUT_DIR="/opt/model/diffuser/outputs/inpainting/"

accelerate launch train_dreambooth_inpaint.py 
  --pretrained_model_name_or_path=$MODEL_NAME  
  --train_text_encoder 
  --instance_data_dir=$INSTANCE_DIR 
  --class_data_dir=$CLASS_DIR 
  --output_dir=$OUTPUT_DIR 
  --with_prior_preservation --prior_loss_weight=1.0 
  --instance_prompt="A supermodel poses in long summer travel skirt, photorealistic" 
  --class_prompt="A supermodel poses in skirt, photorealistic" 
  --resolution=512 
  --train_batch_size=1 
  --use_8bit_adam 
  --gradient_checkpointing 
  --learning_rate=2e-6 
  --lr_scheduler="constant" 
  --lr_warmup_steps=0 
  --num_class_images=200 
  --max_train_steps=800

The fine-tuned in-painting model allows for the generation of more specific images to the fashion class described by the text prompt. Because it has been fine-tuned with a set of high-resolution images and text prompts, the model can generate images that are more tailored to the class, such as formal evening gowns. It’s important to note that the more specific the class and the more data used for fine-tuning, the more accurate and realistic the output images will be.

%tree -d ./finetuned-stable-diffusion-v2-1-inpainting
finetuned-stable-diffusion-v2-1-inpainting
├── 512-inpainting-ema.ckpt
├── feature_extractor
├── code
│ └──inference.py
│ ├──requirements.txt
├── scheduler
├── text_encoder 
├── tokenizer
├── unet
└── vae

Deploy a fine-tuned in-painting model using SageMaker for inference

With Amazon SageMaker, you can deploy the fine-tuned Stable Diffusion models for real-tim inference. To upload the model to Amazon Simple Storage service (S3) for deployment, a model.tar.gz archive tarball must be created. Ensure the archive directly includes all files, not a folder that contains them. The DreamBooth fine-tuning archive folder should appear as follows after eliminating the intermittent checkpoints:

The initial step in creating our inference handler involves the creation of the inference.py file. This file serves as the central hub for loading the model and handling all incoming inference requests. After the model is loaded, the model_fn() function is executed. When the need arises to perform inference, the predict_fn() function is called. Additionally, the decode_base64() function is utilized to convert a JSON string, contained within the payload, into a PIL image data type.

%%writefile code/inference.py
import base64
import torch
from PIL import Image
from io import BytesIO
from diffusers import EulerDiscreteScheduler, StableDiffusionInpaintPipeline

def decode_base64(base64_string):
    decoded_string = BytesIO(base64.b64decode(base64_string))
    img = Image.open(decoded_string)
    return img

def model_fn(model_dir):
    # Load stable diffusion and move it to the GPU
    scheduler = EulerDiscreteScheduler.from_pretrained(model_dir, subfolder="scheduler")
    pipe = StableDiffusionInpaintPipeline.from_pretrained(model_dir, 
                                                   scheduler=scheduler,
                                                   revision="fp16",
                                                   torch_dtype=torch.float16)
    pipe = pipe.to("cuda")
    pipe.enable_xformers_memory_efficient_attention()
    #pipe.enable_attention_slicing()
    return pipe


def predict_fn(data, pipe):
    # get prompt & parameters
    prompt = data.pop("inputs", data) 
    # Require json string input. Inference to convert imge to string.
    input_img = data.pop("input_img", data)
    mask_img = data.pop("mask_img", data)
    # set valid HP for stable diffusion
    num_inference_steps = data.pop("num_inference_steps", 25)
    guidance_scale = data.pop("guidance_scale", 6.5)
    num_images_per_prompt = data.pop("num_images_per_prompt", 2)
    image_length = data.pop("image_length", 512)
    # run generation with parameters
    generated_images = pipe(
        prompt,
        image = decode_base64(input_img),
        mask_image = decode_base64(mask_img),
        num_inference_steps=num_inference_steps,
        guidance_scale=guidance_scale,
        num_images_per_prompt=num_images_per_prompt,
        height=image_length,
        width=image_length,
    #)["images"] # for Stabel Diffusion v1.x
    ).images
    
    # create response
    encoded_images = []
    for image in generated_images:
        buffered = BytesIO()
        image.save(buffered, format="JPEG")
        encoded_images.append(base64.b64encode(buffered.getvalue()).decode())
        
    return {"generated_images": encoded_images}

To upload the model to an Amazon S3 bucket, it’s necessary to first create a model.tar.gz archive. It’s crucial to note that the archive should consist of the files directly and not a folder that holds them. For instance, the file should appear as follows:

import tarfile
import os

# helper to create the model.tar.gz
def compress(tar_dir=None,output_file="model.tar.gz"):
    parent_dir=os.getcwd()
    os.chdir(tar_dir)
    with tarfile.open(os.path.join(parent_dir, output_file), "w:gz") as tar:
        for item in os.listdir('.'):
          print(item)
          tar.add(item, arcname=item)    
    os.chdir(parent_dir)
            
compress(str(model_tar))

# After we created the model.tar.gz archive we can upload it to Amazon S3. We will 
# use the sagemaker SDK to upload the model to our sagemaker session bucket.
from sagemaker.s3 import S3Uploader

# upload model.tar.gz to s3
s3_model_uri=S3Uploader.upload(local_path="model.tar.gz", 
        desired_s3_uri=f"s3://{sess.default_bucket()}/finetuned-stable-diffusion-v2-1-inpainting")

After the model archive is uploaded, we can deploy it on Amazon SageMaker using HuggingfaceModel for real-time inference. You can host the endpoint using a g4dn.xlarge instance, which is equipped with a single NVIDIA Tesla T4 GPU with 16GB of VRAM. Autoscaling can be activated to handle varying traffic demands. For information on incorporating autoscaling in your endpoint, see Going Production: Auto-scaling Hugging Face Transformers with Amazon SageMaker.

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_model_uri,      # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.17",  # transformers version used
   pytorch_version="1.10",       # pytorch version used
   py_version='py38',            # python version used
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge"
    )

The huggingface_model.deploy() method returns a HuggingFacePredictor object that can be used to request inference. The endpoint requires a JSON with an inputs key, which represents the input prompt for the model to generate an image. You can also control the generation with parameters such as num_inference_steps, guidance_scale, and “num_images_per_prompt”. The predictor.predict() function returns a JSON with a “generated_images” key, which holds the four generated images as base64 encoded strings. We added two helper functions, decode_base64_to_image and display_images, to decode the response and display the images respectively. The former decodes the base64 encoded string and returns a PIL.Image object, and the latter displays a list of PIL.Image objects. See the following code:

import PIL
from io import BytesIO
from IPython.display import display
import base64
import matplotlib.pyplot as plt
import json

# Encoder to convert an image to json string
def encode_base64(file_name):
    with open(file_name, "rb") as image:
        image_string = base64.b64encode(bytearray(image.read())).decode()
    return image_string
    
# Decode to to convert a json str to an image 
def decode_base64_image(base64_string):
    decoded_string = BytesIO(base64.b64decode(base64_string))
    img = PIL.Image.open(decoded_string)
    return img
    
# display PIL images as grid
def display_images(images=None,columns=3, width=100, height=100):
    plt.figure(figsize=(width, height))
    for i, image in enumerate(images):
        plt.subplot(int(len(images) / columns + 1), columns, i + 1)
        plt.axis('off')
        plt.imshow(image)
        
# Display images in a row/col grid
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols
    w, h = imgs[0].size
    grid = PIL.Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

Let’s move forward with the in-painting task. It has been estimated that it will take roughly 15 seconds to produce three images, given the input image and the mask created using CLIPSeg with the text prompt discussed previously. See the following code:

num_images_per_prompt = 3
prompt = "A female super-model poses in a casual long vacation skirt, with full body length, bright colors, photorealistic, high quality, highly detailed, elegant, sharp focus"

# Convert image to string
input_image_filename = "./imgs/skirt-model-2.jpg"
encoded_input_image = encode_base64(input_image_filename)
encoded_mask_image = encode_base64("./imgs/skirt-model-2-mask.jpg")


# Set in-painint parameters
guidance_scale = 6.7
num_inference_steps = 45

# run prediction
response = predictor.predict(data={
  "inputs": prompt,
  "input_img": encoded_input_image,
  "mask_img": encoded_mask_image,
  "num_images_per_prompt" : num_images_per_prompt,
  "image_length": 768
  }
)

# decode images
decoded_images = [decode_base64_image(image) for image in response["generated_images"]]

# visualize generation
display_images(decoded_images, columns=num_images_per_prompt, width=100, height=100)

# insert initial image in the list so we can compare side by side
image = PIL.Image.open(input_image_filename).convert("RGB")
decoded_images.insert(0, image)
                       
# Display inpainting images in grid
image_grid(decoded_images, 1, num_images_per_prompt + 1)

The in-painted images can be displayed along with the original image for visual comparison. Additionally, the in-painting process can be constrained using various parameters such as guidance_scale, which controls the strength of the guidance image during the in-painting process. This allows the user to adjust the output image and achieve the desired results.
Inference Output

Amazon SageMaker Jumpstart offers Stable Diffusion templates for various models, including text-to-image and upscaling. For more information, please refer to SageMaker JumpStart now provides Stable Diffusion and Bloom models. Additional Jumpstart templates will be available in the near future.

Limitations

Although CLIPSeg usually performs well on recognizing common objects, it struggles on more abstract or systematic tasks such as counting the number of objects in an image and on more complex tasks such as predicting how close the nearest object such a handbag is in a photo. Zero-shot CLIPSeq also struggles compared to task-specific models on very fine-grained classification, such as telling the difference between two vague designs, variants of dress, or style classification. CLIPSeq also still has poor generalization to images not covered in its pre-training dataset. Finally, it has been observed that CLIP’s zero-shot classifiers can be sensitive to wording or phrasing and sometimes require trial and error “prompt engineering” to perform well. Switching to a different semantic segmentation model for CLIPSeq’s backbone, such as BEiT, which boasts a 62.8% mIOU on the ADE20K dataset, could potentially improve results.

Fashion designs generated by using Stable Diffusion have been found to be limited to parts of garments that are at least as predictably-placed in the wider context of the fashion models, and which conform to high-level embeddings that you could reasonably expect to find in a hyperscale dataset used during training the pre-trained model. The real limit of generative AI is that the model will eventually produce totally imaginary and less authentic outputs. Therefore, the fashion designs generated by AI may not be as varied or unique as those created by human designers.

Conclusion

Generative AI provides the fashion sector an opportunity to transform their practices through better user experiences and cost-efficient business strategies. In this post, we showcase how to harness generative AI to enable fashion designers and consumers to create personalized fashion styles using virtual modeling. With the assistance of existing Amazon SageMaker Jumpstart templates and those to come, users can quickly embrace these advanced techniques without needing in-depth technical expertise, all while maintaining versatility and lowering expenses.

This innovative technology presents new chances for companies and professionals involved in content generation, across various industries. Generative AI provides ample capabilities for enhancing and creating content. Try out the recent additions to the Jumpstart templates in your SageMaker Studio, such as fine-tuning text-to-image and upscale capabilities.

We would like to thank Li Zhang, Karl Albertsen, Kristine Pearce, Nikhil Velpanur, Aaron Sengstacken, James Wu and Neelam Koshiya for their supports and valuable inputs that helped improve this work.


About the Authors

Alfred Shen is a Senior AI/ML Specialist at AWS. He has been worked in Silicon Valley, holding technical and managerial positions in diverse sectors including healthcare, finance, and high-tech. He is a dedicated applied AI/ML researcher, concentrating on CV, NLP, and multimodality. His work has been showcased in publications such as EMNLP, ICLR, and Public Health.

Vivek MadanDr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences

Read More