Build a multilingual automatic translation pipeline with Amazon Translate Active Custom Translation

Build a multilingual automatic translation pipeline with Amazon Translate Active Custom Translation

Dive into Deep Learning (D2L.ai) is an open-source textbook that makes deep learning accessible to everyone. It features interactive Jupyter notebooks with self-contained code in PyTorch, JAX, TensorFlow, and MXNet, as well as real-world examples, exposition figures, and math. So far, D2L has been adopted by more than 400 universities around the world, such as the University of Cambridge, Stanford University, the Massachusetts Institute of Technology, Carnegie Mellon University, and Tsinghua University. This work is also made available in Chinese, Japanese, Korean, Portuguese, Turkish, and Vietnamese, with plans to launch Spanish and other languages.

It is a challenging endeavor to have an online book that is continuously kept up to date, written by multiple authors, and available in multiple languages. In this post, we present a solution that D2L.ai used to address this challenge by using the Active Custom Translation (ACT) feature of Amazon Translate and building a multilingual automatic translation pipeline.

We demonstrate how to use the AWS Management Console and Amazon Translate public API to deliver automatic machine batch translation, and analyze the translations between two language pairs: English and Chinese, and English and Spanish. We also recommend best practices when using Amazon Translate in this automatic translation pipeline to ensure translation quality and efficiency.

Solution overview

We built automatic translation pipelines for multiple languages using the ACT feature in Amazon Translate. ACT allows you to customize translation output on the fly by providing tailored translation examples in the form of parallel data. Parallel data consists of a collection of textual examples in a source language and the desired translations in one or more target languages. During translation, ACT automatically selects the most relevant segments from the parallel data and updates the translation model on the fly based on those segment pairs. This results in translations that better match the style and content of the parallel data.

The architecture contains multiple sub-pipelines; each sub-pipeline handles one language translation such as English to Chinese, English to Spanish, and so on. Multiple translation sub-pipelines can be processed in parallel. In each sub-pipeline, we first build the parallel data in Amazon Translate using the high-quality dataset of tailed translation examples from the human-translated D2L books. Then we generate the customized machine translation output on the fly at run time, which achieves better quality and accuracy.

solution architecture

In the following sections, we demonstrate how to build each translation pipeline using Amazon Translate with ACT, along with Amazon SageMaker and Amazon Simple Storage Service (Amazon S3).

First, we put the source documents, reference documents, and parallel data training set in an S3 bucket. Then we build Jupyter notebooks in SageMaker to run the translation process using Amazon Translate public APIs.

Prerequisites

To follow the steps in this post, make sure you have an AWS account with the following:

  • Access to AWS Identity and Access Management (IAM) for role and policy configuration
  • Access to Amazon Translate, SageMaker, and Amazon S3
  • An S3 bucket to store the source documents, reference documents, parallel data dataset, and output of translation

Create an IAM role and policies for Amazon Translate with ACT

Our IAM role needs to contain a custom trust policy for Amazon Translate:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "Statement1",
        "Effect": "Allow",
        "Principal": {
            "Service": "translate.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
    }]
}

This role must also have a permissions policy that grants Amazon Translate read access to the input folder and subfolders in Amazon S3 that contain the source documents, and read/write access to the output S3 bucket and folder that contains the translated documents:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "s3:ListBucket",
            "s3:GetObject",
            "s3:PutObject",
            “s3:DeleteObject” 
        ]
        "Resource": [
            "arn:aws:s3:::YOUR-S3_BUCKET-NAME"
        ] 
    }]
}

To run Jupyter notebooks in SageMaker for the translation jobs, we need to grant an inline permission policy to the SageMaker execution role. This role passes the Amazon Translate service role to SageMaker that allows the SageMaker notebooks to have access to the source and translated documents in the designated S3 buckets:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Action": ["iam:PassRole"],
        "Effect": "Allow",
        "Resource": [
            "arn:aws:iam::YOUR-AWS-ACCOUNT-ID:role/batch-translate-api-role"
        ]
    }]
}

Prepare parallel data training samples

The parallel data in ACT needs to be trained by an input file consisting of a list of textual example pairs, for instance, a pair of source language (English) and target language (Chinese). The input file can be in TMX, CSV, or TSV format. The following screenshot shows an example of a CSV input file. The first column is the source language data (in English), and the second column is the target language data (in Chinese). The following example is extracted from D2L-en book and D2L-zh book.

scrennshot-1

Perform custom parallel data training in Amazon Translate

First, we set up the S3 bucket and folders as shown in the following screenshot. The source_data folder contains the source documents before the translation; the generated documents after the batch translation are put in the output folder. The ParallelData folder holds the parallel data input file prepared in the previous step.

screenshot-2

After uploading the input files to the source_data folder, we can use the CreateParallelData API to run a parallel data creation job in Amazon Translate:

S3_BUCKET = “YOUR-S3_BUCKET-NAME”
pd_name = “pd-d2l-short_test_sentence_enzh_all”
pd_description = “Parallel Data for English to Chinese”
pd_fn = “d2l_short_test_sentence_enzh_all.csv”
response_t = translate_client.create_parallel_data(
                Name=pd_name,                              # pd_name is the parallel data name 
                Description=pd_description,          # pd_description is the parallel data description 
                ParallelDataConfig={
                      'S3Uri': 's3://'+S3_BUCKET+'/Paralleldata/'+pd_fn,        # S3_BUCKET is the S3 bucket name defined in the previous step
                      'Format': 'CSV'
                },
)
print(pd_name, ": ", response_t['Status'], " created.")

To update existing parallel data with new training datasets, we can use the UpdateParallelData API:

S3_BUCKET = “YOUR-S3_BUCKET-NAME”
pd_name = “pd-d2l-short_test_sentence_enzh_all”
pd_description = “Parallel Data for English to Chinese”
pd_fn = “d2l_short_test_sentence_enzh_all.csv”
response_t = translate_client.update_parallel_data(
                Name=pd_name,                          # pd_name is the parallel data name
                Description=pd_description,      # pd_description is the parallel data description 
                ParallelDataConfig={
                      'S3Uri': 's3://'+S3_BUCKET+'/Paralleldata/'+pd_fn,	# S3_BUCKET is the S3 bucket name defined in the previous step
                      'Format': 'CSV'  
                },
)
print(pd_name, ": ", response_t['Status'], " updated.")

We can check the training job progress on the Amazon Translate console. When the job is complete, the parallel data status shows as Active and is ready to use.

screenshot-3

Run asynchronized batch translation using parallel data

The batch translation can be conducted in a process where multiple source documents are automatically translated into documents in target languages. The process involves uploading the source documents to the input folder of the S3 bucket, then applying the StartTextTranslationJob API of Amazon Translate to initiate an asynchronized translation job:

S3_BUCKET = “YOUR-S3_BUCKET-NAME”
ROLE_ARN = “THE_ROLE_DEFINED_IN_STEP_1”
src_fdr = “source_data”
output_fdr = “output”
src_lang = “en”
tgt_lang = “zh”
pd_name = “pd-d2l-short_test_sentence_enzh_all”
response = translate_client.start_text_translation_job (  
              JobName='D2L_job',         
              InputDataConfig={
                 'S3Uri': 's3://'+S3_BUCKET+'/'+src_fdr+'/',       # S3_BUCKET is the S3 bucket name defined in the previous step 
                                                                   # src_fdr is the folder in S3 bucket containing the source files  
                 'ContentType': 'text/html'
              },
              OutputDataConfig={ 
                  'S3Uri': 's3://'+S3_BUCKET+'/’+output_fdr+’/',   # S3_BUCKET is the S3 bucket name defined in the previous step 
                                                                   # output_fdr is the folder in S3 bucket containing the translated files
              },
              DataAccessRoleArn=ROLE_ARN,            # ROLE_ARN is the role defined in the previous step 
              SourceLanguageCode=src_lang,           # src_lang is the source language, such as ‘en’
              TargetLanguageCodes=[tgt_lang,],       # tgt_lang is the source language, such as ‘zh’
              ParallelDataNames=pd_name              # pd_name is the parallel data name defined in the previous step        
)

We selected five source documents in English from the D2L book (D2L-en) for the bulk translation. On the Amazon Translate console, we can monitor the translation job progress. When the job status changes into Completed, we can find the translated documents in Chinese (D2L-zh) in the S3 bucket output folder.

screenshot-4

Evaluate the translation quality

To demonstrate the effectiveness of the ACT feature in Amazon Translate, we also applied the traditional method of Amazon Translate real-time translation without parallel data to process the same documents, and compared the output with the batch translation output with ACT. We used the BLEU (BiLingual Evaluation Understudy) score to benchmark the translation quality between the two methods. The only way to accurately measure the quality of machine translation output is to have an expert review and grade the quality. However, BLEU provides an estimate of relative quality improvement between two output. A BLEU score is typically a number between 0–1; it calculates the similarity of the machine translation to the reference human translation. The higher score represents better quality in natural language understanding (NLU).

We have tested a set of documents in four pipelines: English into Chinese (en to zh), Chinese into English (zh to en), English into Spanish (en to es), and Spanish into English (es to en). The following figure shows that the translation with ACT produced a higher average BLEU score in all the translation pipelines.

chart-1

We also observed that, the more granular the parallel data pairs are, the better the translation performance. For example, we use the following parallel data input file with pairs of paragraphs, which contains 10 entries.

screenshot-5

For the same content, we use the following parallel data input file with pairs of sentences and 16 entries.

screenshot-6

We used both parallel data input files to construct two parallel data entities in Amazon Translate, then created two batch translation jobs with the same source document. The following figure compares the output translations. It shows that the output using parallel data with pairs of sentences out-performed the one using parallel data with pairs of paragraphs, for both English to Chinese translation and Chinese to English translation.

chart-2

If you are interested in learning more about these benchmark analyses, refer to Auto Machine Translation and Synchronization for “Dive into Deep Learning”.

Clean up

To avoid recurring costs in the future, we recommend you clean up the resources you created:

  1. On the Amazon Translate console, select the parallel data you created and choose Delete. Alternatively, you can use the DeleteParallelData API or the AWS Command Line Interface (AWS CLI) delete-parallel-data command to delete the parallel data.
  2. Delete the S3 bucket used to host the source and reference documents, translated documents, and parallel data input files.
  3. Delete the IAM role and policy. For instructions, refer to Deleting roles or instance profiles and Deleting IAM policies.

Conclusion

With this solution, we aim to reduce the workload of human translators by 80%, while maintaining the translation quality and supporting multiple languages. You can use this solution to improve your translation quality and efficiency. We are working on further improving the solution architecture and translation quality for other languages.

Your feedback is always welcome; please leave your thoughts and questions in the comments section.


About the authors

Yunfei BaiYunfei Bai is a Senior Solutions Architect at AWS. With a background in AI/ML, data science, and analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and data analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei has a PhD in Electronic and Electrical Engineering. Outside of work, Yunfei enjoys reading and music.

RachelHuRachel Hu is an applied scientist at AWS Machine Learning University (MLU). She has been leading a few course designs, including ML Operations (MLOps) and Accelerator Computer Vision. Rachel is an AWS senior speaker and has spoken at top conferences including AWS re:Invent, NVIDIA GTC, KDD, and MLOps Summit. Before joining AWS, Rachel worked as a machine learning engineer building natural language processing models. Outside of work, she enjoys yoga, ultimate frisbee, reading, and traveling.

WatsonWatson Srivathsan is the Principal Product Manager for Amazon Translate, AWS’s natural language processing service. On weekends, you will find him exploring the outdoors in the Pacific Northwest.

Read More

Do Pass Go, Do Collect More Games: Xbox Game Pass Coming to GeForce NOW

Do Pass Go, Do Collect More Games: Xbox Game Pass Coming to GeForce NOW

Xbox Game Pass support is coming to GeForce NOW.

Members will soon be able to play supported PC games from the Xbox Game Pass catalog through NVIDIA’s cloud gaming servers. Learn more about how support for Game Pass and Microsoft Store will roll out in the coming months.

Plus, Age of Empires IV: Anniversary Edition is the first from the world’s most popular real-time strategy franchise to arrive on GeForce NOW.

A Game Pass-tic Partnership

Announced over the weekend, Game Pass members will soon be able to play supported PC games from the Game Pass catalog with GeForce NOW.

We’re working closely with Microsoft to enable members to play select PC titles from Microsoft Store, just as they can today on GeForce NOW with their Steam, Epic Games Store, Ubisoft Connect and GOG.com accounts. Members who are subscribed to PC Game Pass or Xbox Game Pass Ultimate will be able to stream these select PC titles from the Game Pass library — without downloads or additional purchases for instant gaming from the cloud.

With hundreds of PC titles available in the Game Pass catalog, Xbox and PC gamers together can look forward to future GFN Thursdays to see what’s next. PC games from Xbox Game Studios and Bethesda on Steam and Epic Games Store will continue to be released, giving members more ways to play their favorite Xbox titles.

And with the ability for GeForce NOW members to stream at high performance across devices, including PCs, Macs, mobile devices, smart TVs, gaming handheld devices and more, gamers everywhere will be able to take their Xbox PC games wherever they go, along with the over 1,600 titles in the GeForce NOW library.

For an even more upgraded experience, upgrade to Ultimate and Priority memberships to skip the waiting lines over free members and get into gaming even faster.

Build Your Empire — and Library

Age of Empires IV on GeForce NOW
Siege the moment!

Conquer the lands in Microsoft’s award-winning Age of Empires franchise this week.

Age of Empires IV: Anniversary Edition takes the world’s most popular real-time strategy game to the next level with familiar and new ways for players to expand their empire. The Anniversary Edition brings all the latest updates, including new civilizations — the Ottomans and Malians — maps, languages, challenges and more. Choose the path to greatness and become a part of history through Campaign Story Mode with a tutorial designed for first-time players, or challenge the world in competitive or cooperative online matches that include ranked seasons.

Ultimate members can rule the kingdom in stunning 4K or ultrawide resolutions, and settle in with up to eight-hour streaming sessions.

What to Play This Week

Dordogne on GeForce NOW
Hand-painted nostalgia in the cloud this summer.

Take a look at the two new games available to stream this week:

  • Dordogne (New release on Steam)
  • Age of Empires IV: Anniversary Edition (Steam)

Before the weekend arrives, check out our question of the week. Let us know your answer on Twitter or in the comments below.

Read More

Bring SageMaker Autopilot into your MLOps processes using a custom SageMaker Project

Bring SageMaker Autopilot into your MLOps processes using a custom SageMaker Project

Every organization has its own set of standards and practices that provide security and governance for their AWS environment. Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. SageMaker provides a set of templates for organizations that want to quickly get started with ML workflows and DevOps continuous integration and continuous delivery (CI/CD) pipelines.

The majority of enterprise customers already have a well-established MLOps practice with a standardized environment in place—for example, a standardized repository, infrastructure, and security guardrails—and want to extend their MLOps process to no-code and low-code AutoML tools as well. They also have a lot of processes that need to be adhered to before promoting a model to production. They’re looking for a quick and easy way to graduate from the initial phase to a repeatable, reliable, and eventually scalable operating phase, as outlined in the following diagram. For more information, refer to MLOps foundation roadmap for enterprises with Amazon SageMaker.

Although these companies have robust data science and MLOps teams to help them build reliable and scalable pipelines, they want to have their low-code AutoML tool users produce code and model artifacts in a manner that can be integrated with their standardized practices, adhering to their code repo structure and with appropriate validations, tests, steps, and approvals.

They are looking for a mechanism for the low-code tools to generate all the source code for each step of the AutoML tasks (preprocessing, training, and postprocessing) in a standardized repository structure that can provide their expert data scientists with the capability to view, validate, and modify the workflow per their needs and then generate a custom pipeline template that can be integrated into a standardized environment (where they have defined their code repository, code build tools, and processes).

This post showcases how to have a repeatable process with low-code tools like Amazon SageMaker Autopilot such that it can be seamlessly integrated into your environment, so you don’t have to orchestrate this end-to-end workflow on your own. We demonstrate how to use CI/CD the low-code/no-code tools code to integrate it into your MLOps environment, while adhering with MLOps best practices.

Solution overview

To demonstrate the orchestrated workflow, we use the publicly available UCI Adult 1994 Census Income dataset to predict if a person has an annual income of greater than $50,000 per year. This is a binary classification problem; the options for the income target variable are either over $50,000 or under $50,000.

The following table summarizes the key components of the dataset.

Data Set Characteristics Multivariate Number of Instances 48842 Area Social
Attribute Characteristics: Categorical, Integer Number of Attributes: 14 Date Donated 1996-05-01
Associated Tasks: Classification Missing Values? Yes Number of Web Hits 2749715

The following table summarizes the attribute information.

Column Name Description
Age Continuous
Workclass Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked
fnlwgt continuous
education Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
education-num continuous
marital-status Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation ech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces
relationship Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
race White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black
sex Female, Male
capital-gain Continuous
capital-loss Continuous
hours-per-week Continuous
native-country United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.
class Income class, either <=50K or >=50K

In this post, we showcase how to use Amazon SageMaker Projects, a tool that helps organizations set up and standardize environments for MLOps with low-code AutoML tools like Autopilot and Amazon SageMaker Data Wrangler.

Autopilot eliminates the heavy lifting of building ML models. You simply provide a tabular dataset and select the target column to predict, and Autopilot will automatically explore different solutions to find the best model. You then can directly deploy the model to production with just one click or iterate on the recommended solutions to further improve the model quality.

Data Wrangler provides an end-to-end solution to import, prepare, transform, featurize, and analyze data. You can integrate a Data Wrangler data preparation flow into your ML workflows to simplify and streamline data preprocessing and feature engineering using little to no coding. You can also add your own Python scripts and transformations to customize workflows. We use Data Wrangler to perform preprocessing on the dataset before submitting the data to Autopilot.

SageMaker Projects helps organizations set up and standardize environments for automating different steps involved in an ML lifecycle. Although notebooks are helpful for model building and experimentation, a team of data scientists and ML engineers sharing code need a more scalable way to maintain code consistency and strict version control.

To help you get started with common model building and deployment paradigms, SageMaker Projects offers a set of first-party templates (1P templates). The 1P templates generally focus on creating resources for model building and model training. The templates include projects that use AWS-native services for CI/CD, such as AWS CodeBuild and AWS CodePipeline. SageMaker Projects can support custom template offerings, where organizations use an AWS CloudFormation template to run a Terraform stack and create the resources needed for an ML workflow.

Organizations may want to extend the 1P templates to support use cases beyond simply training and deploying models. Custom project templates are a way for you to create a standard workflow for ML projects. You can create several templates and use AWS Identity and Access Management (IAM) policies to manage access to those templates on Amazon SageMaker Studio, ensuring that each of your users are accessing projects dedicated for their use cases.

To learn more about SageMaker Projects and creating custom project templates aligned with best practices, refer to Build Custom SageMaker Project Templates – Best Practices.

These custom templates are created as AWS Service Catalog products and provisioned as organization templates on the Studio UI. This is where data scientists can choose a template and have their ML workflow bootstrapped and preconfigured. Projects are provisioned using AWS Service Catalog products. Project templates are used by organizations to provision projects for each of their teams.

In this post, we showcase how to build a custom project template to have an end-to-end MLOps workflow using SageMaker projects, AWS Service Catalog, and Amazon SageMaker Pipelines integrating Data Wrangler and Autopilot with humans in the loop in order to facilitate the steps of model training and deployment. The humans in the loop are the different personas involved in an MLOps practice working collaboratively for a successful ML build and deploy workflow.

The following diagram illustrates the end-to-end low-code/no-code automation workflow.

The workflow includes the following steps:

  1. The Ops team or the Platform team launches the CloudFormation template to set up the prerequisites required to provision the custom SageMaker template.
  2. When the template is available in SageMaker, the Data Science Lead uses the template to create a SageMaker project.
  3. The SageMaker project creation will launch an AWS Service Catalog product that adds two seed codes to the AWS CodeCommit repositories:
    • The seed code for the model building pipeline includes a pipeline that preprocesses the UCI Machine Learning Adult dataset using Data Wrangler, automatically creates an ML model with full visibility using Autopilot, evaluates the performance of a model using a processing step, and registers the model into a model registry based on the model performance.
    • The seed code for model deployment includes a CodeBuild step to find the latest model that has been approved in the model registry and create configuration files to deploy the CloudFormation templates as part of the CI/CD pipelines using CodePipeline. The CloudFormation template deploys the model to staging and production environments.
  4. The first seed code commit starts a CI/CD pipeline using CodePipeline that triggers a SageMaker pipeline, which is a series of interconnected steps encoded using a directed acyclic graph (DAG). In this case, the steps involved are data processing using a Data Wrangler flow, training the model using Autopilot, creating the model, evaluating the model, and if the evaluation is passed, registering the model.

For more details on creating SageMaker pipelines using Autopilot, refer to Launch Amazon SageMaker Autopilot experiments directly from within Amazon SageMaker Pipelines to easily automate MLOps workflows.

  1. After the model is registered, the model approver can either approve or reject the model in Studio.
  2. When the model is approved, a CodePipeline deployment pipeline integrated with the second seed code is triggered.
  3. This pipeline creates a SageMaker serverless scalable endpoint for the staging environment.
  4. There is an automated test step in the deployment pipeline that will be tested on the staging endpoint.
  5. The test results are stored in Amazon Simple Storage Service (Amazon S3). The pipeline will stop for a production deployment approver, who can review all the artifacts before approving.
  6. Once approved, the model is deployed to production in the form of scalable serverless endpoint. Production applications can now consume the endpoint for inference.

The deployment steps consist of the following:

  1. Create the custom SageMaker project template for Autopilot and other resources using AWS CloudFormation. This is a one-time setup task.
  2. Create the SageMaker project using the custom template.

In the following sections, we proceed with each of these steps in more detail and explore the project details page.

Prerequisites

This walkthrough includes the following prerequisites:

Create solution resources with AWS CloudFormation

You can download and launch the CloudFormation template via the AWS CloudFormation console, the AWS Command Line Interface (AWS CLI), the SDK, or by simply choosing Launch Stack:

The CloudFormation template is also available in the AWS Samples GitHub Code repository. The repository contains the following:

  • A CloudFormation template to set up the custom SageMaker project template for Autopilot
  • Seed code with the ML code to set up SageMaker pipelines to automate the data processing and training steps
  • A project folder for the CloudFormation template used by AWS Service Catalog mapped to the custom SageMaker project template that will be created

The CloudFormation template takes several parameters as input.

The following are the AWS Service Catalog product information parameters:

  • Product Name – The name of the AWS Service Catalog product that the SageMaker project custom MLOps template will be associated with
  • Product Description – The description for the AWS Service Catalog product
  • Product Owner – The owner of the Service Catalog product
  • Product Distributor – The distributor of the Service Catalog product

The following are the AWS Service Catalog product support information parameters:

  • Product Support Description – A support description for this product
  • Product Support Email – An email address of the team supporting the AWS Service Catalog product
  • Product Support URL – A support URL for the AWS Service Catalog product

The following are the source code repository configuration parameters:

  • URL to the zipped version of your GitHub repository – Use the defaults if you’re not forking the AWS Samples repository.
  • Name and branch of your GitHub repository – These should match the root folder of the zip. Use the defaults if you’re not forking the AWS Samples repository.
  • StudioUserExecutionRole – Provide the ARN of the Studio user execution IAM role.

After you launch the CloudFormation stack from this template, you can monitor its status on the AWS CloudFormation console.

When the stack is complete, copy the value of the CodeStagingBucketName key on the Outputs tab of the CloudFormation stack and save it in a text editor to use later.

Create the SageMaker project using the new custom template

To create your SageMaker project, complete the following steps:

  1. Sign in to Studio. For more information, see Onboard to Amazon SageMaker Domain.
  2. In the Studio sidebar, choose the home icon.
  3. Choose Deployments from the menu, then choose Projects.
  4. Choose Create project.
  5. Choose Organization templates to view the new custom MLOps template.
  6. Choose Select project template.

  1. For Project details, enter a name and description for your project.
  2. For MLOpsS3Bucket, enter the name of the S3 bucket you saved earlier.

  1. Choose Create project.

A message appears indicating that SageMaker is provisioning and configuring the resources.

When the project is complete, you receive a success message, and your project is now listed on the Projects list.

Explore the project details

On the project details page, you can view various tabs associated with the project. Let’s dive deep into each of these tabs in detail.

Repositories

This tab lists the code repositories associated with this project. You can choose clone repo under Local path to clone the two seed code repositories created in CodeCommit by the SageMaker project. This option provides you with Git access to the code repositories from the SageMaker project itself.

When the clone of the repository is complete, the local path appears in the Local path column. You can choose the path to open the local folder that contains the repository code in Studio.

The folder will be accessible in the navigation pane. You can use the file browser icon to hide or show the folder list. You can make the code changes here or choose the Git icon to stage, commit, and push the change.

Pipelines

This tab lists the SageMaker ML pipelines that define steps to prepare data, train models, and deploy models. For information about SageMaker ML pipelines, see Create and Manage SageMaker Pipelines.

You can choose the pipeline that is currently running to see its latest status. In the following example, the DataProcessing step is performed by using a Data Wrangler data flow.

You can access the data flow from the local path of the code repository that we cloned earlier. Choose the file browser icon to show the path, which is listed in the pipelines folder of the model build repository.

In the pipelines folder, open the autopilot folder.

In the autopilot folder, open the preprocess.flow file.

It will take a moment to open the Data Wrangler flow.

In this example, three data transformations are performed between the source and destination. You can choose each transformation to see more details.

For instructions on how to include or remove transformations in Data Wrangler, refer to Transform Data.

For more information, refer to Unified data preparation and model training with Amazon SageMaker Data Wrangler and Amazon SageMaker Autopilot – Part 1.

When you’re done reviewing, choose the power icon and stop the Data Wrangler resources under Running Apps and Kernel Sessions.

Experiments

This tab lists the Autopilot experiments associated with the project. For more information about Autopilot, see Automate model development with Amazon SageMaker Autopilot.

Model groups

This tab lists groups of model versions that were created by pipeline runs in the project. When the pipeline run is complete, the model created from the last step of the pipeline will be accessible here.

You can choose the model group to access the latest version of the model.

The status of the model version in the following example is Pending. You can choose the model version and choose Update status to update the status.

Choose Approved and choose Update status to approve the model.

After the model status is approved, the model deploy CI/CD pipeline within CodePipeline will start.

You can open the deployed pipeline to see the different stages in the repo.

As shown in the preceding screenshot, this pipeline has four stages:

  • Source – In this stage, CodePipeline checks the CodeCommit repo code into the S3 bucket.
  • Build – In this stage, CloudFormation templates are prepared for the deployment of the model code.
  • DeployStaging – This stage consists of three sub-stages:
    • DeployResourcesStaging – In the first sub-stage, the CloudFormation stack is deployed to create a serverless SageMaker endpoint in the staging environment.
    • TestStaging – In the second-sub stage, automated testing is performed using CodeBuild on the endpoint to check if the inference is happening as expected. The test results will be available in the S3 bucket with the name sagemaker-project-<project ID of the SageMaker project>.

You can get the SageMaker project ID on the Settings tab of the SageMaker project. Within the S3 bucket, choose the project name folder (for example, sagemaker-MLOp-AutoP) and within that, open the TestArtifa/ folder. Choose the object file in this folder to see the test results.

You can access the testing script from the local path of the code repository that we cloned earlier. Choose the file browser icon view the path. Note this will be the deploy repository. In that repo, open the test folder and choose the test.py Python code file.

You can make changes to this testing code as per your use case.

  • ApproveDeployment – In the third sub-stage, there is an additional approval process before the last stage of deploying to production. You can choose Review and approve it to proceed.

  • DeployProd – In this stage, the CloudFormation stack is deployed to create a serverless SageMaker endpoint for the production environment.

Endpoints

This tab lists the SageMaker endpoints that host deployed models for inference. When all the stages in the model deployment pipeline are complete, models are deployed to SageMaker endpoints and are accessible within the SageMaker project.

Settings

This is the last tab on the project page and lists settings for the project. This includes the name and description of the project, information about the project template and SourceModelPackageGroupName, and metadata about the project.

Clean up

To avoid additional infrastructure costs associated with the example in this post, be sure to delete CloudFormation stacks. Also, ensure that you delete the SageMaker endpoints, any running notebooks, and S3 buckets that were created during the setup.

Conclusion

This post described an easy-to-use ML pipeline approach to automate and standardize the training and deployment of ML models using SageMaker Projects, Data Wrangler, Autopilot, Pipelines, and Studio. This solution can help you perform AutoML tasks (preprocessing, training, and postprocessing) in a standardized repository structure that can provide your expert data scientists with the capability to view, validate, and modify the workflow as per their needs and then generate a custom pipeline template that can be integrated to a SageMaker project.

You can modify the pipelines with your preprocessing and pipeline steps for your use case and deploy our end-to-end workflow. Let us know in the comments how the custom template worked for your respective use case.


About the authors

 Vishal Naik is a Sr. Solutions Architect at Amazon Web Services (AWS). He is a builder who enjoys helping customers accomplish their business needs and solve complex challenges with AWS solutions and best practices. His core area of focus includes Machine Learning, DevOps, and Containers. In his spare time, Vishal loves making short films on time travel and alternate universe themes.

Shikhar Kwatra is an AI/ML specialist solutions architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.

Janisha Anand is a Senior Product Manager in the SageMaker Low/No Code ML team, which includes SageMaker Canvas and SageMaker Autopilot. She enjoys coffee, staying active, and spending time with her family.

Read More

Reconstructing indoor spaces with NeRF

Reconstructing indoor spaces with NeRF

When choosing a venue, we often find ourselves with questions like the following: Does this restaurant have the right vibe for a date? Is there good outdoor seating? Are there enough screens to watch the game? While photos and videos may partially answer questions like these, they are no substitute for feeling like you’re there, even when visiting in person isn’t an option.

Immersive experiences that are interactive, photorealistic, and multi-dimensional stand to bridge this gap and recreate the feel and vibe of a space, empowering users to naturally and intuitively find the information they need. To help with this, Google Maps launched Immersive View, which uses advances in machine learning (ML) and computer vision to fuse billions of Street View and aerial images to create a rich, digital model of the world. Beyond that, it layers helpful information on top, like the weather, traffic, and how busy a place is. Immersive View provides indoor views of restaurants, cafes, and other venues to give users a virtual up-close look that can help them confidently decide where to go.

Today we describe the work put into delivering these indoor views in Immersive View. We build on neural radiance fields (NeRF), a state-of-the-art approach for fusing photos to produce a realistic, multi-dimensional reconstruction within a neural network. We describe our pipeline for creation of NeRFs, which includes custom photo capture of the space using DSLR cameras, image processing and scene reproduction. We take advantage of Alphabet’s recent advances in the field to design a method matching or outperforming the prior state-of-the-art in visual fidelity. These models are then embedded as interactive 360° videos following curated flight paths, enabling them to be available on smartphones.

The reconstruction of The Seafood Bar in Amsterdam in Immersive View.

From photos to NeRFs

At the core of our work is NeRF, a recently-developed method for 3D reconstruction and novel view synthesis. Given a collection of photos describing a scene, NeRF distills these photos into a neural field, which can then be used to render photos from viewpoints not present in the original collection.

While NeRF largely solves the challenge of reconstruction, a user-facing product based on real-world data brings a wide variety of challenges to the table. For example, reconstruction quality and user experience should remain consistent across venues, from dimly-lit bars to sidewalk cafes to hotel restaurants. At the same time, privacy should be respected and any potentially personally identifiable information should be removed. Importantly, scenes should be captured consistently and efficiently, reliably resulting in high-quality reconstructions while minimizing the effort needed to capture the necessary photographs. Finally, the same natural experience should be available to all mobile users, regardless of the device on hand.

The Immersive View indoor reconstruction pipeline.

Capture & preprocessing

The first step to producing a high-quality NeRF is the careful capture of a scene: a dense collection of photos from which 3D geometry and color can be derived. To obtain the best possible reconstruction quality, every surface should be observed from multiple different directions. The more information a model has about an object’s surface, the better it will be in discovering the object’s shape and the way it interacts with lights.

In addition, NeRF models place further assumptions on the camera and the scene itself. For example, most of the camera’s properties, such as white balance and aperture, are assumed to be fixed throughout the capture. Likewise, the scene itself is assumed to be frozen in time: lighting changes and movement should be avoided. This must be balanced with practical concerns, including the time needed for the capture, available lighting, equipment weight, and privacy. In partnership with professional photographers, we developed a strategy for quickly and reliably capturing venue photos using DSLR cameras within only an hour timeframe. This approach has been used for all of our NeRF reconstructions to date.

Once the capture is uploaded to our system, processing begins. As photos may inadvertently contain sensitive information, we automatically scan and blur personally identifiable content. We then apply a structure-from-motion pipeline to solve for each photo’s camera parameters: its position and orientation relative to other photos, along with lens properties like focal length. These parameters associate each pixel with a point and a direction in 3D space and constitute a key signal in the NeRF reconstruction process.

NeRF reconstruction

Unlike many ML models, a new NeRF model is trained from scratch on each captured location. To obtain the best possible reconstruction quality within a target compute budget, we incorporate features from a variety of published works on NeRF developed at Alphabet. Some of these include:

  • We build on mip-NeRF 360, one of the best-performing NeRF models to date. While more computationally intensive than Nvidia’s widely-used Instant NGP, we find the mip-NeRF 360 consistently produces fewer artifacts and higher reconstruction quality.
  • We incorporate the low-dimensional generative latent optimization (GLO) vectors introduced in NeRF in the Wild as an auxiliary input to the model’s radiance network. These are learned real-valued latent vectors that embed appearance information for each image. By assigning each image in its own latent vector, the model can capture phenomena such as lighting changes without resorting to cloudy geometry, a common artifact in casual NeRF captures.
  • We also incorporate exposure conditioning as introduced in Block-NeRF. Unlike GLO vectors, which are uninterpretable model parameters, exposure is directly derived from a photo’s metadata and fed as an additional input to the model’s radiance network. This offers two major benefits: it opens up the possibility of varying ISO and provides a method for controlling an image’s brightness at inference time. We find both properties invaluable for capturing and reconstructing dimly-lit venues.

We train each NeRF model on TPU or GPU accelerators, which provide different trade-off points. As with all Google products, we continue to search for new ways to improve, from reducing compute requirements to improving reconstruction quality.

A side-by-side comparison of our method and a mip-NeRF 360 baseline.

A scalable user experience

Once a NeRF is trained, we have the ability to produce new photos of a scene from any viewpoint and camera lens we choose. Our goal is to deliver a meaningful and helpful user experience: not only the reconstructions themselves, but guided, interactive tours that give users the freedom to naturally explore spaces from the comfort of their smartphones.

To this end, we designed a controllable 360° video player that emulates flying through an indoor space along a predefined path, allowing the user to freely look around and travel forward or backwards. As the first Google product exploring this new technology, 360° videos were chosen as the format to deliver the generated content for a few reasons.

On the technical side, real-time inference and baked representations are still resource intensive on a per-client basis (either on device or cloud computed), and relying on them would limit the number of users able to access this experience. By using videos, we are able to scale the storage and delivery of videos to all users by taking advantage of the same video management and serving infrastructure used by YouTube. On the operations side, videos give us clearer editorial control over the exploration experience and are easier to inspect for quality in large volumes.

While we had considered capturing the space with a 360° camera directly, using a NeRF to reconstruct and render the space has several advantages. A virtual camera can fly anywhere in space, including over obstacles and through windows, and can use any desired camera lens. The camera path can also be edited post-hoc for smoothness and speed, unlike a live recording. A NeRF capture also does not require the use of specialized camera hardware.

Our 360° videos are rendered by ray casting through each pixel of a virtual, spherical camera and compositing the visible elements of the scene. Each video follows a smooth path defined by a sequence of keyframe photos taken by the photographer during capture. The position of the camera for each picture is computed during structure-from-motion, and the sequence of pictures is smoothly interpolated into a flight path.

To keep speed consistent across different venues, we calibrate the distances for each by capturing pairs of images, each of which is 3 meters apart. By knowing measurements in the space, we scale the generated model, and render all videos at a natural velocity.

The final experience is surfaced to the user within Immersive View: the user can seamlessly fly into restaurants and other indoor venues and discover the space by flying through the photorealistic 360° videos.

Open research questions

We believe that this feature is the first step of many in a journey towards universally accessible, AI-powered, immersive experiences. From a NeRF research perspective, more questions remain open. Some of these include:

  1. Enhancing reconstructions with scene segmentation, adding semantic information to the scenes that could make scenes, for example, searchable and easier to navigate.
  2. Adapting NeRF to outdoor photo collections, in addition to indoor. In doing so, we’d unlock similar experiences to every corner of the world and change how users could experience the outdoor world.
  3. Enabling real-time, interactive 3D exploration through neural-rendering on-device.

Reconstruction of an outdoor scene with a NeRF model trained on Street View panoramas.

As we continue to grow, we look forward to engaging with and contributing to the community to build the next generation of immersive experiences.

Acknowledgments

This work is a collaboration across multiple teams at Google. Contributors to the project include Jon Barron, Julius Beres, Daniel Duckworth, Roman Dudko, Magdalena Filak, Mike Harm, Peter Hedman, Claudio Martella, Ben Mildenhall, Cardin Moffett, Etienne Pot, Konstantinos Rematas, Yves Sallat, Marcos Seefelder, Lilyana Sirakovat, Sven Tresp and Peter Zhizhin.

Also, we’d like to extend our thanks to Luke Barrington, Daniel Filip, Tom Funkhouser, Charles Goran, Pramod Gupta, Mario Lučić, Isalo Montacute and Dan Thomasset for valuable feedback and suggestions.

Read More

Forged in Flames: Startup Fuses Generative AI, Computer Vision to Fight Wildfires

Forged in Flames: Startup Fuses Generative AI, Computer Vision to Fight Wildfires

When California skies turned orange in the wake of devastating wildfires, a startup fused computer vision and generative AI to fight back.

“With the 2020 wildfires, it became very personal, so we asked fire officials how we could help,” said Emrah Gultekin, the Turkish-born CEO of Chooch, a Silicon Valley-based leader in computer vision.

California utilities and fire services, they learned, were swamped with as many as 2,000 false positives a week from an existing wildfire detection system. The wrong predictions came from fog, rain and smudges on the lenses of a network of cameras they used.

So, in a pilot project, Chooch linked its fire detection software to the camera network. It analyzed snapshots every 15 minutes, seeking signs of smoke or fire.

Generative AI Sharpens Computer Vision

Then, the team led by Hakan Gultekin — Emrah’s brother, a software wiz and Chooch’s CTO — had an idea.

They built a generative AI tool that automatically created descriptions of each image, helping reviewers discern when smoke is present. False positives dropped from 2,000 a week to eight.

Startup Chooch uses generative AI and computer vision to detect wildfires.
Chooch detects smoke and fire despite bad weather or dirty camera lenses.

“Fire chiefs were excited about launching the technology in their monitoring centers and what it could achieve,” said Michael Liou, the president of Chooch, who detailed the project in a recent webinar.

Chooch’s generative AI tool gives fire fighters in California’s Kern County a dashboard on their smartphones and PCs, populated in real time with alerts, so they can detect wildfires fast.

In 2020, California experienced 9,900 wildfires that burned 4.3 million acres of forest and caused $19 billion in losses. Stopping one fire from spreading out of control would pay for the wildfire detection system for 50 years, the company estimates.

A Vision for Gen AI

Chooch’s CEO says it’s also the shape of things to come.

Emrah Gultekin, CEO of Chooch
Emrah Gultekin

“The fusion of large language models and computer vision will bring about even more powerful and accurate products that are easier to deploy,” said Gultekin.

For example, utilities can connect the software to drones and fixed cameras to detect corrosion on capacitors or vegetation encroaching on power lines.

The technology could see further validation as Chooch enters an $11 million Xprize challenge on detecting and fighting wildfires. Sponsors include PG&E and Lockheed Martin that’s building an AI lab to predict and respond to wildfires in a separate collaboration with NVIDIA.

Startup Chooch deliver real time alerts to smartphone and desktop PC dashboards for firefighters
Dashboards for PCs and smartphones can update firefighters with real-time alerts from Chooch’s software.

Chooch applies its technology to a host of challenges in manufacturing, retail and security.

For example, one manufacturer uses Chooch’s models to detect defects before products ship. Eliminating just 20% of the faults will pay for the system several times over.

Inception of a Partnership

Back in 2019, a potential customer in the U.S. government asked for support with edge deployments it planned on NVIDIA GPUs. Chooch joined NVIDIA Inception, a free program that nurtures cutting-edge startups.

Using NGC, NVIDIA’s hub for accelerated software, Hakan was able to port Chooch’s code to NVIDIA GPUs over a weekend. Now its products run on NVIDIA Jetson modules and “have been tested in the wild with full-motion video and multispectral data,” Emrah said.

Since then, the company rolled out support for GPUs in data centers and beyond. For example, the wildfire use case runs on NVIDIA A100 Tensor Core GPUs in the cloud.

Along the way, Chooch embraced software like Triton Inference Server and the NVIDIA DeepStream software development kit.

“The combination of DeepStream and Triton increased our capacity 8x to run more video streams on more AI models — that’s a huge win,” Emrah said.

A Wide Horizon

Now Chooch is expanding its horizons.

The company is a member of the partner ecosystems for NVIDIA Metropolis for intelligent video analytics and NVIDIA Clara Guardian, edge AI software for smart hospitals. Chooch also works with NVIDIA’s retail and telco teams.

The software is opening new doors and expanding the use cases it can address.

“It’s hard work because there’s so much uncharted territory, but that’s also what makes it exciting,” Emrah said.

Learn more about generative AI for enterprises, and explore NVIDIA’s solutions for power grid modernization.

Read More

Filmmaker Sara Dietschy Talks AI This Week ‘In the NVIDIA Studio’

Filmmaker Sara Dietschy Talks AI This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

With over 900,000 subscribers on her YouTube channel, editor and filmmaker Sara Dietschy creates docuseries, reviews and vlogs that explore the intersection of technology and creativity. The Los Angeles-based creator shares her AI-powered workflow this week In the NVIDIA Studio, and it’s just peachy — a word that happens to rhyme with her last name.

Dietschy explained in a recent video how five AI tools helped save over 100 hours of work, powered by NVIDIA Studio technology.

“If you do any kind of 3D rendering on the go, a dedicated NVIDIA RTX GPU is nonnegotiable.” — Sara Dietschy

She shows a practical approach to how these tools, running on laptops powered by GeForce RTX 40 Series GPUs, tackle the otherwise manual work that can make nonlinear editing tedious. Using tools like AI Relighting, Video Text Editing and more in Davinci Resolve software, Dietschy saves time on every project — and for creators, time is money, she said.

The NVIDIA Studio team spoke with Dietschy about how she uses AI, how technology can simplify artists’ processes, and how the NVIDIA Studio platform supercharged her creativity and video-editing workflows.

Dietschy takes a break to sit down with the Studio team.

Studio team: What AI features do you use most commonly?

Dietschy: In DaVinci Resolve, there’s neural engine text-based editing, automatic subtitles, Magic Mask and Detect Scene Cuts — all AI-powered features I use daily. And the relighting feature in DaVinci Resolve is crazy good.

In addition, ChatGPT and Notion AI sped up copywriting for my website and social media posts, so I could focus on video editing.

Now you see Dietschy, now you don’t — Magic Mask in DaVinci Resolve.

Studio team: How do you use Adobe Premiere Pro? 

Dietschy: In the beta version, my entire video can be transcribed quickly, and Premiere Pro can even detect silence. Just click on the three dots in the text, hit delete and boom — AI conveniently edits out that awkward pause. No need for me to hop back and forth.

Plus, Auto Reframe and Unsharp Mask are popular AI features in Premiere Pro that are worth looking into.

AI detects pauses and jump cuts.

Studio team: What prompted the regular use of AI-powered tools and features? 

Dietschy: My biggest pet peeve is when a program offers really cool features but requires uploading everything to a web app or starting a completely new workflow. Once these features were made available directly in the apps I already use, things became so much more efficient, which is why I now use them on the daily.

Access numerous AI-powered features in DaVinci Resolve.

Studio team: For the non-technical people out there, why does GPU acceleration in creative apps matter?

Dietschy: For video editors, GPU acceleration — which is basically a graphics card making the features and effects in creative apps faster — especially in DaVinci Resolve, is everything. It scrubs through footage and playback, and crushes export times. This ASUS Zenbook Pro 14 OLED Studio laptop exported a recent hour-plus-long 4K video in less than 14 minutes. If you release new content every week, like me, time saved is gold.

NVIDIA GeForce RTX 4070 GPU-accelerated encode (NVENC) speeds up video exporting up to 5x.

Studio team: Would you recommend GeForce RTX GPUs to other video editors?

Dietschy: Absolutely. A big unlock for me was getting a desktop computer with a nice processor and an NVIDIA GPU. I was just amazed at how much smoother things went.

The ASUS Zenbook Pro 14 OLED NVIDIA Studio laptop.

Studio team: If you could go back to the beginning of your creative journey, what advice would you give yourself?

Dietschy: Don’t focus so much on quantity. Instead, take the time to add structure to your process, because being a “messy creative” only seems cool at first. Organization is already paying crazy dividends in better sleep and mental health.

For more AI insights, watch Dietschy’s video on the dozen-plus AI tools creators should use:

Find more on Sara Dietschy’s YouTube channel.

Influencer and creator Sara Dietschy.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

Read More