December 2022 – Page 9

RT-1: Robotics Transformer for Real-World Control at Scale

Posted Keerthana Gopalakrishnan and Kanishka Rao, Google Research, Robotics at Google

Major recent advances in multiple subfields of machine learning (ML) research, such as computer vision and natural language processing, have been enabled by a shared common approach that leverages large, diverse datasets and expressive models that can absorb all of the data effectively. Although there have been various attempts to apply this approach to robotics, robots have not yet leveraged highly-capable models as well as other subfields.

Several factors contribute to this challenge. First, there’s the lack of large-scale and diverse robotic data, which limits a model’s ability to absorb a broad set of robotic experiences. Data collection is particularly expensive and challenging for robotics because dataset curation requires engineering-heavy autonomous operation, or demonstrations collected using human teleoperations. A second factor is the lack of expressive, scalable, and fast-enough-for-real-time-inference models that can learn from such datasets and generalize effectively.

To address these challenges, we propose the Robotics Transformer 1 (RT-1), a multi-task model that tokenizes robot inputs and outputs actions (e.g., camera images, task instructions, and motor commands) to enable efficient inference at runtime, which makes real-time control feasible. This model is trained on a large-scale, real-world robotics dataset of 130k episodes that cover 700+ tasks, collected using a fleet of 13 robots from Everyday Robots (EDR) over 17 months. We demonstrate that RT-1 can exhibit significantly improved zero-shot generalization to new tasks, environments and objects compared to prior techniques. Moreover, we carefully evaluate and ablate many of the design choices in the model and training set, analyzing the effects of tokenization, action representation, and dataset composition. Finally, we’re open-sourcing the RT-1 code, and hope it will provide a valuable resource for future research on scaling up robot learning.

RT-1 absorbs large amounts of data, including robot trajectories with multiple tasks, objects and environments, resulting in better performance and generalization.

Robotics Transformer (RT-1)

RT-1 is built on a transformer architecture that takes a short history of images from a robot’s camera along with task descriptions expressed in natural language as inputs and directly outputs tokenized actions.

RT-1’s architecture is similar to that of a contemporary decoder-only sequence model trained against a standard categorical cross-entropy objective with causal masking. Its key features include: image tokenization, action tokenization, and token compression, described below.

Image tokenization: We pass images through an EfficientNet-B3 model that is pre-trained on ImageNet, and then flatten the resulting 9×9×512 spatial feature map to 81 tokens. The image tokenizer is conditioned on natural language task instructions, and uses FiLM layers initialized to identity to extract task-relevant image features early on.

Action tokenization: The robot’s action dimensions are 7 variables for arm movement (x, y, z, roll, pitch, yaw, gripper opening), 3 variables for base movement (x, y, yaw), and an extra discrete variable to switch between three modes: controlling arm, controlling base, or terminating the episode. Each action dimension is discretized into 256 bins.

Token Compression: The model adaptively selects soft combinations of image tokens that can be compressed based on their impact towards learning with the element-wise attention module TokenLearner, resulting in over 2.4x inference speed-up.

RT-1’s architecture: The model takes a text instruction and set of images as inputs, encodes them as tokens via a pre-trained FiLM EfficientNet model and compresses them via TokenLearner. These are then fed into the Transformer, which outputs action tokens.

To build a system that could generalize to new tasks and show robustness to different distractors and backgrounds, we collected a large, diverse dataset of robot trajectories. We used 13 EDR robot manipulators, each with a 7-degree-of-freedom arm, a 2-fingered gripper, and a mobile base, to collect 130k episodes over 17 months. We used demonstrations provided by humans through remote teleoperation, and annotated each episode with a textual description of the instruction that the robot just performed. The set of high-level skills represented in the dataset includes picking and placing items, opening and closing drawers, getting items in and out drawers, placing elongated items up-right, knocking objects over, pulling napkins and opening jars. The resulting dataset includes 130k+ episodes that cover 700+ tasks using many different objects.

Experiments and Results

To better understand RT-1’s generalization abilities, we study its performance against three baselines: Gato, BC-Z and BC-Z XL (i.e., BC-Z with same number of parameters as RT-1), across four categories:

Seen tasks performance: performance on tasks seen during training
Unseen tasks performance: performance on unseen tasks where the skill and object(s) were seen separately in the training set, but combined in novel ways
Robustness (distractors and backgrounds): performance with distractors (up to 9 distractors and occlusion) and performance with background changes (new kitchen, lighting, background scenes)
Long-horizon scenarios: execution of SayCan-type natural language instructions in a real kitchen

RT-1 outperforms baselines by large margins in all four categories, exhibiting impressive degrees of generalization and robustness.

Performance of RT-1 vs. baselines on evaluation scenarios.

Incorporating Heterogeneous Data Sources

To push RT-1 further, we train it on data gathered from another robot to test if (1) the model retains its performance on the original tasks when a new data source is presented and (2) if the model sees a boost in generalization with new and different data, both of which are desirable for a general robot learning model. Specifically, we use 209k episodes of indiscriminate grasping that were autonomously collected on a fixed-base Kuka arm for the QT-Opt project. We transform the data collected to match the action specs and bounds of our original dataset collected with EDR, and label every episode with the task instruction “pick anything” (the Kuka dataset doesn’t have object labels). Kuka data is then mixed with EDR data in a 1:2 ratio in every training batch to control for regression in original EDR skills.

Training methodology when data has been collected from multiple robots.

Our results indicate that RT-1 is able to acquire new skills by observing other robots’ experiences. In particular, the 22% accuracy seen when training with EDR data alone jumps by almost 2x to 39% when RT-1 is trained on both bin-picking data from Kuka and existing EDR data from robot classrooms, where we collected most of RT-1 data. When training RT-1 on bin-picking data from Kuka alone, and then evaluating it on bin-picking from the EDR robot, we see 0% accuracy. Mixing data from both robots, on the other hand, allows RT-1 to infer the actions of the EDR robot when faced with the states observed by Kuka, without explicit demonstrations of bin-picking on the EDR robot, and by taking advantage of experiences collected by Kuka. This presents an opportunity for future work to combine more multi-robot datasets to enhance robot capabilities.

Training Data	Classroom Eval	Bin-picking Eval
Kuka bin-picking data + EDR data	90%	39%
EDR only data	92%	22%
Kuka bin-picking only data	0	0

RT-1 accuracy evaluation using various training data.

Long-Horizon SayCan Tasks

RT-1’s high performance and generalization abilities can enable long-horizon, mobile manipulation tasks through SayCan. SayCan works by grounding language models in robotic affordances, and leveraging few-shot prompting to break down a long-horizon task expressed in natural language into a sequence of low-level skills.

SayCan tasks present an ideal evaluation setting to test various features:

Long-horizon task success falls exponentially with task length, so high manipulation success is important.
Mobile manipulation tasks require multiple handoffs between navigation and manipulation, so the robustness to variations in initial policy conditions (e.g., base position) is essential.
The number of possible high-level instructions increases combinatorially with skill-breadth of the manipulation primitive.

We evaluate SayCan with RT-1 and two other baselines (SayCan with Gato and SayCan with BC-Z) in two real kitchens. Below, “Kitchen2” constitutes a much more challenging generalization scene than “Kitchen1”. The mock kitchen used to gather most of the training data was modeled after Kitchen1.

SayCan with RT-1 achieves a 67% execution success rate in Kitchen1, outperforming other baselines. Due to the generalization difficulty presented by the new unseen kitchen, the performance of SayCan with Gato and SayCan with BCZ shapely falls, while RT-1 does not show a visible drop.

	SayCan tasks in Kitchen1		SayCan tasks in Kitchen2
	Planning	Execution	Planning	Execution
Original Saycan	73	47	–	–
SayCan w/ Gato	87	33	87	0
SayCan w/ BC-Z	87	53	87	13
SayCan w/ RT-1	87	67	87	67

The following video shows a few example PaLM-SayCan-RT1 executions of long-horizon tasks in multiple real kitchens.

Conclusion

The RT-1 Robotics Transformer is a simple and scalable action-generation model for real-world robotics tasks. It tokenizes all inputs and outputs, and uses a pre-trained EfficientNet model with early language fusion, and a token learner for compression. RT-1 shows strong performance across hundreds of tasks, and extensive generalization abilities and robustness in real-world settings.

As we explore future directions for this work, we hope to scale the number of robot skills faster by developing methods that allow non-experts to train the robot with directed data collection and model prompting. We also look forward to improving robotics transformers’ reaction speeds and context retention with scalable attention and memory. To learn more, check out the paper, open-sourced RT-1 code, and the project website.

Acknowledgements

This work was done in collaboration with Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, and Brianna Zitkovich.

reMARS revisited: Human-like reasoning for an AI

Learn what goes into Amazon’s effort to develop human-like reasoning for Alexa.Read More

“I want machines to write as fluently as humans”

Amazon Machine Learning Fellow Jiao Sun works on strategies to control text generation.Read More

Ferrari of Finance: Accelerated Computing Drives Milan Bank Forward

Banks require more than cash in the vault these days, they also need accelerated computing in the back room.

“The boost we’re getting with GPUs not only significantly improved our performance at the same cost, it helped us redefine our business and sharpen our focus on customers,” said Marco Airoldi, who’s been head of financial engineering for more than 20 years at Mediobanca, a Milan-based banking group that provides lending and investment services in Europe.

High performance computing is especially important for investment banks whose services involve computationally intensive transactions on baskets of securities and derivative products.

Thanks, in part, to its GPU-powered systems, Mediobanca is thriving amid the current market downturn.

“We can’t disclose numbers, but I can tell you with a good degree of confidence I don’t think we’ve had more than a dozen negative days in the last 250 trading days,” said Stefano Dova, a Ph.D. in finance and head of markets at Mediobanca.

That’s, in part, because Airoldi’s team enabled real-time risk management on GPUs early in the year.

“It’s a fundamental step forward,” said Dova, who plays his electric piano or clarinet to unwind at the end of a stressful day. “You can lose money on a daily basis in the current market volatility, but we’ve been very happy with the results we’ve had in the last six months.”

Sharing the Wealth

Now, Mediobanca is preparing to offer its customers the same computing capabilities it enjoys.

“Because the GPUs are so fast, we can offer clients the ability to build their own products and see their risk profiles in real time, so they can decide where and when to invest — you can only do this if you have the computational power for live pricing,” Dova said.

The service, now in final testing, puts customers at the center of the bank’s business. It uses automation made possible by the parallel computing capabilities of the bank’s infrastructure, Airoldi notes.

Next Stop: Machine Learning

Looking further ahead, Airoldi’s group is mapping the investment bank’s journey into AI.

It starts with sentiment analysis, powered by natural language processing. That will help the bank understand market trends more deeply, so it can make even better investment decisions.

“AI will give us useful ways to map customer and investor behaviors, and we will invest in the technology to develop more AI apps for finance,” said Dova.

Mediobanca Milan uses GPUs — Mediobanca’s headquarters is in central Milan, around the corner from the famed Teatro alla Scala.

Their work comes as banks of all sorts are starting to apply AI to dozens of use cases.

“AI is one of the most promising technologies in finance,” said Airoldi, who foresees using it for classical quantitative problems, too.

It’s All About the Math

In the last few years, the bank has added dozens of GPUs to its infrastructure. Each offers up to 100x the performance of a CPU, he said.

That means Mediobanca can do more with less. It reduces its total cost of ownership while accelerating workloads that create competitive advantages such as Monte Carlo simulations used to create and price advanced investment products.

Under the hood, great financial performance is based on excellence in math, said Airoldi, who earned his Ph.D. in theoretical condensed matter physics.

“The mathematical models and numeric methods of finance are closely related to those found in theoretical physics, so investment banking is a great job for a physicist,” he said.

When Airoldi needs a break from work, you might find him playing chess in the Piazza della Scala, across from the famed opera house, just around the corner from the bank’s headquarters.

The post Ferrari of Finance: Accelerated Computing Drives Milan Bank Forward appeared first on NVIDIA Blog.

Helping robots learn from each other

A look at how Transformers are helping robots become more useful.Read More

Face All Fears With Creative Studio Fabian&Fred This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

The short film I Am Not Afraid! by creative studio Fabian&Fred embodies childlike wonder, curiosity and imagination this week In the NVIDIA Studio.

Plus, the NVIDIA Studio #WinterArtChallenge shows no signs of letting up, so learn more and check out featured artwork at the end of this blog. Keep those posts coming.

For inspiration, watch NVIDIA artist Jeremy Lightcap and Adobe Substance 3D expert Pierre Maheut create a winter-themed scene in NVIDIA Omniverse, a platform for creating and operating metaverse applications that enables artists to connect their favorite 3D design tools for a more seamless workflow.

Invoke Emotion

For almost a decade, Fabian&Fred co-founders Fabian Driehost and Frederic Schuld have focused on relatable narratives — stories understood by audiences of all ages — while not shying away from complex emotional and social topics.

The short film’s hero Vanja faces her fears.

One of their latest works, I Am Not Afraid!, features a little girl named Vanja who discovers that being brave means facing your own fears and that everyone, even the bigger personalities in this world, are scared now and again.

“Everybody knows how it feels to be afraid of the dark,” said Fabian&Fred.

The concept for the film started when director and Norwegian native Marita Mayor shared her childhood experiences with the team. These emotional moments had a profound artistic impact on the work’s visual-layer-based, flat, minimal style and an appropriate color system.

“We combined structures from nature, brush strokes used for texture, and a kid’s voice — all designed to ensure the feeling of fear was authentic,” said the team.

With the script in hand, pre-production work included various sketches, moodboards and photographs of urban neighborhoods, people, animals and plants to match the narrative tone.

Work began in the Adobe Creative Cloud suite of creative apps, starting with the creation of multiple characters in Adobe Photoshop. These characters were then prepared and rigged in Adobe Animate.

Animated characters were used in Premiere Pro to create an animatic to test out voices and sounds. With the new GeForce RTX 40 Series GPUs, studios like Fabian&Fred can deploy NVIDIA’s dual encoders to cut export times nearly in half, speeding up review cycles for teams.

3D assets were modeled in Blender, with Blender Cycles RTX-accelerated OptiX ray tracing in the viewport, ensuring interactive modeling with sharp graphical output.

Preliminary sketches in Adobe Photoshop.

In parallel, large, detailed backgrounds were created in Adobe Illustrator with the GPU-accelerated canvas. Fabian&Fred were able to smoothly and interactively pan across, and zoom in and out of, their complex vector graphics, thanks to their GeForce RTX 3090 GPU.

Stunning backgrounds detailed in Adobe Illustrator.

Fabian&Fred returned to Adobe Animate to stage all assets and backgrounds with a mix of frame-by-frame and rig animation techniques. Sound production was done in the digital audio app ProTools, and final composite work completed in Adobe After Effects with more than 45 RTX GPU-accelerated features and effects at the duo’s disposal.

Finally, Fabian&Fred color corrected I Am Not Afraid! using Blackmagic Design’s DaVinci Resolve RTX GPU-accelerated, AI-powered, auto-color-correct feature to improve hues and contrast with ease. They then applied some final manual touches.

The new GeForce RTX 40 Series GPUs speed up AI tools in DaVinci Resolve, including Object Select Mask, which rotoscopes or highlights parts of motion footage frame by frame 70% faster than the previous generation, thanks to close collaboration with Blackmagic Design.

“We have worked closely with NVIDIA for many years, and we look forward to continuing our collaboration to produce even more groundbreaking tools and performance for creators,” said Rohit Gupta, director of software development at Blackmagic Design.

“Each project in our portfolio has benefited from reliable GeForce RTX GPU performance, whether it’s 2D animation or a photogrammetry-based, real-time 3D project.” – Fabian&Fred

Virtually every stage in Fabian&Fred’s creative workflow was made faster and easier with their GeForce RTX GPU. And while these powerful graphics cards are well known for accelerating the most difficult and complex workflows, they’re a boon for efficiency in smaller projects, as well.

Reflecting on their shared experiences, Fabian&Fred agreed that teamwork and diversity are their strengths. “In our studio, we come together from multicultural roots and make unique films as a team, with different methods, but the films have a truth in their heart that works for many people.”

View more of Fabian&Fred’s work on their Instagram page.

The Weather Outside Is Frightful, the #WinterArtChallenge Is Delightful

Enter NVIDIA Studio’s #WinterArtChallenge, running through the end of the year, by sharing winter-themed art on Instagram, Twitter or Facebook for a chance to be featured on our social media channels.

Like @mtw75 with Santa Claus and his faithful elves preparing gifts for all the good little boys and girls this holiday season.

#WinterArtChallenge did that some time ago for the santafacturing medium article https://t.co/3IyTUTDa1p pic.twitter.com/5KVvtWf2dw

— mtw75 (@mtw75) December 1, 2022

Be sure to tag #WinterArtChallenge to join.

Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

The post Face All Fears With Creative Studio Fabian&Fred This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Q&A with Yufei Ding, assistant professor, computer science, University of California, Santa Barbara

This month, we’re spotlighting Yufei Ding, an assistant professor in computer science at the University of California, Santa Barbara (UCSB)Read More

Image augmentation pipeline for Amazon Lookout for Vision

Amazon Lookout for Vision provides a machine learning (ML)-based anomaly detection service to identify normal images (i.e., images of objects without defects) vs anomalous images (i.e., images of objects with defects), types of anomalies (e.g., missing piece), and the location of these anomalies. Therefore, Lookout for Vision is popular among customers that look for automated solutions for industrial quality inspection (e.g., detecting abnormal products). However, customers’ datasets usually face two problems:

The number of images with anomalies could be very low and might not reach anomalies/defect type minimum imposed by Lookout for Vision (~20).
Normal images might not have enough diversity and might result in the model failing when environmental conditions such as lighting change in production

To overcome these problems, this post introduces an image augmentation pipeline that targets both problems: It provides a way to generate synthetic anomalous images by removing objects in images and generates additional normal images by introducing controlled augmentation such as gaussian noise, hue, saturation, pixel value scaling etc. We use the imgaug library to introduce augmentation to generate additional anomalous and normal images for the second problem. We use Amazon Sagemaker Ground Truth to generate object removal masks and the LaMa algorithm to remove objects for the first problem using image inpainting (object removal) techniques.

The rest of the post is organized as follows. In Section 3, we present the image augmentation pipeline for normal images. In Section 4, we present the image augmentation pipeline for abnormal images (aka synthetic defect generation). Section 5 illustrates the Lookout for Vision training results using the augmented dataset. Section 6 demonstrates how the Lookout for Vision model trained on synthetic data perform against real defects. In Section 7, we talk about cost estimation for this solution. All of the code we used for this post can be accessed here.

1. Solution overview

ML diagram

The following is the diagram of the proposed image augmentation pipeline for Lookout for Vision anomaly localization model training:

The diagram above starts by collecting a series of images (step 1). We augment the dataset by augmenting the normal images (step 3) and by using object removal algorithms (steps 2, 5-6). We then package the data in a format that can be consumed by Amazon Lookout for Vision (steps 7-8). Finally, in step 9, we use the packaged data to train a Lookout for Vision localization model.

This image augmentation pipeline gives customers flexibility to generate synthetic defects in the limited sample dataset, as well as add more quantity and variety to normal images. It would boost the performance of Lookout for Vision service, solving the lack of customer data issue and making the automated quality inspection process smoother.

2. Data preparation

From here to the end of the post, we use the public FICS-PCB: A Multi-Modal Image Dataset for Automated Printed Circuit Board Visual Inspection dataset licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License to illustrate the image augmentation pipeline and the consequent Lookout for Vision training and testing. This dataset is designed to support the evaluation of automated PCB visual inspection systems. It was collected at the SeCurity and AssuraNce (SCAN) lab at the University of Florida. It can be accessed here.

We start with the hypothesis that the customer only provides a single normal image of a PCB board (a s10 PCB sample) as the dataset. It can be seen as follows:

3. Image augmentation for normal images

The Lookout for Vision service requires at least 20 normal images and 20 anomalies per defect type. Since there is only one normal image from the sample data, we must generate more normal images using image augmentation techniques. From the ML standpoint, feeding multiple image transformations using different augmentation techniques can improve the accuracy and robustness of the model.

We’ll use imgaug for image augmentation of normal images. Imgaug is an open-source python package that lets you augment images in ML experiments.

First, we’ll install the imgaug library in an Amazon SageMaker notebook.

pip install imgaug

Next, we can install the python package named ‘IPyPlot’.

pip install ipyplot

Then, we perform image augmentation of the original image using transformations including GammaContrast, SigmoidContrast, and LinearContrast, and adding Gaussian noise on the image.

import imageio
import imgaug as ia
import imgaug.augmenters as iaa
import ipyplot
input_img = imageio.imread('s10.png')
noise=iaa.AdditiveGaussianNoise(10,40)
input_noise=noise.augment_image(input_img)
contrast=iaa.GammaContrast((0.5, 2.0))
contrast_sig = iaa.SigmoidContrast(gain=(5, 10), cutoff=(0.4, 0.6))
contrast_lin = iaa.LinearContrast((0.6, 0.4))
input_contrast = contrast.augment_image(input_img)
sigmoid_contrast = contrast_sig.augment_image(input_img)
linear_contrast = contrast_lin.augment_image(input_img)
images_list=[input_img, input_contrast,sigmoid_contrast,linear_contrast,input_noise]
labels = ['Original', 'Gamma Contrast','SigmoidContrast','LinearContrast','Gaussian Noise Image']
ipyplot.plot_images(images_list,labels=labels,img_width=180)

Since we need at least 20 normal images, and the more the better, we generated 10 augmented images for each of the 4 transformations shown above as our normal image dataset. In the future, we plan to also transform the images to be positioned at difference locations and different angels so that the trained model can be less sensitive to the placement of the object relative to the fixed camera.

4. Synthetic defect generation for augmentation of abnormal images

In this section, we present a synthetic defect generation pipeline to augment the number of images with anomalies in the dataset. Note that, as opposed to the previous section where we create new normal samples from existing normal samples, here, we create new anomaly images from normal samples. This is an attractive feature for customers that completely lack this kind of images in their datasets, e.g., removing a component of the normal PCB board. This synthetic defect generation pipeline has three steps: first, we generate synthetic masks from source (normal) images using Amazon SageMaker Ground Truth. In this post, we target at a specific defect type: missing component. This mask generation provides a mask image and a manifest file. Second, the manifest file must be modified and converted to an input file for a SageMaker endpoint. And third, the input file is input to an Object Removal SageMaker endpoint responsible of removing the parts of the normal image indicated by the mask. This endpoint provides the resulting abnormal image.

4.1 Generate synthetic defect masks using Amazon SageMaker Ground Truth

Amazon Sagemaker Ground Truth for data labeling

Amazon SageMaker Ground Truth is a data labeling service that makes it easy to label data and gives you the option to use human annotators through Amazon Mechanical Turk, third-party vendors, or your own private workforce. You can follow this tutorial to set up a labeling job.

In this section, we’ll show how we use Amazon SageMaker Ground Truth to mark specific “components” in normal images to be removed in the next step. Note that a key contribution of this post is that we don’t use Amazon SageMaker Ground Truth in its traditional way (that is, to label training images). Here, we use it to generate a mask for future removal in normal images. These removals in normal images will generate the synthetic defects.

For the purpose of this post, in our labeling job we’ll artificially remove up to three components from the PCB board: IC, resistor1, and resistor2. After entering the labeling job as a labeler, you can select the label name and draw a mask of any shape around the component that you want to remove from the image as a synthetic defect. Note that you can’t include ‘_’ in the label name for this experiment, since we use ‘_’ to separate different metadata in the defect name later in the code.

In the following picture, we draw a green mask around IC (Integrated Circuit), a blue mask around resistor 1, and an orange mask around resistor 2.

After we select the submit button, Amazon SageMaker Ground Truth will generate an output mask with white background and a manifest file as follows:

{"source-ref":"s3://pcbtest22/label/s10.png","s10-label-ref":"s3://pcbtest22/label/s10-label/annotations/consolidated-annotation/output/0_2022-09-08T18:01:51.334016.png","s10-label-ref-metadata":{"internal-color-map":{"0":{"class-name":"BACKGROUND","hex-color":"#ffffff","confidence":0},"1":{"class-name":"IC","hex-color":"#2ca02c","confidence":0},"2":{"class-name":"resistor_1","hex-color":"#1f77b4","confidence":0},"3":{"class-name":"resistor_2","hex-color":"#ff7f0e","confidence":0}},"type":"groundtruth/semantic-segmentation","human-annotated":"yes","creation-date":"2022-09-08T18:01:51.498525","job-name":"labeling-job/s10-label"}}

Note that so far we haven’t generated any abnormal images. We just marked the three components that will be artificially removed and whose removal will generate abnormal images. Later, we’ll use both (1) the mask image above, and (2) the information from the manifest file as inputs for the abnormal image generation pipeline. The next section shows how to prepare the input for the SageMaker endpoint.

4.2 Prepare Input for SageMaker endpoint

Transform Amazon SageMaker Ground Truth manifest as a SageMaker endpoint input file

First, we set up an Amazon Simple Storage Service (Amazon S3) bucket to store all of the input and output for the image augmentation pipeline. In the post, we use an S3 bucket named qualityinspection. Then we generate all of the augmented normal images and upload them to this S3 bucket.

from PIL import Image 
import os 
import shutil 
import boto3

s3=boto3.client('s3')

# make the image directory
dir_im="images"
if not os.path.isdir(dir_im):
    os.makedirs(dir_im)
# create augmented images from original image
input_img = imageio.imread('s10.png')

for i in range(10):
    noise=iaa.AdditiveGaussianNoise(scale=0.2*255)
    contrast=iaa.GammaContrast((0.5,2))
    contrast_sig = iaa.SigmoidContrast(gain=(5,20), cutoff=(0.25, 0.75))
    contrast_lin = iaa.LinearContrast((0.4,1.6))
      
    input_noise=noise.augment_image(input_img)
    input_contrast = contrast.augment_image(input_img)
    sigmoid_contrast = contrast_sig.augment_image(input_img)
    linear_contrast = contrast_lin.augment_image(input_img)
      
    im_noise = Image.fromarray(input_noise)
    im_noise.save(f'{dir_im}/input_noise_{i}.png')

    im_input_contrast = Image.fromarray(input_contrast)
    im_input_contrast.save(f'{dir_im}/contrast_sig_{i}.png')

    im_sigmoid_contrast = Image.fromarray(sigmoid_contrast)
    im_sigmoid_contrast.save(f'{dir_im}/sigmoid_contrast_{i}.png')

    im_linear_contrast = Image.fromarray(linear_contrast)
    im_linear_contrast.save(f'{dir_im}/linear_contrast_{i}.png')
    
# move original image to image augmentation folder
shutil.move('s10.png','images/s10.png')
# list all the images in the image directory
imlist =  [file for file in os.listdir(dir_im) if file.endswith('.png')]

# upload augmented images to an s3 bucket
s3_bucket='qualityinspection'
for i in range(len(imlist)):
    with open('images/'+imlist[i], 'rb') as data:
        s3.upload_fileobj(data, s3_bucket, 'images/'+imlist[i])

# get the image s3 locations
im_s3_list=[]
for i in range(len(imlist)):
    image_s3='s3://qualityinspection/images/'+imlist[i]
    im_s3_list.append(image_s3)

Next, we download the mask from Amazon SageMaker Ground Truth and upload it to a folder named ‘mask’ in that S3 bucket.

# download Ground Truth annotation mask image to local from the Ground Truth s3 folder
s3.download_file('pcbtest22', 'label/S10-label3/annotations/consolidated-annotation/output/0_2022-09-09T17:25:31.918770.png', 'mask.png')
# upload mask to mask folder
s3.upload_file('mask.png', 'qualityinspection', 'mask/mask.png')

After that, we download the manifest file from Amazon SageMaker Ground Truth labeling job and read it as json lines.

import json
#download output manifest to local
s3.download_file('pcbtest22', 'label/S10-label3/manifests/output/output.manifest', 'output.manifest')
# read the manifest file
with open('output.manifest','rt') as the_new_file:
    lines=the_new_file.readlines()
    for line in lines:
        json_line = json.loads(line)

Lastly, we generate an input dictionary which records the input image’s S3 location, mask location, mask information, etc., save it as txt file, and then upload it to the target S3 bucket ‘input’ folder.

# create input dictionary
input_dat=dict()
input_dat['input-image-location']=im_s3_list
input_dat['mask-location']='s3://qualityinspection/mask/mask.png'
input_dat['mask-info']=json_line['S10-label3-ref-metadata']['internal-color-map']
input_dat['output-bucket']='qualityinspection'
input_dat['output-project']='synthetic_defect'

# Write the input as a txt file and upload it to s3 location
input_name='input.txt'
with open(input_name, 'w') as the_new_file:
    the_new_file.write(json.dumps(input_dat))
s3.upload_file('input.txt', 'qualityinspection', 'input/input.txt')

The following is a sample input file:

{"input-image-location": ["s3://qualityinspection/images/s10.png", ... "s3://qualityinspection/images/contrast_sig_1.png"], "mask-location": "s3://qualityinspection/mask/mask.png", "mask-info": {"0": {"class-name": "BACKGROUND", "hex-color": "#ffffff", "confidence": 0}, "1": {"class-name": "IC", "hex-color": "#2ca02c", "confidence": 0}, "2": {"class-name": "resistor1", "hex-color": "#1f77b4", "confidence": 0}, "3": {"class-name": "resistor2", "hex-color": "#ff7f0e", "confidence": 0}}, "output-bucket": "qualityinspection", "output-project": "synthetic_defect"}

4.3 Create Asynchronous SageMaker endpoint to generate synthetic defects with missing components

4.3.1 LaMa Model

To remove components from the original image, we’re using an open-source PyTorch model called LaMa from LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions. It’s a resolution-robust large mask in-painting model with Fourier convolutions developed by Samsung AI. The inputs for the model are an image and a black and white mask and the output is an image with the objects inside the mask removed. We use Amazon SageMaker Ground Truth to create the original mask, and then transform it to a black and white mask as required. The LaMa model application is demonstrated as following:

4.3.2 Introducing Amazon SageMaker Asynchronous inference

Amazon SageMaker Asynchronous Inference is a new inference option in Amazon SageMaker that queues incoming requests and processes them asynchronously. Asynchronous inference enables users to save on costs by autoscaling the instance count to zero when there are no requests to process. This means that you only pay when your endpoint is processing requests. The new asynchronous inference option is ideal for workloads where the request sizes are large (up to 1GB) and inference processing times are in the order of minutes. The code to deploy and invoke the endpoint is here.

4.3.3 Endpoint deployment

To deploy the asynchronous endpoint, first we must get the IAM role and set up some environment variables.

from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorchModel
import boto3

role = get_execution_role()
env = dict()
env['TS_MAX_REQUEST_SIZE'] = '1000000000'
env['TS_MAX_RESPONSE_SIZE'] = '1000000000'
env['TS_DEFAULT_RESPONSE_TIMEOUT'] = '1000000'
env['DEFAULT_WORKERS_PER_MODEL'] = '1'

As we mentioned before, we’re using open source PyTorch model LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions and the pre-trained model has been uploaded to s3://qualityinspection/model/big-lama.tar.gz. The image_uri points to a docker container with the required framework and python versions.

model = PyTorchModel(
    entry_point="./inference_defect_gen.py",
    role=role,
    source_dir = './',
    model_data='s3://qualityinspection/model/big-lama.tar.gz',
    image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker',
    framework_version="1.7.1",
    py_version="py3",
    env = env,
    model_server_workers=1
)

Then, we must specify additional asynchronous inference specific configuration parameters while creating the endpoint configuration.

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
bucket = 'qualityinspection'
prefix = 'async-endpoint'
async_config = AsyncInferenceConfig(output_path=f"s3://{bucket}/{prefix}/output",max_concurrent_invocations_per_instance=10)

Next, we deploy the endpoint on a ml.g4dn.xlarge instance by running the following code:

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.g4dn.xlarge',
    model_server_workers=1,
    async_inference_config=async_config
)

After approximately 6-8 minutes, the endpoint is created successfully, and it will show up in the SageMaker console.

4.3.4 Invoke the endpoint

Next, we use the input txt file we generated earlier as the input of the endpoint and invoke the endpoint using the following code:

import boto3
runtime= boto3.client('runtime.sagemaker')
response = runtime.invoke_endpoint_async(EndpointName='pytorch-inference-2022-09-16-02-04-37-888',
                                   InputLocation='s3://qualityinspection/input/input.txt')

The above command will finish execution immediately. However, the inference will continue for several minutes until it completes all of the tasks and returns all of the outputs in the S3 bucket.

4.3.5 Check the inference result of the endpoint

After you select the endpoint, you’ll see the Monitor session. Select ‘View logs’ to check the inference results in the console.

Two log records will show up in Log streams. The one named data-log will show the final inference result, while the other log record will show the details of the inference, which is usually used for debug purposes.

If the inference request succeeds, then you’ll see the message: Inference request succeeded.in the data-log and also get information of the total model latency, total process time, etc. in the message. If the inference fails, then check the other log to debug. You can also check the result by polling the status of the inference request. Learn more about the Amazon SageMaker Asynchronous inference here.

4.3.6 Generating synthetic defects with missing components using the endpoint

We’ll complete four tasks in the endpoint:

The Lookout for Vision anomaly localization service requires one defect per image in the training dataset to optimize model performance. Therefore, we must separate the masks for different defects in the endpoint by color filtering.
Split train/test dataset to satisfy the following requirement:
- at least 10 normal images and 10 anomalies for train dataset
- one defect/image in train dataset
- at least 10 normal images and 10 anomalies for test dataset
- multiple defects per image is allowed for the test dataset
Generate synthetic defects and upload them to the target S3 locations.

We generate one defect per image and more than 20 defects per class for train dataset, as well as 1-3 defects per image and more than 20 defects per class for the test dataset.

The following is an example of the source image and its synthetic defects with three components: IC, resistor1, and resistor 2 missing.

original image

40_im_mask_IC_resistor1_resistor2.jpg (the defect name indicates the missing components)

Generate manifest files for train/test dataset recording all of the above information.

Finally, we’ll generate train/test manifests to record information, such as synthetic defect S3 location, mask S3 location, defect class, mask color, etc.

The following are sample json lines for an anomaly and a normal image in the manifest.

For anomaly:

{"source-ref": "s3://qualityinspection/synthetic_defect/anomaly/train/6_im_mask_IC.jpg", "auto-label": 11, "auto-label-metadata": {"class-name": "anomaly", "type": "groundtruth/image-classification"}, "anomaly-mask-ref": "s3://qualityinspection/synthetic_defect/mask/MixMask/mask_IC.png", "anomaly-mask-ref-metadata": {"internal-color-map": {"0": {"class-name": "IC", "hex-color": "#2ca02c", "confidence": 0}}, "type": "groundtruth/semantic-segmentation"}}

For normal image:

{"source-ref": "s3://qualityinspection/synthetic_defect/normal/train/25_im.jpg", "auto-label": 12, "auto-label-metadata": {"class-name": "normal", "type": "groundtruth/image-classification"}}

4.3.7 Amazon S3 folder structure

The input and output of the endpoint are stored in the target S3 bucket in the following structure:

5 Lookout for Vision model training and result

5.1 Set up a project, upload dataset, and start model training.

First, you can go to Lookout for Vision from the AWS Console and create a project.
Then, you can create a training dataset by choosing Import images labeled by SageMaker Ground Truth and give the Amazon S3 location of the train dataset manifest generated by the SageMaker endpoint.
Next, you can create a test dataset by choosing Import images labeled by SageMaker Ground Truth again, and give the Amazon S3 location of the test dataset manifest generated by the SageMaker endpoint.

…….
….
After the train and test datasets are uploaded successfully, you can select the Train model button at the top right corner to trigger the anomaly localization model training.
……
In our experiment, the model took slightly longer than one hour to complete training. When the status shows training complete, you can select the model link to check the result.
….

5.2 Model training result

5.2.1 Model performance metrics

After selecting at the Model 1 as shown above, we can see from the 100% Precision, 100% Recall, and 100% F1 score that the model performance is quite good. We can also check the performance per label (missing component), and we’ll be happy to find that all three labels’ F1 scores are above 93%, and the Average IoUs are above 85%. This result is satisfying for this small dataset that we demonstrated in the post.

5.2.2 Visualization of synthetic defects detection in the test dataset.

As the following image shows, each image will be defected as an normal or anomaly label with a confidence score. If it’s an anomaly, then it will show a mask over the abnormal area in the image with a different color for each defect type.

The following is an example of combined missing components (three defects in this case) in the test dataset:

Next you can compile and package the model as an AWS IoT Greengrass component following the instructions in this post, Identify the location of anomalies using Amazon Lookout for Vision at the edge without using a GPU, and run inferences on the model.

6. Test the Lookout for Vision model trained on synthetic data against real defects

To test if the model trained on the synthetic defect can perform well against real defects, we picked a dataset (aliens-dataset) from here to run an experiment.

First, we compare the generated synthetic defect and the real defect. The left image is a real defect with a missing head, and the right image is a generated defect with the head removed using an ML model.

Real defect

Synthetic defect

Second, we use the trial detections in Lookout for Vision to test the model against the real defect. You can either save the test images in the S3 bucket and import them from Amazon S3 or upload images from your computer. Then, select Detect anomalies to run the detection.

Finally, you can see the prediction result of the real defect. The model trained on synthetic defects can defect the real defect accurately in this experiment.

The model trained on synthetic defects may not always perform well on real defects, especially circuit boards which are much more complicated than this sample dataset. If you want to retrain the model with real defects, then you can select the orange button labeled Verify machine predictions in the upper right corner of the prediction result, and then check it as Correct or Incorrect.

Then you can add the verified image and label to the training dataset by selecting the orange button in the upper right corner to enhance model performance.

7. Cost estimation

This image augmentation pipeline for Lookout for Vision is very cost-effective. In the example shown above, Amazon SageMaker Ground Truth Labeling, Amazon SageMaker notebook, and SageMaker asynchronous endpoint deployment and inference only cost a few dollars. For Lookout for Vision service, you pay only for what you use. There are three components that determine your bill: charges for training the model (training hours), charges for detecting anomalies on the cloud (cloud inference hours), and/or charges for detecting anomalies on the edge (edge inference units). In our experiment, the Lookout for Vision model took slightly longer than one hour to complete training, and it cost $2.00 per training hour. Furthermore, you can use the trained model for inference on the cloud or on the edge with the price listed here.

8. Clean up

To avoid incurring unnecessary charges, use the Console to delete the endpoints and resources that you created while running the exercises in the post.

Open the SageMaker console and delete the following resources:
- The endpoint. Deleting the endpoint also deletes the ML compute instance or instances that support it.
  1. Under Inference, choose Endpoints.
  2. Choose the endpoint that you created in the example, choose Actions, and then choose Delete.
- The endpoint configuration.
  1. Under Inference, choose Endpoint configurations.
  2. Choose the endpoint configuration that you created in the example, choose Actions, and then choose Delete.
- The model.
  1. Under Inference, choose Models.
  2. Choose the model that you created in the example, choose Actions, and then choose Delete.
- The notebook instance. Before deleting the notebook instance, stop it.
  1. Under Notebook, choose Notebook instances.
  2. Choose the notebook instance that you created in the example, choose Actions, and then choose Stop. The notebook instance takes several minutes to stop. When the Status changes to Stopped, move on to the next step.
  3. Choose Actions, and then choose Delete.
Open the Amazon S3 console, and then delete the bucket that you created for storing model artifacts and the training dataset.
Open the Amazon CloudWatch console, and then delete all of the log groups that have names starting with /aws/sagemaker/.

You can also delete the endpoint from SageMaker notebook by running the following code:

import boto3
sm_boto3 = boto3.client("sagemaker")
sm_boto3.delete_endpoint(EndpointName='endpoint name')

9. Conclusion

In this post, we demonstrated how to annotate synthetic defect masks using Amazon SageMaker Ground Truth, how to use different image augmentation techniques to transform one normal image to the desired number of normal images, create an asynchronous SageMaker endpoint and prepare the input file for the endpoint, as well as invoke the endpoint. At last, we demonstrated how to use the train/test manifest to train a Lookout for Vision anomaly localization model. This proposed pipeline can be extended to other ML models to generate synthetic defects, and all you need to do is to customize the model and inference code in the SageMaker endpoint.

Start by exploring Lookout for Vision for automated quality inspection here.

About the Authors

Kara Yang is a Data Scientist at AWS Professional Services. She is passionate about helping customers achieve their business goals with AWS cloud services and has helped organizations build end to end AI/ML solutions across multiple industries such as manufacturing, automotive, environmental sustainability and aerospace.

Octavi Obiols-Sales is a computational scientist specialized in deep learning (DL) and machine learning certified as an associate solutions architect. With extensive knowledge in both the cloud and the edge, he helps to accelerate business outcomes through building end-to-end AI solutions. Octavi earned his PhD in computational science at the University of California, Irvine, where he pushed the state-of-the-art in DL+HPC algorithms.

Fabian Benitez-Quiroz is a IoT Edge Data Scientist in AWS Professional Services. He holds a PhD in Computer Vision and Pattern Recognition from The Ohio State University. Fabian is involved in helping customers run their Machine Learning models with low latency on IoT devices and in the cloud.

Manish Talreja is a Principal Product Manager for IoT Solutions at AWS. He is passionate about helping customers build innovative solutions using AWS IoT and ML services in the cloud and at the edge.

Yuxin Yang is an AI/ML architect at AWS, certified in the AWS Machine Learning Specialty. She enables customers to accelerate their outcomes through building end-to-end AI/ML solutions, including predictive maintenance, computer vision and reinforcement learning. Yuxin earned her MS from Stanford University, where she focused on deep learning and big data analytics.

Yingmao Timothy Li is a Data Scientist with AWS. He has joined AWS 11 months ago and he works with a broad range of services and machine learning technologies to build solutions for a diverse set of customers. He holds a Ph.D in Electrical Engineering. In his spare time, He enjoys outdoor games, car racing, swimming, and flying a piper cub to cross country and explore the sky.

Amazon SageMaker JumpStart now offers Amazon Comprehend notebooks for custom classification and custom entity detection

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to discover insights from text. Amazon Comprehend provides customized features, custom entity recognition, custom classification, and pre-trained APIs such as key phrase extraction, sentiment analysis, entity recognition, and more so you can easily integrate NLP into your applications.

We recently added Amazon Comprehend related notebooks in Amazon SageMaker JumpStart notebooks that can help you quickly get started using the Amazon Comprehend custom classifier and custom entity recognizer. You can use custom classification to organize documents into categories (classes) that you define. Custom entity recognition extends the capability of the Amazon Comprehend pre-trained entity detection API by helping you identify entity types that are unique to your domain or business that aren’t in the preset generic entity types.

In this post, we show you how to use JumpStart to build Amazon Comprehend custom classification and custom entity detection models as part of your enterprise NLP needs.

SageMaker JumpStart

The Amazon SageMaker Studio landing page provides the option to use JumpStart. JumpStart provides a quick way to get started by providing pre-trained models for a variety of problem types. You can train and tune these models. JumpStart also provides other resources like notebooks, blogs, and videos.

JumpStart notebooks are essentially sample code that you can use as a starting point to get started quickly. Currently, we provide you with over 40 notebooks that you can use as is or customize as needed. You can find your notebooks by using search or the tabbed view panel. After you find the notebook you want to use, you can import the notebook, customize it for your requirements, and select the infrastructure and environment to run the notebook on.

Get started with JumpStart notebooks

To get started with JumpStart, go to the Amazon SageMaker console and open Studio. Refer to Get Started with SageMaker Studio for instructions on how to get started with Studio. Then complete the following steps:

In Studio, go to the launch page of JumpStart and choose Go to SageMaker JumpStart.

You’re offered multiple ways to search. You may either use tabs on the top to get to what you want, or use the search box as shown in the following screenshot.

To find notebooks, we go to the Notebooks tab.

At the time of writing, JumpStart offers 47 notebooks. You can use filters to find Amazon Comprehend related notebooks.

On the Content Type drop-down menu, choose Notebook.

As you can see in the following screenshot, we currently have two Amazon Comprehend notebooks.

In the following sections, we explore both notebooks.

Amazon Comprehend Custom Classifier

In this notebook, we demonstrate how to use the custom classifier API to create a document classification model.

The custom classifier is a fully managed Amazon Comprehend feature that lets you build custom text classification models that are unique to your business, even if you have little or no ML expertise. The custom classifier builds on the existing capabilities of Amazon Comprehend, which are already trained on tens of millions of documents. It abstracts much of the complexity required to build a NLP classification model. The custom classifier automatically loads and inspects the training data, selects the right ML algorithms, trains your model, finds the optimal hyperparameters, tests the model, and provides model performance metrics. The Amazon Comprehend custom classifier also provides an easy-to-use console for the entire ML workflow, including labeling text using Amazon SageMaker Ground Truth, training and deploying a model, and visualizing the test results. With an Amazon Comprehend custom classifier, you can build the following models:

Multi-class classification model – In multi-class classification, each document can have one and only one class assigned to it. The individual classes are mutually exclusive. For example, a movie can be classed as a documentary or as science fiction, but not both at the same time.
Multi-label classification model – In multi-label classification, individual classes represent different categories, but these categories are somehow related and not mutually exclusive. As a result, each document has at least one class assigned to it, but can have more. For example, a movie can simply be an action movie, or it can be an action movie, a science fiction movie, and a comedy, all at the same time.

This notebook requires no ML expertise to train a model with the example dataset or with your own business specific dataset. You can use the API operations discussed in this notebook in your own applications.

Amazon Custom Entity Recognizer

In this notebook, we demonstrate how to use the custom entity recognition API to create an entity recognition model.

Custom entity recognition extends the capabilities of Amazon Comprehend by helping you identify your specific entity types that aren’t in the preset generic entity types. This means that you can analyze documents and extract entities like product codes or business-specific entities that fit your particular needs.

Building an accurate custom entity recognizer on your own can be a complex process, requiring preparation of large sets of manually annotated training documents and selecting the right algorithms and parameters for model training. Amazon Comprehend helps reduce the complexity by providing automatic annotation and model development to create a custom entity recognition model.

The example notebook takes the training dataset in CSV format and runs inference against text input. Amazon Comprehend also supports an advanced use case that takes Ground Truth annotated data for training and allows you to directly run inference on PDFs and Word documents. For more information, refer to Build a custom entity recognizer for PDF documents using Amazon Comprehend.

Amazon Comprehend has lowered the annotation limits and allowed you to get more stable results, especially for few-shot subsamples. For more information about this improvement, refer to Amazon Comprehend announces lower annotation limits for custom entity recognition.

Use, customize, and deploy Amazon Comprehend JumpStart notebooks

After you select the Amazon Comprehend notebook you want to use, choose Import notebook. As you do that, you can see the notebook kernel starting.

Importing your notebook triggers selection of the notebook instance, kernel, and image that is used to run the notebook. After the default infrastructure is provisioned, you can change the selections as per your requirements.

Now, go over the outline of the notebook and carefully read the sections for prerequisites setup, data setup, training the model, running inference, and stopping the model. Feel free to customize the generated code per your needs.

Based on your requirements, you may want to customize the following sections:

Permissions – For a production application, we recommend restricting access policies to only those needed to run the application. Permissions can be restricted based on the use case, such as training or inference, and specific resource names, such as a full Amazon Simple Storage Service (Amazon S3) bucket name or an S3 bucket name pattern. You should also restrict access to the custom classifier or SageMaker operations to just those that your application needs.
Data and location – The example notebook provides you sample data and S3 locations. Based on your requirements, you may use your own data for training, validation, and testing, and use different S3 locations as needed. Similarly, when the model is created, you can choose to keep the model at different locations. Just make sure you have provided the right permissions to access S3 buckets.
Preprocessing steps – If you’re using different data for training and testing, you may want to adjust the preprocessing steps per your requirements.
Testing data – You can bring your own inference data for testing.
Clean up – Delete the resources launched by the notebook to avoid recurring charges.

Conclusion

In this post, we showed you how to use JumpStart to learn and fast-track using Amazon Comprehend APIs by making it convenient to find and run Amazon Comprehend related notebooks from Studio while having the option to modify the code as needed. The notebooks use sample datasets with AWS product announcements and sample news articles. You may use this notebook to learn how to use Amazon Comprehend APIs in a Python notebook, or you may use it as a starting point and expand the code further for your unique requirements and production deployments.

You can start using JumpStart and take advantage of over 40 notebooks in various topics in all Regions where Studio is available at no additional cost.

About the Authors

Lana Zhang is a Sr. Solutions Architect at the AWS WWSO AI Services team with expertise in AI and ML for Content Moderation and Rekognition. She is passionate about promoting AWS AI services and helping customers transform their business solutions.

Meenakshisundaram Thandavarayan is a Senior AI/ML specialist with AWS. He helps hi-tech strategic accounts on their AI and ML journey. He is very passionate about data-driven AI

Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.

Microsoft Soundscape – New Horizons with a Community-Driven Approach

For more than six years, Microsoft Research has been honored to develop the Soundscape research project, which was designed to deliver information about a person’s location and points of interest and has guided individuals to desired places and in unfamiliar spaces using augmented-reality and three-dimensional audio. While not a traditional turn-by-turn navigation mobile app, the Soundscape research project allowed us to explore ways that audio can enhance mobility and expand navigation experiences without the need to follow directions on a small display.

Beginning January 3, 2023, the Soundscape code will be available as open-source software, so that anyone can continue to build on, and find new ways to leverage, this novel feature set for the growing navigation opportunities in today’s world. As Microsoft Research continues to expand into new accessibility innovation areas, we hope the open-source software release of the Soundscape code supports the community in further developing confidence and utility of spatial audio navigation experiences.

Also on January 3, 2023, the Microsoft Soundscape iOS app will no longer be available for download from the App Store, although existing installations can continue to be used until the end of June 2023. We are grateful to all of those who have tried and found value in the Microsoft Soundscape app and appreciate all the feedback and stories you have shared with us over the years.

Through the Microsoft Soundscape journey, we were delighted to discover the many valuable experiences Soundscape enabled, from empowering mobility instructors, to understanding the role of audio in adaptive sports, to supporting blind or low-vision individuals to go places and do essential activities for their lives. By making the Soundscape code available as open-source software, we hope the interest and potential continues to grow. Documentation on how to build and use the system from the new GitHub Soundscape page will be shared on January 3, 2023.

Frequently asked questions on Soundscape

Q: What is changing for Microsoft Soundscape?
A: It is now time to transition the Soundscape research project to the next phase, where we will share it to allow for broader development. Soundscape code will be available on GitHub as open-source software on January 3, 2023.

Q: What will happen to the Microsoft Soundscape app on iOS?
A: As of January 3, 2023, the app will not be available for download. Existing installations can continue to be used until the end of June 2023.

Q: Will the Azure services that enable the Microsoft Soundscape app continue to be supported?
A: Yes, until the end of June 2023. Beyond that, entities can build new cloud-based services from our open-source release.

Q: Will user feedback on the Microsoft Soundscape app continue to work?
A: Yes, until the end of June 2023. We will focus on bug fixes and repairing service disruptions, but we will not address requests for new features or capabilities.

Q: Will the Soundscape open-source release run only on iOS, or will it also support Android?
A: The original Microsoft Soundscape app only supports iOS, and that is also true for the open-source release.

Q: Why has Microsoft Research decided to release Soundscape as open-source?
A: As we evolve our research portfolio, it is natural to end or transition some projects. We feel the community can benefit from the novel experiences we developed for the Soundscape research project, and that is why we are releasing the code as open-source software.

Q: What will happen to the Microsoft Soundscape Authoring app?
A: Use of the Microsoft Soundscape Authoring app will end on January 17, 2023.

Q: Are other Microsoft offerings implicated in this change for Soundscape or following a similar path at this time?
A: No, this change is specific to Soundscape. There is no impact or implication on other Microsoft offerings.

The post Microsoft Soundscape – New Horizons with a Community-Driven Approach appeared first on Microsoft Research.

Robotics Transformer (RT-1)

Experiments and Results

Incorporating Heterogeneous Data Sources

Long-Horizon SayCan Tasks

Conclusion

Acknowledgements

Sharing the Wealth

Next Stop: Machine Learning

It’s All About the Math

Invoke Emotion

The Weather Outside Is Frightful, the #WinterArtChallenge Is Delightful

1. Solution overview

ML diagram

2. Data preparation

3. Image augmentation for normal images

4. Synthetic defect generation for augmentation of abnormal images

4.1 Generate synthetic defect masks using Amazon SageMaker Ground Truth

Amazon Sagemaker Ground Truth for data labeling

4.2 Prepare Input for SageMaker endpoint

Transform Amazon SageMaker Ground Truth manifest as a SageMaker endpoint input file

4.3 Create Asynchronous SageMaker endpoint to generate synthetic defects with missing components

4.3.1 LaMa Model

4.3.2 Introducing Amazon SageMaker Asynchronous inference

4.3.3 Endpoint deployment

4.3.4 Invoke the endpoint

4.3.5 Check the inference result of the endpoint

4.3.6 Generating synthetic defects with missing components using the endpoint

4.3.7 Amazon S3 folder structure

5 Lookout for Vision model training and result

5.1 Set up a project, upload dataset, and start model training.

5.2 Model training result

5.2.1 Model performance metrics

5.2.2 Visualization of synthetic defects detection in the test dataset.

6. Test the Lookout for Vision model trained on synthetic data against real defects

7. Cost estimation

8. Clean up

9. Conclusion

About the Authors

SageMaker JumpStart

Get started with JumpStart notebooks

Amazon Comprehend Custom Classifier

Amazon Custom Entity Recognizer

Use, customize, and deploy Amazon Comprehend JumpStart notebooks

Conclusion

About the Authors

Frequently asked questions on Soundscape

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.