December 2023 – Page 6

Bootstrap Your Own Variance

This paper was accepted at the workshop Self-Supervised Learning – Theory and Practice at NeurIPS 2023.
*=Equal Contributors
Understanding model uncertainty is important for many applications. We propose Bootstrap Your Own Variance (BYOV), combining Bootstrap Your Own Latent (BYOL), a negative-free Self-Supervised Learning (SSL) algorithm, with Bayes by Backprop (BBB), a Bayesian method for estimating model posteriors. We find that the learned predictive std of BYOV vs. a supervised BBB model is well captured by a Gaussian distribution, providing preliminary evidence that the learned parameter…Apple Machine Learning Research

DataComp: In Search of the Next Generation of Multimodal Datasets

*=Equal Contributors
Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training…Apple Machine Learning Research

Advancements in machine learning for machine learning

Posted by Phitchaya Mangpo Phothilimthana, Staff Research Scientist, Google DeepMind, and Bryan Perozzi, Senior Staff Research Scientist, Google Research

With the recent and accelerated advances in machine learning (ML), machines can understand natural language, engage in conversations, draw images, create videos and more. Modern ML models are programmed and trained using ML programming frameworks, such as TensorFlow, JAX, PyTorch, among many others. These libraries provide high-level instructions to ML practitioners, such as linear algebra operations (e.g., matrix multiplication, convolution, etc.) and neural network layers (e.g., 2D convolution layers, transformer layers). Importantly, practitioners need not worry about how to make their models run efficiently on hardware because an ML framework will automatically optimize the user’s model through an underlying compiler. The efficiency of the ML workload, thus, depends on how good the compiler is. A compiler typically relies on heuristics to solve complex optimization problems, often resulting in suboptimal performance.

In this blog post, we present exciting advancements in ML for ML. In particular, we show how we use ML to improve efficiency of ML workloads! Prior works, both internal and external, have shown that we can use ML to improve performance of ML programs by selecting better ML compiler decisions. Although there exist a few datasets for program performance prediction, they target small sub-programs, such as basic blocks or kernels. We introduce “TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs” (presented at NeurIPS 2023), which we recently released to fuel more research in ML for program optimization. We hosted a Kaggle competition on the dataset, which recently completed with 792 participants on 616 teams from 66 countries. Furthermore, in “Learning Large Graph Property Prediction via Graph Segment Training”, we cover a novel method to scale graph neural network (GNN) training to handle large programs represented as graphs. The technique both enables training arbitrarily large graphs on a device with limited memory capacity and improves generalization of the model.

ML compilers

ML compilers are software routines that convert user-written programs (here, mathematical instructions provided by libraries such as TensorFlow) to executables (instructions to execute on the actual hardware). An ML program can be represented as a computation graph, where a node represents a tensor operation (such as matrix multiplication), and an edge represents a tensor flowing from one node to another. ML compilers have to solve many complex optimization problems, including graph-level and kernel-level optimizations. A graph-level optimization requires the context of the entire graph to make optimal decisions and transforms the entire graph accordingly. A kernel-level optimization transforms one kernel (a fused subgraph) at a time, independently of other kernels.

Important optimizations in ML compilers include graph-level and kernel-level optimizations.

To provide a concrete example, imagine a matrix (2D tensor):

It can be stored in computer memory as [A B C a b c] or [A a B b C c], known as row- and column-major memory layout, respectively. One important ML compiler optimization is to assign memory layouts to all intermediate tensors in the program. The figure below shows two different layout configurations for the same program. Let’s assume that on the left-hand side, the assigned layouts (in red) are the most efficient option for each individual operator. However, this layout configuration requires the compiler to insert a copy operation to transform the memory layout between the add and convolution operations. On the other hand, the right-hand side configuration might be less efficient for each individual operator, but it doesn’t require the additional memory transformation. The layout assignment optimization has to trade off between local computation efficiency and layout transformation overhead.

A node represents a tensor operator, annotated with its output tensor shape [n₀, n₁, …], where n_iis the size of dimension i. Layout {d₀, d₁, …} represents minor-to-major ordering in memory. Applied configurations are highlighted in red, and other valid configurations are highlighted in blue. A layout configuration specifies the layouts of inputs and outputs of influential operators (i.e., convolution and reshape). A copy operator is inserted when there is a layout mismatch.

If the compiler makes optimal choices, significant speedups can be made. For example, we have seen up to a 32% speedup when choosing an optimal layout configuration over the default compiler’s configuration in the XLA benchmark suite.

TpuGraphs dataset

Given the above, we aim to improve ML model efficiency by improving the ML compiler. Specifically, it can be very effective to equip the compiler with a learned cost model that takes in an input program and compiler configuration and then outputs the predicted runtime of the program.

With this motivation, we release TpuGraphs, a dataset for learning cost models for programs running on Google’s custom Tensor Processing Units (TPUs). The dataset targets two XLA compiler configurations: layout (generalization of row- and column-major ordering, from matrices, to higher dimension tensors) and tiling (configurations of tile sizes). We provide download instructions and starter code on the TpuGraphs GitHub. Each example in the dataset contains a computational graph of an ML workload, a compilation configuration, and the execution time of the graph when compiled with the configuration. The graphs in the dataset are collected from open-source ML programs, featuring popular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, and Transformer. The dataset provides 25× more graphs than the largest (earlier) graph property prediction dataset (with comparable graph sizes), and graph size is 770× larger on average compared to existing performance prediction datasets on ML programs. With this greatly expanded scale, for the first time we can explore the graph-level prediction task on large graphs, which is subject to challenges such as scalability, training efficiency, and model quality.

Scale of TpuGraphs compared to other graph property prediction datasets.

We provide baseline learned cost models with our dataset (architecture shown below). Our baseline models are based on a GNN since the input program is represented as a graph. Node features, shown in blue below, consist of two parts. The first part is an opcode id, the most important information of a node, which indicates the type of tensor operation. Our baseline models, thus, map an opcode id to an opcode embedding via an embedding lookup table. The opcode embedding is then concatenated with the second part, the rest of the node features, as inputs to a GNN. We combine the node embeddings produced by the GNN to create the fixed-size embedding of the graph using a simple graph pooling reduction (i.e., sum and mean). The resulting graph embedding is then linearly transformed into the final scalar output by a feedforward layer.

Our baseline learned cost model employs a GNN since programs can be naturally represented as graphs.

Furthermore we present Graph Segment Training (GST), a method for scaling GNN training to handle large graphs on a device with limited memory capacity in cases where the prediction task is on the entire-graph (i.e., graph-level prediction). Unlike scaling training for node- or edge-level prediction, scaling for graph-level prediction is understudied but crucial to our domain, as computation graphs can contain hundreds of thousands of nodes. In a typical GNN training (“Full Graph Training”, on the left below), a GNN model is trained using an entire graph, meaning all nodes and edges of the graph are used to compute gradients. For large graphs, this might be computationally infeasible. In GST, each large graph is partitioned into smaller segments, and a random subset of segments is selected to update the model; embeddings for the remaining segments are produced without saving their intermediate activations (to avoid consuming memory). The embeddings of all segments are then combined to generate an embedding for the original large graph, which is then used for prediction. In addition, we introduce the historical embedding table to efficiently obtain graph segments’ embeddings and segment dropout to mitigate the staleness from historical embeddings. Together, our complete method speeds up the end-to-end training time by 3×.

Comparing Full Graph Training (typical method) vs Graph Segment Training (our proposed method).

Kaggle competition

Finally, we ran the “Fast or Slow? Predict AI Model Runtime” competition over the TpuGraph dataset. This competition ended with 792 participants on 616 teams. We had 10507 submissions from 66 countries. For 153 users (including 47 in the top 100), this was their first competition. We learned many interesting new techniques employed by the participating teams, such as:

Graph pruning / compression: Instead of using the GST method, many teams experimented with different ways to compress large graphs (e.g., keeping only subgraphs that include the configurable nodes and their immediate neighbors).
Feature padding value: Some teams observed that the default padding value of 0 is problematic because 0 clashes with a valid feature value, so using a padding value of -1 can improve the model accuracy significantly.
Node features: Some teams observed that additional node features (such as dot general’s contracting dimensions) are important. A few teams found that different encodings of node features also matter.
Cross-configuration attention: A winning team designed a simple layer that allows the model to explicitly “compare” configs against each other. This technique is shown to be much better than letting the model infer for each config individually.

We will debrief the competition and preview the winning solutions at the competition session at the ML for Systems workshop at NeurIPS on December 16, 2023. Finally, congratulations to all the winners and thank you for your contributions to advancing research in ML for systems!

NeurIPS expo

If you are interested in more research about structured data and artificial intelligence, we hosted the NeurIPS Expo panel Graph Learning Meets Artificial Intelligence on December 9, which covered advancing learned cost models and more!

Acknowledgements

Sami Abu-el-Haija (Google Research) contributed significantly to this work and write-up. The research in this post describes joint work with many additional collaborators including Mike Burrows, Kaidi Cao, Bahare Fatemi, Jure Leskovec, Charith Mendis, Dustin Zelle, and Yanqi Zhou.

StyleDrop: Text-to-image generation in any style

Posted by Kihyuk Sohn and Dilip Krishnan, Research Scientists, Google Research

Text-to-image models trained on large volumes of image-text pairs have enabled the creation of rich and diverse images encompassing many genres and themes. Moreover, popular styles such as “anime” or “steampunk”, when added to the input text prompt, may translate to specific visual outputs. While many efforts have been put into prompt engineering, a wide range of styles are simply hard to describe in text form due to the nuances of color schemes, illumination, and other characteristics. As an example, “watercolor painting” may refer to various styles, and using a text prompt that simply says “watercolor painting style” may either result in one specific style or an unpredictable mix of several.

When we refer to “watercolor painting style,” which do we mean? Instead of specifying the style in natural language, StyleDrop allows the generation of images that are consistent in style by referring to a style reference image^*.

In this blog we introduce “StyleDrop: Text-to-Image Generation in Any Style”, a tool that allows a significantly higher level of stylized text-to-image synthesis. Instead of seeking text prompts to describe the style, StyleDrop uses one or more style reference images that describe the style for text-to-image generation. By doing so, StyleDrop enables the generation of images in a style consistent with the reference, while effectively circumventing the burden of text prompt engineering. This is done by efficiently fine-tuning the pre-trained text-to-image generation models via adapter tuning on a few style reference images. Moreover, by iteratively fine-tuning the StyleDrop on a set of images it generated, it achieves the style-consistent image generation from text prompts.

Method overview

StyleDrop is a text-to-image generation model that allows generation of images whose visual styles are consistent with the user-provided style reference images. This is achieved by a couple of iterations of parameter-efficient fine-tuning of pre-trained text-to-image generation models. Specifically, we build StyleDrop on Muse, a text-to-image generative vision transformer.

Muse: text-to-image generative vision transformer

Muse is a state-of-the-art text-to-image generation model based on the masked generative image transformer (MaskGIT). Unlike diffusion models, such as Imagen or Stable Diffusion, Muse represents an image as a sequence of discrete tokens and models their distribution using a transformer architecture. Compared to diffusion models, Muse is known to be faster while achieving competitive generation quality.

Parameter-efficient adapter tuning

StyleDrop is built by fine-tuning the pre-trained Muse model on a few style reference images and their corresponding text prompts. There have been many works on parameter-efficient fine-tuning of transformers, including prompt tuning and Low-Rank Adaptation (LoRA) of large language models. Among those, we opt for adapter tuning, which is shown to be effective at fine-tuning a large transformer network for language and image generation tasks in a parameter-efficient manner. For example, it introduces less than one million trainable parameters to fine-tune a Muse model of 3B parameters, and it requires only 1000 training steps to converge.

Parameter-efficient adapter tuning of Muse.

Iterative training with feedback

While StyleDrop is effective at learning styles from a few style reference images, it is still challenging to learn from a single style reference image. This is because the model may not effectively disentangle the content (i.e., what is in the image) and the style (i.e., how it is being presented), leading to reduced text controllability in generation. For example, as shown below in Step 1 and 2, a generated image of a chihuahua from StyleDrop trained from a single style reference image shows a leakage of content (i.e., the house) from the style reference image. Furthermore, a generated image of a temple looks too similar to the house in the reference image (concept collapse).

We address this issue by training a new StyleDrop model on a subset of synthetic images, chosen by the user or by image-text alignment models (e.g., CLIP), whose images are generated by the first round of the StyleDrop model trained on a single image. By training on multiple synthetic image-text aligned images, the model can easily disentangle the style from the content, thus achieving improved image-text alignment.

Iterative training with feedback^*. The first round of StyleDrop may result in reduced text controllability, such as a content leakage or concept collapse, due to the difficulty of content-style disentanglement. Iterative training using synthetic images, generated by the previous rounds of StyleDrop models and chosen by human or image-text alignment models, improves the text adherence of stylized text-to-image generation.

Experiments

StyleDrop gallery

We show the effectiveness of StyleDrop by running experiments on 24 distinct style reference images. As shown below, the images generated by StyleDrop are highly consistent in style with each other and with the style reference image, while depicting various contexts, such as a baby penguin, banana, piano, etc. Moreover, the model can render alphabet images with a consistent style.

Stylized text-to-image generation. Style reference images^* are on the left inside the yellow box.
Text prompts used are:

First row: a baby penguin, a banana, a bench.
Second row: a butterfly, an F1 race car, a Christmas tree.
Third row: a coffee maker, a hat, a moose.
Fourth row: a robot, a towel, a wood cabin.

Stylized visual character generation. Style reference images^* are on the left inside the yellow box.
Text prompts used are: (first row) letter ‘A’, letter ‘B’, letter ‘C’, (second row) letter ‘E’, letter ‘F’, letter ‘G’.

Generating images of my object in my style

Below we show generated images by sampling from two personalized generation distributions, one for an object and another for the style.

Images at the top in the blue border are object reference images from the DreamBooth dataset (teapot, vase, dog and cat), and the image on the left at the bottom in the red border is the style reference image*. Images in the purple border (i.e. the four lower right images) are generated from the style image of the specific object.

Quantitative results

For the quantitative evaluation, we synthesize images from a subset of Parti prompts and measure the image-to-image CLIP score for style consistency and image-to-text CLIP score for text consistency. We study non–fine-tuned models of Muse and Imagen. Among fine-tuned models, we make a comparison to DreamBooth on Imagen, state-of-the-art personalized text-to-image method for subjects. We show two versions of StyleDrop, one trained from a single style reference image, and another, “StyleDrop (HF)”, that is trained iteratively using synthetic images with human feedback as described above. As shown below, StyleDrop (HF) shows significantly improved style consistency score over its non–fine-tuned counterpart (0.694 vs. 0.556), as well as DreamBooth on Imagen (0.694 vs. 0.644). We observe an improved text consistency score with StyleDrop (HF) over StyleDrop (0.322 vs. 0.313). In addition, in a human preference study between DreamBooth on Imagen and StyleDrop on Muse, we found that 86% of the human raters preferred StyleDrop on Muse over DreamBooth on Imagen in terms of consistency to the style reference image.

Conclusion

StyleDrop achieves style consistency at text-to-image generation using a few style reference images. Google’s AI Principles guided our development of Style Drop, and we urge the responsible use of the technology. StyleDrop was adapted to create a custom style model in Vertex AI, and we believe it could be a helpful tool for art directors and graphic designers — who might want to brainstorm or prototype visual assets in their own styles, to improve their productivity and boost their creativity — or businesses that want to generate new media assets that reflect a particular brand. As with other generative AI capabilities, we recommend that practitioners ensure they align with copyrights of any media assets they use. More results are found on our project website and YouTube video.

Acknowledgements

This research was conducted by Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, and Dilip Krishnan. We thank owners of images used in our experiments (links for attribution) for sharing their valuable assets.

^*See image sources ^↩

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas, allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. Amazon SageMaker Canvas is a no-code ML workspace offering ready-to-use models, including foundation models, and the ability to prepare data and build and deploy custom models.

In this post, we discuss how to bring data stored in Amazon DocumentDB into SageMaker Canvas and use that data to build ML models for predictive analytics. Without creating and maintaining data pipelines, you will be able to power ML models with your unstructured data stored in Amazon DocumentDB.

Solution overview

Let’s assume the role of a business analyst for a food delivery company. Your mobile app stores information about restaurants in Amazon DocumentDB because of its scalability and flexible schema capabilities. You want to gather insights on this data and build an ML model to predict how new restaurants will be rated, but find it challenging to perform analytics on unstructured data. You encounter bottlenecks because you need to rely on data engineering and data science teams to accomplish these goals.

This new integration solves these problems by making it simple to bring Amazon DocumentDB data into SageMaker Canvas and immediately start preparing and analyzing data for ML. Additionally, SageMaker Canvas removes the dependency on ML expertise to build high-quality models and generate predictions.

We demonstrate how to use Amazon DocumentDB data to build ML models in SageMaker Canvas in the following steps:

Create an Amazon DocumentDB connector in SageMaker Canvas.
Analyze data using generative AI.
Prepare data for machine learning.
Build a model and generate predictions.

Prerequisites

To implement this solution, complete the following prerequisites:

Have AWS Cloud admin access with an AWS Identity and Access Management (IAM) user with permissions required to complete the integration.
Complete the environment setup using AWS CloudFormation through either of the following options:
1. Deploy a CloudFormation template into a new VPC – This option builds a new AWS environment that consists of the VPC, private subnets, security groups, IAM execution roles, Amazon Cloud9, required VPC endpoints, and SageMaker domain. It then deploys Amazon DocumentDB into this new VPC. Download the template or quick launch the CloudFormation stack by choosing Launch Stack:
2. Deploy a CloudFormation template into an existing VPC – This option creates the required VPC endpoints, IAM execution roles, and SageMaker domain in an existing VPC with private subnets. Download the template or quick launch the CloudFormation stack by choosing Launch Stack:

Note that if you’re creating a new SageMaker domain, you must configure the domain to be in a private VPC without internet access to be able to add the connector to Amazon DocumentDB. To learn more, refer to Configure Amazon SageMaker Canvas in a VPC without internet access.

Follow the tutorial to load sample restaurant data into Amazon DocumentDB.
Add access to Amazon Bedrock and the Anthropic Claude model within it. For more information, see Add model access.

Create an Amazon DocumentDB connector in SageMaker Canvas

After you create your SageMaker domain, complete the following steps:

On the Amazon DocumentDB console, choose No-code machine learning in the navigation pane.
Under Choose a domain and profile¸ choose your SageMaker domain and user profile.
Choose Launch Canvas to launch SageMaker Canvas in a new tab.

When SageMaker Canvas finishes loading, you will land on the Data flows tab.

Choose Create to create a new data flow.
Enter a name for your data flow and choose Create.
Add a new Amazon DocumentDB connection by choosing Import data, then choose Tabular for Dataset type.
On the Import data page, for Data Source, choose DocumentDB and Add Connection.
Enter a connection name such as demo and choose your desired Amazon DocumentDB cluster.

Note that SageMaker Canvas will prepopulate the drop-down menu with clusters in the same VPC as your SageMaker domain.

Enter a user name, password, and database name.
Finally, select your read preference.

To protect the performance of primary instances, SageMaker Canvas defaults to Secondary, meaning that it will only read from secondary instances. When read preference is Secondary preferred, SageMaker Canvas reads from available secondary instances, but will read from the primary instance if a secondary instance is not available. For more information on how to configure an Amazon DocumentDB connection, see the Connect to a database stored in AWS.

Choose Add connection.

If the connection is successful, you will see collections in your Amazon DocumentDB database shown as tables.

Drag your table of choice to the blank canvas. For this post, we add our restaurant data.

The first 100 rows are displayed as a preview.

To start analyzing and preparing your data, choose Import data.
Enter a dataset name and choose Import data.

Analyze data using generative AI

Next, we want to get some insights on our data and look for patterns. SageMaker Canvas provides a natural language interface to analyze and prepare data. When the Data tab loads, you can start chatting with your data with the following steps:

Choose Chat for data prep.
Gather insights about your data by asking questions like the samples shown in the following screenshots.

To learn more about how to use natural language to explore and prepare data, refer to Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas.

Let’s get a deeper sense of our data quality by using the SageMaker Canvas Data Quality and Insights Report, which automatically evaluates data quality and detects abnormalities.

On the Analyses tab, choose Data Quality and Insights Report.
Choose rating as the target column and Regression as the problem type, then choose Create.

This will simulate model training and provide insights on how we can improve our data for machine learning. The complete report is generated in a few minutes.

Our report shows that 2.47% of rows in our target have missing values—we’ll address that in the next step. Additionally, the analysis shows that the address line 2, name, and type_of_food features have the most prediction power in our data. This indicates basic restaurant information like location and cuisine may have an outsized impact on ratings.

Prepare data for machine learning

SageMaker Canvas offers over 300 built-in transformations to prepare your imported data. For more information on transformation features of SageMaker Canvas, refer to Prepare data with advanced transformations. Let’s add some transformations to get our data ready for training an ML model.

Navigate back to the Data flow page by choosing the name of your data flow at the top of the page.
Choose the plus sign next to Data types and choose Add transform.
Choose Add step.
Let’s rename the address line 2 column to cities.
1. Choose Manage columns.
2. Choose Rename column for Transform.
3. Choose address line 2 for Input column, enter cities for New name, and choose Add.
Additionally, lets drop some unnecessary columns.
1. Add a new transform.
2. For Transform, choose Drop column.
3. For Columns to drop, choose URL and restaurant_id.
4. Choose Add.
  [
Our rating feature column has some missing values, so let’s fill in those rows with the average value of this column.
1. Add a new transform.
2. For Transform, choose Impute.
3. For Column type, choose Numeric.
4. For Input columns, choose the rating column.
5. For Imputing strategy, choose Mean.
6. For Output column, enter rating_avg_filled.
7. Choose Add.
We can drop the rating column because we have a new column with filled values.
Because type_of_food is categorical in nature, we’ll want to numerically encode it. Let’s encode this feature using the one-hot encoding technique.
1. Add a new transform.
2. For Transform, choose One-hot encode.
3. For Input columns, choose type_of_food.
4. For Invalid handling strategy¸ choose Keep.
5. For Output style¸ choose Columns.
6. For Output column, enter encoded.
7. Choose Add.

Build a model and generate predictions

Now that we have transformed our data, let’s train a numeric ML model to predict the ratings for restaurants.

Choose Create model.
For Dataset name, enter a name for the dataset export.
Choose Export and wait for the transformed data to be exported.
Choose the Create model link at the bottom left corner of the page.

You can also select the dataset from the Data Wrangler feature on the left of the page.

Enter a model name.
Choose Predictive analysis, then choose Create.
Choose rating_avg_filled as the target column.

SageMaker Canvas automatically selects a suitable model type.

Choose Preview model to ensure there are no data quality issues.
Choose Quick build to build the model.

The model creation will take approximately 2–15 minutes to complete.

You can view the model status after the model finishes training. Our model has an RSME of 0.422, which means the model often predicts the rating of a restaurant within +/- 0.422 of the actual value, a solid approximation for the rating scale of 1–6.

Finally, you can generate sample predictions by navigating to the Predict tab.

Clean up

To avoid incurring future charges, delete the resources you created while following this post. SageMaker Canvas bills you for the duration of the session, and we recommend logging out of SageMaker Canvas when you’re not using it. Refer to Logging out of Amazon SageMaker Canvas for more details.

Conclusion

In this post, we discussed how you can use SageMaker Canvas for generative AI and ML with data stored in Amazon DocumentDB. In our example, we showed how an analyst can quickly build a high-quality ML model using a sample restaurant dataset.

We showed the steps to implement the solution, from importing data from Amazon DocumentDB to building an ML model in SageMaker Canvas. The entire process was completed through a visual interface without writing a single line of code.

To start your low-code/no-code ML journey, refer to Amazon SageMaker Canvas.

About the authors

Adeleke Coker is a Global Solutions Architect with AWS. He works with customers globally to provide guidance and technical assistance in deploying production workloads at scale on AWS. In his spare time, he enjoys learning, reading, gaming and watching sport events.

Gururaj S Bayari is a Senior DocumentDB Specialist Solutions Architect at AWS. He enjoys helping customers adopt Amazon’s purpose-built databases. He helps customers design, evaluate, and optimize their internet scale and high performance workloads powered by NoSQL and/or Relational databases.

Tim Pusateri is a Senior Product Manager at AWS where he works on Amazon SageMaker Canvas. His goal is to help customers quickly derive value from AI/ML. Outside of work, he loves to be outdoors, play guitar, see live music, and spend time with family and friends.

Pratik Das is a Product Manager at AWS. He enjoys working with customers looking to build resilient workloads and strong data foundations in the cloud. He brings expertise working with enterprises on modernization, analytical and data transformation initiatives.

Varma Gottumukkala is a Senior Database Specialist Solutions Architect at AWS based out of Dallas Fort Worth. Varma works with the customers on their database strategy and architect their workloads using AWS purpose built databases. Before joining AWS, he worked extensively with relational databases, NOSQL databases and multiple programming languages for the last 22 years.

Empowering Models with Performance: The Art of Generalized Model Transformation Approach

Introduction

PyTorch 2.0 (PT2) offers a compiled execution mode which rewrites Python bytecode to extract sequences of PyTorch operations, translating them into a Graph IR. The IR is then just-in-time compiled through a customizable back end, improving training performance without user interference. Often, production models may go through multiple stages of optimization/lowering to hit performance targets. Therefore, having a compiled mode is desirable as it can separate the work of improving model performance from direct modification of the PyTorch model implementation. Thus, the compiled mode becomes more important, enabling Pytorch users to enhance model performance without modifying the PyTorch code implementation. This feature is particularly valuable for optimizing complex models, including large-scale and production-ready ones.

In our previous blog post , we outlined how heuristic model transformation rules are employed to optimize intricate production models. While these rules enabled substantial performance gains for some pilot models, they lacked universal adaptability; they don’t consistently perform well across different models or sometimes even within different sections of a single model.

Fig. 1: PT1 Graph mode vs PT2 Compile mode.

In this blog post, we propose a more generalized model transformation solution, serving as a plugin to the PT2 compiler as shown in Fig.1 which is more general, performant and user-friendly, bringing performance improvements to both model training and inference without manual efforts. As illustrated in Fig.2, by incorporating the previously user-defined transformations into the compiler, we have streamlined the production stack. These changes bring advantages to a broader range of PyTorch models, extending beyond just Meta models, which has already been incorporated in PT2 and is ready for use to benefit all Pytorch models.

Fig. 2: Simplified stack with PT2 compile mode.

Guiding Principle: Atomic Rules

Traditionally, people might use predefined heuristic rules to replace a model subgraph with another more performant subgraph toreduce launch overhead, minimize memory bw, and fully occupy SMs. However, this approach doesn’t scale well as it is hard to craft a set of rules that fits all models perfectly.

Instead of grappling with bulky, complex rules, we can actually break them down into smaller, more digestible pieces – what we call ‘atomic rules’. These tiny powerhouses of efficiency target the transformation of individual operators, to conduct one step of the fusion/transformation. This makes them easy to handle and apply, offering a straightforward path to optimizing models. So, with these atomic rules in hand, optimizing any model for top-tier performance becomes a breeze!

We will walk through some simple examples to demonstrate how we use a chain of atomic rules to replace complicated heuristic rules.

Case 1: Horizontal fusion of computation chains started with accesses to embedding tables

Horizontal fusion means fusing parallel operators into one so as to reduce the number of kernels to be launched and improve performance. In our previous blog (Section 3.2), we described model transformations that fused layernorm and activation functions after embedding bags, as shown in the figure provided. However, this method, had limitations:

It only worked with layernorm and activation functions after embedding.
It was restricted to models with specific architecture rules, causing various issues in our production stack, including parameter changes and inference disruptions.

To improve, we can use three atomic rules as shown in Fig.3 to replace the complicated heuristic rule:

Fuse layernorms that follow the same split nodes horizontally.
Then, fuse tanh functions following the same split nodes horizontally.
Lastly, fuse vertical split-cat nodes.

These atomic rules offer a clean and streamlined way for model simplification and optimization.

Fig. 3: Before, we optimized the model in one go by replacing subgraphs. Now, with atomic rules, we optimize step-by-step, covering more cases.

Case 2: Fuse horizontal MLP

MLPs (Multilayer Perceptrons) are fundamental components of deep neural networks, often consisting of linear, normalization, and activation functions. In complex models, there’s often a need to fuse many horizontal MLPs. Traditional methods find and replace parallel MLPs with a fused module as shown in Fig.4, but this isn’t always straightforward. Some models might not have normalization, or they might use different activation functions, making it hard to apply a one-size-fits-all rule.

This is where our atomic rules come in handy. These simplified rules target individual operators one at a time, making the process easier and more manageable. We use the following atomic rules for horizontal MLP fusion:

Fusing horizontal linear operators
Fusing horizontal layernorms.
Fusing horizontal activation functions.

Fig. 4: Pseudocode for fusing MLP. Traditional optimizations need manual Python code changes.

The beauty of these rules is that they’re not limited to one case. They can be applied broadly. Since PyTorch models are built with torch operators, focusing on a smaller set of operators simplifies the process. This approach is not only more manageable but also more general compared to writing a specific large pattern replacement rule, making it easier to optimize various models efficiently.

Compile-time Graph Search

Our principle is to use chained atomic rules to replace heuristic rules. While this approach covers a wider range of cases, it does entail a longer time for graph search and pattern matching. The next question is: how can we minimize compilation time while performing compile-time graph searches efficiently?

We design a two-step greedy algorithm as illustrated in Fig. 5. The first step in this process is to identify the target nodes, which we follow certain rules, e.g., identifying all linear operations with the same input shapes. Once identified, we use a Breadth-First Search (BFS) strategy to separate these nodes into different sets, so that nodes within a set don’t have data dependency. The nodes within each of these sets are independent and can be fused horizontally.

Fig. 5: Process of model transformation with graph IR.

With our approach, the search time is roughly 60 seconds for one of our largest internal models. It is manageable for on-the-fly tasks and

In the End

In our tests with internal ranking models, we observed approximately 5% to 15% training performance improvement across five models on top of the performance gain brought by torch.compile. We have enabled the optimization in PT2 compiler stack and landed it as default when users choose Inductor as the backend (config). We expect our generalized transformation approach could benefit models beyond Meta, and look forward to more discussion and improvement through this compiler level transformation framework.

Acknowledgements

Many thanks to Mark Saroufim, Gregory Chanan, Adnan Aziz, and Rocky Liu for their detailed and insightful reviews.

Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools

Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. It provides access to the most comprehensive set of tools for each step of ML development, from preparing data to building, training, deploying, and managing ML models. You can launch fully managed JuptyerLab with pre-configured SageMaker Distribution in seconds to work with your notebooks, code, and data. The flexible and extensible interface of SageMaker Studio allows you to effortlessly configure and arrange ML workflows, and you can use the AI-powered inline coding companion to quickly author, debug, explain, and test code.

In this post, we take a closer look at the updated SageMaker Studio and its JupyterLab IDE, designed to boost the productivity of ML developers. We introduce the concept of Spaces and explain how JupyterLab Spaces enable flexible customization of compute, storage, and runtime resources to improve your ML workflow efficiency. We also discuss our shift to a localized execution model in JupyterLab, resulting in a quicker, more stable, and responsive coding experience. Additionally, we cover the seamless integration of generative AI tools like Amazon CodeWhisperer and Jupyter AI within SageMaker Studio JupyterLab Spaces, illustrating how they empower developers to use AI for coding assistance and innovative problem-solving.

Introducing Spaces in SageMaker Studio

The new SageMaker Studio web-based interface acts as a command center for launching your preferred IDE and accessing your Amazon SageMaker tools to build, train, tune, and deploy models. In addition to JupyterLab and RStudio, SageMaker Studio now includes a fully managed Code Editor based on Code-OSS (Visual Studio Code Open Source). Both JupyterLab and Code Editor can be launched using a flexible workspace called Spaces.

A Space is a configuration representation of a SageMaker IDE, such as JupyterLab or Code Editor, designed to persist regardless of whether an application (IDE) associated with the Space is actively running or not. A Space represents a combination of a compute instance, storage, and other runtime configurations. With Spaces, you can create and scale the compute and storage for your IDE up and down as you go, customize runtime environments, and pause and resume coding anytime from anywhere. You can spin up multiple such Spaces, each configured with a different combination of compute, storage, and runtimes.

When a Space is created, it is equipped with an Amazon Elastic Block Store (Amazon EBS) volume, which is used to store users’ files, data, caches, and other artifacts. It’s attached to a ML compute instance whenever a Space is run. The EBS volume ensures that user files, data, cache, and session states are consistently restored whenever the Space is restarted. Importantly, this EBS volume remains persistent, whether the Space is in a running or stopped state. It will continue to persist until the Space is deleted.

Additionally, we have introduced the bring-your-own file system feature for users who wish to share environments and artifacts across different Spaces, users, or even domains. This enables you to optionally equip your Spaces with your own Amazon Elastic File System (Amazon EFS) mount, facilitating the sharing of resources across various workspaces.

Creating a Space

Creating and launching a new Space is now quick and straightforward. It takes just a few seconds to set up a new Space with fast launch instances and less than 60 seconds to run a Space. Spaces are equipped with predefined settings for compute and storage, managed by administrators. SageMaker Studio administrators can establish domain-level presets for compute, storage, and runtime configurations. This setup enables you to quickly launch a new space with minimal effort, requiring only a few clicks. You also have the option to modify a Space’s compute, storage, or runtime configurations for further customization.

It’s important to note that creating a Space requires updating the SageMaker domain execution role with a policy like the following example. You need to grant your users permissions for private spaces and user profiles necessary to access these private spaces. For detailed instructions, refer to Give your users access to private spaces.

{
  "Version": "2012-10-17",
  "Statement": [
    {

      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateApp",
        "sagemaker:DeleteApp"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:app/*",
      "Condition": {
        "Null": {
          "sagemaker:OwnerUserProfileArn": "true"
        }
      }
    },
    {
      "Sid": "SMStudioCreatePresignedDomainUrlForUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreatePresignedDomainUrl"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
    },
    {
      "Sid": "SMStudioAppPermissionsListAndDescribe",
      "Effect": "Allow",
      "Action": [
        "sagemaker:ListApps",
        "sagemaker:ListDomains",
        "sagemaker:ListUserProfiles",
        "sagemaker:ListSpaces",
        "sagemaker:DescribeApp",
        "sagemaker:DescribeDomain",
        "sagemaker:DescribeUserProfile",
        "sagemaker:DescribeSpace"
      ],
      "Resource": "*"
    },
    {
      "Sid": "SMStudioAppPermissionsTagOnCreate",
      "Effect": "Allow",
      "Action": [
        "sagemaker:AddTags"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:*/*",
      "Condition": {
        "Null": {
          "sagemaker:TaggingAction": "false"
        }
      }
    },
    {
      "Sid": "SMStudioRestrictSharedSpacesWithoutOwners",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateSpace",
        "sagemaker:UpdateSpace",
        "sagemaker:DeleteSpace"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:space/${sagemaker:DomainId}/*",
      "Condition": {
        "Null": {
          "sagemaker:OwnerUserProfileArn": "true"
        }
      }
    },
    {
      "Sid": "SMStudioRestrictSpacesToOwnerUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateSpace",
        "sagemaker:UpdateSpace",
        "sagemaker:DeleteSpace"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:space/${sagemaker:DomainId}/*",
      "Condition": {
        "ArnLike": {
          "sagemaker:OwnerUserProfileArn": "arn:aws:sagemaker:$AWS Region:$111122223333:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
        },
        "StringEquals": {
          "sagemaker:SpaceSharingType": [
            "Private",
            "Shared"
          ]
        }
      }
    },
    {
      "Sid": "SMStudioRestrictCreatePrivateSpaceAppsToOwnerUserProfile",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateApp",
        "sagemaker:DeleteApp"
      ],
      "Resource": "arn:aws:sagemaker:{{Region}}:{{AccountId}}:app/${sagemaker:DomainId}/*",
      "Condition": {
        "ArnLike": {
          "sagemaker:OwnerUserProfileArn": "arn:aws:sagemaker:${aws:Region}:${aws:PrincipalAccount}:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"
        },
        "StringEquals": {
          "sagemaker:SpaceSharingType": [
            "Private"
          ]
        }
      }
    },
  ]
}

To create a space, complete the following steps:

In SageMaker Studio, choose JupyterLab on the Applications menu.
Choose Create JupyterLab space.
For Name, enter a name for your Space.
Choose Create space.
Choose Run space to launch your new Space with default presets or update the configuration based on your requirements.

Reconfiguring a Space

Spaces are designed for users to seamlessly transition between different compute types as needed. You can begin by creating a new Space with a specific configuration, primarily consisting of compute and storage. If you need to switch to a different compute type with a higher or lower vCPU count, more or less memory, or a GPU-based instance at any point in your workflow, you can do so with ease. After you stop the Space, you can modify its settings using either the UI or API via the updated SageMaker Studio interface and then restart the Space. SageMaker Studio automatically handles the provisioning of your existing Space to the new configuration, requiring no extra effort on your part.

Complete the following steps to edit an existing space:

On the space details page, choose Stop space.
Reconfigure the compute, storage, or runtime.
Choose Run space to relaunch the space.

Your workspace will be updated with the new storage and compute instance type you requested.

The new SageMaker Studio JupyterLab architecture

The SageMaker Studio team continues to invent and simplify its developer experience with the release of a new fully managed SageMaker Studio JupyterLab experience. The new SageMaker Studio JupyterLab experience combines the best of both worlds: the scalability and flexibility of SageMaker Studio Classic (see the appendix at the end of this post) with the stability and familiarity of the open source JupyterLab. To grasp the design of this new JupyterLab experience, let’s delve into the following architecture diagram. This will help us better understand the integration and features of this new JupyterLab Spaces platform.

In summary, we have transitioned towards a localized architecture. In this new setup, Jupyter server and kernel processes operate alongside in a single Docker container, hosted on the same ML compute instance. These ML instances are provisioned when a Space is running, and linked with an EBS volume that is created when the Space was initially created.

This new architecture brings several benefits; we discuss some of these in the following sections.

Reduced latency and increased stability

SageMaker Studio has transitioned to a local run model, moving away from the previous split model where code was stored on an EFS mount and run remotely on an ML instance via remote Kernel Gateway. In the earlier setup, Kernel Gateway, a headless web server, enabled kernel operations over remote communication with Jupyter kernels through HTTPS/WSS. User actions like running code, managing notebooks, or running terminal commands were processed by a Kernel Gateway app on a remote ML instance, with Kernel Gateway facilitating these operations over ZeroMQ (ZMQ) within a Docker container. The following diagram illustrates this architecture.

The updated JupyterLab architecture runs all kernel operations directly on the local instance. This local Jupyter Server approach typically provides improved performance and straightforward architecture. It minimizes latency and network complexity, simplifies the architecture for easier debugging and maintenance, enhances resource utilization, and accommodates more flexible messaging patterns for a variety of complex workloads.

In essence, this upgrade brings running notebooks and code much closer to the kernels, significantly reducing latency and boosting stability.

Improved control over provisioned storage

SageMaker Studio Classic originally used Amazon EFS to provide persistent, shared file storage for user home directories within the SageMaker Studio environment. This setup enables you to centrally store notebooks, scripts, and other project files, accessible across all your SageMaker Studio sessions and instances.

With the latest update to SageMaker Studio, there is a shift from Amazon EFS-based storage to an Amazon EBS-based solution. The EBS volumes, provisioned with SageMaker Studio Spaces, are GP3 volumes designed to deliver a consistent baseline performance of 3,000 IOPS, independent of the volume size. This new Amazon EBS storage offers higher performance for I/O-intensive tasks such as model training, data processing, high-performance computing, and data visualization. This transition also gives SageMaker Studio administrators greater insight into and control over storage usage by user profiles within a domain or across SageMaker. You can now set default (DefaultEbsVolumeSizeInGb) and maximum (MaximumEbsVolumeSizeInGb) storage sizes for JupyterLab Spaces within each user profile.

In addition to improved performance, you have the ability to flexibly resize the storage volume attached to your Space’s ML compute instance by editing your Space setting either using the UI or API action from your SageMaker Studio interface, without requiring any administration action. However, note that you can only edit EBS volume sizes in one direction—after you increase the Space’s EBS volume size, you will not be able to lower it back down.

SageMaker Studio now offers elevated control of provisioned storage for administrators:

SageMaker Studio administrators can manage the EBS volume sizes for user profiles. These JupyterLab EBS volumes can vary from a minimum of 5 GB to a maximum of 16 TB. The following code snippet shows how to create or update a user profile with default and maximum space settings:

aws --region $REGION sagemaker create-user-profile 
--domain-id $DOMAIN_ID 
--user-profile-name $USER_PROFILE_NAME 
--user-settings '{
    "SpaceStorageSettings": {
        "DefaultEbsStorageSettings":{
            "DefaultEbsVolumeSizeInGb":5,
            "MaximumEbsVolumeSizeInGb":100
        }
    }
}'


# alternatively to update an existing user profile
aws --region $REGION sagemaker update-user-profile 
--domain-id $DOMAIN_ID 
--user-profile-name $USER_PROFILE_NAME 
--user-settings '{
    "SpaceStorageSettings": {
        "DefaultEbsStorageSettings":{
            "DefaultEbsVolumeSizeInGb":25,
            "MaximumEbsVolumeSizeInGb":100 
        }
    }
}'

SageMaker Studio now offers an enhanced auto-tagging feature for Amazon EBS resources, automatically labeling volumes created by users with domain, user, and Space information. This advancement simplifies cost allocation analysis for storage resources, aiding administrators in managing and attributing costs more effectively. It’s also important to note that these EBS volumes are hosted within the service account, so you won’t have direct visibility. Nonetheless, storage usage and associated costs are directly linked to the domain ARN, user profile ARN, and Space ARN, facilitating straightforward cost allocation.
Administrators can also control encryption of a Space’s EBS volumes, at rest, using customer managed keys (CMK).

Shared tenancy with bring-your-own EFS file system

ML workflows are typically collaborative, requiring efficient sharing of data and code among team members. The new SageMaker Studio enhances this collaborative aspect by enabling you to share data, code, and other artifacts via a shared bring-your-own EFS file system. This EFS drive can be set up independently of SageMaker or could be an existing Amazon EFS resource. After it’s provisioned, it can be seamlessly mounted onto SageMaker Studio user profiles. This feature is not restricted to user profiles within a single domain—it can extend across domains, as long as they are within the same Region.

The following example code shows you how to create a domain and attach an existing EFS volume to it using its associated fs-id. EFS volumes can be attached to a domain at the root or prefix level, as the following commands demonstrate:

# create a domain with and attach an existing EFS volume at root level
aws sagemaker create-domain --domain-name "myDomain" 
 --vpc-id {VPC_ID} --subnet-ids {SUNBET_IDS} --auth-mode IAM 
 --default-user-settings 
 "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678"}}]"
 
# create a domain with and attach an existing EFS volume at file system prefix leve
aws sagemaker create-domain --domain-name "myDomain" 
 --vpc-id {VPC_ID} --subnet-ids {SUNBET_IDS} --auth-mode IAM 
 --default-user-settings 
 "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678", FileSystemPath="/my/custom/path"}}]"

# update an existing domain with your own EFS
aws sagemaker update-domain --region us-west-2 --domain-id d-xxxxx 
    --default-user-settings 
    "CustomFileSystemConfigs=[{EFSFileSystemConfig={FileSystemId="fs-12345678"}}]"

When an EFS mount is made available in a domain and its related user profiles, you can choose to attach it to a new space. This can be done using either the SageMaker Studio UI or an API action, as shown in the following example. It’s important to note that when a space is created with an EFS file system that’s provisioned at the domain level, the space inherits its properties. This means that if the file system is provisioned at a root or prefix level within the domain, these settings will automatically apply to the space created by the domain users.

# attach an a preconfigured EFS to a space
aws sagemaker create-space 
--space-name byofs-space --domain-id "myDomain" 
--ownership-settings "OwnerUserProfileName={USER_PROFILE_NAME}" 
--space-sharing-settings "SharingType=Private" 
--space-settings 
"AppType=JupyterLab,CustomFileSystems=[{EFSFileSystem={FileSystemId="fs-12345678"}}]")

After mounting it to a Space, you can locate all your files located above the admin-provisioned mount point. These files can be found in the directory path /mnt/custom-file-system/efs/fs-12345678.

EFS mounts make is straightforward to share artifacts between a user’s Space or between multiple users or across domains, making it ideal for collaborative workloads. With this feature, you can do the following:

Share data – EFS mounts are ideal for storing large datasets crucial for data science experiments. Dataset owners can load these mounts with training, validation, and test datasets, making them accessible to user profiles within a domain or across multiple domains. SageMaker Studio admins can also integrate existing application EFS mounts while maintaining compliance with organizational security policies. This is done through flexible prefix-level mounting. For example, if production and test data are stored on the same EFS mount (such as fs-12345678:/data/prod and fs-12345678:/data/test), mounting /data/test onto the SageMaker domain’s user profiles grants users access only to the test dataset. This setup allows for analysis or model training while keeping production data secure and inaccessible.
Share Code – EFS mounts facilitate the quick sharing of code artifacts between user profiles. In scenarios where users need to rapidly share code samples or collaborate on a common code base without the complexities of frequent git push/pull commands, shared EFS mounts are highly beneficial. They offer a convenient way to share work-in-progress code artifacts within a team or across different teams in SageMaker Studio.
Share development environments – Shared EFS mounts can also serve as a means to quickly disseminate sandbox environments among users and teams. EFS mounts provide a solid alternative for sharing Python environments like conda or virtualenv across multiple workspaces. This approach circumvents the need for distributing requirements.txt or environment.yml files, which can often lead to the repetitive task of creating or recreating environments across different user profiles.

These features significantly enhance the collaborative capabilities within SageMaker Studio, making it effortless for teams to work together efficiently on complex ML projects. Additionally, Code Editor based on Code-OSS (Visual Studio Code Open Source) shares the same architectural principles as the aforementioned JupyterLab experience This alignment brings several advantages, such as reduced latency, enhanced stability, and improved administrative control, and enables user access to shared workspaces, similar to those offered in JupyterLab Spaces.

Generative AI-powered tools on JupyterLab Spaces

Generative AI, a rapidly evolving field in artificial intelligence, uses algorithms to create new content like text, images, and code from extensive existing data. This technology has revolutionized coding by automating routine tasks, generating complex code structures, and offering intelligent suggestions, thereby streamlining development and fostering creativity and problem-solving in programming. As an indispensable tool for developers, generative AI enhances productivity and drives innovation in the tech industry. SageMaker Studio enhances this developer experience with pre-installed tools like Amazon CodeWhisperer and Jupyter AI, using generative AI to accelerate the development lifecycle.

Amazon CodeWhisperer

Amazon CodeWhisperer is a programming assistant that enhances developer productivity through real-time code recommendations and solutions. As an AWS managed AI service, it’s seamlessly integrated into the SageMaker Studio JupyterLab IDE. This integration makes Amazon CodeWhisperer a fluid and valuable addition to a developer’s workflow.

Amazon CodeWhisperer excels in increasing developer efficiency by automating common coding tasks, suggesting more effective coding patterns, and decreasing debugging time. It serves as an essential tool for both beginner and seasoned coders, providing insights into best practices, accelerating the development process, and improving the overall quality of code. To start using Amazon CodeWhisperer, make sure that the Resume Auto-Suggestions feature is activated. You can manually invoke code suggestions using keyboard shortcuts.

Alternatively, write a comment describing your intended code function and begin coding; Amazon CodeWhisperer will start providing suggestions.

Note that although Amazon CodeWhisperer is pre-installed, you must have the codewhisperer:GenerateRecommendations permission as part of the execution role to receive code recommendations. For additional details, refer to Using CodeWhisperer with Amazon SageMaker Studio. When you use Amazon CodeWhisperer, AWS may, for service improvement purposes, store data about your usage and content. To opt out of the Amazon CodeWhisperer data sharing policy, you can navigate to the Setting option from the top menu then navigate to Settings Editor and disable Share usage data with Amazon CodeWhisperer from the Amazon CodeWhisperer settings menu.

Jupyter AI

Jupyter AI is an open source tool that brings generative AI to Jupyter notebooks, offering a robust and user-friendly platform for exploring generative AI models. It enhances productivity in JupyterLab and Jupyter Notebooks by providing features like the %%ai magic for creating a generative AI playground inside notebooks, a native chat UI in JupyterLab for interacting with AI as a conversational assistant, and support for a wide array of large language model (LLM) providers like AI21, Anthropic, Cohere, and Hugging Face or managed services like Amazon Bedrock and SageMaker endpoints. This integration offers more efficient and innovative methods for data analysis, ML, and coding tasks. For example, you can interact with a domain-aware LLM using the Jupyternaut chat interface for help with processes and workflows or generate example code through CodeLlama, hosted on SageMaker endpoints. This makes it a valuable tool for developers and data scientists.

Jupyter AI provides an extensive selection of language models ready for use right out of the box. Additionally, custom models are also supported via SageMaker endpoints, offering flexibility and a broad range of options for users. It also offers support for embedding models, enabling you to perform inline comparisons and tests and even build or test ad hoc Retrieval Augmented Generation (RAG) apps.

Jupyter AI can act as your chat assistant, helping you with code samples, providing you with answers to questions, and much more.

You can use Jupyter AI’s %%ai magic to generate sample code inside your notebook, as shown in the following screenshot.

JupyterLab 4.0

The JupyterLab team has released version 4.0, featuring significant improvements in performance, functionality, and user experience. Detailed information about this release is available in the official JupyterLab Documentation.

This version, now standard in SageMaker Studio JupyterLab, introduces optimized performance for handling large notebooks and faster operations, thanks to improvements like CSS rule optimization and the adoption of CodeMirror 6 and MathJax 3. Key enhancements include an upgraded text editor with better accessibility and customization, a new extension manager for easy installation of Python extensions, and improved document search capabilities with advanced features. Additionally, version 4.0 brings UI improvements, accessibility enhancements, and updates to development tools, and certain features have been backported to JupyterLab 3.6.

Conclusion

The advancements in SageMaker Studio, particularly with the new JupyterLab experience, mark a significant leap forward in ML development. The updated SageMaker Studio UI, with its integration of JupyterLab, Code Editor, and RStudio, offers an unparalleled, streamlined environment for ML developers. The introduction of JupyterLab Spaces provides flexibility and ease in customizing compute and storage resources, enhancing the overall efficiency of ML workflows. The shift from a remote kernel architecture to a localized model in JupyterLab greatly increases stability while decreasing startup latency. This results in a quicker, more stable, and responsive coding experience. Moreover, the integration of generative AI tools like Amazon CodeWhisperer and Jupyter AI in JupyterLab further empowers developers, enabling you to use AI for coding assistance and innovative problem-solving. The enhanced control over provisioned storage and the ability to share code and data effortlessly through self-managed EFS mounts greatly facilitate collaborative projects. Lastly, the release of JupyterLab 4.0 within SageMaker Studio underscores these improvements, offering optimized performance, better accessibility, and a more user-friendly interface, thereby solidifying JupyterLab’s role as a cornerstone of efficient and effective ML development in the modern tech landscape.

Give SageMaker Studio JupyterLab Spaces a try using our quick onboard feature, which allows you to spin up a new domain for single users within minutes. Share your thoughts in the comments section!

Appendix: SageMaker Studio Classic’s kernel gateway architecture

A SageMaker Classic domain is a logical aggregation of an EFS volume, a list of users authorized to access the domain, and configurations related to security, application, networking, and more. In the SageMaker Studio Classic architecture of SageMaker, each user within the SageMaker domain has a distinct user profile. This profile encompasses specific details like the user’s role and their Posix user ID in the EFS volume, among other unique data. Users access their individual user profile through a dedicated Jupyter Server app, connected via HTTPS/WSS in their web browser. SageMaker Studio Classic uses a remote kernel architecture using a combination of Jupyter Server and Kernel Gateway app types, enabling notebook servers to interact with kernels on remote hosts. This means that the Jupyter kernels operate not on the notebook server’s host, but within Docker containers on separate hosts. In essence, your notebook is stored in the EFS home directory, and runs code remotely on a different Amazon Elastic Compute Cloud (Amazon EC2) instance, which houses a pre-built Docker container equipped with ML libraries such as PyTorch, TensorFlow, Scikit-Learn, and more.

The remote kernel architecture in SageMaker Studio offers notable benefits in terms of scalability and flexibility. However, it has its limitations, including a maximum of four apps per instance type and potential bottlenecks due to numerous HTTPS/WSS connections to a common EC2 instance type. These limitations could negatively affect the user experience.

The following architecture diagram depicts the SageMaker Studio Classic architecture. It illustrates the user’s process of connecting to a Kernel Gateway app via a Jupyter Server app, using their preferred web browser.

About the authors

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Kunal Jha is a Senior Product Manager at AWS. He is focused on building Amazon SageMaker Studio as the best-in-class choice for end-to-end ML development. In his spare time, Kunal enjoys skiing and exploring the Pacific Northwest. You can find him on LinkedIn.

Majisha Namath Parambath is a Senior Software Engineer at Amazon SageMaker. She has been at Amazon for over 8 years and is currently working on improving the Amazon SageMaker Studio end-to-end experience.

Bharat Nandamuri is a Senior Software Engineer working on Amazon SageMaker Studio. He is passionate about building high scale backend services with focus on Engineering for ML systems. Outside of work, he enjoys playing chess, hiking and watching movies.

Derek Lause is a Software Engineer at AWS. He is committed to deliver value to customers through Amazon SageMaker Studio and Notebook Instances. In his spare time, Derek enjoys spending time with family and friends and hiking. You can find Derek on LinkedIn.

How AWS Prototyping enabled ICL-Group to build computer vision models on Amazon SageMaker

This is a customer post jointly authored by ICL and AWS employees.

ICL is a multi-national manufacturing and mining corporation based in Israel that manufactures products based on unique minerals and fulfills humanity’s essential needs, primarily in three markets: agriculture, food, and engineered materials. Their mining sites use industrial equipment that has to be monitored because machinery failures can result in loss of revenue or even environmental damages. Due to the extremely harsh conditions (low and high temperatures, vibrations, salt water, dust), attaching sensors to these mining machines for remote monitoring is difficult. Therefore, most machines are manually or visually monitored continuously by on-site workers. These workers frequently check camera pictures to monitor the state of a machine. Although this approach has worked in the past, it doesn’t scale and incurs relatively high costs.

To overcome this business challenge, ICL decided to develop in-house capabilities to use machine learning (ML) for computer vision (CV) to automatically monitor their mining machines. As a traditional mining company, the availability of internal resources with data science, CV, or ML skills was limited.

In this post, we discuss the following:

How ICL developed the in-house capabilities to build and maintain CV solutions that allow automatic monitoring of mining equipment to improve efficiency and reduce waste
A deep dive into a solution for mining screeners that was developed with the support of the AWS Prototyping program

Using the approach described in this post, ICL was able to develop a framework on AWS using Amazon SageMaker to build other use cases based on extracted vision from about 30 cameras, with the potential of scaling to thousands of such cameras on their production sites.

Building in-house capabilities through AWS Prototyping

Building and maintaining ML solutions for business-critical workloads requires sufficiently skilled staff. Outsourcing such activities is often not possible because internal know-how about business process needs to be combined with technical solution building. Therefore, ICL approached AWS for support in their journey to build a CV solution to monitor their mining equipment and acquire the necessary skills.

AWS Prototyping is an investment program where AWS embeds specialists into customer development teams to build mission-critical use cases. During such an engagement, the customer development team is enabled on the underlying AWS technologies while building the use case over the course of 3–6 weeks and getting hands-on help. Besides a corresponding use case, all the customer needs are 3–7 developers that can spend more than 80% of their working time building the aforementioned use case. During this time, the AWS specialists are fully assigned to the customer’s team and collaborate with them remotely or on-site.

ICL’s computer vision use case

For the prototyping engagement, ICL selected the use case for monitoring their mining screeners. A screener is a large industrial mining machine where minerals dissolved in water are processed. The water flows in several lanes from the top of the machine to the bottom. The influx is monitored for each of the lanes individually. When the influx runs out of the lane, it’s called overflow, which indicates that the machine is overloaded. Overflowing influx are minerals that are not processed by the screener and are lost. This needs to be avoided by regulating the influx. Without an ML solution, the overflow needs to be monitored by humans and it potentially takes time until the overflow is observed and handled.

The following images show the input and outputs of the CV models. The raw camera picture (left) is processed using a semantic segmentation model (middle) to detect the different lanes. Then the model (right) estimates the coverage (white) and overflow (red).

Although the prototyping engagement focused on a single type of machine, the general approach to use cameras and automatically process their images while using CV is applicable to a wider range of mining equipment. This allows ICL to extrapolate the know-how gained during the prototyping engagement to other locations, camera types, and machines, and also maintain the ML models without requiring support from any third party.

During the engagement, the AWS specialists and the ICL development team would meet every day and codevelop the solution step by step. ICL data scientists would either work independently on their assigned tasks or receive hands-on, pair-programming support from AWS ML specialists. This approach ensures that ICL data scientists not only gained experience to systematically develop ML models using SageMaker, but also to embed these models into applications as well as automate the whole lifecycle of such models, including automated retraining or model monitoring. After 4 weeks of this collaboration, ICL was able to move this model into production without requiring further support within 8 weeks, and has built models for other use cases since then. The technical approach of this engagement is described in the next section.

Monitoring mining screeners using CV models with SageMaker

SageMaker is a fully managed platform that addresses the complete lifecycle of an ML model: it provides services and features that support teams working on ML models from labeling their data in Amazon SageMaker Ground Truth to training and optimizing the model, as well as hosting ML models for production use. Prior to the engagement, ICL had installed the cameras and obtained pictures as shown in the previous images (left-most image) and stored them in an Amazon Simple Storage Service (Amazon S3) bucket. Before models can be trained, it’s necessary to generate training data. The joint ICL-AWS team addressed this in three steps:

Label the data using a semantic segmentation labeling job in SageMaker Ground Truth, as shown in the following image.
Preprocess the labeled images using image augmentation techniques to increase the number of data samples.
Split the labeled images into training, test, and validation sets, so that the performance and accuracy of the model can be measured adequately during the training process.

To achieve production scale for ML workloads, automating these steps is crucial to maintain the quality of the training input. Therefore, whenever new images are labeled using SageMaker Ground Truth, the preprocessing and splitting steps are run automatically and the resulting datasets are stored in Amazon S3, as shown model training workflow in the following diagram. Similarly, the model deployment workflow uses assets from SageMaker to update endpoints automatically whenever an updated model is available.

ICL is using several approaches to implement ML models into production. Some involve their current AI platform called KNIME, which allows them to quickly deploy models developed in the development environment into production by industrializing them into products. Several combinations of using KNIME and AWS services were analyzed; the preceding architecture was the most suitable to ICL’ s environment.

The SageMaker semantic segmentation built-in algorithm is used to train models for screener grid area segmentation. By choosing this built-in algorithm over a self-built container, ICL doesn’t have to deal with the undifferentiated heavy lifting of maintaining a Convolutional Neural Network (CNN) while being able to use such a CNN for their use case. After experimenting with different configurations and parameters, ICL used a Fully Convolutional Network (FCN) algorithm with a pyramid scene parsing network (PSPNet) to train the model. This allowed ICL to finalize the model building within 1 week of the prototyping engagement.

After a model has been trained, it has to be deployed to be usable for the screener monitoring. In line with the model training, this process is fully automated and orchestrated using AWS Step Functions and AWS Lambda. After the model is successfully deployed on the SageMaker endpoint, incoming pictures from the cameras are resized to fit the model’s input format and then fed into the endpoint for predictions using Lambda functions. The result of the semantic segmentation prediction as well as the overflow detection are then stored in Amazon DynamoDB and Amazon S3 for downstream analysis. If overflow is detected, Amazon Simple Notification Service (Amazon SNS) or Lambda functions can be used to automatically mitigate the overflow and control the corresponding lanes on the affected screener. The following diagram illustrates this architecture.

Conclusion

This post described how ICL, an Israeli mining company, developed their own computer vision approach for automated monitoring of mining equipment using cameras. We first showed how to address such a challenge from an organizational point of view that is focused on enablement, then we provided a detailed look into how the model was built using AWS. Although the challenge of monitoring may be unique to ICL, the general approach to build a prototype alongside AWS specialists can be applied to similar challenges, particularly for organizations that don’t have the necessary AWS knowledge.

If you want to learn how to build a production-scale prototype of your use case, reach out to your AWS account team to discuss a prototyping engagement.

About the Authors

Markus Bestehorn leads the customer engineering and prototyping teams in Germany, Austria, Switzerland, and Israel for AWS. He has a PhD degree in computer science and is specialized in building complex machine learning and IoT solutions.

David Abekasis leads the data science team at ICL Group with a passion to educate others on data analysis and machine learning while helping solve business challenges. He has an MSc in Data Science and an MBA. He was fortunate to research spatial and time series data in the precision agriculture domain.

Ion Kleopas is a Sr. Machine Learning Prototyping Architect with an MSc in Data Science and Big Data. He helps AWS customers build innovative AI/ML solutions by enabling their technical teams on AWS technologies through the co-development of prototypes for challenging machine learning use cases, paving their path to production.

Miron Perel is a Principal Machine Learning Business Development Manager with Amazon Web Services. Miron advises Generative AI companies building their next generation models.

Automate PDF pre-labeling for Amazon Comprehend

Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business.

To train a custom model, you first prepare training data by manually annotating entities in documents. This can be done with the Comprehend Semi-Structured Documents Annotation Tool, which creates an Amazon SageMaker Ground Truth job with a custom template, allowing annotators to draw bounding boxes around the entities directly on the PDF documents. However, for companies with existing tabular entity data in ERP systems like SAP, manual annotation can be repetitive and time-consuming.

To reduce the effort of preparing training data, we built a pre-labeling tool using AWS Step Functions that automatically pre-annotates documents by using existing tabular entity data. This significantly decreases the manual work needed to train accurate custom entity recognition models in Amazon Comprehend.

In this post, we walk you through the steps of setting up the pre-labeling tool and show examples of how it automatically annotates documents from a public dataset of sample bank statements in PDF format. The full code is available on the GitHub repo.

Solution overview

In this section, we discuss the inputs and outputs of the pre-labeling tool and provide an overview of the solution architecture.

Inputs and outputs

As input, the pre-labeling tool takes PDF documents that contain text to be annotated. For the demo, we use simulated bank statements like the following example.

The tool also takes a manifest file that maps PDF documents with the entities that we want to extract from these documents. Entities consists of two things: the expected_text to extract from the document (for example, AnyCompany Bank) and the corresponding entity_type (for example, bank_name). Later in this post, we show how to construct this manifest file from a CSV document like the following example.

The pre-labeling tool uses the manifest file to automatically annotate the documents with their corresponding entities. We can then use these annotations directly to train an Amazon Comprehend model.

Alternatively, you can create a SageMaker Ground Truth labeling job for human review and editing, as shown in the following screenshot.

When the review is complete, you can use the annotated data to train an Amazon Comprehend custom entity recognizer model.

Architecture

The pre-labeling tool consists of multiple AWS Lambda functions orchestrated by a Step Functions state machine. It has two versions that use different techniques to generate pre-annotations.

The first technique is fuzzy matching. This requires a pre-manifest file with expected entities. The tool uses the fuzzy matching algorithm to generate pre-annotations by comparing text similarity.

Fuzzy matching looks for strings in the document that are similar (but not necessarily identical) to the expected entities listed in the pre-manifest file. It first calculates text similarity scores between the expected text and words in the document, then it matches all pairs above a threshold. Therefore, even if there are no exact matches, fuzzy matching can find variants like abbreviations and misspellings. This allows the tool to pre-label documents without requiring the entities to appear verbatim. For example, if 'AnyCompany Bank' is listed as an expected entity, Fuzzy Matching will annotate occurrences of 'Any Companys Bank'. This provides more flexibility than strict string matching and enables the pre-labeling tool to automatically label more entities.

The following diagram illustrates the architecture of this Step Functions state machine.

The second technique requires a pre-trained Amazon Comprehend entity recognizer model. The tool generates pre-annotations using the Amazon Comprehend model, following the workflow shown in the following diagram.

The following diagram illustrates the full architecture.

In the following sections, we walk through the steps to implement the solution.

Deploy the pre-labeling tool

Clone the repository to your local machine:

git clone https://github.com/aws-samples/amazon-comprehend-automated-pdf-prelabeling-tool.git

This repository has been built on top of the Comprehend Semi-Structured Documents Annotation Tool and extends its functionalities by enabling you to start a SageMaker Ground Truth labeling job with pre-annotations already displayed on the SageMaker Ground Truth UI.

The pre-labeling tool includes both the Comprehend Semi-Structured Documents Annotation Tool resources as well as some resources specific to the pre-labeling tool. You can deploy the solution with AWS Serverless Application Model (AWS SAM), an open source framework that you can use to define serverless application infrastructure code.

If you have previously deployed the Comprehend Semi-Structured Documents Annotation Tool, refer to the FAQ section in Pre_labeling_tool/README.md for instructions on how to deploy only the resources specific to the pre-labeling tool.

If you haven’t deployed the tool before and are starting fresh, do the following to deploy the whole solution.

Change the current directory to the annotation tool folder:

cd amazon-comprehend-semi-structured-documents-annotation-tools

Build and deploy the solution:

make ready-and-deploy-guided

Create the pre-manifest file

Before you can use the pre-labeling tool, you need to prepare your data. The main inputs are PDF documents and a pre-manifest file. The pre-manifest file contains the location of each PDF document under 'pdf' and the location of a JSON file with expected entities to label under 'expected_entities'.

The notebook generate_premanifest_file.ipynb shows how to create this file. In the demo, the pre-manifest file shows the following code:

[
  {
    'pdf': 's3://<bucket>/data_aws_idp_workshop_data/bank_stmt_0.pdf',
    'expected_entities': 's3://<bucket>/prelabeling-inputs/expected-entities/example-demo/fuzzymatching_version/file_bank_stmt_0.json'
  },
  ...
]

Each JSON file listed in the pre-manifest file (under expected_entities) contains a list of dictionaries, one for each expected entity. The dictionaries have the following keys:

‘expected_texts’ – A list of possible text strings matching the entity.
‘entity_type’ – The corresponding entity type.
‘ignore_list’ (optional) – The list of words that should be ignored in the match. These parameters should be used to prevent fuzzy matching from matching specific combinations of words that you know are wrong. This can be useful if you want to ignore some numbers or email addresses when looking at names.

For example, the expected_entities of the PDF shown previously looks like the following:

[
  {
    'expected_texts': ['AnyCompany Bank'],
    'entity_type': 'bank_name',
    'ignore_list': []
  },
  {
    'expected_texts': ['JANE DOE'],
    'entity_type': 'customer_name',
    'ignore_list': ['JANE.DOE@example_mail.com']
  },
  {
    'expected_texts': ['003884257406'],
    'entity_type': 'checking_number',
    'ignore_list': []
  },
 ...
]

Run the pre-labeling tool

With the pre-manifest file that you created in the previous step, start running the pre-labeling tool. For more details, refer to the notebook start_step_functions.ipynb.

To start the pre-labeling tool, provide an event with the following keys:

Premanifest – Maps each PDF document to its expected_entities file. This should contain the Amazon Simple Storage Service (Amazon S3) bucket (under bucket) and the key (under key) of the file.
Prefix – Used to create the execution_id, which names the S3 folder for output storage and the SageMaker Ground Truth labeling job name.
entity_types – Displayed in the UI for annotators to label. These should include all entity types in the expected entities files.
work_team_name (optional) – Used for creating the SageMaker Ground Truth labeling job. It corresponds to the private workforce to use. If it’s not provided, only a manifest file will be created instead of a SageMaker Ground Truth labeling job. You can use the manifest file to create a SageMaker Ground Truth labeling job later on. Note that as of this writing, you can’t provide an external workforce when creating the labeling job from the notebook. However, you can clone the created job and assign it to an external workforce on the SageMaker Ground Truth console.
comprehend_parameters (optional) – Parameters to directly train an Amazon Comprehend custom entity recognizer model. If omitted, this step will be skipped.

To start the state machine, run the following Python code:

import boto3
stepfunctions_client = boto3.client('stepfunctions')

response = stepfunctions_client.start_execution(
stateMachineArn=fuzzymatching_prelabeling_step_functions_arn,
input=json.dumps(<event-dict>)
)

This will start a run of the state machine. You can monitor the progress of the state machine on the Step Functions console. The following diagram illustrates the state machine workflow.

When the state machine is complete, do the following:

Inspect the following outputs saved in the prelabeling/ folder of the comprehend-semi-structured-docs S3 bucket:
- Individual annotation files for each page of the documents (one per page per document) in temp_individual_manifests/
- A manifest for the SageMaker Ground Truth labeling job in consolidated_manifest/consolidated_manifest.manifest
- A manifest that can be used to train a custom Amazon Comprehend model in consolidated_manifest/consolidated_manifest_comprehend.manifest
On the SageMaker console, open the SageMaker Ground Truth labeling job that was created to review the annotations
Inspect and test the custom Amazon Comprehend model that was trained

As mentioned previously, the tool can only create SageMaker Ground Truth labeling jobs for private workforces. To outsource the human labeling effort, you can clone the labeling job on the SageMaker Ground Truth console and attach any workforce to the new job.

Clean up

To avoid incurring additional charges, delete the resources that you created and delete the stack that you deployed with the following command:

make delete

Conclusion

The pre-labeling tool provides a powerful way for companies to use existing tabular data to accelerate the process of training custom entity recognition models in Amazon Comprehend. By automatically pre-annotating PDF documents, it significantly reduces the manual effort required in the labeling process.

The tool has two versions: fuzzy matching and Amazon Comprehend-based, giving flexibility on how to generate the initial annotations. After documents are pre-labeled, you can quickly review them in a SageMaker Ground Truth labeling job or even skip the review and directly train an Amazon Comprehend custom model.

The pre-labeling tool enables you to quickly unlock the value of your historical entity data and use it in creating custom models tailored to your specific domain. By speeding up what is typically the most labor-intensive part of the process, it makes custom entity recognition with Amazon Comprehend more accessible than ever.

For more information about how to label PDF documents using a SageMaker Ground Truth labeling job, see Custom document annotation for extracting named entities in documents using Amazon Comprehend and Use Amazon SageMaker Ground Truth to Label Data.

About the authors

Oskar Schnaack is an Applied Scientist at the Generative AI Innovation Center. He is passionate about diving into the science behind machine learning to make it accessible for customers. Outside of work, Oskar enjoys cycling and keeping up with trends in information theory.

Romain Besombes is a Deep Learning Architect at the Generative AI Innovation Center. He is passionate about building innovative architectures to address customers’ business problems with machine learning.

Improve your Stable Diffusion prompts with Retrieval Augmented Generation

Text-to-image generation is a rapidly growing field of artificial intelligence with applications in a variety of areas, such as media and entertainment, gaming, ecommerce product visualization, advertising and marketing, architectural design and visualization, artistic creations, and medical imaging.

Stable Diffusion is a text-to-image model that empowers you to create high-quality images within seconds. In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart, a machine learning (ML) hub offering models, algorithms, and solutions. The evolution continued in April 2023 with the introduction of Amazon Bedrock, a fully managed service offering access to cutting-edge foundation models, including Stable Diffusion, through a convenient API.

As an ever-increasing number of customers embark on their text-to-image endeavors, a common hurdle arises—how to craft prompts that wield the power to yield high-quality, purpose-driven images. This challenge often demands considerable time and resources as users embark on an iterative journey of experimentation to discover the prompts that align with their visions.

Retrieval Augmented Generation (RAG) is a process in which a language model retrieves contextual documents from an external data source and uses this information to generate more accurate and informative text. This technique is particularly useful for knowledge-intensive natural language processing (NLP) tasks. We now extend its transformative touch to the world of text-to-image generation. In this post, we demonstrate how to harness the power of RAG to enhance the prompts sent to your Stable Diffusion models. You can create your own AI assistant for prompt generation in minutes with large language models (LLMs) on Amazon Bedrock, as well as on SageMaker JumpStart.

Approaches to crafting text-to-image prompts

Creating a prompt for a text-to-image model may seem straightforward at first glance, but it’s a deceptively complex task. It’s more than just typing a few words and expecting the model to conjure an image that aligns with your mental image. Effective prompts should provide clear instructions while leaving room for creativity. They must balance specificity and ambiguity, and they should be tailored to the particular model being used. To address the challenge of prompt engineering, the industry has explored various approaches:

Prompt libraries – Some companies curate libraries of pre-written prompts that you can access and customize. These libraries contain a wide range of prompts tailored to various use cases, allowing you to choose or adapt prompts that align with your specific needs.
Prompt templates and guidelines – Many companies and organizations provide users with a set of predefined prompt templates and guidelines. These templates offer structured formats for writing prompts, making it straightforward to craft effective instructions.
Community and user contributions – Crowdsourced platforms and user communities often play a significant role in improving prompts. Users can share their fine-tuned models, successful prompts, tips, and best practices with the community, helping others learn and refine their prompt-writing skills.
Model fine-tuning – Companies may fine-tune their text-to-image models to better understand and respond to specific types of prompts. Fine-tuning can improve model performance for particular domains or use cases.

These industry approaches collectively aim to make the process of crafting effective text-to-image prompts more accessible, user-friendly, and efficient, ultimately enhancing the usability and versatility of text-to-image generation models for a wide range of applications.

Using RAG for prompt design

In this section, we delve into how RAG techniques can serve as a game changer in prompt engineering, working in harmony with these existing approaches. By seamlessly integrating RAG into the process, we can streamline and enhance the efficiency of prompt design.

Semantic search in a prompt database

Imagine a company that has accumulated a vast repository of prompts in its prompt library or has created a large number of prompt templates, each designed for specific use cases and objectives. Traditionally, users seeking inspiration for their text-to-image prompts would manually browse through these libraries, often sifting through extensive lists of options. This process can be time-consuming and inefficient. By embedding prompts from the prompt library using text embedding models, companies can build a semantic search engine. Here’s how it works:

Embedding prompts – The company uses text embeddings to convert each prompt in its library into a numerical representation. These embeddings capture the semantic meaning and context of the prompts.
User query – When users provide their own prompts or describe their desired image, the system can analyze and embed their input as well.
Semantic search – Using the embeddings, the system performs a semantic search. It retrieves the most relevant prompts from the library based on the user’s query, considering both the user’s input and historical data in the prompt library.

By implementing semantic search in their prompt libraries, companies empower their employees to access a vast reservoir of prompts effortlessly. This approach not only accelerates prompt creation but also encourages creativity and consistency in text-to-image generation.y

Prompt generation from semantic search

Although semantic search streamlines the process of finding relevant prompts, RAG takes it a step further by using these search results to generate optimized prompts. Here’s how it works:

Semantic search results – After retrieving the most relevant prompts from the library, the system presents these prompts to the user, alongside the user’s original input.
Text generation model – The user can select a prompt from the search results or provide further context on their preferences. The system feeds both the selected prompt and the user’s input into an LLM.
Optimized prompt – The LLM, with its understanding of language nuances, crafts an optimized prompt that combines elements from the selected prompt and the user’s input. This new prompt is tailored to the user’s requirements and is designed to yield the desired image output.

The combination of semantic search and prompt generation not only simplifies the process of finding prompts but also ensures that the prompts generated are highly relevant and effective. It empowers you to fine-tune and customize your prompts, ultimately leading to improved text-to-image generation results. The following are examples of images generated from Stable Diffusion XL using the prompts from semantic search and prompt generation.

Original Prompt

Prompts from Semantic Search

Optimized Prompt by LLM

a cartoon of a little dog

cute cartoon of a dog having a sandwich at the dinner table
a cartoon illustration of a punk dog, anime style, white background
a cartoon of a boy and his dog walking down a forest lane

A cartoon scene of a boy happily walking hand in hand down a forest lane with his cute pet dog, in animation style.

RAG-based prompt design applications across diverse industries

Before we explore the application of our suggested RAG architecture, let’s start with an industry in which an image generation model is most applicable. In AdTech, speed and creativity are critical. RAG-based prompt generation can add instant value by generating prompt suggestions to create many images quickly for an advertisement campaign. Human decision-makers can go through the auto-generated images to select the candidate image for the campaign. This feature can be a standalone application or embedded into popular software tools and platforms currently available.

Another industry where the Stable Diffusion model can enhance productivity is media and entertainment. The RAG architecture can assist in use cases of avatar creation, for example. Starting from a simple prompt, RAG can add much more color and characteristics to the avatar ideas. It can generate many candidate prompts and provide more creative ideas. From these generated images, you can find the perfect fit for the given application. It increases the productivity by automatically generating many prompt suggestions. The variation it can come up with is the immediate benefit of the solution.

Solution overview

Empowering customers to construct their own RAG-based AI assistant for prompt design on AWS is a testament to the versatility of modern technology. AWS provides a plethora of options and services to facilitate this endeavor. The following reference architecture diagram illustrates a RAG application for prompt design on AWS.

When it comes to selecting the right LLMs for your AI assistant, AWS offers a spectrum of choices to cater to your specific requirements.

Firstly, you can opt for LLMs available through SageMaker JumpStart, utilizing dedicated instances. These instances support a variety of models, including Falcon, Llama 2, Bloom Z, and Flan-T5, or you can explore proprietary models such as Cohere’s Command and Multilingual Embedding, or Jurassic-2 from AI21 Labs.

If you prefer a more simplified approach, AWS offers LLMs on Amazon Bedrock, featuring models like Amazon Titan and Anthropic Claude. These models are easily accessible through straightforward API calls, allowing you to harness their power effortlessly. The flexibility and diversity of options ensure that you have the freedom to choose the LLM that best aligns with your prompt design goals, whether you’re seeking an innovation with open containers or the robust capabilities of proprietary models.

When it comes to building the essential vector database, AWS provides a multitude of options through their native services. You can opt for Amazon OpenSearch Service, Amazon Aurora, or Amazon Relational Database Service (Amazon RDS) for PostgreSQL, each offering robust features to suit your specific needs. Alternatively, you can explore products from AWS partners like Pinecone, Weaviate, Elastic, Milvus, or Chroma, which provide specialized solutions for efficient vector storage and retrieval.

To help you get started to construct a RAG-based AI assistant for prompt design, we’ve put together a comprehensive demonstration in our GitHub repository. This demonstration uses the following resources:

Image generation: Stable Diffusion XL on Amazon Bedrock
Text embedding: Amazon Titan on Amazon Bedrock
Text generation: Claude 2 on Amazon Bedrock
Vector database: FAISS, an open source library for efficient similarity search
Prompt library: Prompt examples from DiffusionDB, the first large-scale prompt gallery dataset for text-to-image generative models

Additionally, we’ve incorporated LangChain for LLM implementation and Streamit for the web application component, providing a seamless and user-friendly experience.

Prerequisites

You need to have the following to run this demo application:

An AWS account
Basic understanding of how to navigate Amazon SageMaker Studio
Basic understanding of how to download a repo from GitHub
Basic knowledge of running a command on a terminal

Run the demo application

You can download all the necessary code with instructions from the GitHub repo. After the application is deployed, you will see a page like the following screenshot.

With this demonstration, we aim to make the implementation process accessible and comprehensible, providing you with a hands-on experience to kickstart your journey into the world of RAG and prompt design on AWS.

Clean up

After you try out the app, clean up your resources by stopping the application.

Conclusion

RAG has emerged as a game-changing paradigm in the world of prompt design, revitalizing Stable Diffusion’s text-to-image capabilities. By harmonizing RAG techniques with existing approaches and using the robust resources of AWS, we’ve uncovered a pathway to streamlined creativity and accelerated learning.

For additional resources, visit the following:

About the authors

James Yi is a Senior AI/ML Partner Solutions Architect in the Emerging Technologies team at Amazon Web Services. He is passionate about working with enterprise customers and partners to design, deploy and scale AI/ML applications to derive their business values. Outside of work, he enjoys playing soccer, traveling and spending time with his family.

Rumi Olsen is a Solutions Architect in the AWS Partner Program. She specializes in serverless and machine learning solutions in her current role, and has a background in natural language processing technologies. She spends most of her spare time with her daughter exploring the nature of Pacific Northwest.

ML compilers

TpuGraphs dataset

Kaggle competition

NeurIPS expo

Acknowledgements

Method overview

Muse: text-to-image generative vision transformer

Parameter-efficient adapter tuning

Iterative training with feedback

Experiments

StyleDrop gallery

Generating images of my object in my style

Quantitative results

Conclusion

Acknowledgements

Solution overview

Prerequisites

Create an Amazon DocumentDB connector in SageMaker Canvas

Analyze data using generative AI

Prepare data for machine learning

Build a model and generate predictions

Clean up

Conclusion

About the authors

Introduction

Guiding Principle: Atomic Rules

Case 1: Horizontal fusion of computation chains started with accesses to embedding tables

Case 2: Fuse horizontal MLP

Compile-time Graph Search

In the End

Acknowledgements

Introducing Spaces in SageMaker Studio

Creating a Space

Reconfiguring a Space

The new SageMaker Studio JupyterLab architecture

Reduced latency and increased stability

Improved control over provisioned storage

Shared tenancy with bring-your-own EFS file system

Generative AI-powered tools on JupyterLab Spaces

Amazon CodeWhisperer

Jupyter AI

JupyterLab 4.0

Conclusion

Appendix: SageMaker Studio Classic’s kernel gateway architecture

About the authors

Building in-house capabilities through AWS Prototyping

ICL’s computer vision use case

Monitoring mining screeners using CV models with SageMaker

Conclusion

About the Authors

Solution overview

Inputs and outputs

Architecture

Deploy the pre-labeling tool

Create the pre-manifest file

Run the pre-labeling tool

Clean up

Conclusion

About the authors

Approaches to crafting text-to-image prompts

Using RAG for prompt design

Semantic search in a prompt database

Prompt generation from semantic search

RAG-based prompt design applications across diverse industries

Solution overview

Prerequisites

Run the demo application

Clean up

Conclusion

About the authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.