Assessing the absolute utility of query results, rather than just their relative utility, improves learning-to-rank models.Read More
Scale training and inference of thousands of ML models with Amazon SageMaker
As machine learning (ML) becomes increasingly prevalent in a wide range of industries, organizations are finding the need to train and serve large numbers of ML models to meet the diverse needs of their customers. For software as a service (SaaS) providers in particular, the ability to train and serve thousands of models efficiently and cost-effectively is crucial for staying competitive in a rapidly evolving market.
Training and serving thousands of models requires a robust and scalable infrastructure, which is where Amazon SageMaker can help. SageMaker is a fully managed platform that enables developers and data scientists to build, train, and deploy ML models quickly, while also offering the cost-saving benefits of using the AWS Cloud infrastructure.
In this post, we explore how you can use SageMaker features, including Amazon SageMaker Processing, SageMaker training jobs, and SageMaker multi-model endpoints (MMEs), to train and serve thousands of models in a cost-effective way. To get started with the described solution, you can refer to the accompanying notebook on GitHub.
Use case: Energy forecasting
For this post, we assume the role of an ISV company that helps their customers become more sustainable by tracking their energy consumption and providing forecasts. Our company has 1,000 customers who want to better understand their energy usage and make informed decisions about how to reduce their environmental impact. To do this, we use a synthetic dataset and train an ML model based on Prophet for each customer to make energy consumption forecasts. With SageMaker, we can efficiently train and serve these 1,000 models, providing our customers with accurate and actionable insights into their energy usage.
There are three features in the generated dataset:
- customer_id – This is an integer identifier for each customer, ranging from 0–999.
- timestamp – This is a date/time value that indicates the time at which the energy consumption was measured. The timestamps are randomly generated between the start and end dates specified in the code.
- consumption – This is a float value that indicates the energy consumption, measured in some arbitrary unit. The consumption values are randomly generated between 0–1,000 with sinusoidal seasonality.
Solution overview
To efficiently train and serve thousands of ML models, we can use the following SageMaker features:
- SageMaker Processing – SageMaker Processing is a fully managed data preparation service that enables you to perform data processing and model evaluation tasks on your input data. You can use SageMaker Processing to transform raw data into the format needed for training and inference, as well as to run batch and online evaluations of your models.
- SageMaker training jobs – You can use SageMaker training jobs to train models on a variety of algorithms and input data types, and specify the compute resources needed for training.
- SageMaker MMEs – Multi-model endpoints enable you to host multiple models on a single endpoint, which makes it easy to serve predictions from multiple models using a single API. SageMaker MMEs can save time and resources by reducing the number of endpoints needed to serve predictions from multiple models. MMEs support hosting of both CPU- and GPU-backed models. Note that in our scenario, we use 1,000 models, but this is not a limitation of the service itself.
The following diagram illustrates the solution architecture.
The workflow includes the following steps:
- We use SageMaker Processing to preprocess data and create a single CSV file per customer and store it in Amazon Simple Storage Service (Amazon S3).
- The SageMaker training job is configured to read the output of the SageMaker Processing job and distribute it in a round-robin fashion to the training instances. Note that this can also be achieved with Amazon SageMaker Pipelines.
- The model artifacts are stored in Amazon S3 by the training job, and are served directly from the SageMaker MME.
Scale training to thousands of models
Scaling the training of thousands of models is possible via the distribution
parameter of the TrainingInput class in the SageMaker Python SDK, which allows you to specify how data is distributed across multiple training instances for a training job. There are three options for the distribution
parameter: FullyReplicated
, ShardedByS3Key
, and ShardedByRecord
. The ShardedByS3Key
option means that the training data is sharded by S3 object key, with each training instance receiving a unique subset of the data, avoiding duplication. After the data is copied by SageMaker to the training containers, we can read the folder and files structure to train a unique model per customer file. The following is an example code snippet:
Every SageMaker training job stores the model saved in the /opt/ml/model
folder of the training container before archiving it in a model.tar.gz
file, and then uploads it to Amazon S3 upon training job completion. Power users can also automate this process with SageMaker Pipelines. When storing multiple models via the same training job, SageMaker creates a single model.tar.gz
file containing all the trained models. This would then mean that, in order to serve the model, we would need to unpack the archive first. To avoid this, we use checkpoints to save the state of individual models. SageMaker provides the functionality to copy checkpoints created during the training job to Amazon S3. Here, the checkpoints need to be saved in a pre-specified location, with the default being /opt/ml/checkpoints
. These checkpoints can be used to resume training at a later moment or as a model to deploy on an endpoint. For a high-level summary of how the SageMaker training platform manages storage paths for training datasets, model artifacts, checkpoints, and outputs between AWS Cloud storage and training jobs in SageMaker, refer to Amazon SageMaker Training Storage Folders for Training Datasets, Checkpoints, Model Artifacts, and Outputs.
The following code uses a fictitious model.save()
function inside the train.py
script containing the training logic:
Scale inference to thousands of models with SageMaker MMEs
SageMaker MMEs allow you to serve multiple models at the same time by creating an endpoint configuration that includes a list of all the models to serve, and then creating an endpoint using that endpoint configuration. There is no need to re-deploy the endpoint every time you add a new model because the endpoint will automatically serve all models stored in the specified S3 paths. This is achieved with Multi Model Server (MMS), an open-source framework for serving ML models that can be installed in containers to provide the front end that fulfills the requirements for the new MME container APIs. In addition, you can use other model servers including TorchServe and Triton. MMS can be installed in your custom container via the SageMaker Inference Toolkit. To learn more about how to configure your Dockerfile to include MMS and use it to serve your models, refer to Build Your Own Container for SageMaker Multi-Model Endpoints.
The following code snippet shows how to create an MME using the SageMaker Python SDK:
When the MME is live, we can invoke it to generate predictions. Invocations can be done in any AWS SDK as well as with the SageMaker Python SDK, as shown in the following code snippet:
When calling a model, the model is initially loaded from Amazon S3 on the instance, which can result in a cold start when calling a new model. Frequently used models are cached in memory and on disk to provide low-latency inference.
Conclusion
SageMaker is a powerful and cost-effective platform for training and serving thousands of ML models. Its features, including SageMaker Processing, training jobs, and MMEs, enable organizations to efficiently train and serve thousands of models at scale, while also benefiting from the cost-saving advantages of using the AWS Cloud infrastructure. To learn more about how to use SageMaker for training and serving thousands of models, refer to Process data, Train a Model with Amazon SageMaker and Host multiple models in one container behind one endpoint.
About the Authors
Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.
Maurits de Groot is a Solutions Architect at Amazon Web Services, based out of Amsterdam. He likes to work on machine learning-related topics and has a predilection for startups. In his spare time, he enjoys skiing and playing squash.
Accelerate business outcomes with 70% performance improvements to data processing, training, and inference with Amazon SageMaker Canvas
Amazon SageMaker Canvas is a visual interface that enables business analysts to generate accurate machine learning (ML) predictions on their own, without requiring any ML experience or having to write a single line of code. SageMaker Canvas’s intuitive user interface lets business analysts browse and access disparate data sources in the cloud or on premises, prepare and explore the data, build and train ML models, and generate accurate predictions within a single workspace.
SageMaker Canvas allows analysts to use different data workloads to achieve the desired business outcomes with high accuracy and performance. The compute, storage, and memory requirements to generate accurate predictions are abstracted from the end-user, enabling them to focus on the business problem to be solved. Earlier this year, we announced performance optimizations based on customer feedback to deliver faster and more accurate model training times with SageMaker Canvas.
In this post, we show how SageMaker Canvas can now process data, train models, and generate predictions with increased speed and efficiency for different dataset sizes.
Prerequisites
If you would like to follow along, complete the following prerequisites:
- Have an AWS account.
- Set up SageMaker Canvas. For instructions, refer to Prerequisites for setting up Amazon SageMaker Canvas.
- Download the following two datasets to your local computer. The first is the NYC Yellow Taxi Trip dataset; the second is the eCommerce behavior data about retails events related to products and users.
Both datasets come under the Attribution 4.0 International (CC BY 4.0) license and are free to share and adapt.
Data processing improvements
With underlying performance optimizations, the time to import data into SageMaker Canvas has improved by over 70%. You can now import datasets of up to 2 GB in approximately 50 seconds and up to 5 GB in approximately 65 seconds.
After importing data, business analysts typically validate the data to ensure there are no issues found within the dataset. Example validation checks can be ensuring columns contain the correct data type, seeing if the value ranges are in line with expectations, making sure there is uniqueness in values where applicable, and others.
Data validation is now faster. In our tests, all validations took 50 seconds for the taxi dataset exceeding 5 GB in size, a 10-times improvement in speed.
Model training improvements
The performance optimizations related to ML model training in SageMaker Canvas now enable you to train models without running into potential out-of-memory requests failures.
The following screenshot shows the results of a successful build run using a large dataset the impact of the total_amount
feature on the target variable.
Inference improvements
Finally, SageMaker Canvas inference improvements achieved a 3.5 times reduction memory consumption in case of larger datasets in our internal testing.
Conclusion
In this post, we saw various improvements with SageMaker Canvas in importing, validation, training, and inference. We saw an increased in its ability to import large datasets by 70%. We saw a 10 times improvement in data validation, and a 3.5 times reduction in memory consumption. These improvements allow you to better work with large datasets and reduce time when building ML models with SageMaker Canvas.
We encourage you to experience the improvements yourself. We welcome your feedback as we continuously work on performance optimizations to improve the user experience.
About the authors
Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications. He enjoys coffee, cooking, staying active, and spending time with his family.
Tim Song is a Software Development Engineer at AWS SageMaker, with 10+ years of experience as software developer, consultant and tech leader he has demonstrated ability to deliver scalable and reliable products and solve complex problems. In his spare time, he enjoys the nature, outdoor running, hiking and etc.
Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.
Maia Haile is a Solutions Architect at Amazon Web Services based in the Washington, D.C. area. In that role, she helps public sector customers achieve their mission objectives with well architected solutions on AWS. She has 5 years of experience spanning from nonprofit healthcare, Media and Entertainment, and retail. Her passion is leveraging intelligence (AI) and machine learning (ML) to help Public Sector customers achieve their business and technical goals.
Build and train computer vision models to detect car positions in images using Amazon SageMaker and Amazon Rekognition
Computer vision (CV) is one of the most common applications of machine learning (ML) and deep learning. Use cases range from self-driving cars, content moderation on social media platforms, cancer detection, and automated defect detection. Amazon Rekognition is a fully managed service that can perform CV tasks like object detection, video segment detection, content moderation, and more to extract insights from data without the need of any prior ML experience. In some cases, a more custom solution might be needed along with the service to solve a very specific problem.
In this post, we address areas where CV can be applied to use cases where the pose of objects, their position, and orientation is important. One such use case would be customer-facing mobile applications where an image upload is required. It might be for compliance reasons or to provide a consistent user experience and improve engagement. For example, on online shopping platforms, the angle at which products are shown in images has an effect on the rate of buying this product. One such case is to detect the position of a car. We demonstrate how you can combine well-known ML solutions with postprocessing to address this problem on the AWS Cloud.
We use deep learning models to solve this problem. Training ML algorithms for pose estimation requires a lot of expertise and custom training data. Both requirements are hard and costly to obtain. Therefore, we present two options: one that doesn’t require any ML expertise and uses Amazon Rekognition, and another that uses Amazon SageMaker to train and deploy a custom ML model. In the first option, we use Amazon Rekognition to detect the wheels of the car. We then infer the car orientation from the wheel positions using a rule-based system. In the second option, we detect the wheels and other car parts using the Detectron model. These are again used to infer the car position with rule-based code. The second option requires ML experience but is also more customizable. It can be used for further postprocessing on the image, for example, to crop out the whole car. Both of the options can be trained on publicly available datasets. Finally, we show how you can integrate this car pose detection solution into your existing web application using services like Amazon API Gateway and AWS Amplify.
Solution overview
The following diagram illustrates the solution architecture.
The solution consists of a mock web application in Amplify where a user can upload an image and invoke either the Amazon Rekognition model or the custom Detectron model to detect the position of the car. For each option, we host an AWS Lambda function behind an API Gateway that is exposed to our mock application. We configured our Lambda function to run with either the Detectron model trained in SageMaker or Amazon Rekognition.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account.
- An AWS Identity and Access Management (IAM) user with the permissions to deploy and provision the infrastructure, for example, PowerUserAccess (note that permissions would need to be restricted further for a production-ready application and depend on possible integrations with other services).
- Docker in your development environment (local machine or a SageMaker notebook instance where you are deploying the solution from).
- The AWS Cloud Development Kit (AWS CDK) installed. It can be installed using npm as explained in our GitHub repository.
Create a serverless app using Amazon Rekognition
Our first option demonstrates how you can detect car orientations in images using Amazon Rekognition. The idea is to use Amazon Rekognition to detect the location of the car and its wheels and then do postprocessing to derive the orientation of the car from this information. The whole solution is deployed using Lambda as shown in the Github repository. This folder contains two main files: a Dockerfile that defines the Docker image that will run in our Lambda function, and the app.py
file, which will be the main entry point of the Lambda function:
The Lambda function expects an event that contains a header and body, where the body should be the image needed to be labeled as base64 decoded object. Given the image, the Amazon Rekognition detect_labels
function is invoked from the Lambda function using Boto3. The function returns one or more labels for each object in the image and bounding box details for all of the detected object labels as part of the response, along with other information like confidence of the assigned label, the ancestor labels of the detected label, possible aliases for the label, and the categories the detected label belongs to. Based on the labels returned by Amazon Rekognition, we run the function label_image
, which calculates the car angle from the detected wheels as follows:
Note that the application requires that only one car is present in the image and returns an error if that’s not the case. However, the postprocessing can be adapted to provide more granular orientation descriptions, cover several cars, or calculate the orientation of more complex objects.
Improve wheel detection
To further improve the accuracy of the wheel detection, you can use Amazon Rekognition Custom Labels. Similar to fine-tuning using SageMaker to train and deploy a custom ML model, you can bring your own labeled data so that Amazon Rekognition can produce a custom image analysis model for you in just a few hours. With Rekognition Custom Labels, you only need a small set of training images that are specific to your use case, in this case car images with specific angles, because it uses the existing capabilities in Amazon Rekognition of being trained on tens of millions of images across many categories. Rekognition Custom Labels can be integrated with only a few clicks and small adaptations to the Lambda function we use for the standard Amazon Rekognition solution.
Train a model using a SageMaker training job
In our second option, we train a custom deep learning model on SageMaker. We use the Detectron2 framework for the segmentation of car parts. These segments are then used to infer the position of the car.
The Detectron2 framework is a library that provides state-of-the-art detection and segmentation algorithms. Detectron provides a variety of Mask R-CNN models that were trained on the famous COCO (Common objects in Context) dataset. To build our car objects detection model, we use transfer learning to fine-tune a pretrained Mask R-CNN model on the car parts segmentation dataset. This dataset allows us to train a model that can detect wheels but also other car parts. This additional information can be further used in the car angle computations relative to the image.
The dataset contains annotated data of car parts to be used for object detection and semantic segmentation tasks: approximately 500 images of sedans, pickups, and sports utility vehicles (SUVs), taken in multiple views (front, back, and side views). Each image is annotated by 18 instance masks and bounding boxes representing the different parts of a car like wheels, mirrors, lights, and front and back glass. We modified the base annotations of the wheels such that each wheel is considered an individual object instead of considering all the available wheels in the image as one object.
We use Amazon Simple Storage Service (Amazon S3) to store the dataset used for training the Detectron model along with the trained model artifacts. Moreover, the Docker container that runs in the Lambda function is stored in Amazon Elastic Container Registry (Amazon ECR). The Docker container in the Lambda function is needed to include the required libraries and dependencies for running the code. We could alternatively use Lambda layers, but it’s limited to an unzipped deployment packaged size quota of 250 MB and a maximum of five layers can be added to a Lambda function.
Our solution is built on SageMaker: we extend prebuilt SageMaker Docker containers for PyTorch to run our custom PyTorch training code. Next, we use the SageMaker Python SDK to wrap the training image into a SageMaker PyTorch estimator, as shown in the following code snippets:
Finally, we start the training job by calling the fit()
function on the created PyTorch estimator. When the training is finished, the trained model artifact is stored in the session bucket in Amazon S3 to be used for the inference pipeline.
Deploy the model using SageMaker and inference pipelines
We also use SageMaker to host the inference endpoint that runs our custom Detectron model. The full infrastructure used to deploy our solution is provisioned using the AWS CDK. We can host our custom model through a SageMaker real-time endpoint by calling deploy
on the PyTorch estimator. This is the second time we extend a prebuilt SageMaker PyTorch container to include PyTorch Detectron. We use it to run the inference script and host our trained PyTorch model as follows:
Note that we used an ml.g4dn.xlarge GPU for deployment because it’s the smallest GPU available and sufficient for this demo. Two components need to be configured in our inference script: model loading and model serving. The function model_fn()
is used to load the trained model that is part of the hosted Docker container and can also be found in Amazon S3 and return a model object that can be used for model serving as follows:
The function predict_fn()
performs the prediction and returns the result. Besides using our trained model, we use a pretrained version of the Mask R-CNN model trained on the COCO dataset to extract the main car in the image. This is an extra postprocessing step to deal with images where more than one car exists. See the following code:
Similar to the Amazon Rekognition solution, the bounding boxes predicted for the wheel
class are filtered from the detection outputs and supplied to the postprocessing module to assess the car position relative to the output.
Finally, we also improved the postprocessing for the Detectron solution. It also uses the segments of different car parts to infer the solution. For example, whenever a front bumper is detected, but no back bumper, it is assumed that we have a front view of the car and the corresponding angle is calculated.
Connect your solution to the web application
The steps to connect the model endpoints to Amplify are as follows:
- Clone the application repository that the AWS CDK stack created, named
car-angle-detection-website-repo
. Make sure you are looking for it in the Region you used for deployment. - Copy the API Gateway endpoints for each of the deployed Lambda functions into the
index.html
file in the preceding repository (there are placeholders where the endpoint needs to be placed). The following code is an example of what this section of the .html file looks like:
- Save the HTML file and push the code change to the remote main branch.
This will update the HTML file in the deployment. The application is now ready to use.
- Navigate to the Amplify console and locate the project you created.
The application URL will be visible after the deployment is complete.
- Navigate to the URL and have fun with the UI.
Conclusion
Congratulations! We have deployed a complete serverless architecture in which we used Amazon Rekognition, but also gave an option for your own custom model, with this example available on GitHub. If you don’t have ML expertise in your team or enough custom data to train a model, you could select the option that uses Amazon Rekognition. If you want more control over your model, would like to customize it further, and have enough data, you can choose the SageMaker solution. If you have a team of data scientists, they might also want to enhance the models further and pick a more custom and flexible option. You can put the Lambda function and the API Gateway behind your web application using either of the two options. You can also use this approach for a different use case for which you might want to adapt the code.
The advantage of this serverless architecture is that the building blocks are completely exchangeable. The opportunities are almost limitless. So, get started today!
As always, AWS welcomes feedback. Please submit any comments or questions.
About the Authors
Michael Wallner is a Senior Consultant Data & AI with AWS Professional Services and is passionate about enabling customers on their journey to become data-driven and AWSome in the AWS cloud. On top, he likes thinking big with customers to innovate and invent new ideas for them.
Aamna Najmi is a Data Scientist with AWS Professional Services. She is passionate about helping customers innovate with Big Data and Artificial Intelligence technologies to tap business value and insights from data. She has experience in working on data platform and AI/ML projects in the healthcare and life sciences vertical. In her spare time, she enjoys gardening and traveling to new places.
David Sauerwein is a Senior Data Scientist at AWS Professional Services, where he enables customers on their AI/ML journey on the AWS cloud. David focuses on digital twins, forecasting and quantum computation. He has a PhD in theoretical physics from the University of Innsbruck, Austria. He was also a doctoral and post-doctoral researcher at the Max-Planck-Institute for Quantum Optics in Germany. In his free time he loves to read, ski and spend time with his family.
Srikrishna Chaitanya Konduru is a Senior Data Scientist with AWS Professional services. He supports customers in prototyping and operationalising their ML applications on AWS. Srikrishna focuses on computer vision and NLP. He also leads ML platform design and use case identification initiatives for customers across diverse industry verticals. Srikrishna has an M.Sc in Biomedical Engineering from RWTH Aachen university, Germany, with a focus on Medical Imaging.
Ahmed Mansour is a Data Scientist at AWS Professional Services. He provide technical support for customers through their AI/ML journey on the AWS cloud. Ahmed focuses on applications of NLP to the protein domain along with RL. He has a PhD in Engineering from the Technical University of Munich, Germany. In his free time he loves to go to the gym and play with his kids.
Build a personalized avatar with generative AI using Amazon SageMaker
Generative AI has become a common tool for enhancing and accelerating the creative process across various industries, including entertainment, advertising, and graphic design. It enables more personalized experiences for audiences and improves the overall quality of the final products.
One significant benefit of generative AI is creating unique and personalized experiences for users. For example, generative AI is used by streaming services to generate personalized movie titles and visuals to increase viewer engagement and build visuals for titles based on a user’s viewing history and preferences. The system then generates thousands of variations of a title’s artwork and tests them to determine which version most attracts the user’s attention. In some cases, personalized artwork for TV series significantly increased clickthrough rates and view rates as compared to shows without personalized artwork.
In this post, we demonstrate how you can use generative AI models like Stable Diffusion to build a personalized avatar solution on Amazon SageMaker and save inference cost with multi-model endpoints (MMEs) at the same time. The solution demonstrates how, by uploading 10–12 images of yourself, you can fine-tune a personalized model that can then generate avatars based on any text prompt, as shown in the following screenshots. Although this example generates personalized avatars, you can apply the technique to any creative art generation by fine-tuning on specific objects or styles.
![]() |
![]() |
Solution overview
The following architecture diagram outlines the end-to-end solution for our avatar generator.
The scope of this post and the example GitHub code we provide focus only on the model training and inference orchestration (the green section in the preceding diagram). You can reference the full solution architecture and build on top of the example we provide.
Model training and inference can be broken down into four steps:
- Upload images to Amazon Simple Storage Service (Amazon S3). In this step, we ask you to provide a minimum of 10 high-resolution images of yourself. The more images, the better the result, but the longer it will take to train.
- Fine-tune a Stable Diffusion 2.1 base model using SageMaker asynchronous inference. We explain the rationale for using an inference endpoint for training later in this post. The fine-tuning process starts with preparing the images, including face cropping, background variation, and resizing for the model. Then we use Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning technique for large language models (LLMs), to fine-tune the model. Finally, in postprocessing, we package the fine-tuned LoRA weights with the inference script and configuration files (tar.gz) and upload them to an S3 bucket location for SageMaker MMEs.
- Host the fine-tuned models using SageMaker MMEs with GPU. SageMaker will dynamically load and cache the model from the Amazon S3 location based on the inference traffic to each model.
- Use the fine-tuned model for inference. After the Amazon Simple Notification Service (Amazon SNS) notification indicating the fine-tuning is sent, you can immediately use that model by supplying a
target_model
parameter when invoking the MME to create your avatar.
We explain each step in more detail in the following sections and walk through some of the sample code snippets.
Prepare the images
To achieve the best results from fine-tuning Stable Diffusion to generate images of yourself, you typically need to provide a large quantity and variety of photos of yourself from different angles, with different expressions, and in different backgrounds. However, with our implementation, you can now achieve a high-quality result with as few as 10 input images. We have also added automated preprocessing to extract your face from each photo. All you need is to capture the essence of how you look clearly from multiple perspectives. Include a front-facing photo, a profile shot from each side, and photos from angles in between. You should also include photos with different facial expressions like smiling, frowning, and a neutral expression. Having a mix of expressions will allow the model to better reproduce your unique facial features. The input images dictate the quality of avatar you can generate. To make sure this is done properly, we recommend an intuitive front-end UI experience to guide the user through the image capture and upload process.
The following are example selfie images at different angles with different facial expressions.
Fine-tune a Stable Diffusion model
After the images are uploaded to Amazon S3, we can invoke the SageMaker asynchronous inference endpoint to start our training process. Asynchronous endpoints are intended for inference use cases with large payloads (up to 1 GB) and long processing times (up to 1 hour). It also provides a built-in queuing mechanism for queuing up requests, and a task completion notification mechanism via Amazon SNS, in addition to other native features of SageMaker hosting such as auto scaling.
Even though fine-tuning is not an inference use case, we chose to utilize it here in lieu of SageMaker training jobs due to its built-in queuing and notification mechanisms and managed auto scaling, including the ability to scale down to 0 instances when the service is not in use. This allows us to easily scale the fine-tuning service to a large number of concurrent users and eliminates the need to implement and manage the additional components. However, it does come with the drawback of the 1 GB payload and 1 hour maximum processing time. In our testing, we found that 20 minutes is sufficient time to get reasonably good results with roughly 10 input images on an ml.g5.2xlarge instance. However, SageMaker training would be the recommended approach for larger-scale fine-tuning jobs.
To host the asynchronous endpoint, we must complete several steps. The first is to define our model server. For this post, we use the Large Model Inference Container (LMI). LMI is powered by DJL Serving, which is a high-performance, programming language-agnostic model serving solution. We chose this option because the SageMaker managed inference container already has many of the training libraries we need, such as Hugging Face Diffusers and Accelerate. This greatly reduces the amount of work required to customize the container for our fine-tuning job.
The following code snippet shows the version of the LMI container we used in our example:
In addition to that, we need to have a serving.properties
file that configures the serving properties, including the inference engine to use, the location of the model artifact, and dynamic batching. Lastly, we must have a model.py
file that loads the model into the inference engine and prepares the data input and output from the model. In our example, we use the model.py
file to spin up the fine-tuning job, which we explain in greater detail in a later section. Both the serving.properties
and model.py
files are provided in the training_service
folder.
The next step after defining our model server is to create an endpoint configuration that defines how our asynchronous inference will be served. For our example, we are just defining the maximum concurrent invocation limit and the output S3 location. With the ml.g5.2xlarge
instance, we have found that we are able to fine-tune up to two models concurrently without encountering an out-of-memory (OOM) exception, and therefore we set max_concurrent_invocations_per_instance
to 2. This number may need to be adjusted if we’re using a different set of tuning parameters or a smaller instance type. We recommend setting this to 1 initially and monitoring the GPU memory utilization in Amazon CloudWatch.
Finally, we create a SageMaker model that packages the container information, model files, and AWS Identity and Access Management (IAM) role into a single object. The model is deployed using the endpoint configuration we defined earlier:
When the endpoint is ready, we use the following sample code to invoke the asynchronous endpoint and start the fine-tuning process:
For more details about LMI on SageMaker, refer to Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference.
After invocation, the asynchronous endpoint starts queueing our fine-tuning job. Each job runs through the following steps: prepare the images, perform Dreambooth and LoRA fine-tuning, and prepare the model artifacts. Let’s dive deeper into the fine-tuning process.
Prepare the images
As we mentioned earlier, the quality of input images directly impacts the quality of fine-tuned model. For the avatar use case, we want the model to focus on the facial features. Instead of requiring users to provide carefully curated images of exact size and content, we implement a preprocessing step using computer vision techniques to alleviate this burden. In the preprocessing step, we first use a face detection model to isolate the largest face in each image. Then we crop and pad the image to the required size of 512 x 512 pixels for our model. Finally, we segment the face from the background and add random background variations. This helps highlight the facial features, allowing our model to learn from the face itself rather than the background. The following images illustrate the three steps in this process.
![]() |
![]() |
![]() |
Step 1: Face detection using computer vision | Step 2: Crop and pad the image to 512 x 512 pixels | Step 3 (Optional): Segment and add background variation |
Dreambooth and LoRA fine-tuning
For fine-tuning, we combined the techniques of Dreambooth and LoRA. Dreambooth allows you to personalize your Stable Diffusion model, embedding a subject into the model’s output domain using a unique identifier and expanding the model’s language vision dictionary. It uses a method called prior preservation to preserve the model’s semantic knowledge of the class of the subject, in this case a person, and use other objects in the class to improve the final image output. This is how Dreambooth can achieve high-quality results with just a few input images of the subject.
The following code snippet shows the inputs to our trainer.py
class for our avatar solution. Notice we chose <<TOK>>
as the unique identifier. This is purposely done to avoid picking a name that may already be in the model’s dictionary. If the name already exists, the model has to unlearn and then relearn the subject, which may lead to poor fine-tuning results. The subject class is set to “a photo of person”
, which enables prior preservation by first generating photos of people to feed in as additional inputs during the fine-tuning process. This will help reduce overfitting as model tries to preserve the previous knowledge of a person using the prior preservation method.
A number of memory-saving options have been enabled in the configuration, including fp16
, use_8bit_adam
, and gradient accumulation. This reduces the memory footprint to under 12 GB, which allows for fine-tuning of up to two models concurrently on an ml.g5.2xlarge
instance.
LoRA is an efficient fine-tuning technique for LLMs that freezes most of the weights and attaches a small adapter network to specific layers of the pre-trained LLM, allowing for faster training and optimized storage. For Stable Diffusion, the adapter is attached to the text encoder and U-Net components of the inference pipeline. The text encoder converts the input prompt to a latent space that is understood by the U-Net model, and the U-Net model uses the latent meaning to generate the image in the subsequent diffusion process. The output of the fine-tuning is just the text_encoder
and U-Net adapter weights. At inference time, these weights can be reattached to the base Stable Diffusion model to reproduce the fine-tuning results.
The figures below are detail diagram of LoRA fine-tuning provided by original author: Cheng-Han Chiang, Yung-Sung Chuang, Hung-yi Lee, “AACL_2022_tutorial_PLMs,” 2022
![]() |
![]() |
By combining both methods, we were able to generate a personalized model while tuning an order-of-magnitude fewer parameters. This resulted in a much faster training time and reduced GPU utilization. Additionally, storage was optimized with the adapter weight being only 70 MB, compared to 6 GB for a full Stable Diffusion model, representing a 99% size reduction.
Prepare the model artifacts
After fine-tuning is complete, the postprocessing step will TAR the LoRA weights with the rest of the model serving files for NVIDIA Triton. We use a Python backend, which means the Triton config file and the Python script used for inference are required. Note that the Python script has to be named model.py
. The final model TAR file should have the following file structure:
Host the fine-tuned models using SageMaker MMEs with GPU
After the models have been fine-tuned, we host the personalized Stable Diffusion models using a SageMaker MME. A SageMaker MME is a powerful deployment feature that allows hosting multiple models in a single container behind a single endpoint. It automatically manages traffic and routing to your models to optimize resource utilization, save costs, and minimize operational burden of managing thousands of endpoints. In our example, we run on GPU instances, and SageMaker MMEs support GPU using Triton Server. This allows you to run multiple models on a single GPU device and take advantage of accelerated compute. For more detail on how to host Stable Diffusion on SageMaker MMEs, refer to Create high-quality images with Stable Diffusion models and deploy them cost-efficiently with Amazon SageMaker.
For our example, we made additional optimization to load the fine-tuned models faster during cold start situations. This is possible because of LoRA’s adapter design. Because the base model weights and Conda environments are the same for all fine-tuned models, we can share these common resources by pre-loading them onto the hosting container. This leaves only the Triton config file, Python backend (model.py
), and LoRA adaptor weights to be dynamically loaded from Amazon S3 after the first invocation. The following diagram provides a side-by-side comparison.
This significantly reduces the model TAR file from approximately 6 GB to 70 MB, and therefore is much faster to load and unpack. To do the preloading in our example, we created a utility Python backend model in models/model_setup
. The script simply copies the base Stable Diffusion model and Conda environment from Amazon S3 to a common location to share across all the fine-tuned models. The following is the code snippet that performs the task:
Then each fine-tuned model will point to the shared location on the container. The Conda environment is referenced in the config.pbtxt
.
The Stable Diffusion base model is loaded from the initialize()
function of each model.py
file. We then apply the personalized LoRA weights to the unet and text_encoder
model to reproduce each fine-tuned model:
“Helping people stay reliably informed … that’s my motivation”
Amazon Scholar Heng Ji, who leads the Blender Lab at UIUC, has made it her mission to separate truly valuable information from noise.Read More
SageMaker Distribution is now available on Amazon SageMaker Studio
SageMaker Distribution is a pre-built Docker image containing many popular packages for machine learning (ML), data science, and data visualization. This includes deep learning frameworks like PyTorch, TensorFlow, and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab. In addition to this, SageMaker Distribution supports conda, micromamba, and pip as Python package managers.
In May 2023, we launched SageMaker Distribution as an open-source project at JupyterCon. This launch helped you use SageMaker Distribution to run experiments on your local environments. We are now natively providing that image in Amazon SageMaker Studio so that you gain the high performance, compute, and security benefits of running your experiments on Amazon SageMaker.
Compared to the earlier open-source launch, you have the following additional capabilities:
- The open-source image is now available as a first-party image in SageMaker Studio. You can now simply choose the open-source SageMaker Distribution from the list when choosing an image and kernel for your notebooks, without having to create a custom image.
- The SageMaker Python SDK package is now built-in with the image.
In this post, we show the features and advantages of using the SageMaker Distribution image.
Use SageMaker Distribution in SageMaker Studio
If you have access to an existing Studio domain, you can launch SageMaker Studio. To create a Studio domain, follow the directions in Onboard to Amazon SageMaker Domain.
- In the SageMaker Studio UI, choose File from the menu bar, choose New, and choose Notebook.
- When prompted for the image and instance, choose the SageMaker Distribution v0 CPU or SageMaker Distribution v0 GPU image.
- Choose your Kernel, then choose Select.
You can now start running your commands without needing to install common ML packages and frameworks! You can also run notebooks running on supported frameworks such as PyTorch and TensorFlow from the SageMaker examples repository, without having to switch the active kernels.
Run code remotely using SageMaker Distribution
In the public beta announcement, we discussed graduating notebooks from local compute environments to SageMaker Studio, and also operationalizing the notebook using notebook jobs.
Additionally, you can directly run your local notebook code as a SageMaker training job by simply adding a @remote
decorator to your function.
Let’s try an example. Add the following code to your Studio notebook running on the SageMaker Distribution image:
When you run the cell, the function will run as a remote SageMaker training job on an ml.m5.xlarge notebook, and the SDK automatically picks up the SageMaker Distribution image as the training image in Amazon Elastic Container Registry (Amazon ECR). For deep learning workloads, you can also run your script on multiple parallel instances.
Reproduce Conda environments from SageMaker Distribution elsewhere
SageMaker Distribution is available as a public Docker image. However, for data scientists more familiar with Conda environments than Docker, the GitHub repository also provides the environment files for each image build so you can build Conda environments for both CPU and GPU versions.
The build artifacts for each version are stored under the sagemaker-distribution/build_artifacts directory. To create the same environment as any of the available SageMaker Distribution versions, run the following commands, replacing the --file
parameter with the right environment files:
Customize the open-source SageMaker Distribution image
The open-source SageMaker Distribution image has the most commonly used packages for data science and ML. However, data scientists might require access to additional packages, and enterprise customers might have proprietary packages that provide additional capabilities for their users. In such cases, there are multiple options to have a runtime environment with all required packages. In order of increasing complexity, they are listed as follows:
- You can install packages directly on the notebook. We recommend Conda and micromamba, but pip also works.
- Data scientists familiar with Conda for package management can reproduce the Conda environment from SageMaker Distribution elsewhere and install and manage additional packages in that environment going forward.
- If administrators want a repeatable and controlled runtime environment for their users, they can extend SageMaker Distribution’s Docker images and maintain their own image. See Bring your own SageMaker image for detailed instructions to create and use a custom image in Studio.
Clean up
If you experimented with SageMaker Studio, shut down all Studio apps to avoid paying for unused compute usage. See Shut down and Update Studio Apps for instructions.
Conclusion
Today, we announced the launch of the open-source SageMaker Distribution image within SageMaker Studio. We showed you how to use the image in SageMaker Studio as one of the available first-party images, how to operationalize your scripts using the SageMaker Python SDK @remote decorator, how to reproduce the Conda environments from SageMaker Distribution outside Studio, and how to customize the image. We encourage you to try out SageMaker Distribution and share your feedback through GitHub!
Additional References
- SageMaker-distribution documentation
- JupyterCon Contributions by AWS in 2023
- Get Started on SageMaker Studio
About the authors
Durga Sury is an ML Solutions Architect in the Amazon SageMaker Service SA team. She is passionate about making machine learning accessible to everyone. In her 4 years at AWS, she has helped set up AI/ML platforms for enterprise customers. When she isn’t working, she loves motorcycle rides, mystery novels, and hiking with her 5-year-old husky.
Ketan Vijayvargiya is a Senior Software Development Engineer in Amazon Web Services (AWS). His focus areas are machine learning, distributed systems and open source. Outside work, he likes to spend his time self-hosting and enjoying nature.
Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization.
Amazon Kendra supports a variety of document formats, such as Microsoft Word, PDF, and text from various data sources. In this post, we focus on extending the document support in Amazon Kendra to make images searchable by their displayed content. Images can often be searched using supplemented metadata such as keywords. However, it takes a lot of manual effort to add detailed metadata to potentially thousands of images. Generative AI (GenAI) can be helpful in generating the metadata automatically. By generating textual captions, the GenAI caption predictions offer descriptive metadata for images. The Amazon Kendra index can then be enriched with the generated metadata during document ingestion to enable searching the images without any manual effort.
As an example, a GenAI model can be used to generate a textual description for the following image as “a dog laying on the ground under an umbrella” during document ingestion of the image.
An object recognition model can still detect keywords such as “dog” and “umbrella,” but a GenAI model offers deeper understanding of what is represented in the image by identifying that the dog lies under the umbrella. This helps us build more refined searches in the image search process. The textual description is added as metadata to an Amazon Kendra search index via an automated custom document enrichment (CDE). Users searching for terms like “dog” or “umbrella” will then be able to find the image, as shown in the following screenshot.
In this post, we show how to use CDE in Amazon Kendra using a GenAI model deployed on Amazon SageMaker. We demonstrate CDE using simple examples and provide a step-by-step guide for you to experience CDE in an Amazon Kendra index in your own AWS account. It allows users to quickly and easily find the images they need without having to manually tag or categorize them. This solution can also be customized and scaled to meet the needs of different applications and industries.
Image captioning with GenAI
Image description with GenAI involves using ML algorithms to generate textual descriptions of images. The process is also known as image captioning, and operates at the intersection of computer vision and natural language processing (NLP). It has applications in areas where data is multi-modal such as ecommerce, where data contains text in the form of metadata as well as images, or in healthcare, where data could contain MRIs or CT scans along with doctor’s notes and diagnoses, to name a few use cases.
GenAI models learn to recognize objects and features within the images, and then generate descriptions of those objects and features in natural language. The state-of-the-art models use an encoder-decoder architecture, where the image information is encoded in the intermediate layers of the neural network and decoded into textual descriptions. These can be considered as two distinct stages: feature extraction from images and textual caption generation. In the feature extraction stage (encoder), the GenAI model processes the image to extract relevant visual features, such as object shapes, colors, and textures. In the caption generation stage (decoder), the model generates a natural language description of the image based on the extracted visual features.
GenAI models are typically trained on vast amounts of data, which make them suitable for various tasks without additional training. Adapting to custom datasets and new domains is also easily achievable through few-shot learning. Pre-training methods allow multi-modal applications to be easily trained using state-of-the-art language and image models. These pre-training methods also allow you to mix and match the vision model and language model that best fits your data.
The quality of the generated image descriptions depends on the quality and size of the training data, the architecture of the GenAI model, and the quality of the feature extraction and caption generation algorithms. Although image description with GenAI is an active area of research, it shows very good results in a wide range of applications, such as image search, visual storytelling, and accessibility for people with visual impairments.
Use cases
GenAI image captioning is useful in the following use cases:
- Ecommerce – A common industry use case where images and text occur together is retail. Ecommerce in particular stores vast amounts of data as product images along with textual descriptions. The textual description or metadata is important to ensure that the best products are displayed to the user based on the search queries. Moreover, with the trend of ecommerce sites obtaining data from 3P vendors, the product descriptions are often incomplete, amounting to numerous manual hours and huge overhead resulting from tagging the right information in the metadata columns. GenAI-based image captioning is particularly useful for automating this laborious process. Fine-tuning the model on custom fashion data such as fashion images along with text describing the attributes of fashion products can be used to generate metadata that then improves a user’s search experience.
- Marketing – Another use case of image search is digital asset management. Marketing firms store vast amounts of digital data that needs to be centralized, easily searchable, and scalable enabled by data catalogs. A centralized data lake with informative data catalogs would reduce duplication efforts and enable wider sharing of creative content and consistency between teams. For graphic design platforms popularly used for enabling social media content generation, or presentations in corporate settings, a faster search could result in an improved user experience by rendering the correct search results for the images that users want to look for and enabling users to search using natural language queries.
- Manufacturing – The manufacturing industry stores a lot of image data like architecture blueprints of components, buildings, hardware, and equipment. The ability to search through such data enables product teams to easily recreate designs from a starting point that already exists and eliminates a lot of design overhead, thereby speeding up the process of design generation.
- Healthcare – Doctors and medical researchers can catalog and search through MRIs and CT scans, specimen samples, images of the ailment such as rashes and deformities, along with doctor’s notes, diagnoses, and clinical trials details.
- Metaverse or augmented reality – Advertising a product is about creating a story that users can imagine and relate to. With AI-powered tools and analytics, it has become easier than ever to build not just one story but customized stories to appear to end-users’ unique tastes and sensibilities. This is where image-to-text models can be a game changer. Visual storytelling can assist in creating characters, adapting them to different styles, and captioning them. It can also be used to power stimulating experiences in the metaverse or augmented reality and immersive content including video games. Image search enables developers, designers, and teams to search their content using natural language queries, which can maintain consistency of content between various teams.
- Accessibility of digital content for blind and low vision – This is primarily enabled by assistive technologies such as screenreaders, Braille systems that allow touch reading and writing, and special keyboards for navigating websites and applications across the internet. Images, however, need to be delivered as textual content that can then be communicated as speech. Image captioning using GenAI algorithms is a crucial piece for redesigning the internet and making it more inclusive by providing everyone a chance to access, understand, and interact with online content.
Model details and model fine-tuning for custom datasets
In this solution, we take advantage of the vit-gpt2-image-captioning model available from Hugging Face, which is licensed under Apache 2.0 without performing any further fine-tuning. Vit is a foundational model for image data, and GPT-2 is a foundational model for language. The multi-modal combination of the two offers the capability of image captioning. Hugging Face hosts state-of-the-art image captioning models, which can be deployed in AWS in a few clicks and offer simple-to-deploy inference endpoints. Although we can use this pre-trained model directly, we can also customize the model to fit domain-specific datasets, more data types such as video or spatial data, and unique use cases. There are several GenAI models where some models perform best with certain datasets, or your team might already be using vision and language models. This solution offers the flexibility of choosing the best-performing vision and language model as the image captioning model through straightforward replacement of the model we have used.
For customization of the models to unique industry applications, open-source models available on AWS through Hugging Face offer several possibilities. A pre-trained model can be tested for the unique dataset or trained on samples of the labeled data to fine-tune it. Novel research methods also allow any combination of vision and language models to be combined efficiently and trained on your dataset. This newly trained model can then be deployed in SageMaker for the image captioning described in this solution.
An example of a customized image search is Enterprise Resource Planning (ERP). In ERP, image data collected from different stages of logistics or supply chain management could include tax receipts, vendor orders, payslips, and more, which need to be automatically categorized for the purview of different teams within the organization. Another example is to use medical scans and doctor diagnoses to predict new medical images for automatic classification. The vision model extracts features from the MRI, CT, or X-ray images and the text model captions it with the medical diagnoses.
Solution overview
The following diagram shows the architecture for image search with GenAI and Amazon Kendra.
We ingest images from Amazon Simple Storage Service (Amazon S3) into Amazon Kendra. During ingestion to Amazon Kendra, the GenAI model hosted on SageMaker is invoked to generate an image description. Additionally, text visible in an image is extracted by Amazon Textract. The image description and the extracted text are stored as metadata and made available to the Amazon Kendra search index. After ingestion, images can be searched via the Amazon Kendra search console, API, or SDK.
We use the advanced operations of CDE in Amazon Kendra to call the GenAI model and Amazon Textract during the image ingestion step. However, we can use CDE for a wider range of use cases. With CDE, you can create, modify, or delete document attributes and content when you ingest your documents into Amazon Kendra. This means you can manipulate and ingest your data as needed. This can be achieved by invoking pre- and post-extraction AWS Lambda functions during ingestion, which allows for data enrichment or modification. For example, we can use Amazon Medical Comprehend when ingesting medical textual data to add ML-generated insights to the search metadata.
You can use our solution to search images through Amazon Kendra by following these steps:
- Upload images to an image repository like an S3 bucket.
- The image repository is then indexed by Amazon Kendra, which is a search engine that can be used to search for structured and unstructured data. During indexing, the GenAI model as well as Amazon Textract are invoked to generate the image metadata. You can trigger the indexing manually or on a predefined schedule.
- You can then search for images using natural language queries, such as “Find images of red roses” or “Show me pictures of dogs playing in the park,” through the Amazon Kendra console, SDK, or API. These queries are processed by Amazon Kendra, which uses ML algorithms to understand the meaning behind the queries and retrieve relevant images from the indexed repository.
- The search results are presented to you, along with their corresponding textual descriptions, allowing you to quickly and easily find the images you are looking for.
Prerequisites
You must have the following prerequisites:
- An AWS account
- Permissions to provision and invoke the following services via AWS CloudFormation: Amazon S3, Amazon Kendra, Lambda, and Amazon Textract.
Cost estimate
The cost of deploying this solution as a proof of concept is projected in the following table. This is the reason we use Amazon Kendra with the Developer Edition, which is not recommended for production workloads, but provides a low-cost option for developers. We assume that the search functionality of Amazon Kendra is used for 20 working days for 3 hours each day, and therefore calculate associated costs for 60 monthly active hours.
Service | Time Consumed | Cost Estimate per Month |
Amazon S3 | Storage of 10 GB with data transfer | 2.30 USD |
Amazon Kendra | Developer Edition with 60 hours/month | 67.90 USD |
Amazon Textract | 100% detect document text on 10,000 images | 15.00 USD |
Amazon SageMaker | Real-time inference with ml.g4dn.xlarge for one model deployed on one endpoint for 3 hours every day for 20 days | 44.00 USD |
. | . | 129.2 USD |
Deploy resources with AWS CloudFormation
The CloudFormation stack deploys the following resources:
- A Lambda function that downloads the image captioning model from Hugging Face hub and subsequently builds the model assets
- A Lambda function that populates the inference code and zipped model artifacts to a destination S3 bucket
- An S3 bucket for storing the zipped model artifacts and inference code
- An S3 bucket for storing the uploaded images and Amazon Kendra documents
- An Amazon Kendra index for searching through the generated image captions
- A SageMaker real-time inference endpoint for deploying the Hugging Face image
- captioning model
- A Lambda function that is triggered while enriching the Amazon Kendra index on demand. It invokes Amazon Textract and a SageMaker real-time inference endpoint.
Additionally, AWS CloudFormation deploys all the necessary AWS Identity and Access
Management (IAM) roles and policies, a VPC along with subnets, a security group, and an internet gateway in which the custom resource Lambda function is run.
Complete the following steps to provision your resources:
- Choose Launch stack to launch the CloudFormation template in the
us-east-1
Region: - Choose Next.
- On the Specify stack details page, leave the template URL and S3 URI of the parameters file at their defaults, then choose Next.
- Continue to choose Next on the subsequent pages.
- Choose Create stack to deploy the stack.
Monitor the status of the stack. When the status shows as CREATE_COMPLETE, the deployment is complete.
Ingest and search example images
Complete the following steps to ingest and search your images:
- On the Amazon S3 console, create a folder called
images
in thekendra-image-search-stack-imagecaptions
S3 bucket in theus-east-1
Region. - Upload the following images to the
images
folder.
- Navigate to the Amazon Kendra console in
us-east-1
Region. - In the navigation pane, choose Indexes, then choose your index (
kendra-index
). - Choose Data sources, then choose
generated_image_captions
. - Choose Sync now.
Wait for the synchronization to be complete before continuing to the next steps.
- In the navigation pane, choose Indexes, then choose
kendra-index
. - Navigate to the search console.
- Try the following queries individually or combined: “dog,” “umbrella,” and “newsletter,” and find out which images are ranked high by Amazon Kendra.
Feel free to test your own queries that fit the uploaded images.
Clean up
To deprovisioning all the resources, complete the following step
- On the AWS CloudFormation console, choose Stacks in the navigation pane.
- Select the stack
kendra-genai-image-search
and choose Delete.
Wait until the stack status changes to DELETE_COMPLETE.
Conclusion
In this post, we saw how Amazon Kendra and GenAI can be combined to automate the creation of meaningful metadata for images. State-of-the-art GenAI models are extremely useful for generating text captions describing the content of an image. This has several industry use cases, ranging from healthcare and life sciences, retail and ecommerce, digital asset platforms, and media. Image captioning is also crucial for building a more inclusive digital world and redesigning the internet, metaverse, and immersive technologies to cater to the needs of visually challenged sections of society.
Image search enabled through captions enables digital content to be easily searchable without manual effort for these applications, and removes duplication efforts. The CloudFormation template we provided makes it straightforward to deploy this solution to enable image search using Amazon Kendra. A simple architecture of images stored in Amazon S3 and GenAI to create textual descriptions of the images can be used with CDE in Amazon Kendra to power this solution.
This is only one application of GenAI with Amazon Kendra. To dive deeper into how to build GenAI applications with Amazon Kendra, refer to Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models. For building and scaling GenAI applications, we recommend checking out Amazon Bedrock.
About the Authors
Charalampos Grouzakis is a Data Scientist within AWS Professional Services. He has over 11 years of experience in developing and leading data science, machine learning, and big data initiatives. Currently he is helping enterprise customers modernizing their AI/ML workloads within the cloud using industry best practices. Prior to joining AWS, he was consulting customers in various industries such as Automotive, Manufacturing, Telecommunications, Media & Entertainment, Retail and Financial Services. He is passionate about enabling customers to accelerate their AI/ML journey in the cloud and to drive tangible business outcomes.
Bharathi Srinivasan is a Data Scientist at AWS Professional Services where she loves to build cool things on Sagemaker. She is passionate about driving business value from machine learning applications, with a focus on ethical AI. Outside of building new AI experiences for customers, Bharathi loves to write science fiction and challenge herself with endurance sports.
Jean-Michel Lourier is a Senior Data Scientist within AWS Professional Services. He leads teams implementing data driven applications side by side with AWS customers to generate business value out of their data. He’s passionate about diving into tech and learning about AI, machine learning, and their business applications. He is also an enthusiastic cyclist, taking long bike-packing trips.
Tanvi Singhal is a Data Scientist within AWS Professional Services. Her skills and areas of expertise include data science, machine learning, and big data. She supports customers in developing Machine learning models and MLops solutions within the cloud. Prior to joining AWS, she was also a consultant in various industries such as Transportation Networking, Retail and Financial Services. She is passionate about enabling customers on their data/AI journey to the cloud.
Abhishek Maligehalli Shivalingaiah is a Senior AI Services Solution Architect at AWS with focus on Amazon Kendra. He is passionate about building applications using Amazon Kendra ,Generative AI and NLP. He has around 10 years of experience in building Data & AI solutions to create value for customers and enterprises. He has built a (personal) chatbot for fun to answers questions about his career and professional journey. Outside of work he enjoys making portraits of family & friends, and loves creating artworks.
Exploring summarization options for Healthcare with Amazon SageMaker
In today’s rapidly evolving healthcare landscape, doctors are faced with vast amounts of clinical data from various sources, such as caregiver notes, electronic health records, and imaging reports. This wealth of information, while essential for patient care, can also be overwhelming and time-consuming for medical professionals to sift through and analyze. Efficiently summarizing and extracting insights from this data is crucial for better patient care and decision-making. Summarized patient information can be useful to a number of downstream processes like data aggregation, effectively coding patients, or grouping patients with similar diagnoses for review.
Artificial intelligence (AI) and machine learning (ML) models have shown great promise in addressing these challenges. Models can be trained to analyze and interpret large volumes of text data, effectively condensing information into concise summaries. By automating the summarization process, doctors can quickly gain access to relevant information, allowing them to focus on patient care and make more informed decisions. See the following case study to learn more about a real-world use case.
Amazon SageMaker, a fully managed ML service, provides an ideal platform for hosting and implementing various AI/ML-based summarization models and approaches. In this post, we explore different options for implementing summarization techniques on SageMaker, including using Amazon SageMaker JumpStart foundation models, fine-tuning pre-trained models from Hugging Face, and building custom summarization models. We also discuss the pros and cons of each approach, enabling healthcare professionals to choose the most suitable solution for generating concise and accurate summaries of complex clinical data.
Two important terms to know before we begin: pre-trained and fine-tuning. A pre-trained or foundation model is one that has been built and trained on a large corpus of data, typically for general language knowledge. Fine-tuning is the process by which a pre-trained model is given another more domain-specific dataset in order to enhance its performance on a specific task. In a healthcare setting, this would mean giving the model some data including phrases and terminology pertaining specifically to patient care.
Build custom summarization models on SageMaker
Though the most high-effort approach, some organizations might prefer to build custom summarization models on SageMaker from scratch. This approach requires more in-depth knowledge of AI/ML models and may involve creating a model architecture from scratch or adapting existing models to suit specific needs. Building custom models can offer greater flexibility and control over the summarization process, but also requires more time and resources compared to approaches that start from pre-trained models. It’s essential to weigh the benefits and drawbacks of this option carefully before proceeding, because it may not be suitable for all use cases.
SageMaker JumpStart foundation models
A great option for implementing summarization on SageMaker is using JumpStart foundation models. These models, developed by leading AI research organizations, offer a range of pre-trained language models optimized for various tasks, including text summarization. SageMaker JumpStart provides two types of foundation models: proprietary models and open-source models. SageMaker JumpStart also provides HIPAA eligibility, making it useful for healthcare workloads. It is ultimately up to the customer to ensure compliance, so be sure to take the appropriate steps. See Architecting for HIPAA Security and Compliance on Amazon Web Services for more details.
Proprietary foundation models
Proprietary models, such as Jurassic models from AI21 and the Cohere Generate model from Cohere, can be discovered through SageMaker JumpStart on the AWS Management Console and are currently under preview. Utilizing proprietary models for summarization is ideal when you don’t need to fine-tune your model on custom data. This offers an easy-to-use, out-of-the-box solution that can meet your summarization requirements with minimal configuration. By using the capabilities of these pre-trained models, you can save time and resources that would otherwise be spent on training and fine-tuning a custom model. Furthermore, proprietary models typically come with user-friendly APIs and SDKs, streamlining the integration process with your existing systems and applications. If your summarization needs can be met by pre-trained proprietary models without requiring specific customization or fine-tuning, they offer a convenient, cost-effective, and efficient solution for your text summarization tasks. Because these models are not trained specifically for healthcare use cases, quality can’t be guaranteed for medical language out of the box without fine-tuning.
Jurassic-2 Grande Instruct is a large language model (LLM) by AI21 Labs, optimized for natural language instructions and applicable to various language tasks. It offers an easy-to-use API and Python SDK, balancing quality and affordability. Popular uses include generating marketing copy, powering chatbots, and text summarization.
On the SageMaker console, navigate to SageMaker JumpStart, find the AI21 Jurassic-2 Grande Instruct model, and choose Try out model.
If you want to deploy the model to a SageMaker endpoint that you manage, you can follow the steps in this sample notebook, which shows you how to deploy Jurassic-2 Large using SageMaker.
Open-source foundation models
Open-source models include FLAN T5, Bloom, and GPT-2 models that can be discovered through SageMaker JumpStart in the Amazon SageMaker Studio UI, SageMaker JumpStart on the SageMaker console, and SageMaker JumpStart APIs. These models can be fine-tuned and deployed to endpoints under your AWS account, giving you full ownership of model weights and script codes.
Flan-T5 XL is a powerful and versatile model designed for a wide range of language tasks. By fine-tuning the model with your domain-specific data, you can optimize its performance for your particular use case, such as text summarization or any other NLP task. For details on how to fine-tune Flan-T5 XL using the SageMaker Studio UI, refer to Instruction fine-tuning for FLAN T5 XL with Amazon SageMaker Jumpstart.
Fine-tuning pre-trained models with Hugging Face on SageMaker
One of the most popular options for implementing summarization on SageMaker is fine-tuning pre-trained models using the Hugging Face Transformers library. Hugging Face provides a wide range of pre-trained transformer models specifically designed for various natural language processing (NLP) tasks, including text summarization. With the Hugging Face Transformers library, you can easily fine-tune these pre-trained models on your domain-specific data using SageMaker. This approach has several advantages, such as faster training times, better performance on specific domains, and easier model packaging and deployment using built-in SageMaker tools and services. If you’re unable to find a suitable model in SageMaker JumpStart, you can choose any model offered by Hugging Face and fine-tune it using SageMaker.
To start working with a model to learn about the capabilities of ML, all you need to do is open SageMaker Studio, find a pre-trained model you want to use in the Hugging Face Model Hub, and choose SageMaker as your deployment method. Hugging Face will give you the code to copy, paste, and run in your notebook. It’s as easy as that! No ML engineering experience required.
The Hugging Face Transformers library enables builders to operate on the pre-trained models and do advanced tasks like fine-tuning, which we explore in the following sections.
Provision resources
Before we can begin, we need to provision a notebook. For instructions, refer to Steps 1 and 2 in Build and Train a Machine Learning Model Locally. For this example, we used the settings shown in the following screenshot.
We also need to create an Amazon Simple Storage Service (Amazon S3) bucket to store the training data and training artifacts. For instructions, refer to Creating a bucket.
Prepare the dataset
To fine-tune our model to have better domain knowledge, we need to get data suitable for the task. When training for an enterprise use case, you’ll need to go through a number of data engineering tasks to prepare your own data to be ready for training. Those tasks are outside the scope of this post. For this example, we’ve generated some synthetic data to emulate nursing notes and stored it in Amazon S3. Storing our data in Amazon S3 enables us to architect our workloads for HIPAA compliance. We start by getting those notes and loading them on the instance where our notebook is running:
The notes are composed of a column containing the full entry, note, and a column containing a shortened version exemplifying what our desired output should be, summary. The purpose of using this dataset is to improve our model’s biological and medical vocabulary so that it’s more attuned to summarizing in a healthcare context, called domain fine-tuning, and show our model how to structure its summarized output. In some summarization cases, we may want to create an abstract out of an article or a one-line synopsis of a review, but in this case, we’re trying to get our model to output an abbreviated version of the symptoms and actions taken for a patient so far.
Load the model
The model we use as our foundation is a version of Google’s Pegasus, made available in the Hugging Face Hub, called pegasus-xsum. It’s already pre-trained for summarization, so our fine-tuning process can focus on extending its domain knowledge. Modifying the task our model runs is a different type of fine-tuning not covered in this post. The Transformer library supplies us with a class to load the model definition from our model_checkpoint: google/pegasus-xsum
. This will load the model from the hub and instantiate it in our notebook so we can use it later on. Because pegasus-xsum
is a sequence-to-sequence model, we want to use the Seq2Seq type of the AutoModel class:
Now that we have our model, it’s time to put our attention to the other components that will enable us to run our training loop.
Create a tokenizer
The first of these components is the tokenizer. Tokenization is the process by which words from the input data are transformed into numerical representations that our model can understand. Again, the Transformer library provides a class for us to load a tokenizer definition from the same checkpoint we used to instantiate the model:
With this tokenizer object, we can create a preprocessing function and map it onto our dataset to give us tokens ready to be fed into the model. Finally, we format the tokenized output and remove the columns containing our original text, because the model will not be able to interpret them. Now we’re left with a tokenized input ready to be fed into the model. See the following code:
With our data tokenized and our model instantiated, we’re almost ready to run a training loop. The next components we want to create are the data collator and the optimizer. The data collator is another class provided by Hugging Face through the Transformers library, which we use to create batches of our tokenized data for training. We can easily build this using the tokenizer and model objects we already have just by finding the corresponding class type we’ve used previously for our model (Seq2Seq) for the collator class. The optimizer’s function is to maintain the training state and update the parameters based on our training loss as we work through the loop. To create an optimizer, we can import the optim package from the torch module, where a number of optimization algorithms are available. Some common ones you may have encountered before are Stochastic Gradient Descent and Adam, the latter of the which is applied in our example. Adam’s constructor takes in the model parameters and the parameterized learning rate for the given training run. See the following code:
The last steps before we can begin training are to build the accelerator and the learning rate scheduler. The accelerator comes from a different library (we’ve been primarily using Transformers) produced by Hugging Face, aptly named Accelerate, and will abstract away logic required to manage devices during training (using multiple GPUs for example). For the final component, we revisit the ever-useful Transformers library to implement our learning rate scheduler. By specifying the scheduler type, the total number of training steps in our loop, and the previously created optimizer, the get_scheduler
function returns an object that enables us to adjust our initial learning rate throughout the training process:
We’re now fully set up for training! Let’s set up a training job, starting by instantiating the training_args using the Transformers library and choosing parameter values. We can pass these, along with our other prepared components and dataset, directly to the trainer and start training, as shown in the following code. Depending on the size of your dataset and chosen parameters, this may take a significant amount of time.
Package the model for inference
After training has been run, the model object is ready to be used for inference. As a best practice, let’s save our work for future use. We need to create our model artifacts, zip them together, and upload our tarball to Amazon S3 for storage. To prepare our model for zipping, we need to unwrap the now fine-tuned model, then save the model binary and associated config files. We also need to save our tokenizer to the same directory that we saved our model artifacts to so it is available when we use the model for inference. Our model_dir
folder should now look something like the following code:
All that’s left is to run a tar command to zip up our directory and upload the tar.gz file to Amazon S3:
Our newly fine-tuned model is now ready and available to be used for inference.
Perform inference
To use this model artifact for inference, open a new file and use the following code, modifying the model_data
parameter to fit your artifact save location in Amazon S3. The HuggingFaceModel
constructor will rebuild our model from the checkpoint we saved to model.tar.gz
, which we can then deploy for inference using the deploy method. Deploying the endpoint will take a few minutes.
After the endpoint is deployed, we can use the predictor we’ve created to test it. Pass the predict
method a data payload and run the cell, and you’ll get the response from your fine-tuned model:
To see the benefit of fine-tuning a model, let’s do a quick test. The following table includes a prompt and the results of passing that prompt to the model before and after fine-tuning.
Prompt | Response with No Fine-Tuning | Response with Fine-Tuning |
Summarize the symptoms that the patient is experiencing. Patient is a 45 year old male with complaints of substernal chest pain radiating to the left arm. Pain is sudden onset while he was doing yard work, associated with mild shortness of breath and diaphoresis. On arrival patient’s heart rate was 120, respiratory rate 24, blood pressure 170/95. 12 lead electrocardiogram done on arrival to the emergency department and three sublingual nitroglycerin administered without relief of chest pain. Electrocardiogram shows ST elevation in anterior leads demonstrating acute anterior myocardial infarction. We have contacted cardiac catheterization lab and prepping for cardiac catheterization by cardiologist. | We present a case of acute myocardial infarction. | Chest pain, anterior MI, PCI. |
As you can see, our fine-tuned model uses health terminology differently, and we’ve been able to change the structure of the response to fit our purposes. Note that results are dependent on your dataset and the design choices made during training. Your version of the model could offer very different results.
Clean up
When you’re finished with your SageMaker notebook, be sure to shut it down to avoid costs from long-running resources. Note that shutting down the instance will cause you to lose any data stored in the instance’s ephemeral memory, so you should save all your work to persistent storage before cleanup. You will also need to go to the Endpoints page on the SageMaker console and delete any endpoints deployed for inference. To remove all artifacts, you also need to go to the Amazon S3 console to delete files uploaded to your bucket.
Conclusion
In this post, we explored various options for implementing text summarization techniques on SageMaker to help healthcare professionals efficiently process and extract insights from vast amounts of clinical data. We discussed using SageMaker Jumpstart foundation models, fine-tuning pre-trained models from Hugging Face, and building custom summarization models. Each approach has its own advantages and drawbacks, catering to different needs and requirements.
Building custom summarization models on SageMaker allows for lots of flexibility and control but requires more time and resources than using pre-trained models. SageMaker Jumpstart foundation models provide an easy-to-use and cost-effective solution for organizations that don’t require specific customization or fine-tuning, as well as some options for simplified fine-tuning. Fine-tuning pre-trained models from Hugging Face offers faster training times, better domain-specific performance, and seamless integration with SageMaker tools and services across a broad catalog of models, but it requires some implementation effort. At the time of writing this post, Amazon has announced another option, Amazon Bedrock, which will offer summarization capabilities in an even more managed environment.
By understanding the pros and cons of each approach, healthcare professionals and organizations can make informed decisions on the most suitable solution for generating concise and accurate summaries of complex clinical data. Ultimately, using AI/ML-based summarization models on SageMaker can significantly enhance patient care and decision-making by enabling medical professionals to quickly access relevant information and focus on providing quality care.
Resources
For the full script discussed in this post and some sample data, refer to the GitHub repo. For more information on how to run ML workloads on AWS, see the following resources:
- Hugging Face on Amazon SageMaker Workshop
- Hugging Face Transformers Amazon SageMaker Examples
- Technology Innovation Institute trains the state-of-the-art Falcon LLM 40B foundation model on Amazon SageMaker
- Training large language models on Amazon SageMaker: Best practices
- How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker
About the authors
Cody Collins is a New York based Solutions Architect at Amazon Web Services. He works with ISV customers to build industry leading solutions in the cloud. He has successfully delivered complex projects for diverse industries, optimizing efficiency and scalability. In his spare time, he enjoys reading, traveling, and training jiu jitsu.
Ameer Hakme is an AWS Solutions Architect residing in Pennsylvania. His professional focus involves collaborating with Independent software vendors throughout the Northeast, guiding them in designing and constructing scalable, state-of-the-art platforms on the AWS Cloud.
Unlocking creativity: How generative AI and Amazon SageMaker help businesses produce ad creatives for marketing campaigns with AWS
Advertising agencies can use generative AI and text-to-image foundation models to create innovative ad creatives and content. In this post, we demonstrate how you can generate new images from existing base images using Amazon SageMaker, a fully managed service to build, train, and deploy ML models for at scale. With this solution, businesses large and small can develop new ad creatives much faster and at lower cost than ever before. This allows you to develop new custom ad creative content for your business at low cost and at a rapid pace.
Solution overview
Consider the following scenario: a global automotive company needs new marketing material generated for their new car design being released and hires a creative agency that is known for providing advertising solutions for clients with strong brand equity. The car manufacturer is looking for low-cost ad creatives that display the model in diverse locations, colors, views, and perspectives while maintaining the brand identity of the car manufacturer. With the power of state-of-the-art techniques, the creative agency can support their customer by using generative AI models within their secure AWS environment.
The solution is developed with Generative AI and Text-to-Image models in Amazon SageMaker. SageMaker is a fully managed machine learning (ML) service that that makes it straightforward to build, train, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows. Stable Diffusion is a text-to-image foundation model from Stability AI that powers the image generation process. Diffusers are pre-trained models that use Stable Diffusion to use an existing image to generate new images based on a prompt. Combining Stable Diffusion with Diffusers like ControlNet can take existing brand-specific content and develop stunning versions of it. Key benefits of developing the solution within AWS along with Amazon SageMaker are:
- Privacy – Storing the data in Amazon Simple Storage Service (Amazon S3) and using SageMaker to host models allows you to adhere to security best practices within your AWS account while not exposing assets publicly.
- Scalability – The Stable Diffusion model, when deployed as a SageMaker endpoint, brings scalability by allowing you to configure instance sizes and number of instances. SageMaker endpoints also have auto scaling features and are highly available.
- Flexibility – When creating and deploying endpoints, SageMaker provides the flexibility to choose GPU instance types. Also, instances behind SageMaker endpoints can be changed with minimum effort as business needs change. AWS has also developed hardware and chips using AWS Inferentia2 for high performance at the lowest cost for generative AI inference.
- Rapid innovation – Generative AI is a rapidly evolving domain with new approaches, and models are being constantly developed and released. Amazon SageMaker JumpStart regularly onboards new models along with foundation models.
- End-to-end integration – AWS allows you to integrate the creative process with any AWS service and develop an end-to-end process using fine-grained access control through AWS Identity and Access Management (IAM), notification through Amazon Simple Notification Service (Amazon SNS), and postprocessing with the event-driven compute service AWS Lambda.
- Distribution – When the new creatives are generated, AWS allows distributing the content across global channels in multiple Regions using Amazon CloudFront.
For this post, we use the following GitHub sample, which uses Amazon SageMaker Studio with foundation models (Stable Diffusion), prompts, computer vision techniques, and a SageMaker endpoint to generate new images from existing images. The following diagram illustrates the solution architecture.
The workflow contains the following steps:
- We store the existing content (images, brand styles, and so on) securely in S3 buckets.
- Within SageMaker Studio notebooks, the original image data is transformed to images using computer vision techniques, which preserves the shape of the product (the car model), removes color and background, and generates monotone intermediate images.
- The intermediate image acts as a control image for Stable Diffusion with ControlNet.
- We deploy a SageMaker endpoint with the Stable Diffusion text-to-image foundation model from SageMaker Jumpstart and ControlNet on a preferred GPU-based instance size.
- Prompts describing new backgrounds and car colors along with the intermediate monotone image are used to invoke the SageMaker endpoint, yielding new images.
- New images are stored in S3 buckets as they’re generated.
Deploy ControlNet on SageMaker endpoints
To deploy the model to SageMaker endpoints, we must create a compressed file for each individual technique model artifact along with the Stable Diffusion weights, inference script, and NVIDIA Triton config file.
In the following code, we download the model weights for the different ControlNet techniques and Stable Diffusion 1.5 to the local directory as tar.gz files:
To create the model pipeline, we define an inference.py
script that SageMaker real-time endpoints will use to load and host the Stable Diffusion and ControlNet tar.gz files. The following is a snippet from inference.py
that shows how the models are loaded and how the Canny technique is called:
We deploy the SageMaker endpoint with the required instance size (GPU type) from the model URI:
Generate new images
Now that the endpoint is deployed on SageMaker endpoints, we can pass in our prompts and the original image we want to use as our baseline.
To define the prompt, we create a positive prompt, p_p
, for what we’re looking for in the new image, and the negative prompt, n_p
, for what is to be avoided:
Finally, we invoke our endpoint with the prompt and source image to generate our new image:
Different ControlNet techniques
In this section, we compare the different ControlNet techniques and their effect on the resulting image. We use the following original image to generate new content using Stable Diffusion with Control-net in Amazon SageMaker.
The following table shows how the technique output dictates what, from the original image, to focus on.
Technique Name | Technique Type | Technique Output | Prompt | Stable Diffusion with ControlNet |
canny | A monochrome image with white edges on a black background. | ![]() |
metal orange colored car, complete car, colour photo, outdoors in a pleasant landscape, realistic, high quality | ![]() |
depth | A grayscale image with black representing deep areas and white representing shallow areas. | ![]() |
metal red colored car, complete car, colour photo, outdoors in pleasant landscape on beach, realistic, high quality | ![]() |
hed | A monochrome image with white soft edges on a black background. | ![]() |
metal white colored car, complete car, colour photo, in a city, at night, realistic, high quality | ![]() |
scribble | A hand-drawn monochrome image with white outlines on a black background. | ![]() |
metal blue colored car, similar to original car, complete car, colour photo, outdoors, breath-taking view, realistic, high quality, different viewpoint | ![]() |
Clean up
After you generate new ad creatives with generative AI, clean up any resources that won’t be used. Delete the data in Amazon S3 and stop any SageMaker Studio notebook instances to not incur any further charges. If you used SageMaker JumpStart to deploy Stable Diffusion as a SageMaker real-time endpoint, delete the endpoint either through the SageMaker console or SageMaker Studio.
Conclusion
In this post, we used foundation models on SageMaker to create new content images from existing images stored in Amazon S3. With these techniques, marketing, advertisement, and other creative agencies can use generative AI tools to augment their ad creatives process. To dive deeper into the solution and code shown in this demo, check out the GitHub repo.
Also, refer to Amazon Bedrock for use cases on generative AI, foundation models, and text-to-image models.
About the Authors
Sovik Kumar Nath is an AI/ML solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. Sovik has published articles and holds a patent in ML model monitoring. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.
Sandeep Verma is a Sr. Prototyping Architect with AWS. He enjoys diving deep into customer challenges and building prototypes for customers to accelerate innovation. He has a background in AI/ML, founder of New Knowledge, and generally passionate about tech. In his free time, he loves traveling and skiing with his family.
Uchenna Egbe is an Associate Solutions Architect at AWS. He spends his free time researching about herbs, teas, superfoods, and how to incorporate them into his daily diet.
Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers using machine learning to solve their business challenges using the AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge, therefore, she has created her own lab with self-driving kit and prototype manufacturing production line, where she spend lot of her free time.