Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

*Equal Contributors
To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output. A critical aspect of compression in practice is model comparison, including tracking many compression experiments, identifying subtle changes in model behavior, and negotiating complex accuracy-efficiency trade-offs. However, existing compression tools poorly support comparison, leading to tedious and, sometimes, incomplete analyses spread across disjoint tools. To support real-world comparative workflows, we…Apple Machine Learning Research

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

Aerospace companies face a generational workforce challenge today. With the strong post-COVID recovery, manufacturers are committing to record production rates, requiring the sharing of highly specialized domain knowledge across more workers. At the same time, maintaining the headcount and experience level of the workforce is increasingly challenging, as a generation of subject matter experts (SMEs) retires and increased fluidity characterizes the post-COVID labor market. This domain knowledge is traditionally captured in reference manuals, service bulletins, quality ticketing systems, engineering drawings, and more, but the quantity and complexity of documents is growing and takes time to learn. You simply can’t train new SMEs overnight. Without a mechanism to manage this knowledge transfer gap, productivity across all phases of the lifecycle might suffer from losing expert knowledge and repeating past mistakes.

Generative AI is a modern form of machine learning (ML) that has recently shown significant gains in reasoning, content comprehension, and human interaction. It can be a significant force multiplier to help the human workforce quickly digest, summarize, and answer complex questions from large technical document libraries, accelerating your workforce development. AWS is uniquely positioned to help you address these challenges through generative AI, with a broad and deep range of AI/ML services and over 20 years of experience in developing AI/ML technologies.

This post shows how aerospace customers can use AWS generative AI and ML-based services to address this document-based knowledge use case, using a Q&A chatbot to provide expert-level guidance to technical staff based on large libraries of technical documents. We focus on the use of two AWS services:

  • Amazon Q can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories, code, and enterprise systems.
  • Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Although Amazon Q is a great way to get started with no code for business users, Amazon Bedrock Knowledge Bases offers more flexibility at the API level for generative AI developers; we explore both these solutions in the following sections. But first, let’s revisit some basic concepts around Retrieval Augmented Generation (RAG) applications.

Generative AI constraints and RAG

Although generative AI holds great promise for automating complex tasks, our aerospace customers often express concerns about the use of the technology in such a safety- and security-sensitive industry. They ask questions such as:

  • “How do I keep my generative AI applications secure?”
  • “How do I make sure my business-critical data isn’t used to train proprietary models?”
  • “How do I know that answers are accurate and only drawn from authoritative sources?” (Avoiding the well-known problem of hallucination.)
  • “How can I trace the reasoning of my model back to source documents to build user trust?”
  • “How do I keep my generative AI applications up to date with an ever-evolving knowledge base?”

In many generative AI applications built on proprietary technical document libraries, these concerns can be addressed by using the RAG architecture. RAG helps maintain the accuracy of responses, keeps up with the rapid pace of document updates, and provides traceable reasoning while keeping your proprietary data private and secure.

This architecture combines a general-purpose large language model (LLM) with a customer-specific document database, which is accessed through a semantic search engine. Rather than fine-tuning the LLM to the specific application, the document library is loaded with the relevant reference material for that application. In RAG, these knowledge sources are often referred to as a knowledge base.

A high-level RAG architecture is shown in the following figure. The workflow includes the following steps:

  1. When the technician has a question, they enter it at the chat prompt.
  2. The technician’s question is used to search the knowledge base.
  3. The search results include a ranked list of most relevant source documentation.
  4. Those documentation snippets are added to the original query as context, and sent to the LLM as a combined prompt.
  5. The LLM returns the answer to the question, as synthesized from the source material in the prompt.

Because RAG uses a semantic search, it can find more relevant material in the database than just a keyword match alone. For more details on the operation of RAG systems, refer to Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart.

RAG architecture

This architecture addresses the concerns listed earlier in few key ways:

  • The underlying LLM doesn’t require custom training because the domain-specialized knowledge is contained in a separate knowledge base. As a result, the RAG-based system can be kept up to date, or retrained to completely new domains, simply by changing the documents in the knowledge base. This mitigates the significant cost typically associated with training custom LLMs.
  • Because of the document-based prompting, generative AI answers can be constrained to only come from trusted document sources, and provide direct attribution back to those source documents to verify.
  • RAG-based systems can securely manage access to different knowledge bases by role-based access control. Proprietary knowledge in generative AI remains private and protected in those knowledge bases.

AWS provides customers in aerospace and other high-tech domains the tools they need to rapidly build and securely deploy generative AI solutions at scale, with world-class security. Let’s look at how you can use Amazon Q and Amazon Bedrock to build RAG-based solutions in two different use cases.

Use case 1: Create a chatbot “expert” for technicians with Amazon Q

Aerospace is a high-touch industry, and technicians are the front line of that workforce. Technician work appears at every lifecycle stage for the aircraft (and its components), engineering prototype, qualification testing, manufacture, quality inspection, maintenance, and repair. Technician work is demanding and highly specialized; it requires detailed knowledge of highly technical documentation to make sure products meet safety, functional, and cost requirements. Knowledge management is a high priority for many companies, seeking to spread domain knowledge from experts to junior employees to offset attrition, scale production capacity, and improve quality.

Our customers frequently ask us how they can use customized chatbots built on customized generative AI models to automate access to this information and help technicians make better-informed decisions and accelerate their development. The RAG architecture shown in this post is an excellent solution to this use case because it allows companies to quickly deploy domain-specialized generative AI chatbots built securely on their own proprietary documentation. Amazon Q can deploy fully managed, scalable RAG systems tailored to address a wide range of business problems. It provides immediate, relevant information and advice to help streamline tasks, accelerate decision-making, and help spark creativity and innovation at work. It can automatically connect to over 40 different data sources, including Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, Atlassian Confluence, Slack, and Jira Cloud.

Let’s look at an example of how you can quickly deploy a generative AI-based chatbot “expert” using Amazon Q.

  1. Sign in to the Amazon Q console.

If you haven’t used Amazon Q before, you might be greeted with a request for initial configuration.

  1. Under Connect Amazon Q to IAM Identity Center, choose Create account instance to create a custom credential set for this demo.
  2. Under Select a bundle to get started, under Amazon Q Business Lite, choose Subscribe in Q Business to create a test subscription.

If you have previously used Amazon Q in this account, you can simply reuse an existing user or subscription for this walkthrough.

Amazon Q subscription

  1. After you create your AWS IAM Identity Center and Amazon Q subscription, choose Get started on the Amazon Q landing page.

Amazon Q getting started

  1. Choose Create application.
  2. For Application name, enter a name (for example, my-tech-assistant).
  3. Under Service access, select Create and use a new service-linked role (SLR).
  4. Choose Create.

This creates the application framework.

Amazon Q create app

  1. Under Retrievers, select Use native retriever.
  2. Under Index provisioning, select Starter for a basic, low-cost retriever.
  3. Choose Next.

Amazon Q indexer / retriever

Next, we need to configure a data source. For this example, we use Amazon S3 and assume that you have already created a bucket and uploaded documents to it (for more information, see Step 1: Create your first S3 bucket). For this example, we have uploaded some public domain documents from the Federal Aviation Administration (FAA) technical library relating to software, system standards, instrument flight rating, aircraft construction and maintenance, and more.

  1. For Data sources, choose Amazon S3 to point our RAG assistant to this S3 bucket.

Amazon Q data source

  1. For Data source name, enter a name for your data source (independent of the S3 bucket name, such as my-faa-docs).
  2. Under IAM role, choose Create new service role (Recommended).
  3. Under Sync scope, choose the S3 bucket where you uploaded your documents.
  4. Under Sync run schedule, choose Run on demand (or another option, if you want your documents to be re-indexed on a set schedule).
  5. Choose Add data source.
  6. Leave the remaining settings as default and choose Next to finish adding your Amazon S3 data source.

Amazon Q S3 source

Finally, we need to create user access permissions to our chatbot.

  1. Under Add groups and users, choose Add groups and users.
  2. In the popup that appears, you can choose to either create new users or select existing ones. If you want to use an existing user, you can skip the following steps:
    • Select Add new users, then choose Next.
    • Enter the new user information, including a valid email address.

An email will be sent to that address with a link to validate that user.

  1. Now that you have a user, select Assign existing users and groups and choose Next.
  2. Choose your user, then choose Assign.

Amazon Q add user

You should now have a user assigned to your new chatbot application.

  1. Under Web experience service access, select Create and use a new service role.
  2. Choose Create application.

Amazon Q create app

You now have a new generative AI application! Before the chatbot can answer your questions, you have to run the indexer on your documents at least one time.

  1. On the Applications page, choose your application.

Amazon Q select app

  1. Select your data source and choose Sync now.

The synchronization process takes a few minutes to complete.

  1. When the sync is complete, on the Web experience settings tab, choose the link under Deployed URL.

If you haven’t yet, you will be prompted to log in using the user credentials you created; use the email address as the user name.

Your chatbot is now ready to answer technical questions on the large library of documents you provided. Try it out! You’ll notice that for each answer, the chatbot provides a Sources option that indicates the authoritative reference from which it drew its answer.

Amazon Q chat

Our fully customized chatbot required no coding, no custom data schemas, and no managing of underlying infrastructure to scale! Amazon Q fully manages the infrastructure required to securely deploy your technician’s assistant at scale.

Use case 2: Use Amazon Bedrock Knowledge Bases

As we demonstrated in the previous use case, Amazon Q fully manages the end-to-end RAG workflow and allows business users to get started quickly. But what if you need more granular control of parameters related to the vector database, chunking, retrieval, and models used to generate final answers? Amazon Bedrock Knowledge Bases allows generative AI developers to build and interact with proprietary document libraries for accurate and efficient Q&A over documents. In this example, we use the same FAA documents as before, but this time we set up the RAG solution using Amazon Bedrock Knowledge Bases. We demonstrate how to do this using both APIs and the Amazon Bedrock console. The full notebook for following the API-based approach can be downloaded from the GitHub repo.

The following diagram illustrates the architecture of this solution.

Amazon Bedrock Knowledge Bases

Create your knowledge base using the API

To implement the solution using the API, complete the following steps:

  1. Create a role with the necessary policies to access data from Amazon S3 and write embeddings to Amazon OpenSearch Serverless. This role will be used by the knowledge base to retrieve relevant chunks for OpenSearch based on the input query.
# Create security, network and data access policies within OSS
encryption_policy, network_policy, access_policy = create_policies_in_oss(vector_store_name=vector_store_name,
    aoss_client=aoss_client, bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn)
  1. Create an empty OpenSearch Serverless index to store the document embeddings and metadata. OpenSearch Serverless is a fully managed option that allows you to run petabyte-scale workloads without managing clusters.
# Create the OpenSearch Serverless collection
collection = aoss_client.create_collection(name=vector_store_name, type='VECTORSEARCH')

# Create the index within the collection
response = oss_client.indices.create(index=index_name, body=json.dumps(body_json))
print('Creating index:')
pp.pprint(response)
  1. With the OpenSearch Serverless index set up, you can now create the knowledge base and associate it with a data source containing our documents. For brevity, we haven’t included the full code; to run this example end-to-end, refer to the GitHub repo.
# Initialize OSS configuration for the Knowledge Base
opensearchServerlessConfiguration = { ... }

# Set chunking strategy for how to split documents
chunkingStrategyConfiguration = { ... }

# Configure S3 data source
s3Configuration = { ... }

# Set embedding model ARN
embeddingModelArn = "arn:aws:bedrock:{region}::foundation-model/amazon.titan-embed-text-v2:0"

# Create the Knowledge Base
kb = create_knowledge_base_func()

# Create a data source and associate it with the KB
ds = bedrock_agent_client.create_data_source(...)

# Start ingestion job to load data into OSS
start_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb['knowledgeBaseId'], dataSourceId=ds["dataSourceId"])

The ingestion job will fetch documents from the Amazon S3 data source, preprocess and chunk the text, create embeddings for each chunk, and store them in the OpenSearch Serverless index.

  1. With the knowledge base populated, you can now query it using the RetrieveAndGenerate API and get responses generated by LLMs like Anthropic’s Claude on Amazon Bedrock:
# Helper function to query the knowledge base
def ask_bedrock_llm_with_knowledge_base(query, model_arn, kb_id):
    response = bedrock_agent_runtime_client.retrieve_and_generate(...)
    generated_text = response['output']['text']
    return generated_text

# Example query
query = "How are namespaces registered with the FAA for service providers?"

# Query knowledge base with different Claude models
for model_id in claude_model_ids:
    model_arn = f'arn:aws:bedrock:{region}::foundation-model/{model_id[1]}'
    generated_text = ask_bedrock_llm_with_knowledge_base(query, model_arn, kb_id)
    print(f"Generated using {model_id[0]}:")
    pp.pprint(generated_text)

The RetrieveAndGenerate API converts the query into an embedding, searches the knowledge base for relevant document chunks, and generates a response by providing the retrieved context to the specified language model. We asked the question “How are namespaces registered with the FAA for service providers?” Anthropic’s Claude 3 Sonnet uses the chunks retrieved from our OpenSearch vector index to answer as follows:

To register a namespace with the FAA as a service provider, you need to follow these steps:

  1. Develop the namespaces metadata according to FAA-STD-063 and submit it for registration in the FAA Data Registry (FDR).
  2. The FDR registrar will perform the namespace registration function. The specific process for
    developing and registering a namespace in the FDR involves:
  3. Searching the FDR for an existing namespace that matches your business domain. If none exists, work
    with the FDR registrar to create a new one.
  4. Create and document the new namespace according to FAA-STD-063, following the guidelines for
    organization, web service, or taxonomy namespaces.
  5. Register the namespace in the FDR by either filling out a registration form and submitting it to the FDR
    registrar, or requesting access to enter the metadata directly into the FDR.

Create your knowledge base on the Amazon Bedrock console

If you prefer, you can build the same solution in Amazon Bedrock Knowledge Bases using the Amazon Bedrock console instead of the API-based implementation shown in the previous section. Complete the following steps:

  1. Sign in to your AWS account.
  2. On the Amazon Bedrock console, choose Get started.

Amazon Bedrock getting started

As a first step, you need to set up your permissions to use the various LLMs in Amazon Bedrock.

  1. Choose Model access in the navigation pane.
  2. Choose Modify model access.

Amazon Bedrock model access

  1. Select the LLMs to enable.
  2. Choose Next¸ then choose Submit to complete your access request.

You should now have access to the models you requested.

Amazon Bedrock model select

Now you can set up your knowledge base.

  1. Choose Knowledge bases under Builder tools in the navigation pane.
  2. Choose Create knowledge base.

Amazon Bedrock create Knowledge Base

  1. On the Provide knowledge base details page, keep the default settings and choose Next.
  2. For Data source name, enter a name for your data source or keep the default.
  3. For S3 URI, choose the S3 bucket where you uploaded your documents.
  4. Choose Next.

Amazon Bedrock Knowledge Base details

  1. Under Embeddings model, choose the embeddings LLM to use (for this post, we choose Titan Text Embeddings).
  2. Under Vector database, select Quick create a new vector store.

This option uses OpenSearch Serverless as the vector store.

  1. Choose Next.

Amazon Bedrock embeddings

  1. Choose Create knowledge base to finish the process.

Your knowledge base is now set up! Before interacting with the chatbot, you need to index your documents. Make sure you have already loaded the desired source documents into your S3 bucket; for this walkthrough, we use the same public-domain FAA library referenced in the previous section.

  1. Under Data source, select the data source you created, then choose Sync.
  2. When the sync is complete, choose Select model in the Test knowledge base pane, and choose the model you want to try (for this post, we use Anthropic Claude 3 Sonnet, but Amazon Bedrock gives you the flexibility to experiment with many other models).

Amazon Bedrock data source

Your technician’s assistant is now set up! You can experiment with it using the chat window in the Test knowledge base pane. Experiment with different LLMs and see how they perform. Amazon Bedrock provides a simple API-based framework to experiment with different models and RAG components so you can tune them to help meet your requirements in production workloads.

Amazon Bedrock chat

Clean up

When you’re done experimenting with the assistant, complete the following steps to clean up your created resources to avoid ongoing charges to your account:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select the application you created, and on the Actions menu, choose Delete.
  3. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  4. Select the knowledge base you created, then choose Delete.

Conclusion

This post showed how quickly you can launch generative AI-enabled expert chatbots, trained on your proprietary document sets, to empower your workforce across specific aerospace roles with Amazon Q and Amazon Bedrock. After you have taken these basic steps, more work will be needed to solidify these solutions for production. Future editions in this “GenAI for Aerospace” series will explore follow-up topics, such as creating additional security controls and tuning performance for different content.

Generative AI is changing the way companies address some of their largest challenges. For our aerospace customers, generative AI can help with many of the scaling challenges that come from ramping production rates and the skills of their workforce to match. This post showed how you can apply this technology to expert knowledge challenges in various functions of aerospace development today. The RAG architecture shown can help meet key requirements for aerospace customers: maintaining privacy of data and custom models, minimizing hallucinations, customizing models with private and authoritative reference documents, and direct attribution of answers back to those reference documents. There are many other aerospace applications where generative AI can be applied: non-conformance tracking, business forecasting, bid and proposal management, engineering design and simulation, and more. We examine some of these use cases in future posts.

AWS provides a broad range of AI/ML services to help you develop generative AI solutions for these use cases and more. This includes newly announced services like Amazon Q, which provides fast, relevant answers to pressing business questions drawn from enterprise data sources, with no coding required, and Amazon Bedrock, which provides quick API-level access to a wide range of LLMs, with knowledge base management for your proprietary document libraries and direct integration to external workflows through agents. AWS also offers competitive price-performance for AI workloads, running on purpose-built silicon—the AWS Trainium and AWS Inferentia processors—to run your generative AI services in the most cost-effective, scalable, simple-to-manage way. Get started on addressing your toughest business challenges with generative AI on AWS today!

For more information on working with generative AI and RAG on AWS, refer to Generative AI. For more details on building an aerospace technician’s assistant with AWS generative AI services, refer to Guidance for Aerospace Technician’s Assistant on AWS.


About the authors

Peter Bellows is a Principal Solutions Architect and Head of Technology for Commercial Aviation in the Worldwide Specialist Organization (WWSO) at Amazon Web Services (AWS). He leads technical development for solutions across aerospace domains, including manufacturing, engineering, operations, and security. Prior to AWS, he worked in aerospace engineering for 20+ years.

Shreyas Subramanian is a Principal Data Scientist and helps customers by using Machine Learning to solve their business challenges using the AWS platform. Shreyas has a background in large scale optimization and Machine Learning, and in use of Machine Learning and Reinforcement Learning for accelerating optimization tasks.

Priyanka Mahankali is a Senior Specialist Solutions Architect for Aerospace at AWS, bringing over 7 years of experience across the cloud and aerospace sectors. She is dedicated to streamlining the journey from innovative industry ideas to cloud-based implementations.

Read More

Scalable training platform with Amazon SageMaker HyperPod for innovation: a video generation case study

Scalable training platform with Amazon SageMaker HyperPod for innovation: a video generation case study

Video generation has become the latest frontier in AI research, following the success of text-to-image models. Luma AI’s recently launched Dream Machine represents a significant advancement in this field. This text-to-video API generates high-quality, realistic videos quickly from text and images. Trained on the Amazon SageMaker HyperPod, Dream Machine excels in creating consistent characters, smooth motion, and dynamic camera movements.

To accelerate iteration and innovation in this field, sufficient computing resources and a scalable platform are essential. During the iterative research and development phase, data scientists and researchers need to run multiple experiments with different versions of algorithms and scale to larger models. Model parallel training becomes necessary when the total model footprint (model weights, gradients, and optimizer states) exceeds the memory of a single GPU. However, building large distributed training clusters is a complex and time-intensive process that requires in-depth expertise. Furthermore, as clusters scale to larger sizes (for example, more than 32 nodes), they require built-in resiliency mechanisms such as automated faulty node detection and replacement to improve cluster goodput and maintain efficient operations. These challenges underscore the importance of robust infrastructure and management systems in supporting advanced AI research and development.

Amazon SageMaker HyperPod, introduced during re:Invent 2023, is a purpose-built infrastructure designed to address the challenges of large-scale training. It removes the undifferentiated heavy lifting involved in building and optimizing machine learning (ML) infrastructure for training foundation models (FMs). SageMaker HyperPod offers a highly customizable user interface using Slurm, allowing users to select and install any required frameworks or tools. Clusters are provisioned with the instance type and count of your choice and can be retained across workloads. With these capabilities, customers are adopting SageMaker HyperPod as their innovation platform for more resilient and performant model training, enabling them to build state-of-the-art models faster.

In this post, we share an ML infrastructure architecture that uses SageMaker HyperPod to support research team innovation in video generation. We will discuss the advantages and pain points addressed by SageMaker HyperPod, provide a step-by-step setup guide, and demonstrate how to run a video generation algorithm on the cluster.

Training video generation algorithms on Amazon SageMaker HyperPod: background and architecture

Video generation is an exciting and rapidly evolving field that has seen significant advancements in recent years. While generative modeling has made tremendous progress in the domain of image generation, video generation still faces several challenges that require further improvement.

Algorithms architecture complexity with diffusion model family

Diffusion models have recently made significant strides in generating high-quality images, prompting researchers to explore their potential in video generation. By leveraging the architecture and pre-trained generative capabilities of diffusion models, scientists aim to create visually impressive videos. The process extends image generation techniques to the temporal domain. Starting with noisy frames, the model iteratively refines them, removing random elements while adding meaningful details guided by text or image prompts. This approach progressively transforms abstract patterns into coherent video sequences, effectively translating diffusion models’ success in static image creation to dynamic video synthesis.

However, the compute requirements for video generation using diffusion models increase substantially compared to image generation for several reasons:

  1. Temporal dimension – Unlike image generation, video generation requires processing multiple frames simultaneously. This adds a temporal dimension to the original 2D UNet, significantly increasing the amount of data that needs to be processed in parallel.
  2. Iterative denoising process – The diffusion process involves multiple iterations of denoising for each frame. When extended to videos, this iterative process must be applied to multiple frames, multiplying the computational load.
  3. Increased parameter count – To handle the additional complexity of video data, models often require more parameters, leading to larger memory footprints and increased computational demands.
  4. Higher resolution and longer sequences – Video generation often aims for higher resolution outputs and longer sequences compared to single image generation, further amplifying the computational requirements.

Due to these factors, the operational efficiency of diffusion models for video generation is lower and significantly more compute-intensive compared to image generation. This increased computational demand underscores the need for advanced hardware solutions and optimized model architectures to make video generation more practical and accessible.

Handling the increased computational requirements

The improvement in video generation quality necessitates a significant increase in the size of the models and training data. Researchers have concluded that scaling up the base model size leads to substantial enhancements in video generation performance. However, this growth comes with considerable challenges in terms of computing power and memory resources. Training larger models requires more computational power and memory space, which can limit the accessibility and practical use of these models. As the model size increases, the computational requirements grow exponentially, making it difficult to train these models on single GPU, or even single node multi-GPUs environment. Moreover, storing and manipulating the large datasets required for training also pose significant challenges in terms of infrastructure and costs. High-quality video datasets tend to be massive, requiring substantial storage capacity and efficient data management systems. Transferring and processing these datasets can be time-consuming and resource-intensive, adding to the overall computational burden.

Maintaining temporal consistency and continuity

Maintaining temporal consistency and continuity becomes increasingly challenging as the length of the generated video increases. Temporal consistency refers to the continuity of visual elements, such as objects, characters, and scenes, across subsequent frames. Inconsistencies in appearance, movement, or lighting can lead to jarring visual artifacts and disrupt the overall viewing experience. To address this challenge, researchers have explored the use of multiframe inputs, which provide the model with information from multiple consecutive frames to better understand and model the relationships and dependencies across time. These techniques preserve high-resolution details in visual quality while simulating a continuous and smooth temporal motion process. However, they require more sophisticated modeling techniques and increased computational resources.

Algorithm overview

In the following sections, we illustrate how to run the Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation algorithm on Amazon SageMaker HyperPod for video generation. Animate Anyone is one of the methods for transforming character images into animated videos controlled by desired pose sequences. The key components of the architecture include:

  1. ReferenceNet – A symmetrical UNet structure that captures spatial details of the reference image and integrates them into the denoising UNet using spatial-attention to preserve appearance consistency
  2. Pose guider – A lightweight module that efficiently integrates pose control signals into the denoising process to ensure pose controllability
  3. Temporal layer – Added to the denoising UNet to model relationships across multiple frames, preserving high-resolution details and ensuring temporal stability and continuity of the character’s motion

The model architecture is illustrated in the following image from its original research paper. The method is trained on a dataset of video clips and achieves state-of-the-art results on fashion video and human dance synthesis benchmarks, demonstrating its ability to animate arbitrary characters while maintaining appearance consistency and temporal stability. The implementation of AnimateAnyone can be found in this repository.

To address the challenges of large-scale training infrastructure required in video generation training process, we can use the power of Amazon SageMaker HyperPod. While many customers have adopted SageMaker HyperPod for large-scale training, such as Luma’s launch of Dream Machine and Stability AI’s work on FMs for image or video generation, we believe that the capabilities of SageMaker HyperPod can also benefit lighter ML workloads, including full fine-tuning.

Amazon SageMaker HyperPod concept and advantage

SageMaker HyperPod offers a comprehensive set of features that significantly enhance the efficiency and effectiveness of ML workflows. From purpose-built infrastructure for distributed training to customizable environments and seamless integration with tools like Slurm, SageMaker HyperPod empowers ML practitioners to focus on their core tasks while taking advantage of the power of distributed computing. With SageMaker HyperPod, you can accelerate your ML projects, handle larger datasets and models, and drive innovation in your organization. SageMaker HyperPod provides several key features and advantages in the scalable training architecture.

Purpose-built infrastructure – One of the primary advantages of SageMaker HyperPod is its purpose-built infrastructure for distributed training. It simplifies the setup and management of clusters, allowing you to easily configure the desired instance types and counts, which can be retained across workloads. As a result of this flexibility, you can adapt to various scenarios. For example, when working with a smaller backbone model like Stable Diffusion 1.5, you can run multiple experiments simultaneously on a single GPU to accelerate the iterative development process. As your dataset grows, you can seamlessly switch to data parallelism and distribute the workload across multiple GPUs, such as eight GPUs, to reduce compute time. Furthermore, when dealing with larger backbone models like Stable Diffusion XL, SageMaker HyperPod offers the flexibility to scale and use model parallelism.

Shared file system – SageMaker HyperPod supports the attachment of a shared file system, such as Amazon FSx for Lustre. This integration brings several benefits to your ML workflow. FSx for Lustre enables full bidirectional synchronization with Amazon Simple Storage Service (Amazon S3), including the synchronization of deleted files and objects. It also allows you to synchronize file systems with multiple S3 buckets or prefixes, providing a unified view across multiple datasets. In our case, this means that the installed libraries within the conda virtual environment will be synchronized across different worker nodes, even if the cluster is torn down and recreated. Additionally, input video data for training and inference results can be seamlessly synchronized with S3 buckets, enhancing the experience of validating inference results.

Customizable environment – SageMaker HyperPod offers the flexibility to customize your cluster environment using lifecycle scripts. These scripts allow you to install additional frameworks, debugging tools, and optimization libraries tailored to your specific needs. You can also split your training data and model across all nodes for parallel processing, fully using the cluster’s compute and network infrastructure. Moreover, you have full control over the execution environment, including the ability to easily install and customize virtual Python environments for each project. In our case, all the required libraries for running the training script are installed within a conda virtual environment, which is shared across all worker nodes, simplifying the process of distributed training on multi-node setups. We also installed MLflow Tracking on the controller node to monitor the training progress.

Job distribution with Slurm integration – SageMaker HyperPod seamlessly integrates with Slurm, a popular open source cluster management and job scheduling system. Slurm can be installed and set up through lifecycle scripts as part of the cluster creation process, providing a highly customizable user interface. With Slurm, you can efficiently schedule jobs across different GPU resources so you can run multiple experiments in parallel or use distributed training to train large models for improved performance. With Slurm, customers can customize the job queues, prioritization algorithms, and job preemption policies, ensuring optimal resource use and streamlining your ML workflows. If you are searching a Kubernetes-based administrator experience, recently, Amazon SageMaker HyperPod introduces Amazon EKS support to manage their clusters using a Kubernetes-based interface.

Enhanced productivity – To further enhance productivity, SageMaker HyperPod supports connecting to the cluster using Visual Studio Code (VS Code) through a Secure Shell (SSH) connection. You can easily browse and modify code within an integrated development environment (IDE), execute Python scripts seamlessly as if in a local environment, and launch Jupyter notebooks for quick development and debugging. The Jupyter notebook application experience within VS Code provides a familiar and intuitive interface for iterative experimentation and analysis.

Set up SageMaker HyperPod and run video generation algorithms

In this walkthrough, we use the AnimateAnyone algorithm as an illustration for video generation. AnimateAnyone is a state-of-the-art algorithm that generates high-quality videos from input images or videos. Our walkthrough guidance code is available on GitHub.

Set up the cluster

To create the SageMaker HyperPod infrastructure, follow the detailed intuitive and step-by-step guidance for cluster setup from the Amazon SageMaker HyperPod workshop studio.

The two things you need to prepare are a provisioning_parameters.json file required by HyperPod for setting up Slurm and a cluster-config.json file as the configuration file for creating the HyperPod cluster. Inside these configuration files, you need to specify the InstanceGroupName, InstanceType, and InstanceCount for the controller group and worker group, as well as the execution role attached to the group.

One practical setup is to set up bidirectional synchronization with Amazon FSx and Amazon S3. This can be done with the Amazon S3 integration for Amazon FSx for Lustre. It helps to establish a full bidirectional synchronization of your file systems with Amazon S3. In addition, it can synchronize your file systems with multiple S3 buckets or prefixes.

In addition, if you prefer a local IDE such as VSCode, you can set up an SSH connection to the controller node within your IDE. In this way, the worker nodes can be used for running scripts within a conda environment and a Jupyter notebook server.

Run the AnimateAnyone algorithm

When the cluster is in service, you can connect using SSH into the controller node, then go into the worker nodes, where the GPU compute resources are available. You can follow the SSH Access to compute guide. We suggest installing the libraries on the worker nodes directly.

To create the conda environment, follow the instructions at Miniconda’s Quick command line install. You can then use the conda environment to install all required libraries.

source ~/miniconda3/bin/activate
conda create -n videogen 
pip install -r requirements.txt

To run AnimateAnyone, clone the GitHub repo and follow the instructions.

To train AnimateAnyone, launch stage 1 for training the denoising UNet and ReferenceNet, which enables the model to generate high-quality animated images under the condition of a given reference image and target pose. The denoising UNet and ReferenceNet are initialized based on the pre-trained weights from Stable Diffusion.

accelerate launch train_stage_1.py --config configs/train/stage1.yaml

In stage 2, the objective is to train the temporal layer to capture the temporal dependencies among video frames.

accelerate launch train_stage_2.py --config configs/train/stage2.yaml

Once the training script executes as expected, use a Slurm scheduled job to run on a single node. We provide a batch file to simulate the single-node training job. It can be a single GPU or a single node with multiple GPUs. If you want to know more, the documentation provides detailed instructions on running jobs on SageMaker HyperPod clusters.

sbatch submit-animateanyone-algo.sh
#!/bin/bash
#SBATCH --job-name=video-gen
#SBATCH -N 1
#SBATCH --exclusive
#SBATCH -o video-gen-stage-1.out
export OMP_NUM_THREADS=1
# Activate the conda environment
source ~/miniconda3/bin/activate
conda activate videogen
srun accelerate launch train_stage_1.py --config configs/train/stage1.yaml

Check the job status using the following code snippet.

squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10 dev video-ge ubuntu R 0:16 1 ip-10-1-93-196

By using a small batch size and setting use_8bit_adam=True, you can achieve efficient training on a single GPU. When using a single GPU, use a multi-GPU cluster for running multiple experiments.

The following code block is one example of running four jobs in parallel to test different hyperparameters. We provide the batch file here as well.

sbatch submit-hyperparameter-testing.sh

squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4_0 dev video-ge ubuntu R 0:08 1 ip-10-1-17-56
4_1 dev video-ge ubuntu R 0:08 1 ip-10-1-33-49
4_2 dev video-ge ubuntu R 0:08 1 ip-10-1-37-152
4_3 dev video-ge ubuntu R 0:08 1 ip-10-1-83-68

The experiments can then be compared, and you can move forward with the best configuration. In our scenario, shown in the following screenshot, we use different datasets and video preprocessing strategies to validate the stage 1 training. Then, we quickly draw conclusions about the impact on video quality with respect to stage 1 training results.  For experiment tracking, besides installing MLflow on the controller node to monitor the training progress, you can also leverage the fully managed MLflow capability on Amazon SageMaker. This makes it easy for data scientists to use MLflow on SageMaker for model training, registration, and deployment.

Scale to multi-node GPU setup

As model sizes grow, single GPU memory quickly becomes a bottleneck. Large models easily exhaust memory with pure data parallelism, and implementing model parallelism can be challenging. DeepSpeed addresses these issues, accelerating model development and training.

ZeRO

DeepSpeed is a deep learning optimization library that aims to make distributed training easy, efficient, and effective. DeepSpeed’s ZeRO removes memory redundancies across data-parallel processes by partitioning three model states (optimizer states, gradients, and parameters) across data-parallel processes instead of replicating them. This approach significantly boosts memory efficiency compared to classic data-parallelism while maintaining computational granularity and communication efficiency.

ZeRO offers three stages of optimization:

  1. ZeRO Stage 1 – Partitions optimizer states across processes, with each process updating only its partition
  2. ZeRO Stage 2 – Additionally partitions gradients, with each process retaining only the gradients corresponding to its optimizer state portion
  3. ZeRO Stage 3 – Partitions model parameters across processes, automatically collecting and partitioning them during forward and backward passes

Each stage offers progressively higher memory efficiency at the cost of increased communication overhead. These techniques enable training of extremely large models that would otherwise be impossible. This is particularly useful when working with limited GPU memory or training very large models.

Accelerate

Accelerate is a library that enables running the same PyTorch code across any distributed configuration with minimal code changes. It handles the complexities of distributed setups, allowing developers to focus on their models rather than infrastructure. To put it briefly, Accelerate makes training and inference at scale straightforward, efficient, and adaptable.

Accelerate allows easy integration of DeepSpeed features through a configuration file. Users can supply a custom configuration file or use provided templates. The following is an example of how to use DeepSpeed with Accelerate.

Single node with multiple GPUs job

To run a job on a single node with multiple GPUs, we have tested this configuration on four GPU instances (for example, g5.24xlarge). For these instances, adjust train_width: 768 and train_height: 768, and set use_8bit_adam: False in your configuration file. You’ll likely notice that the model can handle much larger images for generation with these settings.

sbatch submit-deepspeed-singlenode.sh

This Slurm job will:

  1. Allocate a single node
  2. Activate the training environment
  3. Run accelerate launch train_stage_1.py --config configs/train/stage1.yaml

Multi-node with multiple GPUs job

To run a job across multiple nodes, each with multiple GPUs, we have tested this distribution with two ml.g5.24xlarge instances.

sbatch submit-deepspeed-multinode.sh

This Slurm job will:

  1. Allocate the specified number of nodes
  2. Activate the training environment on each node
  3. Run accelerate launch --multi_gpu --num_processes <num_processes> --num_machines <num_machines> train_stage_1.py --config configs/train/stage1.yaml

When running a multi-node job, make sure that the num_processes and num_machines arguments are set correctly based on your cluster configuration.

For optimal performance, adjust the batch size and learning rate according to the number of GPUs and nodes being used. Consider using a learning rate scheduler to adapt the learning rate during training.

Additionally, monitor the GPU memory usage and adjust the model’s architecture or batch size if necessary to prevent out-of-memory issues.

By following these steps and configurations, you can efficiently train your models on single-node and multi-node setups with multiple GPUs, taking advantage of the power of distributed training.

Monitor cluster usage

To achieve comprehensive observability into your SageMaker HyperPod cluster resources and software components, integrate the cluster with Amazon Managed Service for Prometheus and Amazon Managed Grafana. The integration with Amazon Managed Service for Prometheus makes it possible to export of metrics related to your HyperPod cluster resources, providing insights into their performance, utilization, and health. The integration with Amazon Managed Grafana makes it possible to visualize these metrics through various Grafana dashboards that offer intuitive interfaces for monitoring and analyzing the cluster’s behavior. You can follow the SageMaker documentation on Monitor SageMaker HyperPod cluster resources and Workshop Studio Observability section to bootstrap your cluster monitoring with the metric exporter services. The following screenshot shows a Grafana dashboard.

Inference and results discussion

When the fine-tuned model is ready, you have two primary deployment options: using popular image and video generation GUIs like ComfyUI or deploying an inference endpoint with Amazon SageMaker. The SageMaker option offers several advantages, including easy integration of image generation APIs with video generation endpoints to create end-to-end pipelines. As a managed service with auto scaling, SageMaker makes parallel generation of multiple videos possible using either the same reference image with different reference videos or the reverse. Furthermore, you can deploy various video generation model endpoints such as MimicMotion and UniAnimate, allowing for quality comparisons by generating videos in parallel with the same reference image and video. This approach not only provides flexibility and scalability but also accelerates the production process by making possible the generation of a large number of videos quickly, ultimately streamlining the process of obtaining content that meets business requirements. The SageMaker option thus offers a powerful, efficient, and scalable solution for video generation workflows. The following diagram shows a basic version of video generation pipeline. You can modify it based on your own specific business requirements.

Recent advancements in video generation have rapidly overcome limitations of earlier models like AnimateAnyone. Two notable research papers showcase significant progress in this domain.

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance enhances shape alignment and motion guidance. It demonstrates superior ability in generating high-quality human animations that accurately capture both pose and shape variations, with improved generalization on in-the-wild datasets.

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation makes it possible to generate longer videos, up to one minute, compared to earlier models’ limited frame outputs. It introduces a unified noise input supporting both random noise input and first frame conditioned input, enhancing long-term video generation capabilities.

Cleanup

To avoid incurring future charges, delete the resources created as part of this post:

  1. Delete the SageMaker HyperPod cluster using either the CLI or the console.
  2. Once the SageMaker HyperPod cluster deletion is complete, delete the CloudFormation stack. For more details on cleanup, refer to the cleanup section in the Amazon SageMaker HyperPod workshop.
  1. To delete the endpoints created during deployment, refer to the endpoint deletion section we provided in the Jupyter notebook. Then manually delete the SageMaker notebook.

Conclusion

In this post, we explored the exciting field of video generation and showcased how SageMaker HyperPod can be used to efficiently train video generation algorithms at scale. By using the AnimateAnyone algorithm as an example, we demonstrated the step-by-step process of setting up a SageMaker HyperPod cluster, running the algorithm, scaling it to multiple GPU nodes, and monitoring GPU usage during the training process.

SageMaker HyperPod offers several key advantages that make it an ideal platform for training large-scale ML models, particularly in the domain of video generation. Its purpose-built infrastructure allows for distributed training at scale so you can manage clusters with desired instance types and counts. The ability to attach a shared file system such as Amazon FSx for Lustre provides efficient data storage and retrieval, with full bidirectional synchronization with Amazon S3. Moreover, the SageMaker HyperPod customizable environment, integration with Slurm, and seamless connectivity with Visual Studio Code enhance productivity and simplify the management of distributed training jobs.

We encourage you to use SageMaker HyperPod for your ML training workloads, especially those involved in video generation or other computationally intensive tasks. By harnessing the power of SageMaker HyperPod, you can accelerate your research and development efforts, iterate faster, and build state-of-the-art models more efficiently. Embrace the future of video generation and unlock new possibilities with SageMaker HyperPod. Start your journey today and experience the benefits of distributed training at scale.


About the author

Yanwei Cui, PhD, is a Senior Machine Learning Specialist Solutions Architect at AWS. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building AI-powered industrial applications in computer vision, natural language processing, and online user behavior prediction. At AWS, he shares his domain expertise and helps customers unlock business potentials and drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Gordon Wang is a Senior Data Scientist at AWS. He helps customers imagine and scope the use cases that will create the greatest value for their businesses, define paths to navigate technical or business challenges. He is passionate about computer vision, NLP, generative AI, and MLOps. In his spare time, he loves running and hiking.

Gary LO is a Solutions Architect at AWS based in Hong Kong. He is a highly passionate IT professional with over 10 years of experience in designing and implementing critical and complex solutions for distributed systems, web applications, and mobile platforms for startups and enterprise companies. Outside of the office, he enjoys cooking and sharing the latest technology trends and insights on his social media platforms with thousands of followers.

Read More

Control data access to Amazon S3 from Amazon SageMaker Studio with Amazon S3 Access Grants

Control data access to Amazon S3 from Amazon SageMaker Studio with Amazon S3 Access Grants

Amazon SageMaker Studio provides a single web-based visual interface where different personas like data scientists, machine learning (ML) engineers, and developers can build, train, debug, deploy, and monitor their ML models. These personas rely on access to data in Amazon Simple Storage Service (Amazon S3) for tasks such as extracting data for model training, logging model training metrics, and storing model artifacts after training. For example, data scientists need access to datasets stored in Amazon S3 for tasks like data exploration and model training. ML engineers require access to intermediate model artifacts stored in Amazon S3 from past training jobs.

Traditionally, access to data in Amazon S3 from SageMaker Studio for these personas is provided through roles configured in SageMaker Studio—either at the domain level or user profile level. The SageMaker Studio domain role grants permissions for the SageMaker Studio domain to interact with other AWS services, providing access to data in Amazon S3 for all users of that domain. If no specific user profile roles are created, this role will apply to all user profiles, granting uniform access privileges across the domain. However, if different users of the domain have different access restrictions, then configuring individual user roles allows for more granular control. These roles define the specific actions and access each user profile can have within the environment, providing granular permissions.

Although this approach offers a degree of flexibility, it also entails frequent updates to the policies attached to these roles whenever access requirements change, which can add maintenance overhead. This is where Amazon S3 Access Grants can significantly streamline the process. S3 Access Grants enables you to manage access to Amazon S3 data more dynamically, without the need to constantly update AWS Identity and Access Management (IAM) roles. S3 Access Grants allows data owners or permission administrators to set permissions, such as read-only, write-only, or read/write access, at various levels of Amazon S3, such as at the bucket, prefix, or object level. The permissions can be granted to IAM principals or to users and groups from their corporate directory through integration with AWS IAM Identity Center.

In this post, we demonstrate how to simplify data access to Amazon S3 from SageMaker Studio using S3 Access Grants, specifically for different user personas using IAM principals.

Solution overview

Now that we’ve discussed the benefits of S3 Access Grants, let’s look at how grants can be applied with SageMaker Studio user roles and domain roles for granular access control.

Consider a scenario involving a product team with two members: User A and User B. They use an S3 bucket where the following access requirements are implemented:

  • All members of the team should have access to the folder named Product within the S3 bucket.
  • The folder named UserA should be accessible only by User A.
  • The folder named UserB should be accessible only by User B.
  • User A will be running an Amazon SageMaker Processing job that uses S3 Access Grants to get data from the S3 bucket. The processing job will access the required data from the S3 bucket using the temporary credentials provided by the access grants.

The following diagram illustrates the solution architecture and workflow.

Let’s start by creating a SageMaker Studio environment as needed for our scenario. This includes establishing a SageMaker Studio domain, setting up user profiles for User A and User B, configuring an S3 bucket with the necessary folders, configuring S3 Access Grants.

Prerequisites

To set up the SageMaker Studio environment and configure S3 Access Grants as described in this post, you need administrative privileges for the AWS account you’ll be working with. If you don’t have administrative access, request assistance from someone who does. Throughout this post, we assume that you have the necessary permissions to create SageMaker Studio domains, create S3 buckets, and configure S3 Access Grants. If you don’t have these permissions, consult with your AWS administrator or account owner for guidance.

Deploy the solution resources using AWS CloudFormation

To provision the necessary resources and streamline the deployment process, we’ve provided an AWS CloudFormation template that automates the provisioning of required services. Deploying the CloudFormation stack in your account incurs AWS usage charges.

The CloudFormation stack creates the following resources:

Virtual private cloud (VPC) with private subnets with relevant route tables, NAT gateway, internet gateway, and security groups

  • IAM execution roles
  • S3 Access Grants instance
  • AWS Lambda function to load the Abalone dataset into Amazon S3
  • SageMaker domain
  • SageMaker Studio user profiles

Complete the following steps to deploy the stack:

  1. Choose Launch Stack to launch the CloudFormation stack.
    Launch Stack to create Agent
  2. On the Create stack page, leave the default options and choose Next.
  3. On the Specify stack details page, for Stack name, enter a name (for example, blog-sagemaker-s3-access-grants).
  4. Under Parameters, provide the following information:
    1. For PrivateSubnetCIDR, enter the IP address range in CIDR notation that should be allocated for the private subnet.
    2. For ProjectName, enter sagemaker-blog.
    3.  For VpcCIDR, enter the desired IP address range in CIDR notation for the VPC being created.
  5. Choose Next.
  6. On the Configure stack options page, leave the default options and choose Next.
  7. On the Review and create page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  8. Review the template and choose Create stack.

After the successful deployment of stack, you can view the resources created on the stack’s Outputs tab on the AWS CloudFormation console.

Validate data in the S3 bucket

To validate access to the S3 bucket, we use the Abalone dataset. As part of the CloudFormation stack deployment process, a Lambda function is invoked to load the data into Amazon S3. After the Lambda function is complete, you should find the abalone.csv file in all three folders (Product, UserA, and UserB) within the S3 bucket.

Validate the SageMaker domain and associated user profiles

Complete the following steps to validate the SageMaker resources:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose Product-Domain to be directed to the domain details page.
  3. In the User profiles section, verify that the userA and userB profiles are present.
  4. Choose a user profile name to be directed to the user profile details.
  5. Validate that each user profile is associated with its corresponding IAM role: userA is associated with sagemaker-usera-role, and userB is associated with sagemaker-userb-role.

Validate S3 Access Grants setup

Complete the following steps to validate your configuration of S3 Access Grants:

  1. On the Amazon S3 console, choose Access Grants in the navigation pane.
  2. Choose View details to be directed to the details page of S3 Access Grants.
  3. On the Locations tab, confirm that the URI of S3 bucket created is registered with the S3 Access Grants instance for the location scope.
  4. On the Grants tab, confirm the following:
    1. sagemaker-usera-role has been given read/write permissions on the S3 prefix Product/* and UserA/*
    2. sagemaker-userb-role has been given read/write permissions on the S3 prefix Product/* and UserB/*

Validate access from your SageMaker Studio environment

To validate the access grants we set up, we run a distributed data processing job on the Abalone dataset using SageMaker Processing jobs and PySpark.

To get started, complete the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose the domain Product-Domain to be directed to the domain details page.
  3. Choose userA under User profiles.
  4. On the User Details page, choose Launch and choose Studio.
  5. On the SageMaker Studio console, choose JupyterLab in the navigation pane.
  6. Choose Create JupyterLab space.
  7. For Name, enter usera-space.

  8. For Sharing, select Private.

  9. Choose Create space.

  10. After the space is created, choose Run space.
  11. When the status shows as Running, choose Open JupyterLab, which will redirect you to the SageMaker JupyterLab experience.
  12. On the Launcher page, choose Python 3 under Notebook.
    This will open a new Python notebook, which we use to run the PySpark script.

    Let’s validate the access grants by running a distributed job using SageMaker Processing jobs to process data, because we often need to process data before it can be used for training ML models. SageMaker Processing jobs allow you to run distributed data processing workloads while using the access grants you set up earlier.
  13. Copy the following PySpark script into a cell in your SageMaker Studio notebook.
    The %%writefile directive is used to save the script locally. The script is used to generate temporary credentials using the access grant and configures Spark to use these credentials for accessing data in Amazon S3. It performs some basic feature engineering on the Abalone dataset, including string indexing, one-hot encoding, and vector assembly, and combines them into a pipeline. It then does an 80/20 split to produce training and validation datasets as outputs, and saves these datasets in Amazon S3.
    Make sure to replace region_name with the AWS Region you’re using in the script.
    %%writefile ./preprocess.py
    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType, StructField, StringType, DoubleType
    from pyspark.ml import Pipeline
    from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler
    import argparse
    import subprocess
    import sys
    
    def install_packages():
        subprocess.check_call([sys.executable, "-m", "pip", "install", "boto3==1.35.1", "botocore>=1.35.0"])
    
    install_packages()
    import boto3
    print(f"logs: boto3 version in the processing job: {boto3.__version__}")
    import botocore
    print(f"logs: botocore version in the processing job: {botocore.__version__}")
    
    def get_temporary_credentials(account_id, bucket_name, object_key_prefix):
        region_name = '<region>'
        s3control_client = boto3.client('s3control', region_name=region_name)
        response = s3control_client.get_data_access(
            AccountId=account_id,
            Target=f's3://{bucket_name}/{object_key_prefix}/',
            Permission='READWRITE'
        )
        return response['Credentials']
    
    def configure_spark_with_s3a(credentials):
        spark = SparkSession.builder 
            .appName("PySparkApp") 
            .config("spark.hadoop.fs.s3a.access.key", credentials['AccessKeyId']) 
            .config("spark.hadoop.fs.s3a.secret.key", credentials['SecretAccessKey']) 
            .config("spark.hadoop.fs.s3a.session.token", credentials['SessionToken']) 
            .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") 
            .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider") 
            .getOrCreate()
        
        spark.sparkContext._jsc.hadoopConfiguration().set(
            "mapred.output.committer.class", "org.apache.hadoop.mapred.FileOutputCommitter"
        )
        return spark
    
    def csv_line(data):
        r = ",".join(str(d) for d in data[1])
        return str(data[0]) + "," + r
    
    def main():
        parser = argparse.ArgumentParser(description="app inputs and outputs")
        parser.add_argument("--account_id", type=str, help="AWS account ID")
        parser.add_argument("--s3_input_bucket", type=str, help="s3 input bucket")
        parser.add_argument("--s3_input_key_prefix", type=str, help="s3 input key prefix")
        parser.add_argument("--s3_output_bucket", type=str, help="s3 output bucket")
        parser.add_argument("--s3_output_key_prefix", type=str, help="s3 output key prefix")
        args = parser.parse_args()
    
        # Get temporary credentials for both reading and writing
        credentials = get_temporary_credentials(args.account_id, args.s3_input_bucket, args.s3_input_key_prefix)
        spark = configure_spark_with_s3a(credentials)
    
        # Defining the schema corresponding to the input data
        schema = StructType([
            StructField("sex", StringType(), True),
            StructField("length", DoubleType(), True),
            StructField("diameter", DoubleType(), True),
            StructField("height", DoubleType(), True),
            StructField("whole_weight", DoubleType(), True),
            StructField("shucked_weight", DoubleType(), True),
            StructField("viscera_weight", DoubleType(), True),
            StructField("shell_weight", DoubleType(), True),
            StructField("rings", DoubleType(), True),
        ])
    
        # Reading data directly from S3 using s3a protocol
        total_df = spark.read.csv(
            f"s3a://{args.s3_input_bucket}/{args.s3_input_key_prefix}/abalone.csv",
            header=False,
            schema=schema
        )
    
        # Transformations and data processing
        sex_indexer = StringIndexer(inputCol="sex", outputCol="indexed_sex")
        sex_encoder = OneHotEncoder(inputCol="indexed_sex", outputCol="sex_vec")
        assembler = VectorAssembler(
            inputCols=[
                "sex_vec",
                "length",
                "diameter",
                "height",
                "whole_weight",
                "shucked_weight",
                "viscera_weight",
                "shell_weight",
            ],
            outputCol="features"
        )
        pipeline = Pipeline(stages=[sex_indexer, sex_encoder, assembler])
        model = pipeline.fit(total_df)
        transformed_total_df = model.transform(total_df)
        (train_df, validation_df) = transformed_total_df.randomSplit([0.8, 0.2])
    
        # Saving transformed datasets to S3 using RDDs and s3a protocol
        train_rdd = train_df.rdd.map(lambda x: (x.rings, x.features))
        train_lines = train_rdd.map(csv_line)
        train_lines.saveAsTextFile(
            f"s3a://{args.s3_output_bucket}/{args.s3_output_key_prefix}/train"
        )
    
        validation_rdd = validation_df.rdd.map(lambda x: (x.rings, x.features))
        validation_lines = validation_rdd.map(csv_line)
        validation_lines.saveAsTextFile(
            f"s3a://{args.s3_output_bucket}/{args.s3_output_key_prefix}/validation"
        )
    
    if __name__ == "__main__":
        main()
  14. Run the cell to create the preprocess.py file locally.
  15. Next, you use the PySparkProcessor class to define a Spark job and run it using SageMaker Processing. Copy the following code into a new cell in your SageMaker Studio notebook, and run the cell to invoke the SageMaker Processing job:
    from sagemaker.spark.processing import PySparkProcessor
    from time import gmtime, strftime
    import boto3
    import sagemaker
    import logging
    
    # Get region
    region = boto3.Session().region_name
    
    # Initialize Boto3 and SageMaker sessions
    boto_session = boto3.Session(region_name=region)
    sagemaker_session = sagemaker.Session(boto_session=boto_session)
    
    # Get account id
    def get_account_id():
        client = boto3.client("sts")
        return client.get_caller_identity()["Account"]
    account_id = get_account_id()
    
    bucket = sagemaker_session.default_bucket()
    role = sagemaker.get_execution_role()
    sagemaker_logger = logging.getLogger("sagemaker")
    sagemaker_logger.setLevel(logging.INFO)
    sagemaker_logger.addHandler(logging.StreamHandler())
    
    # Set up S3 bucket and paths
    timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())
    prefix = "Product/sagemaker/spark-preprocess-demo/{}".format(timestamp_prefix)
    
    # Define the account ID and S3 bucket details
    input_bucket = f'blog-access-grants-{account_id}-{region}'
    input_key_prefix = 'UserA'
    output_bucket = f'blog-access-grants-{account_id}-{region}'
    output_key_prefix = 'UserA/output'
    
    # Define the Spark processor using the custom Docker image
    spark_processor = PySparkProcessor(
        framework_version="3.3",
        role=role,
        instance_count=2,
        instance_type="ml.m5.2xlarge",
        base_job_name="spark-preprocess-job",
        sagemaker_session=sagemaker_session 
    )
    
    # Run the Spark processing job
    spark_processor.run(
        submit_app="./preprocess.py",
        arguments=[
            "--account_id", account_id,
            "--s3_input_bucket", input_bucket,
            "--s3_input_key_prefix", input_key_prefix,
            "--s3_output_bucket", output_bucket,
            "--s3_output_key_prefix", output_key_prefix,
        ],
        spark_event_logs_s3_uri=f"s3://{output_bucket}/{prefix}/spark_event_logs",
        logs=False
    )

    A few things to note in the definition of the PySparkProcessor:

    • This is a multi-node job with two ml.m5.2xlarge instances (specified in the instance_count and instance_type parameters)
    • The Spark framework version is set to 3.3 using the framework_version parameter
    • The PySpark script is passed using the submit_app parameter
    • Command line arguments to the PySpark script (such as the account ID, input/output bucket names, and input/output key prefixes) are passed through the arguments parameter
    • Spark event logs will be offloaded to the Amazon S3 location specified in spark_event_logs_s3_uri and can be used to view the Spark UI while the job is in progress or after it’s complete.
  16. After the job is complete, validate the output of the preprocessing job by looking at the first five rows of the output dataset using the following validation script:
    import boto3
    import pandas as pd
    import io
    
    # Initialize S3 client
    s3 = boto3.client('s3')
    
    # Get region
    region = boto3.Session().region_name
    
    # Get account id
    def get_account_id():
        client = boto3.client("sts")
        return client.get_caller_identity()["Account"]
    account_id = get_account_id()
    # Replace with your bucket name and output key prefix bucket_name = f'blog-access-grants-{account_id}-{region}' output_key_prefix = 'UserA/output/train' # Get temporary credentials for accessing S3 data using user profile role s3control_client = boto3.client('s3control') response = s3control_client.get_data_access( AccountId=account_id, Target=f's3://{bucket_name}/{output_key_prefix}', Permission='READ' ) credentials = response['Credentials'] # Create an S3 client with the temporary credentials s3_client = boto3.client( 's3', aws_access_key_id=credentials['AccessKeyId'], aws_secret_access_key=credentials['SecretAccessKey'], aws_session_token=credentials['SessionToken'] ) objects = s3_client.list_objects(Bucket=bucket_name, Prefix=output_key_prefix) # Read the first part file into a pandas DataFrame first_part_key = f"{output_key_prefix}/part-00000" obj = s3_client.get_object(Bucket=bucket_name, Key=first_part_key) data = obj['Body'].read().decode('utf-8') df = pd.read_csv(io.StringIO(data), header=None) # Print the top 5 rows print(f"Top 5 rows from s3://{bucket_name}/{first_part_key}") print(df.head())

    This script uses the access grants to obtain temporary credentials, reads the first part file (part-00000) from the output location into a pandas DataFrame, and prints the top five rows of the DataFrame.
    Because the User A role has access to the userA folder, the user can read the contents of the file part-00000, as shown in the following screenshot.

    Now, let’s validate access to the userA folder from the User B profile.

  17. Repeat the earlier steps to launch a Python notebook under the User B profile.

  18. Use the validation script to read the contents of the file part-00000, which is in the userA folder.

If User B tries to read the contents of the file part-00000, which is in the userA folder, their access will be denied, as shown in the following screenshot, because User B doesn’t have access to the userA folder.

Clean up

To avoid incurring future charges, delete the CloudFormation stack. This will delete resources such as the SageMaker Studio domain, S3 Access Grants instance, and S3 bucket you created.

Conclusion

In this post, you learned how to control data access to Amazon S3 from SageMaker Studio with S3 Access Grants. S3 Access Grants provides a more flexible and scalable mechanism to define access patterns at scale than IAM based techniques. These grants not only support IAM principals but also allow direct granting of access to users and groups from a corporate directory that is synchronized with IAM Identity Center.

Take the next step in optimizing your data management workflow by integrating S3 Access Grants into your AWS environment alongside SageMaker Studio, a web-based visual interface for building, training, debugging, deploying, and monitoring ML models. Take advantage of the granular access control and scalability offered by S3 Access Grants to enable efficient collaboration, secure data access, and simplified access management for your team working in the SageMaker Studio environment. For more details, refer to Managing access with S3 Access Grants and Amazon SageMaker Studio.


About the authors

Koushik Konjeti is a Senior Solutions Architect at Amazon Web Services. He has a passion for aligning architectural guidance with customer goals, ensuring solutions are tailored to their unique requirements. Outside of work, he enjoys playing cricket and tennis.

Vijay Velpula is a Data Architect with AWS Professional Services. He helps customers implement Big Data and Analytics Solutions. Outside of work, he enjoys spending time with family, traveling, hiking and biking.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey. In his spare time, he rides motorcycle and enjoys the nature with his family.

Read More

A Whole New World: ‘GreedFall II: The Dying World’ Joins GeForce NOW

A Whole New World: ‘GreedFall II: The Dying World’ Joins GeForce NOW

Whether looking for a time-traveling adventure, strategic roleplay or epic action, anyone can find something to play on GeForce NOW, with over 2,000 games in the cloud.

The GeForce NOW library continues to grow with seven titles arriving this week, including the role-playing game GreedFall II: The Dying World from developer Spiders and publisher Nacon.

Plus, be sure to claim the new in-game reward for Guild Wars 2 for extra style points.

GeForce NOW is improving experiences for members using Windows on Arm laptops. Support for these products is currently in beta, and improvements will be included in the GeForce NOW 2.0.67 app update, rolling out this week to bring GeForce NOW streaming at up to 4K resolution, 120 frames per second and high dynamic range to Arm-based laptops.

Greed Is Good

Greedfall II on GeForce NOW
Greed falls, frame rates rise in the cloud.

GreedFall II: The Dying World, the sequel to the acclaimed GreedFall, transports players to a captivating world set three years before the events of the original game. It features a revamped combat system, offering players enhanced control over Companions, and introduces a tactic pause feature during live battles for strategic planning. In this immersive adventure, step into the shoes of a person native to the magical archipelago uprooted from their homeland and thrust into the complex political landscape of the Old Continent. GreedFall II delivers an immersive experience filled with alliances, schemes and intense battles as players navigate the treacherous waters of colonial conflict and supernatural forces.

Members can shape the destiny of the Old Continent all from the cloud. Ultimate and Priority members can elevate their gaming experiences with longer gaming sessions and higher-resolution gameplay over free members. Upgrade today to get immersed in the fight for freedom.

Adventure in Style

The Guild Wars 2: Janthir Wilds expansion is here, bringing new adventures and challenges to explore in the world of Tyria. To celebrate this release, GeForce NOW is offering a special member reward: a unique style bundle to enliven members’ in-game experiences.

Guild Wars II reward on GeForce NOW
So fancy.

Transform characters’ hairstyle, horns and facial hair, customize armor and tailor a wardrobe for epic quests. The reward allows players to stand out as a true champion of Tyria while exploring the new lands of Janthir.

Members enrolled in the GeForce NOW rewards program can check their email for instructions on how to claim the reward. Ultimate and Priority members can redeem their style packages today, and free members can access the reward beginning on Friday, Sept. 27. Don’t miss out — the offer is available through Saturday, Oct. 26, on a first-come, first-served basis.

Something for Everyone

Remnant II DLC on GeForce NOW
The apocalypse never looked so good.

The hit survival action shooter Remnant II from Arc Games this week released its newest and final downloadable content (DLC), The Dark Horizon, along with a free update that brings a brand-new game mode called Boss Rush. In the DLC, players return to N’Erud and uncover a mysterious place preserved in time, where alien farmlands are tended by robots for inhabitants who have long since perished. But time corrupts all, and robotic creations threaten at every turn. Stream the game instantly on GeForce NOW without waiting for downloads or updates.

Members can look for the following games available to stream in the cloud this week:

  • Witchfire (New release on Steam, Sept. 23)
  • Tiny Glade (New release on Steam, Sept. 23)
  • Disney Epic Mickey: Rebrushed (New release on Steam, Sept. 24)
  • GreedFall II: The Dying World (New release on Steam, Sept. 24)
  • Breachway (New release on Steam, Sept. 26)
  • Mechabellum (New release on Steam, Sept. 26)
  • Monopoly (New release on Ubisoft Connect, Sept. 26)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Microsoft Research Forum Episode 4: The future of multimodal models, a new “small” language model, and other AI updates

Microsoft Research Forum Episode 4: The future of multimodal models, a new “small” language model, and other AI updates

Microsoft Research Forum is a continuous exchange of ideas about science and technology research in the era of general AI. In the latest episode (opens in new tab), researchers discussed the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization. Researchers at Microsoft are working to explore breakthrough technology that can help advance everything from weather prediction to materials design. 

Below is a brief recap of the event, including select quotes from the presentations. Register to join future Research Forum episodes and view previous sessions. Transcripts and additional resources can be found in the Research Forum briefing book.

Keynote

Phi-3-Vision: A highly capable and “small” language vision model (opens in new tab)

Research Forum | Episode 4 Keynote | Jianfeng Gao

Jianfeng Gao introduced Phi-3-Vision, an advanced and economical open-source multimodal model. As a member of the Phi-3 model family, Phi-3-Vision enhances language models by integrating multisensory skills, seamlessly combining language and vision capabilities.

“Phi-3-Vision is the first multimodal model in the Phi small model family. It matches and sometimes exceeds some of the capabilities of much larger models … at a much lower cost. And to help everyone build more affordable and accessible AI systems, we have released the model weights into the open-source community.”

Jianfeng Gao, Distinguished Scientist and Vice President, Microsoft Research Redmond


Panel Discussion

Beyond language: The future of multimodal models in healthcare, gaming, and AI (opens in new tab)

Research Forum | Episode 4 Panel | John Langford, Hoifung Poon, Katja Hofmann, Jianwei Yang

This discussion examined the transformative potential and core challenges of multimodal models across various domains, including precision health, game intelligence, and foundation models. Microsoft researchers John Langford, Hoifung Poon, Katja Hofmann, and Jianwei Yang shared their thoughts on future directions, bridging gaps, and fostering synergies within the field. 

“One of the really cutting-edge treatments for cancer these days is immunotherapy. That works by mobilizing the immune system to fight the cancer. And then one of the blockbuster drugs is a KEYTRUDA, that really can work miracles for some of the late- stage cancers … Unfortunately, only 20 to 30 percent of the patients actually respond. So that’s … a marquee example of what are the growth opportunity in precision health.”
Hoifung Poon, General Manager, Microsoft Research Health Futures

“We experience the world through vision, touch, and all our other senses before we start to make sense of any of the language that is spoken around us. So, it’s really, really interesting to think through the implications of that, and potentially, as we start to understand more about the different modalities that we can model and the different ways in which we combine them.”
Katja Hofmann, Senior Principal Researcher, Microsoft Research

“To really have a capable multimodal model, we need to encode different information from different modalities, for example, from vision, from language, from even audio, speech, etc. We need to develop a very capable encoder for each of these domains and then … tokenize each of these raw data.”
Jianwei Yang, Principal Researcher, Microsoft Research Redmond


Lightning Talks

Analog optical computing for sustainable AI and beyond (opens in new tab)

Research Forum | Episode 4 Talk 1 | Francesca Parmigiani and Jiaqi Chu

This talk presented a new kind of computer—an analog optical computer—that has the potential to accelerate AI inference and hard optimization workloads by 100x, leveraging hardware-software co-design to improve the efficiency and sustainability of real-world applications. 

“Most likely, you or your loved ones have been inside an MRI scan not really a great place to be in. Imagine if you can reduce that amount of time from 20 to 40 minutes to less than five minutes.”
Francesca Parmigiani, Principal Researcher, Microsoft Research Cambridge

“I’m really excited to share that we have just completed the second generation of [this] computer. It is much smaller in physical size, and this is a world first in that exactly the same computer is simultaneously solving hard optimization problems and accelerating machine learning inference. Looking ahead, we estimate that at scale, this computer can achieve around 450 tera operations per second per watt, which is a 100-times improvement as compared to state-of-the-art GPUs.”
Jiaqi Chu, Principal Researcher, Microsoft Research Cambridge


Direct Nash Optimization: Teaching language models to self-improve with general preferences (opens in new tab)

Research Forum | Episode 4 Talk 2 | Corby Rosset

This talk explored teaching language models to self-improve using AI preference feedback, challenging the model to play against itself and a powerful teacher until it arrives at a Nash equilibrium, resulting in state-of-the-art win rates against GPT-4 Turbo on benchmarks such as AlpacaEval and MT-Bench. 

“The traditional way to fine-tune an LLM for post-training … basically tells the model to emulate good behaviors, but it does not target or correct any mistakes or bad behaviors that it makes explicitly. … Self-improving post-training explicitly identifies and tries to correct bad behaviors or mistakes that the model makes.”
Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers


Project Aurora: The first large-scale foundation model of the atmosphere (opens in new tab)

Research Forum | Episode 4 Talk 3 | Megan Stanley

This talk presented Aurora, a cutting-edge foundation model that offers a new approach to weather forecasting that could transform our ability to predict and mitigate the impacts of extreme events, air pollution, and the changing climate.

“If we look at Aurora’s ability to predict pollutants such as nitrogen dioxide that are strongly related to emissions from human activity, we can see that the model has learned to make these predictions with no emissions data provided. It’s learned the implicit patterns that cause the gas concentrations, which is very impressive.”
Megan Stanley, Senior Researcher, Microsoft Research AI for Science


A generative model of biology for in-silico experimentation and discovery (opens in new tab)

Research Forum | Episode 4 Talk 4 | Kevin Yang

This talk explored how deep learning enables generation of novel and useful biomolecules, allowing researchers and practitioners to better understand biology. This includes EvoDiff, a general-purpose diffusion framework that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models to generate new proteins, given a protein sequence.

“Often, protein engineers want proteins that perform a similar function to a natural protein, or they want to produce a protein that performs the same function but has other desirable properties, such as stability. By conditioning EvoDiff with a family of related sequences, we can generate new proteins that are very different in sequence space to the natural proteins but are predicted to fold into similar three-dimensional structures. These may be good starting points for finding new functions or for discovering versions of a protein with desirable properties.”
Kevin Yang, Senior Researcher, Microsoft Research New England


Fostering appropriate reliance on AI (opens in new tab)

Research Forum | Episode 4 Talk 5 | Mihaela Vorvoreanu

Since AI systems are probabilistic, they can make mistakes. One of the main challenges in human-AI interaction is to avoid overreliance on AI and empower people to determine when to accept or not accept an AI system’s recommendation. This talk explores Microsoft’s work in this area.

“This is where I think it is our responsibility as people working in UX disciplines—as people researching UX and human-computer interaction—to really, really step up to the front and see how it is our moment to shine and to address this problem.”
Mihaela Vorvoreanu, Director UX Research and Responsible AI Education, Microsoft AI Ethics and Effects in Engineering and Research (Aether)

The post Microsoft Research Forum Episode 4: The future of multimodal models, a new “small” language model, and other AI updates appeared first on Microsoft Research.

Read More

Contextualization of ASR with LLM Using Phonetic Retrieval-Based Augmentation

Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition task and propose a retrieval-based solution to contextualize the LLM: we first let the LLM detect named entities in speech without any context, then use this named entity as a query to retrieve…Apple Machine Learning Research

Improve employee productivity using generative AI with Amazon Bedrock

Improve employee productivity using generative AI with Amazon Bedrock

The Employee Productivity GenAI Assistant Example is a practical AI-powered solution designed to streamline writing tasks, allowing teams to focus on creativity rather than repetitive content creation. Built on AWS technologies like AWS Lambda, Amazon API Gateway, and Amazon DynamoDB, this tool automates the creation of customizable templates and supports both text and image inputs. Using generative AI models such as Anthropic’s Claude 3 from Amazon Bedrock, it provides a scalable, secure, and efficient way to generate high-quality content. Whether you’re new to AI or an experienced user, this simplified interface allows you to quickly take advantage of the power of this sample code, enhancing your team’s writing capabilities and enabling them to focus on more valuable tasks.

By using Amazon Bedrock and generative AI on AWS, organizations can accelerate their innovation cycles, unlock new business opportunities, and deliver innovative solutions powered by the latest advancements in generative AI technology, while maintaining high standards of security, scalability, and operational efficiency.

AWS takes a layered approach to generative AI, providing a comprehensive stack that covers the infrastructure for training and inference, tools to build with large language models (LLMs) and other foundation models (FMs), and applications that use these models. At the bottom layer, AWS offers advanced infrastructure like graphics processing units (GPUs), AWS Trainium, AWS Inferentia, and Amazon SageMaker, along with capabilities like UltraClusters, Elastic Fabric Adapter (EFA), and Amazon EC2 Capacity Blocks for efficient model training and inference. The middle layer, Amazon Bedrock, provides a managed service that allows you to choose from industry-leading models, customize them with your own data, and use security, access controls, and other features. This layer includes capabilities like guardrails, agents, Amazon Bedrock Studio, and customization options. The top layer consists of applications like Amazon Q Business, Amazon Q Developer, Amazon Q in QuickSight, and Amazon Q in Connect, which enable you to use generative AI for various tasks and workflows. This post focuses exclusively on the middle layer, tools with LLMs and other FMs, specifically Amazon Bedrock and its capabilities for building and scaling generative AI applications.

Employee GenAI Assistant Example: Key features

In this section, we discuss the key features of the Employee Productivity GenAI Assistant Example and its console options.

The Playground page of the Employee Productivity GenAI Assistant Example is designed to interact with Anthropic’s Claude language models on Amazon Bedrock. In this example, we explore how to use the Playground feature to request a poem about New York City, with the model’s response dynamically streamed back to the user.

Playground GIF

This process includes the following steps:

  1. The Playground interface provides a dropdown menu to choose the specific AI model to be used. In this case, use claude-3:sonnet-202402229-v1.0, which is a version of Anthropic’s Claude 3.
  2. In the Input field, enter the prompt “Write a poem about NYC” to request the AI model to compose a poem about New York.
  3. After you enter the prompt, choose Submit. This sends the API request to Amazon Bedrock, which is hosting the Anthropic’s Claude 3 Sonnet language model. 

As the AI model processes the request and generates the poem, it’s streamed back to Output in real time, allowing you to observe the text being generated word by word or line by line.

The Templates page lists various predefined sample prompt templates, such as Interview Question Crafter, Perspective Change Prompt, Grammar Genie, and Tense Change Prompt.

Template GIF

Now let’s create a template called Product Naming Pro:

  1. Add a customized prompt by choosing Add Prompt Template.
  2. Enter Product Naming Pro as the name and Create catchy product names from descriptions and keywords as the description.
  3. Choose anthropic.claude-3:sonnet-202402229-v1.0 as the model.

The template section includes a System Prompt option. In this example, we provide the System Prompt with guidance on creating effective product names that capture the essence of the product and leave a lasting impression.

The ${INPUT_DATA} field is a placeholder variable that allows template users to provide their input text, which will be incorporated into the prompt used by the system. The visibility of the template can be set as Public or Private. A public template can be seen by authenticated users within the deployment of the solution, making sure that only those with an account and proper authentication can access it. In contrast, a private template is only visible to your own authenticated user, keeping it exclusive to you. Additional information, such as the creator’s email address, is also displayed.

The interface showcases the creation of a Product Naming Pro template designed to generate catchy product names from descriptions and keywords, enabling efficient prompt engineering.

On the Activity page, you can choose a prompt template to generate output based on provided input.

Activity GIF

The following steps demonstrate how to use the Activity feature:

  1. Choose the Product Naming Pro template created in the previous section.
  2. In the input field, enter a description: A noise-canceling, wireless, over-ear headphone with a 20-hour battery life and touch controls. Designed for audiophiles and frequent travelers.
  3. Add relevant keywords: immersive, comfortable, high-fidelity, long-lasting, convenient.
  4. After you provide the input description and keywords, choose Submit.

The output section displays five suggested product names that were generated based on the input. For example, SoundScape Voyager, AudioOasis Nomad, EnvoyAcoustic, FidelityTrek, and SonicRefuge Traveler.

The template has processed the product description and keywords to create catchy and descriptive product name suggestions that capture the essence of the noise-canceling, wireless, over-ear headphones designed for audiophiles and frequent travelers.

The History page displays logs of the interactions and activities performed within the application, including requests made on the Playground and Activity pages.

History GIF

At the top of the interface, a notification indicates that text has been copied to the clipboard, enabling you to copy generated outputs or prompts for use elsewhere.

The View and Delete options allow you to review the full details of the interaction or delete the entry from the history log, respectively.

The History page provides a way to track and revisit past activities within the application, providing transparency and allowing you to reference or manage your previous interactions with the system. The history saves your inputs and outputs on the Playground and Activity page (at the time of writing, Chat page history is not yet supported). You can only see the history of your own user requests, safeguarding security and privacy, and no other users can access your data. Additionally, you have the option to delete records stored in the history at any time if you prefer not to keep them.

Chat GIF

The interactive chat interface displays a chat conversation. The user is greeted by the assistant, and then chooses the Product Naming Pro template and provides a product description for a noise-canceling, wireless headphone designed for audiophiles and frequent travelers. The assistant responds with an initial product name recommendation based on the description. The user then requests additional recommendations, and the assistant provides five more product name suggestions. This interactive conversation highlights how the chat functionality allows continued natural language interaction with the AI model to refine responses and explore multiple options.

In the following example, the user chooses an AI model (for example, anthropic.claude-3-sonnet-202402280-v1.0) and provides input for that model. An image named headphone.jpg has been uploaded and the user asks “Please describe the image uploaded in detail to me.”

MultiModal GIF

The user chooses Submit and the AI model’s output is displayed, providing a detailed description of the headphone image. It describes the headphones as “over-ear wireless headphones in an all-black color scheme with a sleek and modern design.” It mentions the matte black finish on the ear cups and headband, as well as the well-padded soft leather or leatherette material for comfort during extended listening sessions.

This demonstrates the power of multi-modality models like the Anthropic’s Claude 3 family on Amazon Bedrock, allowing you to upload and use up to six images on the Playground or Activity pages as inputs for generating context-rich, multi-modal responses.

Solution overview

The Employee Productivity GenAI Assistant Example is built on robust AWS serverless technologies such as AWS Lambda, API Gateway, DynamoDB, and Amazon Simple Storage Service (Amazon S3), maintaining scalability, high availability, and security through Amazon Cognito. These technologies provide a foundation that allows the Employee Productivity GenAI Assistant Example to respond to user needs on-demand while maintaining strict security standards. The core of its generative abilities is derived from the powerful AI models available in Amazon Bedrock, which help deliver tailored and high-quality content swiftly.

The following diagram illustrates the solution architecture.

Architecture Diagram

The workflow of the Employee Productivity GenAI Assistant Example includes the following steps:

  1. Users access a static website hosted in the us-east-1 AWS Region, secured with AWS WAF. The frontend of the application consists of a React application hosted on an S3 bucket (S3 React Frontend), distributed using Amazon CloudFront.
  2. Users can initiate REST API calls from the static website, which are routed through an API Gateway. API Gateway manages these calls and interacts with multiple components:
    1. The API interfaces with a DynamoDB table to store and retrieve template and history data.
    2. The API communicates with a Python-based Lambda function to process requests.
    3. The API generates pre-signed URLs for image uploads and downloads to and from an S3 bucket (S3 Images).
  3. API Gateway integrates with Amazon Cognito for user authentication and authorization, managing users and groups.
  4. Users upload images to the S3 bucket (S3 Images) using the pre-signed URLs provided by API Gateway.
  5. When users request image downloads, a Lambda authorizer function written in Java is invoked, recording the request in the history database (DynamoDB table).
  6. For streaming data, users establish a WebSocket connection with an API Gateway WebSocket, which interacts with a Python Lambda function to handle the streaming data. The streaming data undergoes processing before being transmitted to an Amazon Bedrock streaming service.

Running generative AI workloads in Amazon Bedrock offers a robust and secure environment that seamlessly scales to help meet the demanding computational requirements of generative AI models. The layered security approach of Amazon Bedrock, built on the foundational principles of the comprehensive security services provided by AWS, provides a fortified environment for handling sensitive data and processing AI workloads with confidence. Its flexible architecture lets organizations use AWS elastic compute resources to scale dynamically with workload demands, providing efficient performance and cost control. Furthermore, the modular design of Amazon Bedrock empowers organizations to integrate their existing AI and machine learning (ML) pipelines, tools, and frameworks, fostering a seamless transition to a secure and scalable generative AI infrastructure within the AWS ecosystem.

In addition to the interactive features, the Employee Productivity GenAI Assistant Example provides a robust architectural pattern for building generative AI solutions on AWS. By using Amazon Bedrock and AWS serverless services such as Lambda, API Gateway, and DynamoDB, the Employee Productivity GenAI Assistant Example demonstrates a scalable and secure approach to deploying generative AI applications. You can use this architecture pattern as a foundation to build various generative AI solutions tailored to different use cases. Furthermore, the solution includes a reusable component-driven UI built on the React framework, enabling developers to quickly extend and customize the interface to fit their specific needs. The example also showcases the implementation of streaming support using WebSockets, allowing for real-time responses in both chat-based interactions and one-time requests, enhancing the user experience and responsiveness of the generative AI assistant.

Prerequisites

You should have the following prerequisites:

  • An AWS account
  • Permission to use Lambda, API Gateway, Amazon Bedrock, Amazon Cognito, CloudFront, AWS WAF, Amazon S3, and DynamoDB

Deploy the solution

To deploy and use the application, complete the following steps:

  1. Clone the GitHub repository into your AWS environment:
    git clone https://github.com/aws-samples/improve-employee-productivity-using-genai

  2. See the How to Deploy Locally section if you want to deploy from your computer.
  3. See How to Deploy via AWS CloudShell if you want to deploy from AWS CloudShell in your AWS account.
  4. After deployment is complete, see Post Deployment Steps to get started.
  5. See Demos to see examples of the solution’s capabilities and features.

Cost estimate for running the Employee Productivity GenAI Assistant Example

The cost of running the Employee Productivity GenAI Assistant Example will vary depending on the Amazon Bedrock model you choose and your usage patterns, as well as the Region you use. The primary cost drivers are the Amazon Bedrock model pricing and the AWS services used to host and run the application.

For this example, let’s assume a scenario with 50 users, each using this example code five times a day, with an average of 500 input tokens and 200 output tokens per use.

The total monthly token usage calculation is as follows:

  • Input tokens: 7.5 million
    • 500 tokens per request * 5 requests per day * 50 users * 30 days = 3.75 million tokens
  • Output tokens: 1.5 million
    • 200 tokens per request * 5 requests day * 50 users * 30 days = 1.5 million tokens

The estimated monthly costs (us-east-1 Region) for different Anthropic’s Claude models on Amazon Bedrock would be the following:

  • Anthropic’s Claude 3 Haiku model:
    • Amazon Bedrock: $2.81
      • 75 million input tokens at $0.00025/thousand tokens = $0.9375
      • 5 million output tokens at $0.00125/thousand tokens = $1.875
    • Other AWS services: $16.51
    • Total: $19.32
  • Anthropic’s Claude 3 and 3.5 Sonnet model:
    • Amazon Bedrock: $33.75
      • 75 million input tokens at $0.003/thousand tokens = $11.25
      • 5 million output tokens at $0.015/thousand tokens = $22.50
    • Other AWS services: $16.51
    • Total: $50.26
  • Anthropic’s Claude 3 Opus model:
    • Amazon Bedrock: $168.75
      • 75 million input tokens at $0.015/thousand tokens = $56.25
      • 5 million output tokens at $0.075/thousand tokens = $112.50
    • Other AWS services: $16.51
    • Total: $185.26

These estimates don’t consider the AWS Free Tier for eligible services, so your actual costs might be lower if you’re still within the Free Tier limits. Additionally, the pricing for AWS services might change over time, so the actual costs might vary from these estimates.

The beauty of this serverless architecture is that you can scale resources up or down based on demand, making sure that you only pay for the resources you consume. Some components, such as Lambda, Amazon S3, CloudFront, DynamoDB, and Amazon Cognito, might not incur additional costs if you’re still within the AWS Free Tier limits.

For a detailed breakdown of the cost estimate, including assumptions and calculations, refer to the Cost Estimator.

Clean up

When you’re done, delete any resources you no longer need to avoid ongoing costs.

To delete the stack, use the command

./deploy.sh --delete --region=<your-aws-region> --email=<your-email>

For example:

./deploy.sh --delete --us-east-1 --email=abc@example.com

For more information about how to delete the resources from your AWS account, see the How to Deploy Locally section in the GitHub repo.

Summary

The Employee Productivity GenAI Assistant Example is a cutting-edge sample code that uses generative AI to automate repetitive writing tasks, freeing up resources for more meaningful work. It uses Amazon Bedrock and generative AI models to create initial templates that can be customized. You can input both text and images, benefiting from the multimodal capabilities of AI models. Key features include a user-friendly playground, template creation and application, activity history tracking, interactive chat with templates, and support for multi-modal inputs. The solution is built on robust AWS serverless technologies such as Lambda, API Gateway, DynamoDB, and Amazon S3, maintaining scalability, security, and high availability.

Visit our GitHub repository and try it firsthand.

By using Amazon Bedrock and generative on AWS, organizations can accelerate innovation cycles, unlock new business opportunities, and deliver AI-powered solutions while maintaining high standards of security and operational efficiency.


About the Authors

Samuel Baruffi is a seasoned technology professional with over 17 years of experience in the information technology industry. Currently, he works at AWS as a Principal Solutions Architect, providing valuable support to global financial services organizations. His vast expertise in cloud-based solutions is validated by numerous industry certifications. Away from cloud architecture, Samuel enjoys soccer, tennis, and travel.

Somnath Chatterjee is an accomplished Senior Technical Account Manager at AWS, Somnath Chatterjee is dedicated to guiding customers in crafting and implementing their cloud solutions on AWS. He collaborates strategically with customers to help them run cost-optimized and resilient workloads in the cloud. Beyond his primary role, Somnath holds specialization in the Compute technical field community. He is an SAP on AWS Specialty certified professional and EFS SME. With over 14 years of experience in the information technology industry, he excels in cloud architecture and helps customers achieve their desired outcomes on AWS.

Mohammed Nawaz Shaikh is a Technical Account Manager at AWS, dedicated to guiding customers in crafting and implementing their AWS strategies. Beyond his primary role, Nawaz serves as an AWS GameDay Regional Lead and is an active member of the AWS NextGen Developer Experience technical field community. With over 16 years of expertise in solution architecture and design, he is not only a passionate coder but also an innovator, holding three US patents.

Read More