April 2024 – Page 5

Significant new capabilities make it easier to use Amazon Bedrock to build and scale generative AI applications – and achieve impressive results

We introduced Amazon Bedrock to the world a little over a year ago, delivering an entirely new way to build generative artificial intelligence (AI) applications. With the broadest selection of first- and third-party foundation models (FMs) as well as user-friendly capabilities, Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications. Now tens of thousands of customers are using Amazon Bedrock to build and scale impressive applications. They are innovating quickly, easily, and securely to advance their AI strategies. And we’re supporting their efforts by enhancing Amazon Bedrock with exciting new capabilities including even more model choice and features that make it easier to select the right model, customize the model for a specific use case, and safeguard and scale generative AI applications.

Customers across diverse industries from finance to travel and hospitality to healthcare to consumer technology are making remarkable progress. They are realizing real business value by quickly moving generative AI applications into production to improve customer experiences and increase operational efficiency. Consider the New York Stock Exchange (NYSE), the world’s largest capital market processing billions of transactions each day. NYSE is leveraging Amazon Bedrock’s choice of FMs and cutting-edge AI generative capabilities across several use cases, including the processing of thousands of pages of regulations to provide answers in easy-to-understand language

Global airline United Airlines modernized their Passenger Service System to translate legacy passenger reservation codes into plain English so that agents can provide swift and efficient customer support. LexisNexis Legal & Professional, a leading global provider of information and analytics, developed a personalized legal generative AI assistant on Lexis+ AI. LexisNexis customers receive trusted results two times faster than the nearest competing product and can save up to five hours per week for legal research and summarization. And HappyFox, an online help desk software, selected Amazon Bedrock for its security and performance, boosting the efficiency of its AI-powered automated ticket system in its customer support solution by 40% and agent productivity by 30%.

And across Amazon, we are continuing to innovate with generative AI to deliver more immersive, engaging experiences for our customers. Just last week Amazon Music announced Maestro. Maestro is an AI playlist generator powered by Amazon Bedrock that gives Amazon Music subscribers an easier, more fun way to create playlists based on prompts. Maestro is now rolling out in beta to a small number of U.S. customers on all tiers of Amazon Music.

With Amazon Bedrock, we’re focused on the key areas that customers need to build production-ready, enterprise-grade generative AI applications at the right cost and speed. Today I’m excited to share new features that we’re announcing across the areas of model choice, tools for building generative AI applications, and privacy and security.

1. Amazon Bedrock expands model choice with Llama 3 models and helps you find the best model for your needs

In these early days, customers are still learning and experimenting with different models to determine which ones to use for various purposes. They want to be able to easily try the latest models, and test which capabilities and features will give them the best results and cost characteristics for their use cases. The majority of Amazon Bedrock customers use more than one model, and Amazon Bedrock provides the broadest selection of first- and third-party large language models (LLMs) and other FMs. This includes models from AI21 labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI, as well as our own Amazon Titan models. In fact, Joel Hron, head of AI and Thomson Reuters Labs at Thomson Reuters recently said this about their adoption of Amazon Bedrock, “Having the ability to use a diverse range of models as they come out was a key driver for us, especially given how quickly this space is evolving.” The cutting-edge models of the Mistral AI model family including Mistral 7B, Mixtral 8x7B, and Mistral Large have customers excited about their high performance in text generation, summarization, Q&A, and code generation. Since we introduced the Anthropic Claude 3 model family, thousands of customers have experienced how Claude 3 Haiku, Sonnet, and Opus have established new benchmarks across cognitive tasks with unrivaled intelligence, speed, and cost-efficiency. After the initial evaluation using Claude 3 Haiku and Opus in Amazon Bedrock, BlueOcean.ai, a brand intelligence platform, saw a cost reduction of over 50% when they were able to consolidate four separate API calls into a single, more efficient call.

Masahiro Oba, General Manager, Group Federated Governance of DX Platform at Sony Group corporation shared,

“While there are many challenges with applying generative AI to the business, Amazon Bedrock’s diverse capabilities help us to tailor generative AI applications to Sony’s business. We are able to take advantage of not only the powerful LLM capabilities of Claude 3, but also capabilities that help us safeguard applications at the enterprise-level. I’m really proud to be working with the Bedrock team to further democratize generative AI within the Sony Group.”

I recently sat down with Aaron Linsky, CTO of Artificial Investment Associate Labs at Bridgewater Associates, a premier asset management firm, where they are using generative AI to enhance their “Artificial Investment Associate,” a major leap forward for their customers. It builds on their experience of giving rules-based expert advice for investment decision-making. With Amazon Bedrock, they can use the best available FMs, such as Claude 3, for different tasks-combining fundamental market understanding with the flexible reasoning capabilities of AI. Amazon Bedrock allows for seamless model experimentation, enabling Bridgewater to build a powerful, self-improving investment system that marries systematic advice with cutting-edge capabilities–creating an evolving, AI-first process.

To bring even more model choice to customers, today, we are making Meta Llama 3 models available in Amazon Bedrock. Llama 3’s Llama 3 8B and Llama 3 70B models are designed for building, experimenting, and responsibly scaling generative AI applications. These models were significantly improved from the previous model architecture, including scaling up pretraining, as well as instruction fine-tuning approaches. Llama 3 8B excels in text summarization, classification, sentiment analysis, and translation, ideal for limited resources and edge devices. Llama 3 70B shines in content creation, conversational AI, language understanding, R&D, enterprises, accurate summarization, nuanced classification/sentiment analysis, language modeling, dialogue systems, code generation, and instruction following. Read more about Meta Llama 3 now available in Amazon Bedrock.

We are also announcing support coming soon for Cohere’s Command R and Command R+ enterprise FMs. These models are highly scalable and optimized for long-context tasks like retrieval-augmented generation (RAG) with citations to mitigate hallucinations, multi-step tool use for automating complex business tasks, and support for 10 languages for global operations. Command R+ is Cohere’s most powerful model optimized for long-context tasks, while Command R is optimized for large-scale production workloads. With the Cohere models coming soon in Amazon Bedrock, businesses can build enterprise-grade generative AI applications that balance strong accuracy and efficiency for day-to-day AI operations beyond proof-of-concept.

Amazon Titan Image Generator now generally available and Amazon Titan Text Embeddings V2 coming soon

In addition to adding the most capable 3P models, Amazon Titan Image Generator is generally available today. With Amazon Titan Image Generator, customers in industries like advertising, e-commerce, media, and entertainment can efficiently generate realistic, studio-quality images in large volumes and at low cost, utilizing natural language prompts. They can edit generated or existing images using text prompts, configure image dimensions, or specify the number of image variations to guide the model. By default, every image produced by Amazon Titan Image Generator contains an invisible watermark, which aligns with AWS’s commitment to promoting responsible and ethical AI by reducing the spread of misinformation. The Watermark Detection feature identifies images created by Image Generator, and is designed to be tamper-resistant, helping increase transparency around AI-generated content. Watermark Detection helps mitigate intellectual property risks and enables content creators, news organizations, risk analysts, fraud-detection teams, and others, to better identify and mitigate dissemination of misleading AI-generated content. Read more about Watermark Detection for Titan Image Generator.

Coming soon, Amazon Titan Text Embeddings V2 efficiently delivers more relevant responses for critical enterprise use cases like search. Efficient embeddings models are crucial to performance when leveraging RAG to enrich responses with additional information. Embeddings V2 is optimized for RAG workflows and provides seamless integration with Knowledge Bases for Amazon Bedrock to deliver more informative and relevant responses efficiently. Embeddings V2 enables a deeper understanding of data relationships for complex tasks like retrieval, classification, semantic similarity search, and enhancing search relevance. Offering flexible embedding sizes of 256, 512, and 1024 dimensions, Embeddings V2 prioritizes cost reduction while retaining 97% of the accuracy for RAG use cases, out-performing other leading models. Additionally, the flexible embedding sizes cater to diverse application needs, from low-latency mobile deployments to high-accuracy asynchronous workflows.

New Model Evaluation simplifies the process of accessing, comparing, and selecting LLMs and FMs

Choosing the appropriate model is a critical first step toward building any generative AI application. LLMs can vary drastically in performance based on the task, domain, data modalities, and other factors. For example, a biomedical model is likely to outperform general healthcare models in specific medical contexts, whereas a coding model may face challenges with natural language processing tasks. Using an excessively powerful model could lead to inefficient resource usage, while an underpowered model might fail to meet minimum performance standards – potentially providing incorrect results. And selecting an unsuitable FM at a project’s onset could undermine stakeholder confidence and trust.

With so many models to choose from, we want to make it easier for customers to pick the right one for their use case.

Amazon Bedrock’s Model Evaluation tool, now generally available, simplifies the selection process by enabling benchmarking and comparison against specific datasets and evaluation metrics, ensuring developers select the model that best aligns with their project goals. This guided experience allows developers to evaluate models across criteria tailored to each use case. Through Model Evaluation, developers select candidate models to assess – public options, imported custom models, or fine-tuned versions. They define relevant test tasks, datasets, and evaluation metrics, such as accuracy, latency, cost projections, and qualitative factors. Read more about Model Evaluation in Amazon Bedrock.

The ability to select from the top-performing FMs in Amazon Bedrock has been extremely beneficial for Elastic Security. James Spiteri, Director of Product Management at Elastic shared,

“With just a few clicks, we can assess a single prompt across multiple models simultaneously. This model evaluation functionality enables us to compare the outputs, metrics, and associated costs across different models, allowing us to make an informed decision on which model would be most suitable for what we are trying to accomplish. This has significantly streamlined our process, saving us a considerable amount of time in deploying our applications to production.”

2. Amazon Bedrock offers capabilities to tailor generative AI to your business needs

While models are incredibly important, it takes more than a model to build an application that is useful for an organization. That’s why Amazon Bedrock has capabilities to help you easily tailor generative AI solutions to specific use cases. Customers can use their own data to privately customize applications through fine-tuning or by using Knowledge Bases for a fully managed RAG experience to deliver more relevant, accurate, and customized responses. Agents for Amazon Bedrock allows developers to define specific tasks, workflows, or decision-making processes, enhancing control and automation while ensuring consistent alignment with an intended use case. Starting today, you can now use Agents with Anthropic Claude 3 Haiku and Sonnet models. We are also introducing an updated AWS console experience, supporting a simplified schema and return of control to make it easy for developers to get started. Read more about Agents for Amazon Bedrock, now faster and easier to use.

With new Custom Model Import, customers can leverage the full capabilities of Amazon Bedrock with their own models

All these features are essential to building generative AI applications, which is why we wanted to make them available to even more customers including those who have already invested significant resources in fine-tuning LLMs with their own data on different services or in training custom models from scratch. Many customers have customized models available on Amazon SageMaker, which provides the broadest array of over 250 pre-trained FMs. These FMs include cutting-edge models such as Mistral, Llama2, CodeLlama, Jurassic-2, Jamba, pplx-7B, 70B, and the impressive Falcon 180B. Amazon SageMaker helps with getting data organized and fine-tuned, building scalable and efficient training infrastructure, and then deploying models at scale in a low latency, cost-efficient manner. It has been a game changer for developers in preparing their data for AI, managing experiments, training models faster (e.g. Perplexity AI trains models 40% faster in Amazon SageMaker), lowering inference latency (e.g. Workday has reduced inference latency by 80% with Amazon SageMaker), and improving developer productivity (e.g. NatWest reduced its time-to-value for AI from 12-18 months to under seven months using Amazon SageMaker). However, operationalizing these customized models securely and integrating them into applications for specific business use cases still has challenges.

That is why today we’re introducing Amazon Bedrock Custom Model Import, which enables organizations to leverage their existing AI investments along with Amazon Bedrock’s capabilities. With Custom Model Import, customers can now import and access their own custom models built on popular open model architectures including Flan-T5, Llama, and Mistral, as a fully managed application programming interface (API) in Amazon Bedrock. Customers can take models that they customized on Amazon SageMaker, or other tools, and easily add them to Amazon Bedrock. After an automated validation, they can seamlessly access their custom model, as with any other model in Amazon Bedrock. They get all the same benefits, including seamless scalability and powerful capabilities to safeguard their applications, adherence to responsible AI principles – as well as the ability to expand a model’s knowledge base with RAG, easily create agents to complete multi-step tasks, and carry out fine tuning to keep teaching and refining models. All without needing to manage the underlying infrastructure.

With this new capability, we’re making it easy for organizations to choose a combination of Amazon Bedrock models and their own custom models while maintaining the same streamlined development experience. Today, Amazon Bedrock Custom Model Import is available in preview and supports three of the most popular open model architectures and with plans for more in the future. Read more about Custom Model Import for Amazon Bedrock.

ASAPP is a generative AI company with a 10-year history of building ML models.

“Our conversational generative AI voice and chat agent leverages these models to redefine the customer service experience. To give our customers end to end automation, we need LLM agents, knowledge base, and model selection flexibility. With Custom Model Import, we will be able to use our existing custom models in Amazon Bedrock. Bedrock will allow us to onboard our customers faster, increase our pace of innovation, and accelerate time to market for new product capabilities.”

– Priya Vijayarajendran, President, Technology.

3. Amazon Bedrock provides a secure and responsible foundation to implement safeguards easily

As generative AI capabilities progress and expand, building trust and addressing ethical concerns becomes even more important. Amazon Bedrock addresses these concerns by leveraging AWS’s secure and trustworthy infrastructure with industry-leading security measures, robust data encryption, and strict access controls.

Guardrails for Amazon Bedrock, now generally available, helps customers prevent harmful content and manage sensitive information within an application.

We also offer Guardrails for Amazon Bedrock, which is now generally available. Guardrails offers industry-leading safety protection, giving customers the ability to define content policies, set application behavior boundaries, and implement safeguards against potential risks. Guardrails for Amazon Bedrock is the only solution offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution. It helps customers block as much as 85% more harmful content than protection natively provided by FMs on Amazon Bedrock. Guardrails provides comprehensive support for harmful content filtering and robust personal identifiable information (PII) detection capabilities. Guardrails works with all LLMs in Amazon Bedrock as well as fine-tuned models, driving consistency in how models respond to undesirable and harmful content. You can configure thresholds to filter content across six categories – hate, insults, sexual, violence, misconduct (including criminal activity), and prompt attack (jailbreak and prompt injection). You can also define a set of topics or words that need to be blocked in your generative AI application, including harmful words, profanity, competitor names, and products. For example, a banking application can configure a guardrail to detect and block topics related to investment advice. A contact center application summarizing call center transcripts can use PII redaction to remove PIIs in call summaries, or a conversational chatbot can use content filters to block harmful content. Read more about Guardrails for Amazon Bedrock.

Companies like Aha!, a software company that helps more than 1 million people bring their product strategy to life, uses Amazon Bedrock to power many of their generative AI capabilities.

“We have full control over our information through Amazon Bedrock’s data protection and privacy policies, and can block harmful content through Guardrails for Amazon Bedrock. We just built on it to help product managers discover insights by analyzing feedback submitted by their customers. This is just the beginning. We will continue to build on advanced AWS technology to help product development teams everywhere prioritize what to build next with confidence.”

With even more choice of leading FMs and features that help you evaluate models and safeguard applications as well as leverage your prior investments in AI along with the capabilities of Amazon Bedrock, today’s launches make it even easier and faster for customers to build and scale generative AI applications. This blog post highlights only a subset of the new features. You can learn more about everything we’ve launched in the resources of this post, including asking questions and summarizing data from a single document without setting up a vector database in Knowledge Bases and the general availability of support for multiple data sources with Knowledge Bases.

Early adopters leveraging Amazon Bedrock’s capabilities are gaining a crucial head start – driving productivity gains, fueling ground-breaking discoveries across domains, and delivering enhanced customer experiences that foster loyalty and engagement. I’m excited to see what our customers will do next with these new capabilities.

As my mentor Werner Vogels always says “Now Go Build” and I’ll add “…with Amazon Bedrock!”

Resources

Check out the following resources to learn more about this announcement:

Visit our community.aws site to find deep-dive technical content and to discover how our builder communities are using Amazon Bedrock in their solutions
Learn more about Generative AI on AWS
Learn more about Amazon Bedrock
Learn more about customers achieving success with Amazon Bedrock

About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Building scalable, secure, and reliable RAG applications using Knowledge Bases for Amazon Bedrock

Generative artificial intelligence (AI) has gained significant momentum with organizations actively exploring its potential applications. As successful proof-of-concepts transition into production, organizations are increasingly in need of enterprise scalable solutions. However, to unlock the long-term success and viability of these AI-powered solutions, it is crucial to align them with well-established architectural principles.

The AWS Well-Architected Framework provides best practices and guidelines for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. Aligning generative AI applications with this framework is essential for several reasons, including providing scalability, maintaining security and privacy, achieving reliability, optimizing costs, and streamlining operations. Embracing these principles is critical for organizations seeking to use the power of generative AI and drive innovation.

This post explores the new enterprise-grade features for Knowledge Bases on Amazon Bedrock and how they align with the AWS Well-Architected Framework. With Knowledge Bases for Amazon Bedrock, you can quickly build applications using Retrieval Augmented Generation (RAG) for use cases like question answering, contextual chatbots, and personalized search.

Here are some features which we will cover:

AWS CloudFormation support
Private network policies for Amazon OpenSearch Serverless
Multiple S3 buckets as data sources
Service Quotas support
Hybrid search, metadata filters, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals.

AWS Well-Architected design principles

RAG-based applications built using Knowledge Bases for Amazon Bedrock can greatly benefit from following the AWS Well-Architected Framework. This framework has six pillars that help organizations make sure their applications are secure, high-performing, resilient, efficient, cost-effective, and sustainable:

Operational Excellence – Well-Architected principles streamline operations, automate processes, and enable continuous monitoring and improvement of generative AI app performance.
Security – Implementing strong access controls, encryption, and monitoring helps secure sensitive data used in your organization’s knowledge base and prevent misuse of generative AI.
Reliability – Well-Architected principles guide the design of resilient and fault-tolerant systems, providing consistent value delivery to users.
Performance Optimization – Choosing the appropriate resources, implementing caching strategies, and proactively monitoring performance metrics ensure that applications deliver fast and accurate responses, leading to optimal performance and an enhanced user experience.
Cost Optimization – Well-Architected guidelines assist in optimizing resource usage, using cost-saving services, and monitoring expenses, resulting in long-term viability of generative AI projects.
Sustainability – Well-Architected principles promote efficient resource utilization and minimizing carbon footprints, addressing the environmental impact of growing generative AI usage.

By aligning with the Well-Architected Framework, organizations can effectively build and manage enterprise-grade RAG applications using Knowledge Bases for Amazon Bedrock. Now, let’s dive deep into the new features launched within Knowledge Bases for Amazon Bedrock.

AWS CloudFormation support

For organizations building RAG applications, it’s important to provide efficient and effective operations and consistent infrastructure across different environments. This can be achieved by implementing practices such as automating deployment processes. To accomplish this, Knowledge Bases for Amazon Bedrock now offers support for AWS CloudFormation.

With AWS CloudFormation and the AWS Cloud Development Kit (AWS CDK), you can now create, update, and delete knowledge bases and associated data sources. Adopting AWS CloudFormation and the AWS CDK for managing knowledge bases and associated data sources not only streamlines the deployment process, but also promotes adherence to the Well-Architected principles. By performing operations (applications, infrastructure) as code, you can provide consistent and reliable deployments in multiple AWS accounts and AWS Regions, and maintain versioned and auditable infrastructure configurations.

The following is a sample CloudFormation script in JSON format for creating and updating a knowledge base in Amazon Bedrock:

{
    "Type" : "AWS::Bedrock::KnowledgeBase", 
    "Properties" : {
        "Name": String,
        "RoleArn": String,
        "Description": String,
        "KnowledgeBaseConfiguration": {
  		"Type" : String,
  		"VectorKnowledgeBaseConfiguration" : VectorKnowledgeBaseConfiguration
},
        "StorageConfiguration": StorageConfiguration,            
    } 
}

Type specifies a knowledge base as a resource in a top-level template. Minimally, you must specify the following properties:

Name – Specify a name for the knowledge base.
RoleArn – Specify the Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role with permissions to invoke API operations on the knowledge base. For more information, see Create a service role for Knowledge bases for Amazon Bedrock.
KnowledgeBaseConfiguration – Specify the embeddings configuration of the knowledge base. The following sub-properties are required:
- Type – Specify the value VECTOR.
- VectorKnowledgeBaseConfiguration – Contains details about the model used to create vector embeddings for the knowledge base.
StorageConfiguration – Specify information about the vector store in which the data source is stored. The following sub-properties are required:
- Type – Specify the vector store service that you are using.
- You would also need to select one of the vector stores supported by Knowledge Bases such OpenSearchServerless, Pinecone or Amazon PostgreSQL and provide configuration for the selected vector store.

For details on all the fields and providing configuration of various vector stores supported by Knowledge Bases for Amazon Bedrock, refer to AWS::Bedrock::KnowledgeBase.

Redis Enterprise Cloud vector stores are not supported as of this writing in AWS CloudFormation. For latest information, please refer to the documentation above.

After you create a knowledge base, you need to create a data source from the Amazon Simple Storage Service (Amazon S3) bucket containing the files for your knowledge base. It calls the CreateDataSource and DeleteDataSource APIs.

The following is the sample CloudFormation script in JSON format:

{
    "Type" : "AWS::Bedrock::DataSource", 
    "Properties" : {
        "KnowledgeBaseId": String,
        "Name": String,
        "RoleArn": String,
        "Description": String,
        "DataSourceConfiguration": {
  		"S3Configuration" : S3DataSourceConfiguration,
  		"Type" : String
},
ServerSideEncryptionConfiguration":ServerSideEncryptionConfiguration,           
"VectorIngestionConfiguration": VectorIngestionConfiguration
    } 
}

Type specifies a data source as a resource in a top-level template. Minimally, you must specify the following properties:

Name – Specify a name for the data source.
KnowledgeBaseId – Specify the ID of the knowledge base for the data source to belong to.
DataSourceConfiguration – Specify information about the S3 bucket containing the data source. The following sub-properties are required:
- Type – Specify the value S3.
- S3Configuration – Contains details about the configuration of the S3 object containing the data source.
VectorIngestionConfiguration – Contains details about how to ingest the documents in a data source. You need to provide “ChunkingConfiguration” where you can define your chunking strategy.
ServerSideEncryptionConfiguration – Contains the configuration for server-side encryption, where you can provide the Amazon Resource Name (ARN) of the AWS KMS key used to encrypt the resource.

For more information about setting up data sources in Amazon Bedrock, see Set up a data source for your knowledge base.

Note: You cannot change the chunking configuration after you create the data source.

The CloudFormation template allows you to define and manage your knowledge base resources using infrastructure as code (IaC). By automating the setup and management of the knowledge base, you can provide a consistent infrastructure across different environments. This approach aligns with the Operational Excellence pillar, which emphasizes performing operations as code. By treating your entire workload as code, you can automate processes, create consistent responses to events, and ultimately reduce human errors.

Private network policies for Amazon OpenSearch Serverless

For companies building RAG applications, it’s critical that the data remains secure and the network traffic does not go to public internet. To support this, Knowledge Bases for Amazon Bedrock now supports private network policies for Amazon OpenSearch Serverless.

Knowledge Bases for Amazon Bedrock provides an option for using OpenSearch Serverless as a vector store. You can now access OpenSearch Serverless collections that have a private network policy, which further enhances the security posture for your RAG application. To achieve this, you need to create an OpenSearch Serverless collection and configure it for private network access. First, create a vector index within the collection to store the embeddings. Then, while creating the collection, set Network access settings to Private and specify the VPC endpoint for access. Importantly, you can now provide private network access to OpenSearch Serverless collections specifically for Amazon Bedrock. To do this, select AWS service private access and specify bedrock.amazonaws.com as the service.

This private network configuration makes sure that your embeddings are stored securely and are only accessible by Amazon Bedrock, enhancing the overall security and privacy of your knowledge bases. It aligns closely with the Security Pillar of controlling traffic at all layers, because all network traffic is kept within the AWS backbone with these settings.

So far, we have explored the automation of creating, deleting, and updating knowledge base resources and the enhanced security through private network policies for OpenSearch Serverless to store vector embeddings securely. Now, let’s understand how to build more reliable, comprehensive, and cost-optimized RAG applications.

Multiple S3 buckets as data sources

Knowledge Bases for Amazon Bedrock now supports adding multiple S3 buckets as data sources within single knowledge base, including cross-account access. This enhancement increases the knowledge base’s comprehensiveness and accuracy by allowing users to aggregate and use information from various sources seamlessly.

The following are key features:

Multiple S3 buckets – Knowledge Bases for Amazon Bedrock can now incorporate data from multiple S3 buckets, enabling users to combine and use information from different sources effortlessly. This feature promotes data diversity and makes sure that relevant information is readily available for RAG-based applications.
Cross-account data access – Knowledge Bases for Amazon Bedrock supports the configuration of S3 buckets as data sources across different accounts. You can provide the necessary credentials to access these data sources, expanding the range of information that can be incorporated into their knowledge bases.
Efficient data management – When a data source or knowledge base is deleted, the related or existing items in the vector stores are automatically removed. This feature makes sure that the knowledge base remains up to date and free from obsolete or irrelevant data, maintaining the integrity and accuracy of the RAG process.

By supporting multiple S3 buckets as data sources, the need for creating multiple knowledge bases or redundant data copies is eliminated, thereby optimizing cost and promoting cloud financial management. Furthermore, the cross-account access capabilities enable the development of resilient architectures, aligning with the Reliability pillar of the AWS Well-Architected Framework, providing high availability and fault tolerance.

Other recently announced features for Knowledge Bases

To further enhance the reliability of your RAG application, Knowledge Bases for Amazon Bedrock now extends support for Service Quotas. This feature provides a single pane of glass to view applied AWS quota values and usage. For example, you now have quick access to information such as the allowed number of `RetrieveAndGenerate API requests per second.

This feature allows you to effectively manage resource quotas, prevent overprovisioning, and limit API request rates to safeguard services from potential abuse.

You can also enhance your application’s performance by using recently announced features like hybrid search, filtering based on metadata, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals. These features collectively improve the accuracy, relevance, and consistency of generated responses, and align with the Performance Efficiency pillar of the AWS Well-Architected Framework.

Knowledge Bases for Amazon Bedrock aligns with the Sustainability pillar of the AWS Well-Architected Framework by using managed services and optimizing resource utilization. As a fully managed service, Knowledge Bases for Amazon Bedrock removes the burden of provisioning, managing, and scaling the underlying infrastructure, thereby reducing the environmental impact associated with operating and maintaining these resources.

Additionally, by aligning with the AWS Well-Architected principles, organizations can design and operate their RAG applications in a sustainable manner. Practices such as automating deployments through AWS CloudFormation, implementing private network policies for secure data access, and using efficient services like OpenSearch Serverless contribute to minimizing the environmental impact of these workloads.

Overall, Knowledge Bases for Amazon Bedrock, combined with the AWS Well-Architected Framework, empowers organizations to build scalable, secure, and reliable RAG applications while prioritizing environmental sustainability through efficient resource utilization and the adoption of managed services.

Conclusion

The new enterprise-grade features, such as AWS CloudFormation support, private network policies, the ability to use multiple S3 buckets as data sources, and support for Service Quotas, make it straightforward to build scalable, secure, and reliable RAG applications with Knowledge Bases for Amazon Bedrock. Using AWS managed services and following Well-Architected best practices allows organizations to focus on delivering innovative generative AI solutions while providing operational excellence, robust security, and efficient resource utilization. As you build applications on AWS, aligning RAG applications with the AWS Well-Architected Framework provides a solid foundation for building enterprise-grade solutions that drive business value while adhering to industry standards.

For additional resources, refer to the following:

About the authors

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Introducing more enterprise-grade features for API customers

Increasing enterprise support with more security features and controls, updates to our Assistants API, and tools to better manage costs.OpenAI Blog

OpenAI’s commitment to child safety: adopting safety by design principles

We’re joining Thorn, All Tech Is Human, and other leading companies in an effort to prevent the misuse of generative AI to perpetrate, proliferate, and further sexual harms against children. OpenAI Blog

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. With SageMaker HyperPod, you can train FMs for weeks and months without disruption.

Typically, HyperPod clusters are used by multiple users: machine learning (ML) researchers, software engineers, data scientists, and cluster administrators. They edit their own files, run their own jobs, and want to avoid impacting each other’s work. To achieve this multi-user environment, you can take advantage of Linux’s user and group mechanism and statically create multiple users on each instance through lifecycle scripts. The drawback to this approach, however, is that user and group settings are duplicated across multiple instances in the cluster, making it difficult to configure them consistently on all instances, such as when a new team member joins.

To solve this pain point, we can use Lightweight Directory Access Protocol (LDAP) and LDAP over TLS/SSL (LDAPS) to integrate with a directory service such as AWS Directory Service for Microsoft Active Directory. With the directory service, you can centrally maintain users and groups, and their permissions.

In this post, we introduce a solution to integrate HyperPod clusters with AWS Managed Microsoft AD, and explain how to achieve a seamless multi-user login environment with a centrally maintained directory.

Solution overview

The solution uses the following AWS services and resources:

SageMaker HyperPod to create a cluster
AWS Managed Microsoft AD to create a managed directory
Elastic Load Balancing (ELB) to create a Network Load Balancer (NLB) in front of the directory service
AWS Certificate Manager (ACM) to import and maintain an SSL/TLS certificate for LDAP over an SSL/TLS (LDAPS) connection
Amazon Elastic Compute Cloud (Amazon EC2) to create a Windows machine to administer users and groups in the directory

We also use AWS CloudFormation to deploy a stack to create the prerequisites for the HyperPod cluster: VPC, subnets, security group, and Amazon FSx for Lustre volume.

The following diagram illustrates the high-level solution architecture.

In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB. We use TLS termination by installing a certificate to the NLB. To configure LDAPS in HyperPod cluster instances, the lifecycle script installs and configures System Security Services Daemon (SSSD)—an open source client software for LDAP/LDAPS.

Prerequisites

This post assumes you already know how to create a basic HyperPod cluster without SSSD. For more details on how to create HyperPod clusters, refer to Getting started with SageMaker HyperPod and the HyperPod workshop.

Also, in the setup steps, you will use a Linux machine to generate a self-signed certificate and obtain an obfuscated password for the AD reader user. If you don’t have a Linux machine, you can create an EC2 Linux instance or use AWS CloudShell.

Create a VPC, subnets, and a security group

Follow the instructions in the Own Account section of the HyperPod workshop. You will deploy a CloudFormation stack and create prerequisite resources such as VPC, subnets, security group, and FSx for Lustre volume. You need to create both a primary subnet and backup subnet when deploying the CloudFormation stack, because AWS Managed Microsoft AD requires at least two subnets with different Availability Zones.

In this post, for simplicity, we use the same VPC, subnets, and security group for both the HyperPod cluster and directory service. If you need to use different networks between the cluster and directory service, make sure security groups and route tables are configured so that they can communicate each other.

Create AWS Managed Microsoft AD on Directory Service

Complete the following steps to set up your directory:

On the Directory Service console, choose Directories in the navigation pane.
Choose Set up directory.
For Directory type, select AWS Managed Microsoft AD.
Choose Next.
For Edition, select Standard Edition.
For Directory DNS name, enter your preferred directory DNS name (for example, hyperpod.abc123.com).
For Admin password¸ set a password and save it for later use.
Choose Next.
In the Networking section, specify the VPC and two private subnets you created.
Choose Next.
Review the configuration and pricing, then choose Create directory.

The directory creation starts. Wait until the status changes from Creating to Active, which can take 20–30 minutes.
When the status changes to Active, open the detail page of the directory and take note of the DNS addresses for later use.

Create an NLB in front of Directory Service

To create the NLB, complete the following steps:

On the Amazon EC2 console, choose Target groups in the navigation pane.
Choose Create target groups.
Create a target group with the following parameters:
1. For Choose a target type, select IP addresses.
2. For Target group name, enter LDAP.
3. For Protocol: Port, choose TCP and enter 389.
4. For IP address type, select IPv4.
5. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
6. For Health check protocol, choose TCP.
Choose Next.
In the Register targets section, register the directory service’s DNS addresses as the targets.
For Ports, choose Include as pending below.The addresses are added in the Review targets section with Pending status.
Choose Create target group.
On the Load Balancers console, choose Create load balancer.
Under Network Load Balancer, choose Create.
Configure an NLB with the following parameters:
1. For Load balancer name, enter a name (for example, nlb-ds).
2. For Scheme, select Internal.
3. For IP address type, select IPv4.
4. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
5. Under Mappings, select the two private subnets and their CIDR ranges (which you created with the CloudFormation template).
6. For Security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).
In the Listeners and routing section, specify the following parameters:
1. For Protocol, choose TCP.
2. For Port, enter 389.
3. For Default action, choose the target group named LDAP.
Here, we are adding a listener for LDAP. We will add LDAPS later.
Choose Create load balancer.Wait until the status changes from Provisioning to Active, which can take 3–5 minutes.
When the status changes to Active, open the detail page of the provisioned NLB and take note of the DNS name (xyzxyz.elb.region-name.amazonaws.com) for later use.

Create a self-signed certificate and import it to Certificate Manager

To create a self-signed certificate, complete the following steps:

On your Linux-based environment (local laptop, EC2 Linux instance, or CloudShell), run the following OpenSSL commands to create a self-signed certificate and private key:

$ openssl genrsa 2048 > ldaps.key

$ openssl req -new -key ldaps.key -out ldaps_server.csr

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:Washington
Locality Name (eg, city) []:Bellevue
Organization Name (eg, company) [Internet Widgits Pty Ltd]:CorpName
Organizational Unit Name (eg, section) []:OrgName
Common Name (e.g., server FQDN or YOUR name) []:nlb-ds-abcd1234.elb.region.amazonaws.com
Email Address []:your@email.address.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

$ openssl x509 -req -sha256 -days 365 -in ldaps_server.csr -signkey ldaps.key -out ldaps.crt

Certificate request self-signature ok
subject=C = US, ST = Washington, L = Bellevue, O = CorpName, OU = OrgName, CN = nlb-ds-abcd1234.elb.region.amazonaws.com, emailAddress = your@email.address.com

$ chmod 600 ldaps.key

On the Certificate Manager console, choose Import.
Enter the certificate body and private key, from the contents of ldaps.crt and ldaps.key respectively.
Choose Next.
Add any optional tags, then choose Next.
Review the configuration and choose Import.

Add an LDAPS listener

We added a listener for LDAP already in the NLB. Now we add a listener for LDAPS with the imported certificate. Complete the following steps:

On the Load Balancers console, navigate to the NLB details page.
On the Listeners tab, choose Add listener.
Configure the listener with the following parameters:
1. For Protocol, choose TLS.
2. For Port, enter 636.
3. For Default action, choose LDAP.
4. For Certificate source, select From ACM.
5. For Certificate, enter what you imported in ACM.
Choose Add.Now the NLB listens to both LDAP and LDAPS. It is recommended to delete the LDAP listener because it transmits data without encryption, unlike LDAPS.

Create an EC2 Windows instance to administer users and groups in the AD

To create and maintain users and groups in the AD, complete the following steps:

On the Amazon EC2 console, choose Instances in the navigation pane.
Choose Launch instances.
For Name, enter a name for your instance.
For Amazon Machine Image, choose Microsoft Windows Server 2022 Base.
For Instance type, choose t2.micro.
In the Network settings section, provide the following parameters:
1. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
2. For Subnet, choose either of two subnets you created with the CloudFormation template.
3. For Common security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).
For Configure storage, set storage to 30 GB gp2.
In the Advanced details section, for Domain join directory¸ choose the AD you created.
For IAM instance profile, choose an AWS Identity and Access Management (IAM) role with at least the AmazonSSMManagedEC2InstanceDefaultPolicy policy.
Review the summary and choose Launch instance.

Create users and groups in AD using the EC2 Windows instance

With Remote Desktop, connect to the EC2 Windows instance you created in the previous step. Using an RDP client is recommended over using a browser-based Remote Desktop so that you can exchange the contents of the clipboard with your local machine using copy-paste operations. For more details about connecting to EC2 Windows instances, refer to Connect to your Windows instance.

If you are prompted for a login credential, use hyperpodAdmin (where hyperpod is the first part of your directory DNS name) as the user name, and use the admin password you set to the directory service.

When the Windows desktop screen opens, choose Server Manager from the Start menu.
Choose Local Server in the navigation pane, and confirm that the domain is what you specified to the directory service.
On the Manage menu, choose Add Roles and Features.
Choose Next until you are at the Features page.
Expand the feature Remote Server Administration Tools, expand Role Administration Tools, and select AD DS and AD LDS Tools and Active Directory Rights Management Service.
Choose Next and Install.Feature installation starts.
When the installation is complete, choose Close.
Open Active Directory Users and Computers from the Start menu.
Under hyperpod.abc123.com, expand hyperpod.
Choose (right-click) hyperpod, choose New, and choose Organizational Unit.
Create an organizational unit called Groups.
Choose (right-click) Groups, choose New, and choose Group.
Create a group called ClusterAdmin.
Create a second group called ClusterDev.
Choose (right-click) Users, choose New, and choose User.
Create a new user.
Choose (right-click) the user and choose Add to a group.
Add your users to the groups ClusterAdmin or ClusterDev.Users added to the ClusterAdmin group will have sudo privilege on the cluster.

Create a ReadOnly user in AD

Create a user called ReadOnly under Users. The ReadOnly user is used by the cluster to programmatically access users and groups in AD.

Take note of the password for later use.

(For SSH public key authentication) Add SSH public keys to users

By storing an SSH public key to a user in AD, you can log in without entering a password. You can use an existing key pair, or you can create a new key pair with OpenSSH’s ssh-keygen command. For more information about generating a key pair, refer to Create a key pair for your Amazon EC2 instance.

In Active Directory Users and Computers, on the View menu, enable Advanced Features.
Open the Properties dialog of the user.
On the Attribute Editor tab, choose altSecurityIdentities choose Edit.
For Value to add, choose Add.
For Values, add an SSH public key.
Choose OK.Confirm that the SSH public key appears as an attribute.

Get an obfuscated password for the ReadOnly user

To avoid including a plain text password in the SSSD configuration file, you obfuscate the password. For this step, you need a Linux environment (local laptop, EC2 Linux instance, or CloudShell).

Install the sssd-tools package on the Linux machine to install the Python module pysss for obfuscation:

# Ubuntu
$ sudo apt install sssd-tools

# Amazon Linux
$ sudo yum install sssd-tools

Run the following one-line Python script. Input the password of the ReadOnly user. You will get the obfuscated password.

$ python3 -c "import getpass,pysss; print(pysss.password().encrypt(getpass.getpass('AD reader user password: ').strip(), pysss.password().AES_256))"
AD reader user password: (Enter ReadOnly user password) 
AAAQACK2....

Create a HyperPod cluster with an SSSD-enabled lifecycle script

Next, you create a HyperPod cluster with LDAPS/Active Directory integration.

Find the configuration file config.py in your lifecycle script directory, open it with your text editor, and edit the properties in the Config class and SssdConfig class:

Set True for enable_sssd to enable setting up SSSD.
The SssdConfig class contains configuration parameters for SSSD.
Make sure you use the obfuscated password for the ldap_default_authtok property, not a plain text password.

# Basic configuration parameters
class Config:
         :
    # Set true if you want to install SSSD for ActiveDirectory/LDAP integration.
    # You need to configure parameters in SssdConfig as well.
    enable_sssd = True
# Configuration parameters for ActiveDirectory/LDAP/SSSD
class SssdConfig:

    # Name of domain. Can be default if you are not sure.
    domain = "default"

    # Comma separated list of LDAP server URIs
    ldap_uri = "ldaps://nlb-ds-xyzxyz.elb.us-west-2.amazonaws.com"

    # The default base DN to use for performing LDAP user operations
    ldap_search_base = "dc=hyperpod,dc=abc123,dc=com"

    # The default bind DN to use for performing LDAP operations
    ldap_default_bind_dn = "CN=ReadOnly,OU=Users,OU=hyperpod,DC=hyperpod,DC=abc123,DC=com"

    # "password" or "obfuscated_password". Obfuscated password is recommended.
    ldap_default_authtok_type = "obfuscated_password"

    # You need to modify this parameter with the obfuscated password, not plain text password
    ldap_default_authtok = "placeholder"

    # SSH authentication method - "password" or "publickey"
    ssh_auth_method = "publickey"

    # Home directory. You can change it to "/home/%u" if your cluster doesn't use FSx volume.
    override_homedir = "/fsx/%u"

    # Group names to accept SSH login
    ssh_allow_groups = {
        "controller" : ["ClusterAdmin", "ubuntu"],
        "compute" : ["ClusterAdmin", "ClusterDev", "ubuntu"],
        "login" : ["ClusterAdmin", "ClusterDev", "ubuntu"],
    }

    # Group names for sudoers
    sudoers_groups = {
        "controller" : ["ClusterAdmin", "ClusterDev"],
        "compute" : ["ClusterAdmin", "ClusterDev"],
        "login" : ["ClusterAdmin", "ClusterDev"],
    }

Copy the certificate file ldaps.crt to the same directory (where config.py exists).
Upload the modified lifecycle script files to your Amazon Simple Storage Service (Amazon S3) bucket, and create a HyperPod cluster with it.
Wait until the status changes to InService.

Verification

Let’s verify the solution by logging in to the cluster with SSH. Because the cluster was created in a private subnet, you can’t directly SSH into the cluster from your local environment. You can choose from two options to connect to the cluster.

Option 1: SSH login through AWS Systems Manager

You can use AWS Systems Manager as a proxy for the SSH connection. Add a host entry to the SSH configuration file ~/.ssh/config using the following example. For the HostName field, specify the Systems Manger target name in the format of sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id]. For the IdentityFile field, specify the file path to the user’s SSH private key. This field is not required if you chose password authentication.

Host MyCluster-LoginNode
    HostName sagemaker-cluster:abcd1234_LoginGroup-i-01234567890abcdef
    User user1
    IdentityFile ~/keys/my-cluster-ssh-key.pem
    ProxyCommand aws --profile default --region us-west-2 ssm start-session --target %h --document-name AWS-StartSSHSession --parameters portNumber=%p

Run the ssh command using the host name you specified. Confirm you can log in to the instance with the specified user.

$ ssh MyCluster-LoginNode
   :
   :
   ____              __  ___     __             __ __                  ___          __
  / __/__ ____ ____ /  |/  /__ _/ /_____ ____  / // /_ _____  ___ ____/ _ ___  ___/ /
 _ / _ `/ _ `/ -_) /|_/ / _ `/  '_/ -_) __/ / _  / // / _ / -_) __/ ___/ _ / _  /
/___/_,_/_, /__/_/  /_/_,_/_/_\__/_/   /_//_/_, / .__/__/_/ /_/   ___/_,_/
         /___/                                    /___/_/
You're on the controller
Instance Type: ml.m5.xlarge
user1@ip-10-1-111-222:~$

At this point, users can still use the Systems Manager default shell session to log in to the cluster as ssm-user with administrative privileges. To block the default Systems Manager shell access and enforce SSH access, you can configure your IAM policy by referring to the following example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:StartSession",
                "ssm:TerminateSession"
            ],
            "Resource": [
                "arn:aws:sagemaker:us-west-2:123456789012:cluster/abcd1234efgh",
                "arn:aws:ssm:us-west-2:123456789012:document/AWS-StartSSHSession"
            ],
            "Condition": {
                "BoolIfExists": {
                    "ssm:SessionDocumentAccessCheck": "true"
                }
            }
        }
    ]
}

For more details on how to enforce SSH access, refer to Start a session with a document by specifying the session documents in IAM policies.

Option 2: SSH login through bastion host

Another option to access the cluster is to use a bastion host as a proxy. You can use this option when the user doesn’t have permission to use Systems Manager sessions, or to troubleshoot when Systems Manager is not working.

Create a bastion security group that allows inbound SSH access (TCP port 22) from your local environment.
Update the security group for the cluster to allow inbound SSH access from the bastion security group.
Create an EC2 Linux instance.
For Amazon Machine Image, choose Ubuntu Server 20.04 LTS.
For Instance type, choose t3.small.
In the Network settings section, provide the following parameters:
1. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
2. For Subnet, choose the public subnet you created with the CloudFormation template.
3. For Common security groups, choose the bastion security group you created.
For Configure storage, set storage to 8 GB.

Identify the public IP address of the bastion host and the private IP address of the target instance (for example, the login node of the cluster), and add two host entries in the SSH config, by referring to the following example:

Host Bastion
    HostName 11.22.33.44
    User ubuntu
    IdentityFile ~/keys/my-bastion-ssh-key.pem

Host MyCluster-LoginNode-with-Proxy
    HostName 10.1.111.222
    User user1
    IdentityFile ~/keys/my-cluster-ssh-key.pem
    ProxyCommand ssh -q -W %h:%p Bastion

Run the ssh command using the target host name you specified earlier, and confirm you can log in to the instance with the specified user:

$ ssh MyCluster-LoginNode-with-Proxy
   :
   :
   ____              __  ___     __             __ __                  ___          __
  / __/__ ____ ____ /  |/  /__ _/ /_____ ____  / // /_ _____  ___ ____/ _ ___  ___/ /
 _ / _ `/ _ `/ -_) /|_/ / _ `/  '_/ -_) __/ / _  / // / _ / -_) __/ ___/ _ / _  /
/___/_,_/_, /__/_/  /_/_,_/_/_\__/_/   /_//_/_, / .__/__/_/ /_/   ___/_,_/
         /___/                                    /___/_/
You're on the controller
Instance Type: ml.m5.xlarge
user1@ip-10-1-111-222:~$

Clean up

Clean up the resources in the following order:

Delete the HyperPod cluster.
Delete the Network Load Balancer.
Delete the load balancing target group.
Delete the certificate imported to Certificate Manager.
Delete the EC2 Windows instance.
Delete the EC2 Linux instance for the bastion host.
Delete the AWS Managed Microsoft AD.
Delete the CloudFormation stack for the VPC, subnets, security group, and FSx for Lustre volume.

Conclusion

This post provided steps to create a HyperPod cluster integrated with Active Directory. This solution removes the hassle of user maintenance on large-scale clusters and allows you to manage users and groups centrally in one place.

For more information about HyperPod, check out the HyperPod workshop and the SageMaker HyperPod Developer Guide. Leave your feedback on this solution in the comments section.

About the Authors

Tomonori Shimomura is a Senior Solutions Architect on the Amazon SageMaker team, where he provides in-depth technical consultation to SageMaker customers and suggests product improvements to the product team. Before joining Amazon, he worked on the design and development of embedded software for video game consoles, and now he leverages his in-depth skills in Cloud side technology. In his free time, he enjoys playing video games, reading books, and writing software.

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Monidipa Chakraborty currently serves as a Senior Software Development Engineer at Amazon Web Services (AWS), specifically within the SageMaker HyperPod team. She is committed to assisting customers by designing and implementing robust and scalable systems that demonstrate operational excellence. Bringing nearly a decade of software development experience, Monidipa has contributed to various sectors within Amazon, including Video, Retail, Amazon Go, and AWS SageMaker.

Satish Pasumarthi is a Software Developer at Amazon Web Services. With several years of software engineering and an ML background, he loves to bridge the gap between the ML and systems and is passionate to build systems that make large scale model training possible. He has worked on projects in a variety of domains, including Machine Learning frameworks, model benchmarking, building hyperpod beta involving a broad set of AWS services. In his free time, Satish enjoys playing badminton.

The executive’s guide to generative AI for sustainability

Organizations are facing ever-increasing requirements for sustainability goals alongside environmental, social, and governance (ESG) practices. A Gartner, Inc. survey revealed that 87 percent of business leaders expect to increase their organization’s investment in sustainability over the next years. This post serves as a starting point for any executive seeking to navigate the intersection of generative artificial intelligence (generative AI) and sustainability. It provides examples of use cases and best practices for using generative AI’s potential to accelerate sustainability and ESG initiatives, as well as insights into the main operational challenges of generative AI for sustainability. This guide can be used as a roadmap for integrating generative AI effectively within sustainability strategies while ensuring alignment with organizational objectives.

A roadmap to generative AI for sustainability

In the sections that follow, we provide a roadmap for integrating generative AI into sustainability initiatives

1. Understand the potential of generative AI for sustainability

Generative AI has the power to transform every part of a business with its wide range of capabilities. These include the ability to analyze massive amounts of data, identify patterns, summarize documents, perform translations, correct errors, or answer questions. These capabilities can be used to add value throughout the entire value chain of your organization. Figure 1 illustrates selected examples of use cases of generative AI for sustainability across the value chain.

Figure 1: Examples of generative AI for sustainability use cases across the value chain

According to KPMG’s 2024 ESG Organization Survey, investment in ESG capabilities is another top priority for executives as organizations face increasing regulatory pressure to disclose information about ESG impacts, risks, and opportunities. Within this context, you can use generative AI to advance your organization’s ESG goals.

The typical ESG workflow consists of multiple phases, each presenting unique pain points. Generative AI offers solutions that can address these pain points throughout the process and contribute to sustainability efforts. Figure 2 provides examples illustrating how generative AI can support each phase of the ESG workflow within your organization. These examples include speeding up market trend analysis, ensuring accurate risk management and compliance, and facilitating data collection or report generation. Note that ESG workflows may vary across different verticals, organizational maturities, and legislative frameworks. Factors such as industry-specific regulations, company size, and regional policies can influence the ESG workflow steps. Therefore, prioritizing use cases according to your specific needs and context and defining a clear plan to measure success is essential for optimal effectiveness.

Figure 2: Mapping generative AI benefits across the ESG workflow

2. Recognize the operational challenges of generative AI for sustainability

Understanding and appropriately addressing the challenges of implementing generative AI is crucial for organizations aiming to use its potential to address the organization’s sustainability goals and ESG initiatives. These challenges include collecting and managing high-quality data, integrating generative AI into existing IT systems, navigating ethical concerns, filling skills gaps and setting the organization up for success by bringing in key stakeholders such as the chief information security officer (CISO) or chief financial officer (CFO) early so you build responsibly. Legal challenges are a huge blocker for transitioning from proof of concept (POC) to production. Therefore, it’s essential to involve legal teams early in the process to build with compliance in mind. Figure 3 provides an overview of the main operational challenges of generative AI for sustainability.

Figure 3: Operational challenges of generative AI for sustainability

3. Set the right data foundations

As a CEO aiming to use generative AI to achieve sustainability goals, remember that data is your differentiator. Companies that lack ready access to high-quality data will not be able to customize generative AI models with their own data, thus missing out on realizing the full scaling potential of generative AI and creating a competitive advantage. Invest in acquiring diverse and high-quality datasets to enrich and accelerate your ESG initiatives. You can use resources such as the Amazon Sustainability Data Initiative or the AWS Data Exchange to simplify and expedite the acquisition and analysis of comprehensive datasets. Alongside external data acquisition, prioritize internal data management to maximize the potential of generative AI and use its capabilities in analyzing your organizational data and uncovering new insights.

From an operational standpoint, you can embrace foundation model ops (FMOps) and large language model ops (LLMOps) to make sure your sustainability efforts are data-driven and scalable. This involves documenting data lineage, data versioning, automating data processing, and monitoring data management costs.

4. Identify high-impact opportunities

You can use Amazon’s working backwards principle to pinpoint opportunities within your sustainability strategy where generative AI can make a significant impact. Prioritize projects that promise immediate enhancements in key areas within your organization. While ESG remains a key aspect of sustainability, tapping into industry-specific expertise across sectors such as energy, supply chain, and manufacturing, transportation, or agriculture can uncover diverse generative AI for sustainability use cases tailored to your business’s applications. Moreover, exploring alternative avenues, such as using generative AI for improving research and development, enabling customer self-service, optimizing energy usage in buildings or slowing down deforestation, can also provide impactful opportunities for sustainable innovation.

5. Use the right tools

Failing to use the appropriate tools can add complexity, compromise security, and reduce effectiveness in using generative AI for sustainability. The right tool should offer you choice and flexibility and enable you to customize your solutions to specific needs and requirements.

Figure 4 illustrates the AWS generative AI stack as of 2023, which offers a set of capabilities that encompass choice, breadth, and depth across all layers. Moreover, it is built on a data-first approach, ensuring that every aspect of its offerings is designed with security and privacy in mind.

Examples of tools you can use to advance sustainability initiatives are:

Amazon Bedrock – a fully managed service that provides access to high-performing FMs from leading AI companies through a single API, enabling you to choose the right model for your sustainability use cases.

AWS Trainium2 – Purpose-built for high-performance training of FMs and LLMs, Trainium2 provides up to 2x better energy efficiency (performance/watt) compared to first-generation Trainium chips.

Inferentia2-based Amazon EC2 Inf2 instances – These instances offer up to 50 percent better performance/watt over comparable Amazon Elastic Compute Cloud (Amazon EC2) instances. Purpose-built to handle deep learning models at scale, Inf2 instances are indispensable for deploying ultra-large models while meeting sustainability goals through improved energy efficiency.

Figure 4: AWS generative AI stack

6. Use the right approach

Generative AI isn’t a one-size-fits-all solution. Tailoring your approach by choosing the right modality and optimization strategy is crucial for maximizing its impact on sustainability initiatives. Figure 5 offers an overview on generative AI modalities and optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 5: Generative AI modalities

In addition, figure 6 outlines the main generative AI optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 6: Generative AI optimization strategies

7. Simplify the development of your applications by using generative AI agents

Generative AI agents offer a unique opportunity to drive sustainability initiatives forward with their advanced capabilities of automating a wide range of routine and repetitive tasks, such as data entry, customer support inquiries, and content generation. Moreover, they can orchestrate complex, multistep workflows by breaking down tasks into smaller, manageable steps, coordinating various actions, and ensuring the efficient execution of processes within your organization. For example, you can use Agents for Amazon Bedrock to configure an agent that monitors and analyzes energy usage patterns across your operations and identifies opportunities for energy savings. Alternatively, you can create a specialized agent that monitors compliance with sustainability regulations in real time.

8. Build robust feedback mechanisms for evaluation

Take advantage of feedback insights for strategic improvements, whether adjusting generative AI models or redefining objectives to ensure agility and alignment with sustainability challenges. Consider the following guidelines:

Implement real-time monitoring – Set up monitoring systems to track generative AI performance against sustainability benchmarks, focusing on efficiency and environmental impact. Establish a metrics pipeline to provide insights into the sustainability contributions of your generative AI initiatives.

Engage stakeholders for human-in-the-loop evaluation – Rely on human-in-the-loop auditing and regularly collect feedback from internal teams, customers, and partners to gauge the impact of generative AI–driven processes on the organization’s sustainability benchmarks. This enhances transparency and promotes trust in your commitment to sustainability.

Use automated testing for continuous improvement – With tools such as RAGAS and LangSmith, you can use LLM-based evaluation to identify and correct inaccuracies or hallucinations, facilitating rapid optimization of generative AI models in line with sustainability goals.

9. Measure impact and maximize ROI from generative AI for sustainability

Establish clear key performance indicators (KPIs) that capture the environmental impact, such as carbon footprint reduction, alongside economic benefits, such as cost savings or enhanced business agility. This dual focus ensures that your investments not only contribute to programs focused on environmental sustainability but also reinforces the business case for sustainability while empowering you to drive innovation and competitive advantage in sustainable practices. Share success stories internally and externally to inspire others and demonstrate your organization’s commitment to sustainability leadership.

10. Minimize resource usage throughout the generative AI lifecycle

In some cases, generative AI itself can have a high energy cost. To achieve maximum impact, consider the trade-off between the benefits of using generative AI for sustainability initiatives and the energy efficiency of the technology itself. Make sure to gain a deep understanding of the iterative generative AI lifecycle and optimize each phase for environmental sustainability. Typically, the journey into generative AI begins with identifying specific application requirements. From there, you have the option to either train your model from scratch or use an existing one. In most cases, opting for an existing model and customizing it is preferred. Following this step and evaluating your system thoroughly is essential before deployment. Lastly, continuous monitoring enables ongoing refinement and adjustments. Throughout this lifecycle, implementing AWS Well-Architected Framework best practices is recommended. Refer to Figure 7 for an overview of the generative AI lifecycle.

Figure 7: The generative AI lifecycle

11. Manage risks and implement responsibly

While generative AI holds significant promise for working towards your organization’s sustainability goals, it also poses challenges such as toxicity and hallucinations. Striking the right balance between innovation and the responsible use of generative AI is fundamental for mitigating risks and enabling responsible AI innovation. This balance must account for the assessment of risk in terms of several factors such as quality, disclosures, or reporting. To achieve this, adopting specific tools and capabilities and working with your security team experts to adopt security best practices is necessary. Scaling generative AI in a safe and secure manner requires putting in place guardrails that are customized to your use cases and aligned with responsible AI policies.

12. Invest in educating and training your teams

Continuously upskill your team and empower them with the right skills to innovate and actively contribute to achieving your organization’s sustainability goals. Identify relevant resources for sustainability and generative AI to ensure your teams stay updated with the essential skills required in both areas.

Conclusion

In this post, we provided a guide for executives to integrate generative AI into their sustainability strategies, focusing on both sustainability and ESG goals. The adoption of generative AI in sustainability efforts is not just about technological innovation. It is about fostering a culture of responsibility, innovation, and continuous improvement. By prioritizing high-quality data, identifying impactful opportunities, and fostering stakeholders’ engagement, companies can harness the transformative power of generative AI to not only achieve but surpass their sustainability goals.

How can AWS help?

Explore the AWS Solutions Library to discover ways to build sustainability solutions on AWS.

The AWS Generative AI Innovation Center can assist you in the process with expert guidance on ideation, strategic use case identification, execution, and scaling to production.

To learn more about how Amazon is using AI to reach our climate pledge commitment of net-zero carbon by 2040, explore the 7 ways AI is helping Amazon build a more sustainable future and business.

About the Authors

Dr. Wafae Bakkali is a Data Scientist at AWS. As a generative AI expert, Wafae is driven by the mission to empower customers in solving their business challenges through the utilization of generative AI techniques, ensuring they do so with maximum efficiency and sustainability.

Dr. Mehdi Noori is a Senior Scientist at AWS Generative AI Innovation Center. With a passion for bridging technology and innovation in the sustainability field, he assists AWS customers in unlocking the potential of Generative AI, turning potential challenges into opportunities for rapid experimentation and innovation. By focusing on scalable, measurable, and impactful uses of advanced AI technologies and streamlining the path to production, he helps customers achieve their sustainability goals.

Rahul Sareen is the GM for Sustainability Solutions and GTM at AWS. Rahul has a team of high performing individuals consisting of sustainability strategists, GTM specialists and technology architects to create great business outcomes for customer’s sustainability goals (everything from carbon emission tracking, sustainable packaging and operations, circular economy to renewable energy). Rahul’s team provides technical expertise (ML, GenAI, IoT) to solve sustainability use cases

Climate Tech Startups Integrate NVIDIA AI for Sustainability Applications

Whether they’re monitoring miniscule insects or delivering insights from satellites in space, NVIDIA-accelerated startups are making every day Earth Day.

Sustainable Futures, an initiative within the NVIDIA Inception program for cutting-edge startups, is supporting 750+ companies globally focused on agriculture, carbon capture, clean energy, climate and weather, environmental analysis, green computing, sustainable infrastructure and waste management.

This Earth Day, discover how five of these sustainability-focused startups are advancing their work with accelerated computing and the NVIDIA Earth-2 platform for climate tech.

Earth-2 features a suite of AI models that help simulate, visualize and deliver actionable insights about weather and climate.

Insect Farming Catches the AI Bug

Amid a changing climate, a key component of environmental resilience is food security: the ability to produce and provide enough food to meet the nutrition needs of all people. Edible insects, such as crickets and black soldier flies, are one solution that could reduce humans’ reliance on resource-intensive livestock farming for protein.

Bug Mars, a startup based in Ontario, Canada, supports insect protein production with AI tools that monitor variables including temperature, pests and number of insects — and predict issues and recommend actions based on that data. It can help insect farmers increase yield by 30%.

The company uses NVIDIA Jetson Orin Nano modules to accelerate its work, and recently announced it’s using synthetic data and digital twin technology to further advance its AI solutions for insect agriculture.

Seeing the Forest for the Trees

Based in Truckee, Calif., Vibrant Planet is modeling trillions of trees and other flammable vegetation such as shrublands and grasslands to help land managers, counties and fire districts across North America build wildfire and climate resilience.

NVIDIA hardware and software has helped Vibrant Planet develop transformer models for forest and ecosystem management and AI-enhanced operational planning.

Visualization of forest — Visualization courtesy of Vibrant Planet

The startup collects and analyzes data from lidar sensors, satellites and aircraft to train AI models that can map vegetation with high precision, estimate canopy height and detect characteristics of forest and vegetation areas such as carbon, water, biodiversity and built infrastructure. Customers can use this data to understand fire and drought hazards, and, with these insights, conduct scenario planning to forecast the effects of potential forest thinning, prescribed fire or other actions.

Delivering Tomorrow’s Forecast

Tomorrow.io, based in Boston, is a leading resilience platform that helps organizations adapt to increasing weather and climate volatility. Powered by next-generation space technology, advanced AI models and proprietary modeling capabilities, the startup enables businesses and governments to proactively mitigate risk, ensure operational resilience and drive critical decision-making.

screen capture of tomorrow.io dashboard — Image courtesy of Tomorrow.io

The startup is developing weather forecasting AI and is launching its own satellites to collect environmental data to further train its models. It’s also conducting experiments using Earth-2 AI forecast models to determine the optimal configurations of satellites to improve weather-forecasting conditions.

One of Tomorrow.io’s projects is an initiative in Kenya with the Bill and Melinda Gates Foundation that provides daily alerts to 6 million farmers with insights around when to water their crops, when to spray pesticides, when to harvest or when to change crops altogether due to changes in the local climate. The team hopes to scale up their user base to 100 million farmers in Africa by 2030.

Winds of Change

Palo Alto, Calif.-based WindBorne Systems is developing weather sensing balloons equipped with WeatherMesh, a state-of-the-art AI model for real-time global weather forecasts.

weather balloon against landscape — Image courtesy of WindBorne Systems

WeatherMesh predicts factors including surface temperature, pressure, winds, precipitation and radiation. The model has set world records for accuracy and is lightweight enough to run on a gaming laptop, unlike traditional models that run on supercomputers.

WindBorne uses NVIDIA GPUs to develop its AI and is an early-access user of Earth-2. The company’s weather balloon development is funded in part by the National Oceanic and Atmospheric Administration’s Weather Program Office.

Taking the Temperature of Global Cities

FortyGuard, a startup founded in Abu Dhabi with headquarters in Miami, is developing a system to measure urban heat with AI models that present insights for public health officials, city planners, landscape architects and environmental engineers.

FortyGuard presented in the Expo Hall Theater at NVIDIA GTC.

The company — an early-access user of the Earth-2 platform — aims for its temperature AI models to provide a more granular view into urban heat dynamics, providing data that can help industries and governments shape cooler and more livable cities.

FortyGuard’s technology, offered via application programming interfaces, could integrate with existing enterprise platforms to enable use cases including temperature-based route navigation, predictive enhanced EV performance and property insights.

To learn more about the Sustainable Futures program, watch the “AI Nations and Sustainable Futures Day” session from NVIDIA GTC.

NVIDIA is a member of the U.S. Department of State’s Coalition for Climate Entrepreneurship, which aims to address the United Nations’ Sustainable Development Goals using emerging technologies. Learn more in the GTC session, “Global Strategies: Startups, Venture Capital, and Climate Change Solutions.”

Video at top courtesy of Vibrant Planet.

The Slingshot Effect: A Late-Stage Optimization Anomaly in Adam-Family of Optimization Methods

Adaptive gradient methods, notably Adam, have become indispensable for optimizing neural networks, particularly in conjunction with Transformers. In this paper, we present a novel optimization anomaly called the Slingshot Effect, which manifests during extremely late stages of training. We identify a distinctive characteristic of this phenomenon through cyclic phase transitions between stable and unstable training regimes, as evidenced by the cyclic behavior of the norm of the last layer’s weights. Although the Slingshot Effect can be easily reproduced in more general settings, it does not…Apple Machine Learning Research

Introducing automatic training for solutions in Amazon Personalize

Amazon Personalize is excited to announce automatic training for solutions. Solution training is fundamental to maintain the effectiveness of a model and make sure recommendations align with users’ evolving behaviors and preferences. As data patterns and trends change over time, retraining the solution with the latest relevant data enables the model to learn and adapt, enhancing its predictive accuracy. Automatic training generates a new solution version, mitigating model drift and keeping recommendations relevant and tailored to end-users’ current behaviors while including the newest items. Ultimately, automatic training provides a more personalized and engaging experience that adapts to changing preferences.

Amazon Personalize accelerates your digital transformation with machine learning (ML), making it effortless to integrate personalized recommendations into existing websites, applications, email marketing systems, and more. Amazon Personalize enables developers to quickly implement a customized personalization engine, without requiring ML expertise. Amazon Personalize provisions the necessary infrastructure and manages the entire ML pipeline, including processing the data, identifying features, using the appropriate algorithms, and training, optimizing, and hosting the customized models based on your data. All your data is encrypted to be private and secure.

In this post, we guide you through the process of configuring automatic training, so your solutions and recommendations maintain their accuracy and relevance.

Solution overview

A solution refers to the combination of an Amazon Personalize recipe, customized parameters, and one or more solution versions (trained models). When you create a custom solution, you specify a recipe matching your use case and configure training parameters. For this post, you configure automatic training in the training parameters.

Prerequisites

To enable automatic training for your solutions, you first need to set up Amazon Personalize resources. Start by creating a dataset group, schemas, and datasets representing your items, interactions, and user data. For instructions, refer to Getting Started (console) or Getting Started (AWS CLI).

After you finish importing your data, you are ready to create a solution.

Create a solution

To set up automatic training, complete the following steps:

On the Amazon Personalize console, create a new solution.
Specify a name for your solution, choose the type of solution you want to create, and choose your recipe.
Optionally, add any tags. For more information about tagging Amazon Personalize resources, see Tagging Amazon Personalize resources.
To use automatic training, in the Automatic training section, select Turn on and specify your training frequency.

Automatic training is enabled by default to train one time every 7 days. You can configure the training cadence to suit your business needs, ranging from one time every 1–30 days.

If your recipe generates item recommendations or user segments, optionally use the Columns for training section to choose the columns Amazon Personalize considers when training solution versions.
In the Hyperparameter configuration section, optionally configure any hyperparameter options based on your recipe and business needs.
Provide any additional configurations, then choose Next.
Review the solution details and confirm that your automatic training is configured as expected.
Choose Create solution.

Amazon Personalize will automatically create your first solution version. A solution version refers to a trained ML model. When a solution version is created for the solution, Amazon Personalize trains the model backing the solution version based on the recipe and training configuration. It can take up to 1 hour for the solution version creation to start.

The following is sample code for creating a solution with automatic training using the AWS SDK:

import boto3 
personalize = boto3.client('personalize')

solution_config = {
    "autoTrainingConfig": {
        "schedulingExpression": "rate(3 days)"
    }
}

recipe = "arn:aws:personalize:::recipe/aws-similar-items"
name = "test_automatic_training"
response = personalize.create_solution(name=name, recipeArn=recipe_arn, datasetGroupArn=dataset_group_arn, 
                            performAutoTraining=True, solutionConfig=solution_config)

print(response['solutionArn'])
solution_arn = response['solutionArn'])

After a solution is created, you can confirm whether automatic training is enabled on the solution details page.

You can also use the following sample code to confirm via the AWS SDK that automatic training is enabled:

response = personalize.describe_solution(solutionArn=solution_arn)
print(response)

Your response will contain the fields performAutoTraining and autoTrainingConfig, displaying the values you set in the CreateSolution call.

On the solution details page, you will also see the solution versions that are created automatically. The Training type column specifies whether the solution version was created manually or automatically.

You can also use the following sample code to return a list of solution versions for the given solution:

response = personalize.list_solution_versions(solutionArn=solution_arn)['solutionVersions']
print("List Solution Version responsen")
for val in response:
    print(f"SolutionVersion: {val}")
    print("n")

Your response will contain the field trainingType, which specifies whether the solution version was created manually or automatically.

When your solution version is ready, you can create a campaign for your solution version.

Create a campaign

A campaign deploys a solution version (trained model) to generate real-time recommendations. With Amazon Personalize, you can streamline your workflow and automate the deployment of the latest solution version to campaigns via automatic syncing. To set up auto sync, complete the following steps:

On the Amazon Personalize console, create a new campaign.
Specify a name for your campaign.
Choose the solution you just created.
Select Automatically use the latest solution version.
Set the minimum provisioned transactions per second.
Create your campaign.

The campaign is ready when its status is ACTIVE.

The following is sample code for creating a campaign with syncWithLatestSolutionVersion set to true using the AWS SDK. You must also append the suffix $LATEST to the solutionArn in solutionVersionArn when you set syncWithLatestSolutionVersion to true.

campaign_config = {
    "syncWithLatestSolutionVersion": True
}
resource_name = "test_campaign_sync"
solution_version_arn = "arn:aws:personalize:<region>:<accountId>:solution/<solution_name>/$LATEST"
response = personalize.create_campaign(name=resource_name, solutionVersionArn=solution_version_arn, campaignConfig=campaign_config)
campaign_arn = response['campaignArn']
print(campaign_arn)

On the campaign details page, you can see whether the campaign selected has auto sync enabled. When enabled, your campaign will automatically update to use the most recent solution version, whether it was automatically or manually created.

Use the following sample code to confirm via the AWS SDK that syncWithLatestSolutionVersion is enabled:

response = personalize.describe_campaign(campaignArn=campaign_arn)
Print(response)

Your response will contain the field syncWithLatestSolutionVersion under campaignConfig, displaying the value you set in the CreateCampaign call.

You can enable or disable the option to automatically use the latest solution version on the Amazon Personalize console after a campaign is created by updating your campaign. Similarly, you can enable or disable syncWithLatestSolutionVersion with UpdateCampaign using the AWS SDK.

Conclusion

With automatic training, you can mitigate model drift and maintain recommendation relevance by streamlining your workflow and automating the deployment of the latest solution version in Amazon Personalize.

For more information about optimizing your user experience with Amazon Personalize, see the Amazon Personalize Developer Guide.

About the authors

Ba’Carri Johnson is a Sr. Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. With a background in computer science and strategy, she is passionate about product innovation. In her spare time, she enjoys traveling and exploring the great outdoors.

Ajay Venkatakrishnan is a Software Development Engineer on the Amazon Personalize team. In his spare time, he enjoys writing and playing soccer.

Pranesh Anubhav is a Senior Software Engineer for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

We are excited to announce a new version of the Amazon SageMaker Operators for Kubernetes using the AWS Controllers for Kubernetes (ACK). ACK is a framework for building Kubernetes custom controllers, where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like buckets, databases, or message queues simply by using the Kubernetes API.

Release v1.2.9 of the SageMaker ACK Operators adds support for inference components, which until now were only available through the SageMaker API and the AWS Software Development Kits (SDKs). Inference components can help you optimize deployment costs and reduce latency. With the new inference component capabilities, you can deploy one or more foundation models (FMs) on the same Amazon SageMaker endpoint and control how many accelerators and how much memory is reserved for each FM. This helps improve resource utilization, reduces model deployment costs on average by 50%, and lets you scale endpoints together with your use cases. For more details, see Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency.

The availability of inference components through the SageMaker controller enables customers who use Kubernetes as their control plane to take advantage of inference components while deploying their models on SageMaker.

In this post, we show how to use SageMaker ACK Operators to deploy SageMaker inference components.

How ACK works

To demonstrate how ACK works, let’s look at an example using Amazon Simple Storage Service (Amazon S3). In the following diagram, Alice is our Kubernetes user. Her application depends on the existence of an S3 bucket named my-bucket.

The workflow consists of the following steps:

Alice issues a call to kubectl apply, passing in a file that describes a Kubernetes custom resource describing her S3 bucket. kubectl apply passes this file, called a manifest, to the Kubernetes API server running in the Kubernetes controller node.
The Kubernetes API server receives the manifest describing the S3 bucket and determines if Alice has permissions to create a custom resource of kind s3.services.k8s.aws/Bucket, and that the custom resource is properly formatted.
If Alice is authorized and the custom resource is valid, the Kubernetes API server writes the custom resource to its etcd data store.
It then responds to Alice that the custom resource has been created.
At this point, the ACK service controller for Amazon S3, which is running on a Kubernetes worker node within the context of a normal Kubernetes Pod, is notified that a new custom resource of kind s3.services.k8s.aws/Bucket has been created.
The ACK service controller for Amazon S3 then communicates with the Amazon S3 API, calling the S3 CreateBucket API to create the bucket in AWS.
After communicating with the Amazon S3 API, the ACK service controller calls the Kubernetes API server to update the custom resource’s status with information it received from Amazon S3.

Key components

The new inference capabilities build upon SageMaker’s real-time inference endpoints. As before, you create the SageMaker endpoint with an endpoint configuration that defines the instance type and initial instance count for the endpoint. The model is configured in a new construct, an inference component. Here, you specify the number of accelerators and amount of memory you want to allocate to each copy of a model, together with the model artifacts, container image, and number of model copies to deploy.

You can use the new inference capabilities from Amazon SageMaker Studio, the SageMaker Python SDK, AWS SDKs, and AWS Command Line Interface (AWS CLI). They are also supported by AWS CloudFormation. Now you also can use them with SageMaker Operators for Kubernetes.

Solution overview

For this demo, we use the SageMaker controller to deploy a copy of the Dolly v2 7B model and a copy of the FLAN-T5 XXL model from the Hugging Face Model Hub on a SageMaker real-time endpoint using the new inference capabilities.

Prerequisites

To follow along, you should have a Kubernetes cluster with the SageMaker ACK controller v1.2.9 or above installed. For instructions on how to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with Amazon Elastic Compute Cloud (Amazon EC2) Linux managed nodes using eksctl, see Getting started with Amazon EKS – eksctl. For instructions on installing the SageMaker controller, refer to Machine Learning with the ACK SageMaker Controller.

You need access to accelerated instances (GPUs) for hosting the LLMs. This solution uses one instance of ml.g5.12xlarge; you can check the availability of these instances in your AWS account and request these instances as needed via a Service Quotas increase request, as shown in the following screenshot.

Create an inference component

To create your inference component, define the EndpointConfig, Endpoint, Model, and InferenceComponent YAML files, similar to the ones shown in this section. Use kubectl apply -f <yaml file> to create the Kubernetes resources.

You can list the status of the resource via kubectl describe <resource-type>; for example, kubectl describe inferencecomponent.

You can also create the inference component without a model resource. Refer to the guidance provided in the API documentation for more details.

EndpointConfig YAML

The following is the code for the EndpointConfig file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: EndpointConfig
metadata:
  name: inference-component-endpoint-config
spec:
  endpointConfigName: inference-component-endpoint-config
  executionRoleARN: <EXECUTION_ROLE_ARN>
  productionVariants:
  - variantName: AllTraffic
    instanceType: ml.g5.12xlarge
    initialInstanceCount: 1
    routingConfig:
      routingStrategy: LEAST_OUTSTANDING_REQUESTS

Endpoint YAML

The following is the code for the Endpoint file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Endpoint
metadata:
  name: inference-component-endpoint
spec:
  endpointName: inference-component-endpoint
  endpointConfigName: inference-component-endpoint-config

Model YAML

The following is the code for the Model file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
  name: dolly-v2-7b
spec:
  modelName: dolly-v2-7b
  executionRoleARN: <EXECUTION_ROLE_ARN>
  containers:
  - image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
    environment:
      HF_MODEL_ID: databricks/dolly-v2-7b
      HF_TASK: text-generation
---
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
  name: flan-t5-xxl
spec:
  modelName: flan-t5-xxl
  executionRoleARN: <EXECUTION_ROLE_ARN>
  containers:
  - image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
    environment:
      HF_MODEL_ID: google/flan-t5-xxl
      HF_TASK: text-generation

InferenceComponent YAMLs

In the following YAML files, given that the ml.g5.12xlarge instance comes with 4 GPUs, we are allocating 2 GPUs, 2 CPUs and 1,024 MB of memory to each model:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
  name: inference-component-dolly
spec:
  inferenceComponentName: inference-component-dolly
  endpointName: inference-component-endpoint
  variantName: AllTraffic
  specification:
    modelName: dolly-v2-7b
    computeResourceRequirements:
      numberOfAcceleratorDevicesRequired: 2
      numberOfCPUCoresRequired: 2
      minMemoryRequiredInMb: 1024
  runtimeConfig:
    copyCount: 1

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
  name: inference-component-flan
spec:
  inferenceComponentName: inference-component-flan
  endpointName: inference-component-endpoint
  variantName: AllTraffic
  specification:
    modelName: flan-t5-xxl
    computeResourceRequirements:
      numberOfAcceleratorDevicesRequired: 2
      numberOfCPUCoresRequired: 2
      minMemoryRequiredInMb: 1024
  runtimeConfig:
    copyCount: 1

Invoke models

You can now invoke the models using the following code:

import boto3
import json

sm_runtime_client = boto3.client(service_name="sagemaker-runtime")
payload = {"inputs": "Why is California a great place to live?"}

response_dolly = sm_runtime_client.invoke_endpoint(
    EndpointName="inference-component-endpoint",
    InferenceComponentName="inference-component-dolly",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)
result_dolly = json.loads(response_dolly['Body'].read().decode())
print(result_dolly)

response_flan = sm_runtime_client.invoke_endpoint(
    EndpointName="inference-component-endpoint",
    InferenceComponentName="inference-component-flan",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)
result_flan = json.loads(response_flan['Body'].read().decode())
print(result_flan)

Update an inference component

To update an existing inference component, you can update the YAML files and then use kubectl apply -f <yaml file>. The following is an example of an updated file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
  name: inference-component-dolly
spec:
  inferenceComponentName: inference-component-dolly
  endpointName: inference-component-endpoint
  variantName: AllTraffic
  specification:
    modelName: dolly-v2-7b
    computeResourceRequirements:
      numberOfAcceleratorDevicesRequired: 2
      numberOfCPUCoresRequired: 4 # Update the numberOfCPUCoresRequired.
      minMemoryRequiredInMb: 1024
  runtimeConfig:
    copyCount: 1

Delete an inference component

To delete an existing inference component, use the command kubectl delete -f <yaml file>.

Availability and pricing

The new SageMaker inference capabilities are available today in AWS Regions US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), Middle East (UAE), and South America (São Paulo). For pricing details, visit Amazon SageMaker Pricing.

Conclusion

In this post, we showed how to use SageMaker ACK Operators to deploy SageMaker inference components. Fire up your Kubernetes cluster and deploy your FMs using the new SageMaker inference capabilities today!

About the Authors

Rajesh Ramchander is a Principal ML Engineer in Professional Services at AWS. He helps customers at various stages in their AI/ML and GenAI journey, from those that are just getting started all the way to those that are leading their business with an AI-first strategy.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Suryansh Singh is a Software Development Engineer at AWS SageMaker and works on developing ML-distributed infrastructure solutions for AWS customers at scale.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Johna Liu is a Software Development Engineer in the Amazon SageMaker team. Her current work focuses on helping developers efficiently host machine learning models and improve inference performance. She is passionate about spatial data analysis and using AI to solve societal problems.

1. Amazon Bedrock expands model choice with Llama 3 models and helps you find the best model for your needs

2. Amazon Bedrock offers capabilities to tailor generative AI to your business needs

3. Amazon Bedrock provides a secure and responsible foundation to implement safeguards easily

Resources

About the author

AWS Well-Architected design principles

AWS CloudFormation support

Private network policies for Amazon OpenSearch Serverless

Multiple S3 buckets as data sources

Other recently announced features for Knowledge Bases

Conclusion

About the authors

Solution overview

Prerequisites

Create a VPC, subnets, and a security group

Create AWS Managed Microsoft AD on Directory Service

Create an NLB in front of Directory Service

Create a self-signed certificate and import it to Certificate Manager

Add an LDAPS listener

Create an EC2 Windows instance to administer users and groups in the AD

Create users and groups in AD using the EC2 Windows instance

Create a ReadOnly user in AD

(For SSH public key authentication) Add SSH public keys to users

Get an obfuscated password for the ReadOnly user

Create a HyperPod cluster with an SSSD-enabled lifecycle script

Verification

Option 1: SSH login through AWS Systems Manager

Option 2: SSH login through bastion host

Clean up

Conclusion

About the Authors

A roadmap to generative AI for sustainability

1. Understand the potential of generative AI for sustainability

2. Recognize the operational challenges of generative AI for sustainability

3. Set the right data foundations

4. Identify high-impact opportunities

5. Use the right tools

6. Use the right approach

7. Simplify the development of your applications by using generative AI agents

8. Build robust feedback mechanisms for evaluation

9. Measure impact and maximize ROI from generative AI for sustainability

10. Minimize resource usage throughout the generative AI lifecycle

11. Manage risks and implement responsibly

12. Invest in educating and training your teams

Conclusion

How can AWS help?

About the Authors

Insect Farming Catches the AI Bug

Seeing the Forest for the Trees

Delivering Tomorrow’s Forecast

Winds of Change

Taking the Temperature of Global Cities

Solution overview

Prerequisites

Create a solution

Create a campaign

Conclusion

About the authors

How ACK works

Key components

Solution overview

Prerequisites

Create an inference component

EndpointConfig YAML

Endpoint YAML

Model YAML

InferenceComponent YAMLs

Invoke models

Update an inference component

Delete an inference component

Availability and pricing

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.