Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

This post is cowritten with Greg Benson, Aaron Kesler and David Dellsperger from SnapLogic.

The landscape of enterprise application development is undergoing a seismic shift with the advent of generative AI. SnapLogic, a leader in generative integration and automation, has introduced the industry’s first low-code generative AI development platform, Agent Creator, designed to democratize AI capabilities across all organizational levels. Agent Creator is a no-code visual tool that empowers business users and application developers to create sophisticated large language model (LLM) powered applications and agents without programming expertise.

This intuitive platform enables the rapid development of AI-powered solutions such as conversational interfaces, document summarization tools, and content generation apps through a drag-and-drop interface. By using SnapLogic’s library of more than 800 pre-built connectors and data transformation capabilities, users can seamlessly integrate various data sources and AI models, dramatically accelerating the development process compared to traditional coding methods. This innovative platform empowers employees, regardless of their coding skills, to create generative AI processes and applications through a low-code visual designer.

Pre-built templates tailored to various use cases are included, significantly enhancing both employee and customer experiences. Agent Creator is a versatile extension to the SnapLogic platform that is compatible with modern databases, APIs, and even legacy mainframe systems, fostering seamless integration across various data environments. Its low-code interface drastically reduces the time needed to develop generative AI applications.

Agent Creator

Creating enterprise-grade, LLM-powered applications and integrations that meet security, governance, and compliance requirements has traditionally demanded the expertise of programmers and data scientists. Not anymore! SnapLogic’s Agent Creator revolutionizes this landscape by empowering everyone to create generative AI–powered applications and automations without any coding. Enterprises can use SnapLogic’s Agent Creator to store their knowledge in vector databases and create powerful generative AI solutions that augment LLMs with relevant enterprise-specific knowledge, a framework also known as Retrieval Augmented Generation (RAG). This capability accelerates business operations by providing a toolkit for users to create departmental chat assistants, add LLM-powered search to portals, automate processes involving documents, and much more. Additionally, this platform offers:

  • LLM-powered processes and apps in minutes – Agent Creator empowers enterprise users to create custom LLM-powered workflows without coding. Whether your HR department needs a Q&A workflow for employee benefits, your legal team needs a contract redlining solution, or your analysts need a research report analysis engine, Agent Creator provides the tools and flexibility to build it all.
  • Automate intelligent document processing (IDP) – Agent Creator can extract valuable data from invoices, purchase orders, resumes, insurance claims, loan applications, and other unstructured sources automatically. The IDP solution uses the power of LLMs to automate tedious document-centric processes, freeing up your team for higher-value work.
  • Boost productivity – Empowers knowledge workers with the ability to automatically and reliably summarize reports and articles, quickly find answers, and extract valuable insights from unstructured data. Agent Creator’s low-code approach allows anyone to use the power of AI to automate tedious portions of their work, regardless of their technical expertise.

The following demo shows Agent Creator in action.

To deliver these robust features, Agent Creator uses Amazon Bedrock, a foundational platform that provides managed infrastructure to use state-of-the-art foundation models (FMs). This eliminates the complexities of setting up and maintaining the underlying hardware and software so SnapLogic can focus on innovation and application development rather than infrastructure management.

What is Amazon Bedrock

Amazon Bedrock is a fully managed service that provides access to high-performing FMs from leading AI startups and Amazon through a unified API, making it easier for enterprises to develop generative AI applications. Users can choose from a wide range of FMs to find the best fit for their use case. With Amazon Bedrock, organizations can experiment with and evaluate top models, customize them with their data using techniques like fine-tuning and RAG, and build intelligent agents that use enterprise systems and data sources. The serverless experience offered by Amazon Bedrock enables quick deployment, private customization, and secure integration of these models into applications without the need to manage underlying infrastructure. Key features include experimenting with prompts, augmenting response generation with data sources, creating reasoning agents, adapting models to specific tasks, and improving application efficiency with provisioned throughput, providing a robust and scalable solution for enterprise AI needs. The robust capabilities and unified API of Amazon Bedrock make it an ideal foundation for developing enterprise-grade AI applications.

By using the Amazon Bedrock high-performing FMs, secure customization options, and seamless integration features, SnapLogic’s Agent Creator maximizes its potential to deliver powerful, low-code AI solutions. This integration not only enhances the Agent Creator’s ability to create and deploy sophisticated AI models quickly but also makes them scalable, secure, and efficient.

Why Agent Creator uses Amazon Bedrock

SnapLogic’s Agent Creator uses Amazon Bedrock to deliver a powerful, low-code generative AI development platform that meets the unique needs of its enterprise customers. By integrating Amazon Bedrock, Agent Creator benefits from several key advantages:

  • Access to top-tier FMs – Amazon Bedrock provides access to high-performing FMs from leading AI providers through a unified API. Agent Creator offers enterprises the ability to experiment with and deploy sophisticated AI models without the complexity of managing the underlying infrastructure.
  • Seamless customization and integration –The serverless architecture of Amazon Bedrock frees up the time of Agent Creator developers so they can focus on innovation and rapid development. It facilitates the seamless customization of FMs with enterprise-specific data using advanced techniques like prompt engineering and RAG so outputs are relevant and accurate.
  • Enhanced security and compliance – Security and compliance are paramount for enterprise AI applications. SnapLogic uses Amazon Bedrock to build its platform, capitalizing on the proximity to data already stored in Amazon Web Services (AWS). Because of this strategic decision, SnapLogic can offer enhanced security and compliance measures while significantly reducing latency for its customers. By processing data closer to where it resides, SnapLogic promotes faster, more efficient operations that meet stringent regulatory requirements, ultimately delivering a superior experience for businesses relying on their data integration and management solutions. Because Amazon Bedrock offers robust features to meet these requirements, Agent Creator adheres to stringent security protocols and governance standards, giving enterprises confidence in their generative AI deployments.
  • Accelerated development and deployment – With Amazon Bedrock, Agent Creator empowers users to quickly experiment with various FMs, accelerating the development cycle. The managed infrastructure streamlines the testing and deployment process, enabling rapid iteration and implementation of intelligent applications.
  • Scalability and performance – Generative AI applications built using Agent Creator are scalable and performant because of Amazon Bedrock. It can handle large volumes of data and interactions, which is crucial for enterprises requiring robust applications. Provisioned throughput options enable efficient model inference, promoting smooth operation even under heavy usage.

By harnessing the capabilities of Amazon Bedrock, SnapLogic’s Agent Creator delivers a comprehensive, low-code solution that allows enterprises to capitalize on the transformative potential of generative AI. This integration simplifies the development process while enhancing the capabilities, security, and scalability of AI applications, driving significant business value and innovation.

Solution approach

Agent Creator integrates Amazon Bedrock, Anthropic’s Claude, and Amazon OpenSearch Service vector databases to deliver a comprehensive and powerful low-code visual interface for building generative AI solutions. At its core, Amazon Bedrock provides the foundational infrastructure for robust performance, security, and scalability for deploying machine learning (ML) models. This foundational layer is critical for managing the complexities of AI model deployment, and therefore SnapLogic can offer a seamless user experience. This integrated architecture not only supports advanced AI functionalities but also makes it easy to use. By abstracting the complexities of generative AI development and providing a user-friendly visual interface, Agent Creator offers enterprises the ability to use powerful AWS generative AI services without needing deep technical knowledge.

Control plane and data plane implementation

SnapLogic’s Agent Creator platform follows a decoupled architecture, separating the control plane and data plane for enhanced security and scalability.

Control plane

The control plane is responsible for managing and orchestrating the various components of the platform. The control plane is hosted and managed by SnapLogic, meaning that customers don’t have to worry about the underlying infrastructure and can focus on their core business requirements. SnapLogic’s control plane comprises several components that manage and orchestrate the platform’s operations. Here are some key components:

  • Designer – A visual interface where users can design, build, and configure integrations and data flows
  • Manager – A centralized management console for monitoring, scheduling, and controlling the execution of integrations and data pipelines
  • Monitor – A comprehensive reporting and analytics dashboard that provides insights into the performance, usage, and health of the platform
  • API management (APIM) – A component that manages and secures the exposure of integrations and data services as APIs, providing seamless integration with external applications and systems.

By separating the control plane from the data plane, SnapLogic offers a scalable and secure architecture so customers can use generative AI capabilities while maintaining control over their data within their own virtual private cloud (VPC) environment.

Data plane

The data plane is where the actual data processing and integration take place. To address customers’ requirements about data privacy and sovereignty, SnapLogic deploys the data plane within the customer’s VPC on AWS. This approach means that customer data never leaves their controlled environment, providing an extra layer of security and compliance. By using Amazon Bedrock, SnapLogic can invoke generative AI models directly from the customer’s VPC, enabling real-time processing and analysis of customer data without needing to move it outside the secure environment. The integration with Amazon Bedrock is achieved through the Amazon Bedrock InvokeModel APIs. SnapLogic’s data plane, running within the customer’s VPC, calls these APIs to invoke the desired generative AI models hosted on Amazon Bedrock.

Functional components

The solution comprises the following functional components:

  • Vector Database Snap Pack – Manages the reading and writing of data to vector databases. This pack is crucial for maintaining the integrity and accessibility of the enterprise-specific knowledge stored in the OpenSearch vector database.
  • Chunker Snap – Segments large texts into manageable pieces. This functionality is important for processing large documents so the AI can handle and analyze text effectively.
  • Embedding Snap – Converts text segments into vectors. This step is vital for integrating enterprise-specific knowledge into AI prompts, enhancing the relevance and accuracy of AI responses.
  • LLM Snap Pack – Facilitates interactions with Claude and other language models. The AI can generate responses and perform tasks based on the processed and retrieved data.
  • Prompt Generator Snap – Enriches queries with the most relevant data so the AI prompts are contextually accurate and tailored to the specific needs of the enterprise.
  • Pre-Built Pipeline Patterns for indexing and retrieving – To streamline the deployment of intelligent applications, Agent Creator includes pre-built pipeline patterns. These patterns simplify common tasks such as indexing, retrieving data, and processing documents so AI-driven solutions can be deployed without the need for deep technical expertise.
  • Frontend Starter Kit – To simplify the deployment of user-facing applications, Agent Creator includes a Frontend Starter Kit. This kit provides pre-built components and templates for creating intuitive and responsive interfaces. Enterprises can quickly develop and deploy chat assistant UI applications, and applications not only function well but also provide a seamless and engaging user experience.

Data flow and control flow

In the architecture of Agent Creator, the interaction between Agent Creator platform, Amazon Bedrock, OpenSearch Service, and Anthropic’s Claude involves a sophisticated and efficient management of data flow and control flow. By effectively managing the data and control flows between Agent Creator and AWS services, SnapLogic provides a robust, secure, and efficient platform for developing and deploying enterprise-grade solutions. This architecture supports advanced integration functionalities and offers a seamless, user-friendly experience, making it a valuable tool for enterprise customers.

Data flow

Here is an example of this data flow for an Agent Creator pipeline that involves data ingestion, preprocessing, and vectorization using Chunker and Embedding Snaps. The resulting vectors are stored in OpenSearch Service databases for efficient retrieval and querying. When a query is initiated, relevant vectors are retrieved to augment the query with context-specific data, and the enriched query is processed by the LLM Snap Pack to generate responses.

The data flow follows these steps:

  1. Data ingestion and preprocessing – Enterprise data is ingested from various sources such as documents, databases, and APIs. Chunker Snap processes large texts and documents by segmenting them into smaller, manageable chunks to make them compatible with downstream processing steps.
  2. Vectorization – The text chunks are passed to the Embedding Snap, which converts them into vector representations using embedding models. These vectors are numerical representations that capture the semantic meaning of the text. The resulting vectors are stored in OpenSearch Service vector databases, which manage and index these vectors for efficient retrieval and querying.
  3. Data retrieval and augmentation – When a query is initiated, the Vector Database Snap Pack retrieves relevant vectors from OpenSearch Service using similarity search algorithms to match the query with stored vectors. The retrieved vectors augment the initial query with context-specific enterprise data, enhancing its relevance.
  4. PromptResponse generation – The Prompt Generator Snap refines the final query so it’s well-formed and optimized for the language model. The language model generates a response, which is then postprocessed, if necessary, before delivery.
  5. Interaction with LLMs – The augmented query is forwarded to the LLM Snap Pack, which interacts with Anthropic’s Claude and other integrated language models. This interaction generates responses based on the enriched query.

Control flow

The control flow in Agent Creator is orchestrated between the control plane and the data plane. The control plane hosts the user environment, stores configuration settings and user-created assets, and provides access to various components. The data plane executes pipelines, connecting to cloud-based or on-premises data endpoints, with the control plane orchestrating the workflow across interconnected snaps. Here is an example of this control flow for a Agent Creator.

The control flow follows these steps:

  1. Initiating requests – Users initiate requests using Agent Creator’s low-code visual interface, specifying tasks such as creating Q&A assistants or automating document processing. Pre-built UI components such as the Frontend Starter Kit capture user inputs and streamline the interaction process.
  2. Orchestrating pipelines – Agent Creator orchestrates workflows using interconnected snaps, each performing a specific function such as ingestion, chunking, vectorization, or querying. The architecture employs an event-driven model, where the completion of one snap triggers the next step in the workflow.
  3. Managing interactions with AWS services – Agent Creator communicates with AWS services, including Amazon Bedrock and OpenSearch Service, and Anthropic’s Claude in Amazon Bedrock, through secure API calls. The serverless infrastructure of Amazon Bedrock manages the execution of ML models, resulting in a scalable and reliable application.
  4. Observability – Robust mechanisms are in place for handling errors during data processing or model inference. Errors are logged and notifications are sent to system administrators for resolution. Continuous logging and monitoring provide transparency and facilitate troubleshooting. Logs are centrally stored and analyzed to maintain system integrity.
  5. Final output delivery – The generated AI responses are delivered to end user applications or interfaces, integrated into SnapLogic’s dashboards. User feedback is collected to continuously improve AI models and processing pipelines, enhancing overall system performance.

Use cases

You can use the SnapLogic Agent Creator for many different use cases. The next paragraphs illustrate just a few.

IDP on quarterly reports

A leading pharmaceutical data provider empowered their analysts by using Agent Creator and AutoIDP to automate data extraction on pharmaceutical drugs. By processing their portfolio of quarterly reports through LLMs, they could ask standardized questions to extract information that was previously gathered manually. This automation not only reduced errors but also saved significant time and resources, leading to a 35% reduction in costs and a centralized pool of reusable data assets, providing a single source of truth for their entire organization.

Automating market intelligence insights

A global telecommunications company used Agent Creator to process a multitude of RSS feeds, extracting only business-relevant information. This data was then integrated into Salesforce as a real-time feed of market insights. As the customer noted, “This automation allows us to filter and synthesize crucial data, delivering targeted, real-time insights to our sales teams, enhancing their productivity without the need for individual AI licenses.”

Agent Creator Amazon Bedrock roadmap

Development and improvement are ongoing for Agent Creator, with several enhancements released recently and more to come in the future.

Recent releases

Extended support for more Amazon Bedrock capabilities was made available with the August 2024 release. Support for retrieving and generating against Amazon Bedrock and Amazon Bedrock Knowledge Bases through snap orchestration was added as well as support for invoking Amazon Bedrock Agents. Continual enhancements for new models and additional authentication mechanisms have been released supporting AWS Identity and Access Management (IAM) role authentication and cross-account IAM role authentication. All Agent Creator LLM Snaps have also been updated to support a more raw request payload, adding support to specify entire conversations (for continued conversations) as well as the ability to specify prompts beyond just text.

Support for the Amazon Bedrock Converse API was released recently. With the Amazon Bedrock Converse API support, Agent Creator is able to support models beyond Amazon Titan and Anthropic’s Claude. This comes with added support for multi-modal prompt capabilities, which is delivered through new Snaps to orchestrate the building of these more complex payloads.

Conclusion

SnapLogic has revolutionized enterprise AI with its Agent Creator, the industry’s first low-code generative AI development platform. By integrating advanced generative AI services such as Amazon Bedrock and OpenSearch Service vector databases and cutting edge LLMs such as Anthropic’s Claude, SnapLogic empowers enterprise users, from product to sales to marketing, to create sophisticated generative AI–driven applications without deep technical expertise. This platform reduces dependency on specialized programmers and accelerates innovation by streamlining the generative AI development process with pre-built pipeline patterns and a Frontend Starter Kit.

Agent Creator offers robust performance, security, and scalability so enterprises can use powerful generative AI tools for competitive advantage. By pioneering this comprehensive approach, SnapLogic not only addresses current enterprise needs but also positions organizations to harness Amazon Bedrock for future advancements in generative AI technology, driving significant business value and operational efficiency for our enterprise customers.

To use Agent Creator effectively, schedule a demo of SnapLogic’s Agent Creator  to learn how it can address your specific use cases. Identify potential pilot projects, such as creating departmental Q&A assistants, automating document processing, or putting an LLM to work for you behind the scenes. Prepare to store your enterprise knowledge in vector databases, which Agent Creator can use to augment LLMs with your specific information through RAG. Begin with a small project, such as creating a departmental Q&A assistant, to demonstrate the value of Agent Creator and use this success to build momentum for larger initiatives. To learn more about how to make best use of Amazon Bedrock, refer to the Amazon Bedrock Documentation.


About the authors

Asheesh Goja is Principal Solutions Architect at AWS. Prior to AWS, Asheesh worked at prominent organizations such as Cisco and UPS, where he spearheaded initiatives to accelerate the adoption of several emerging technologies. His expertise spans ideation, co-design, incubation, and venture product development. Asheesh holds a wide portfolio of hardware and software patents, including a real-time C++ DSL, IoT hardware devices, Computer Vision and Edge AI prototypes. As an active contributor to the emerging fields of Generative AI and Edge AI, Asheesh shares his knowledge and insights through tech blogs and as a speaker at various industry conferences and forums.

Dhawal PatelDhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Greg Benson is a Professor of Computer Science at the University of San Francisco and Chief Scientist at SnapLogic. He joined the USF Department of Computer Science in 1998 and has taught undergraduate and graduate courses including operating systems, computer architecture, programming languages, distributed systems, and introductory programming. Greg has published research in the areas of operating systems, parallel computing, and distributed systems. Since joining SnapLogic in 2010, Greg has helped design and implement several key platform features including cluster processing, big data processing, the cloud architecture, and machine learning. He currently is working on Generative AI for data integration.

Aaron Kesler is the Senior Product Manager for AI products and services at SnapLogic, Aaron applies over ten years of product management expertise to pioneer AI/ML product development and evangelize services across the organization. He is the author of the upcoming book “What’s Your Problem?” aimed at guiding new product managers through the product management career. His entrepreneurial journey began with his college startup, STAK, which was later acquired by Carvertise with Aaron contributing significantly to their recognition as Tech Startup of the Year 2015 in Delaware. Beyond his professional pursuits, Aaron finds joy in golfing with his father, exploring new cultures and foods on his travels, and practicing the ukulele.

David Dellsperger is a Senior Staff Software Engineer and Technical Lead of the Agent Creator product at SnapLogic. David has been working as a Software Engineer emphasizing in Machine Learning and AI for over a decade previously focusing on AI in Healthcare and now focusing on the SnapLogic Agent Creator. David spends his time outside of work playing video games and spending quality time with his yellow lab, Sudo

Read More

Next-generation learning experience using Amazon Bedrock and Anthropic’s Claude: Innovation from Classworks

Next-generation learning experience using Amazon Bedrock and Anthropic’s Claude: Innovation from Classworks

This post is co-written with Jerry Henley, Hans Buchheim and Roy Gunter from Classworks.

Classworks is an online teacher and student platform that includes academic screening, progress monitoring, and specially designed instruction for reading and math for grades K–12. Classworks’s unique ability to ingest student assessment data from various sources, analyze it, and automatically deliver a customized learning progression for each student sets them apart. Although this evidence-based model has significantly impacted student growth, supporting diverse learning needs in a classroom of 25 students working independently remains challenging. Teachers often find themselves torn between assisting individual students and delivering group instruction, ultimately hindering the learning experience for all.

To address the challenges of personalized learning and teacher workload, Classworks introduces Wittly by Classworks, an AI-powered learning assistant built on Amazon Bedrock, a fully managed service that makes it straightforward to build generative AI applications.

Wittly’s innovative approach centers on two key aspects:

  • Harnessing Anthropic’s Claude in Amazon Bedrock for advanced AI capabilities – Wittly uses Amazon Bedrock to seamlessly integrate with Anthropic’s Claude Sonnet 3.5, a state-of-the-art large language model (LLM). This powerful combination enables Wittly to provide tailored learning support and foster self-directed learning environments at scale.
  • Personalization and teacher empowerment – This comprises two objectives:
    • Personalized learning – Through AI-driven differentiated instruction, Wittly adapts to individual student needs, enhancing their learning experience.
    • Reduced teacher workload – By reducing the workload, Wittly allows educators to concentrate on high-impact student support, facilitating better educational outcomes.

In this post, we discuss how Classworks uses Amazon Bedrock and Anthropic’s Claude Sonnet to deliver next-generation differentiated learning with Wittly.

Powering differentiated learning with Amazon Bedrock

The ability to deliver differentiated learning to a classroom of diverse learners is transformative. Engaging students with instruction tailored to their current learning skills accelerates mastery and fosters critical thinking and independent problem-solving. However, providing such personalized instruction to an entire classroom is labor-intensive and time-consuming for teachers.

Wittly uses generative AI to offer explanations of each skill at a student’s interest level in various ways. When students encounter challenging concepts, Wittly provides clear, concise guidance tailored to their learning style and language preferences, enabling them to grasp concepts at their own pace and overcome obstacles independently. With the scalable infrastructure of Amazon Bedrock, Wittly handles diverse classroom needs simultaneously, making personalized instruction a reality for every student.

Amazon Bedrock serves as the cornerstone of Wittly’s AI capabilities, offering several key advantages:

  • Single API access – Simplifies integration with Anthropic’s Claude foundation models (FMs), allowing for straightforward updates and potential expansion to other models in the future. This unified interface accelerates development cycles by reducing the complexity of working with multiple AI models. It also future proofs Wittly’s AI infrastructure, enabling seamless adoption of new models of capabilities as they become available, without significant code changes.
  • Serverless architecture – Eliminates the need for infrastructure management, enabling Classworks to focus on educational content and user experience. This approach provides automatic scaling to handle varying loads, from individual student sessions to entire school districts accessing the platform simultaneously. It also optimizes costs by allocating resources based on actual usage rather than maintaining constant capacity. The reduced operational overhead allows Wittly’s team to dedicate more time and resources to enhancing the core educational features of the platform.

Combining cutting-edge AI technology with thoughtful implementation and robust safeguards, Wittly represents a significant leap forward in personalized digital learning assistance. The system’s architecture, powered by Amazon Bedrock and Anthropic’s Claude Sonnet 3.5, enables Wittly to adapt to individual student needs while maintaining high standards of safety, privacy, and educational efficacy. By integrating these advanced technologies, Wittly not only enhances the learning experience but also makes sure it’s accessible, secure, and tailored to the unique requirements of every student.

Increasing teacher capacity and bandwidth

Meeting the diverse needs of students in a single classroom, particularly during intervention periods or in resource rooms, can be overwhelming. By differentiating instruction for students learning independently, Wittly saves valuable teacher time. Students can seek clarification and guidance from Wittly before asking for the teacher’s help, fostering a self-directed learning environment that eases the teacher’s burden.

This approach is particularly beneficial when a teacher delivers small group lessons while others learn independently. Knowing that interactive explanations are available to students learning each concept is a significant relief for teachers managing diverse ability levels in a classroom. By harnessing the powerful capabilities of Anthropic’s Claude Sonnet 3.5, Wittly creates a more efficient, personalized learning ecosystem that benefits both students and teachers.

Solution overview

The following diagram illustrates the solution architecture.

 

The solution consists of the following key components:

  • Wittly interface – The frontend component where students interact with the learning assistant is designed to be intuitive and engaging.
  • Classworks API – This API manages the data exchange and serves as the central hub for communication between various system components.
  • Wittly AI assistant prompt – Students receive a tailored prompt for the AI based on the student’s first name, grade level, learning objectives, and conversation history.
  • Student common misconception prompt – This prompt actively identifies potential misconceptions related to the current learning objective, enhancing the student experience.
  • Anthropic’s Claude on Amazon Bedrock – Amazon Bedrock orchestrates AI interactions, providing a fully managed service that simplifies the integration of the state-of-the-art Anthropic’s Claude models.

Monitoring the Wittly platform

In the rapidly evolving landscape of AI-powered education, robust monitoring isn’t only beneficial—it’s essential. Classworks recognizes this criticality and has developed a comprehensive monitoring strategy for the Wittly platform. This approach is pivotal in maintaining the highest standards of performance, optimizing resource allocation, and continually refining the user experience. More specifically, the Wittly platform monitors the following metrics:

  • Token usage – By tracking overall token consumption and visualizing usage patterns by feature and user type, we can plan resources efficiently and manage costs effectively.
  • Request volume – Monitoring API calls helps us detect unusual spikes and analyze usage patterns, enabling predictive scaling decisions and providing system reliability.
  • Response times – Measuring and analyzing latency, breaking down response times by query complexity and user segments. This allows us to identify and address performance bottlenecks promptly.
  • Costs – Implementing detailed cost tracking and modeling for various usage scenarios supports our budget management and pricing strategies, leading to sustainable growth.
  • Quality metrics – Logging and analyzing user feedback, along with correlating satisfaction metrics with model performance, guides our continuous improvement efforts.
  • Error tracking – Setting up alerts for critical errors and performing advanced error categorization and trend analysis helps us integrate seamlessly with our development workflow and maintain system integrity.
  • User engagement – Visualizing user journeys and feature adoption rates through monitoring feature usage informs our product development priorities, enhancing the overall user experience.
  • System health – By tracking overall system performance, we gain a holistic view of system dependencies, supporting proactive maintenance and maintaining a stable platform.

To achieve this, we use Amazon CloudWatch to capture key performance data, such as average latency and token counts. This information is then seamlessly integrated into our Grafana dashboard for real-time visualization and analysis. The following screenshot showcases our monitoring dashboard created using Grafana, which visually represents these critical metrics and provides actionable insights. Grafana is an open-source platform for monitoring and observability, enabling users to query, visualize, and understand their data through customizable dashboards.

This comprehensive monitoring framework enables Classworks to deliver exceptional value to our users by optimizing AI-powered features and maintaining high performance standards. With cutting-edge tools like Grafana for data collection, alerting, and in-depth visualization and analysis, we can adapt and expand our monitoring capabilities in tandem with the growing complexity of our AI integration.

Engaging with Wittly: A student’s experience

As students embark on their Classworks activities, they are greeted by Wittly, their AI-powered learning assistant, integrated seamlessly into the Classworks instructional toolbar. When students encounter challenging concepts or need additional help, they can choose the Wittly icon to open an interactive chat window.

Unlike other AI chat-based systems that rely on open-ended questions, Wittly offers a set of pre-created AI response options. This guided approach makes sure conversations remain focused and relevant to the current activity. When Wittly provides explanations or poses questions, students can select from the provided responses, indicating their understanding or need for further clarification.

The student engagement workflow includes the following steps:

  1. Wittly is called when a student needs help with a specific activity in Classworks.
  2. Each Classworks activity focuses on a particular skill or concept, and we’ve tagged all activities with learning objectives for the specific activity.
  3. When a student accesses Wittly, we send key pieces of information, including the student’s first name, the learning objective of the activity they’re working on, and the language preference of the student.
  4. Wittly generates a personalized response to help the student. This typically includes a greeting using the student’s name, an explanation of the concept, an example related to the learning objective, and a prompt asking if the explanation helped the student understand the concept.

The following is a sample interaction, starting with the input sent to Wittly:

{

  "student_name": "Alex",

  "learning_objective": "Identify and use proper punctuation in compound sentences",

  "language": "English"

}

Wittly’s output is as follows:

"Hi Alex! Let's work on punctuating compound sentences. Remember to use a comma before coordinating conjunctions like 'and' or 'but'. For example: 'I love pizza, and I enjoy pasta.' Do you understand this? Please reply with 'thumbs up' or 'thumbs down'."

Wittly is designed to adapt to each student’s unique needs. It can communicate in both English and Spanish, and students can choose a voice they find engaging. For those who prefer auditory learning, Wittly reads its answers aloud while highlighting the corresponding text, making the learning experience both dynamic and accessible.

The structured interactions with Wittly are recorded, allowing teachers to monitor student progress and identify areas where additional support may be needed. This makes sure teachers remain actively involved in the learning process and that Wittly’s interactions are always appropriate and aligned with educational objectives.

With Wittly as their learning companion, students can delve into complex concepts in language arts, math, and science through guided, interactive exchanges. Wittly supports their learning journey, making their time in Classworks more engaging and personalized, all within a safe and controlled environment.

The following example showcases the interactive experience with Wittly in action, demonstrating how students engage with personalized learning through guided interactions.

Data privacy and safety considerations

In the era of AI-powered education, protecting student data and providing safe interactions are paramount. Classworks has implemented rigorous measures to uphold the highest standards of privacy and safety in Wittly’s design and operation.

Ethical AI foundation

Classworks employs a human-in-the-loop (HITL) model, combining AI technology with human expertise and insight. Wittly uses advanced AI algorithms, overseen and enhanced by the expertise of human educators and engineers, to generate instructional recommendations.

Student data protection

A core tenet in developing Wittly was achieving personalized learning without compromising student privacy. We don’t share any personally identifiable information with Wittly. Anthropic’s Claude LLM is trained on a dataset of anonymous data, not data from the Classworks platform, providing complete student privacy. Furthermore, when engaging with Wittly, students select from various pre-created responses to indicate whether the differentiated instruction was helpful or if they need further assistance. This approach eliminates the risk of inappropriate conversations, maintaining a safe learning environment.

Amazon Bedrock enhances this protection by encrypting data both in transit and rest and preventing the sharing of prompts with any third parties, including Anthropic. Additionally, Amazon Bedrock doesn’t train models with Classworks’s data, so all interactions remain secure and private.

Conclusion

Amazon Bedrock represents a pivotal advancement in AI technology, offering vast opportunities for innovation and efficiency in education. At Classworks, we’re not just adopting this technology, we’re pioneering its application to craft exceptional, personalized learning experiences. Our commitment extends beyond students to empowering educators with cutting-edge resources that elevate learning outcomes.

Based on Wittly’s capabilities, we estimate that teachers could potentially save 15–25 hours per month. This time savings might come from reduced need for individual student support, decreased time spent on classroom management, and less after-hours support. These efficiency gains significantly enhance the learning environment, allowing teachers to focus more on high impact, tailored educational experiences.

As AI continues to evolve, we’re committed to refining our policies and practices to uphold the highest standards of safety, quality, and efficacy in educational technology. By embracing Amazon Bedrock, we can make sure Classworks remains at the forefront of delivering safe, impactful, and meaningful educational experiences to students and educators alike.

To learn more about how generative AI and Amazon Bedrock can revolutionize your educational platform by delivering personalized learning experiences, enhancing teacher capacity, and enforcing data privacy, visit Amazon Bedrock. Discover how you can use advanced AI to create innovative applications, streamline development processes, and provide impactful data insights for your users.

To learn more about Classworks and our groundbreaking generative AI capabilities, visit our website.

This is a guest post from Classworks. Classworks is an award-winning K–12 special education and tiered intervention platform that uses advanced technology and comprehensive data to deliver superior personalized learning experiences. The comprehensive solution includes academic screeners, math and reading interventions, specially designed instruction, progress monitoring, and powerful data. Validated by the National Center on Intensive Intervention (NCII) and endorsed by The Council of Administrators of Special Education (CASE), Classworks partners with districts nationwide to deliver data-driven personalized learning to students where they are ready to learn.

 


About the Authors

Jerry Henley, VP of Technology at Curriculum Advantage, leads the product technical vision, platform services, and support for Classworks. With 18 years in EdTech, he oversees innovation, roadmaps, and AI integration, enhancing personalized learning experiences for students and educators.

 

Hans Buchheim, VP of Engineering at Curriculum Advantage, has spent 25 years developing Classworks. He leads software architecture decisions, mentors junior developers, and ensures the product evolves to meet educator needs.

 

Roy Gunter, DevOps Engineer at Curriculum Advantage, manages cloud infrastructure and automation for Classworks. He focuses on system reliability, troubleshooting, and performance optimization to deliver an excellent user experience.

 

Gowtham Shankar is a Solutions Architect at Amazon Web Services (AWS). He is passionate about working with customers to design and implement cloud-native architectures to address business challenges effectively. Gowtham actively engages in various open source projects, collaborating with the community to drive innovation.

 

Dr. Changsha Ma is an AI/ML Specialist at AWS. She is a technologist with a PhD in Computer Science, a master’s degree in Education Psychology, and years of experience in data science and independent consulting in AI/ML. She is passionate about researching methodological approaches for machine and human intelligence. Outside of work, she loves hiking, cooking, hunting food, and spending time with friends and families

Read More

Fine-tune a BGE embedding model using synthetic data from Amazon Bedrock

Fine-tune a BGE embedding model using synthetic data from Amazon Bedrock

Have you ever faced the challenge of obtaining high-quality data for fine-tuning your machine learning (ML) models? Generating synthetic data can provide a robust solution, especially when real-world data is scarce or sensitive. For instance, when developing a medical search engine, obtaining a large dataset of real user queries and relevant documents is often infeasible due to privacy concerns surrounding personal health information. However, synthetic data generation techniques can be employed to create realistic query-document pairs that resemble authentic user searches and relevant medical content, enabling the training of accurate retrieval models while preserving user privacy.

In this post, we demonstrate how to use Amazon Bedrock to create synthetic data, fine-tune a BAAI General Embeddings (BGE) model, and deploy it using Amazon SageMaker.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

You can find the full code associated with this post at the accompanying GitHub repository.

Solution overview

BGE stands for Beijing Academy of Artificial Intelligence (BAAI) General Embeddings. It is a family of embedding models with a BERT-like architecture, designed to produce high-quality embeddings from text data. The BGE models come in three sizes:

  • bge-large-en-v1.5: 1.34 GB, 1,024 embedding dimensions
  • bge-base-en-v1.5: 0.44 GB, 768 embedding dimensions
  • bge-small-en-v1.5: 0.13 GB, 384 embedding dimensions

For comparing two pieces of text, the BGE model functions as a bi-encoder architecture, processing each piece of text through the same model in parallel to obtain their embeddings.

Generating synthetic data can significantly enhance the performance of your models by providing ample, high-quality training data without the constraints of traditional data collection methods. This post guides you through generating synthetic data using Amazon Bedrock, fine-tuning a BGE model, evaluating its performance, and deploying it with SageMaker.

The high-level steps are as follows:

  1. Set up an Amazon SageMaker Studio environment with the necessary AWS Identity and Access Management (IAM) policies.
  2. Open SageMaker Studio.
  3. Create a Conda environment for dependencies.
  4. Generate synthetic data using Meta Llama 3 on Amazon Bedrock.
  5. Fine-tune the BGE embedding model with the generated data.
  6. Merge the model weights.
  7. Test the model locally.
  8. Evaluate and compare the fine-tuned model.
  9. Deploy the model using SageMaker and Hugging Face Text Embeddings Inference (TEI).
  10. Test the deployed model.

Prerequisites

First-time users need an AWS account and an IAM user role with the following permission policies attached:

  • AmazonSageMakerFullAccess
  • IAMFullAccess (or a custom IAM policy that grants iam:GetRole and iam:AttachRolePolicy permissions for the specific SageMaker execution role and the required policies: AmazonBedrockFullAccess, AmazonS3FullAccess, and AmazonEC2ContainerRegistryFullAccess)

Create a SageMaker Studio domain and user

Complete the following steps to create a SageMaker Studio domain and user:

  1. On the SageMaker console, under Admin configurations in the navigation pane, choose Domains.
  2. Choose Create domain.

SageMaker Domains

  1. Choose Set up for single user (Quick setup). Your domain, along with an IAM role with the AmazonSageMakerFullAccess policy, will be automatically created.
  2. After the domain is prepared, choose Add user.
  3. Provide a name for the new user profile and choose the IAM role (use the default role you created in step 4).
  4. Choose Next on the next three screens, then choose Submit.

After you add the user profile, update the IAM role.

  1. On the IAM console, choose Roles in the navigation pane.
  2. Navigate to the Domain settings page of your newly created domain and locate the IAM role created earlier (it should have a name similar to AmazonSageMaker-ExecutionRole-YYYYMMDDTHHMMSS).
  3. On the role details page, on the Add permissions drop down menu, choose Attach policies.
  4. Select the following policies and Add permissions to add them to the role.
    1. AmazonBedrockFullAccess
    2. AmazonS3FullAccess
    3. AmazonEC2ContainerRegistryFullAccess

Open SageMaker Studio

To open SageMaker studio, complete the following steps:

  1. On the SageMaker console, choose Studio in the navigation pane.
  2. On the SageMaker Studio landing page, select the newly created user profile and choose Open Studio.
  3. After you launch SageMaker Studio, choose JupyterLab.
  4. In the top-right corner, choose Create JupyterLab Space.
  5. Give the space a name, such as embedding-finetuning, and choose Create space.
  6. Change the instance type to ml.g5.2xlarge and the Storage (GB) value to 100.

You may need to request a service quota increase before being able to select the ml.g5.2xlarge instance type.

  1. Choose Run space and wait a few minutes for the space to start.
  2. Choose Open JupyterLab.

Set up a Conda environment in SageMaker Studio

Next, you create a Conda environment with the necessary dependencies for running the code in this post. You can use the environment.yml file provided in the code repository to create this.

  1. Open the previous terminal, or choose Terminal in Launcher to open a new one.
  2. Clone the code repository, and enter the directory:
    # TODO: replace this with final public version 
    git clone https://gitlab.aws.dev/austinmw/Embedding-Finetuning-Blog

  3. Create the Conda environment by running the following command (this step will take several minutes to complete):
    conda env create -f environment.yml

  4. Activate the environment by running the following commands one by one:
    conda init source ~/.bashrc conda activate ft-embedding-blog

  5. Add the newly created Conda environment to Jupyter:
    python -m ipykernel install --user --name=ft-embedding-blog

  6. From the Launcher, open the repository folder named embedding-finetuning-blog and open the file Embedding Blog.ipynb.
  7. On the Kernel drop down menu in the notebook, choose Change Kernel, then choose ft-embedding-blog.

You may need to refresh your browser if it doesn’t show up as available.

Now you have a Jupyter notebook that includes the necessary dependencies required to run the code in this post.

Generate synthetic data using Amazon Bedrock

We start by adapting LlamaIndex’s embedding model fine-tuning guide to use Amazon Bedrock to generate synthetic data for fine-tuning. We use the sample data and evaluation procedures outlined in this guide.

To generate synthetic data, we use the Meta Llama3-70B-Instruct model on Amazon Bedrock, which offers great a price and performance. The process involves the following steps:

  1. Download the training and validation data, which consists of PDFs from Uber and Lyft 10K documents. These PDFs will serve as the source for generating document chunks.
  2. Parse the PDFs into plain text chunks using LlamaIndex functionality. The Lyft corpus will be used as the training dataset, and the Uber corpus will be used as the evaluation dataset.
  3. Clean the parsed data by removing samples that are too short or contain special characters that could cause errors during training.
  4. Set up the large language model (LLM) Meta Llama3-70B-Instruct and define a prompt template for generating questions based on the context provided by the document chunks.
  5. Use the LLM to generate synthetic question answer pairs for each document chunk. The document chunks serve as the context, and the generated questions are designed to be answerable using the information within the corresponding chunk.
  6. Save the generated synthetic data in JSONL format, where each line is a dictionary containing the query (generated question), positive passages (the document chunk used as context), and negative passages (if available). This format is compatible with the FlagEmbedding library, which will be used for fine-tuning the BGE model.

By generating synthetic question-answer pairs using the Meta Llama3-70B-Instruct model and the document chunks from the Uber and Lyft datasets, you create a high-quality dataset that can be used to fine-tune the BGE embedding model for improved performance in retrieval tasks.

Fine-Tune the BGE embedding model

For fine-tuning, you can use the bge-base-en-v1.5 model, which offers a good balance between performance and resource requirements. You define retrieval instructions for the query to enhance the model’s performance during fine-tuning and inference.

Before fine-tuning, generate hard negatives using a predefined script available from the FlagEmbedding library. Hard negative mining is an essential step that helps improve the model’s ability to distinguish between similar but not identical text pairs. By including hard negatives in the training data, you encourage the model to learn more discriminative embeddings.

You then initiate the fine-tuning process using the FlagEmbedding library, which trains the model with InfoNCE contrastive loss. The library provides a convenient way to fine-tune the BGE model using the synthetic data you generated earlier. During fine-tuning, the model learns to produce embeddings that bring similar query-document pairs closer together in the embedding space while pushing dissimilar pairs further apart.

Merge the model weights

After fine-tuning, you can use the LM-Cocktail library to merge the fine-tuned weights with the original weights of the BGE model. LM-Cocktail creates new model parameters by calculating a weighted average of the parameters from two or more models. This process helps mitigate the problem of catastrophic forgetting, where the model might lose its previously learned knowledge during fine-tuning.

By merging the fine-tuned weights with the original weights, you obtain a model that benefits from the specialized knowledge acquired during fine-tuning while retaining the general language understanding capabilities of the original model. This approach often leads to improved performance compared to using either the fine-tuned or the original model alone.

Test the model locally

Before you evaluate the fine-tuned BGE model on the validation set, it’s a good idea to perform a quick local test to make sure the model behaves as expected. You can do this by comparing the cosine similarity scores for pairs of queries and documents that you expect to have high similarity and those that you expect to have low similarity.

To test the model, prepare two small sets of document-query pairs:

  • Similar document-query pairs – These are pairs where the document and query are closely related and should have a high cosine similarity score
  • Different document-query pairs – These are pairs where the document and query are not closely related and should have a lower cosine similarity score

Then use the fine-tuned BGE model to generate embeddings for each document and query in both sets of pairs. By calculating the cosine similarity between the document and query embeddings for each pair, you can assess how well the model captures the semantic similarity between them.

When comparing the cosine similarity scores, we expect to see higher scores for the similar document-query pairs compared to the different document-query pairs. This would indicate that the fine-tuned model is able to effectively distinguish between similar and dissimilar pairs, assigning higher similarity scores to the pairs that are more closely related.

If the local testing results align with your expectations, it provides a quick confirmation that the fine-tuned model is performing as intended. You can then move on to a more comprehensive evaluation of the model’s performance using the validation set.

However, if the local testing results are not satisfactory, it may be necessary to investigate further and identify potential issues with the fine-tuning process or the model architecture before proceeding to the evaluation step.

This local testing step serves as a quick sanity check to make sure the fine-tuned model is behaving reasonably before investing time and resources in a full evaluation on the validation set. It can help catch obvious issues early on and provide confidence in the model’s performance before moving forward with more extensive testing.

Evaluate the model

We evaluate the performance of the fine-tuned BGE model using two procedures:

  • Hit rate – This straightforward metric assesses the model’s performance by checking if the retrieved results for a given query include the relevant document. You calculate the hit rate by taking each query-document pair from the validation set, retrieving the top-K documents using the fine-tuned model, and verifying if the relevant document is present in the retrieved results.
  • InformationRetrievalEvaluator – This procedure, provided by the sentence-transformers library, offers a more comprehensive suite of metrics for detailed performance analysis. It evaluates the model on various information retrieval tasks and provides metrics such as Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG), and more. However, InformationRetrievalEvaluator is only compatible with sentence-transformers

To get a better understanding of the fine-tuned model’s performance, you can compare it against the base (non-fine-tuned) BGE model and the Amazon Titan Text Embeddings V2 model on Amazon Bedrock. This comparison helps you assess the effectiveness of the fine-tuning process and determine if the fine-tuned model outperforms the baseline models.

By evaluating the model using both the hit rate and InformationRetrievalEvaluator (when applicable), you gain insights into its performance on different aspects of retrieval tasks and can make informed decisions about its suitability for your specific use case.

Deploy the model

To deploy the fine-tuned BGE model, you can deploy the Hugging Face Text Embedding Inference (TEI) container to SageMaker. TEI is a high-performance toolkit for deploying and serving popular text embeddings and sequence classification models, including support for FlagEmbedding models. It provides a fast and efficient serving framework for your fine-tuned model on SageMaker.

The deployment process involves the following steps:

  1. Upload the fine-tuned model to the Hugging Face Hub or Amazon Simple Storage Service (Amazon S3).
  2. Retrieve the new Hugging Face Embedding Container image URI.
  3. Deploy the model to SageMaker.
  4. Optionally, set up auto scaling for the endpoint to automatically adjust the number of instances based on the incoming request traffic. Auto scaling helps make sure the endpoint can handle varying workloads efficiently.

By deploying the fine-tuned BGE model using TEI on SageMaker, you can integrate it into your applications and use it for efficient text embedding and retrieval tasks. The deployment process outlined in this post provides a scalable and manageable solution for serving the model in production environments.

Test the deployed model

After you deploy the fine-tuned BGE model using TEI on SageMaker, you can test the model by sending requests to the SageMaker endpoint and evaluating the model’s responses.

To test the deployed model, you can run the model and optionally add instructions. If the model was fine-tuned with instructions for queries or passages, it’s important to match the instructions used during fine-tuning when performing inference. In this case, you used instructions for queries but not for passages, so you can follow the same approach during testing.

To test the deployed model, you send queries to the SageMaker endpoint using the tei_endpoint.predict() method provided by the SageMaker SDK. You prepare a batch of queries, optionally prepending any instructions used during fine-tuning, and pass them to the predict() method. The model generates embeddings for each query, which are returned in the response.

By examining the generated embeddings, you can assess the quality and relevance of the model’s output. You can compare the embeddings of similar queries and verify that they have high cosine similarity scores, indicating that the model accurately captures the semantic meaning of the queries.

Additionally, you can measure the average response time of the deployed model to evaluate its performance and make sure it adheres to the required latency constraints for your application.

Integrate the model with LangChain

Additionally, you can integrate the deployed BGE model with LangChain, a library for building applications with language models. To do this, you create a custom content handler that inherits from LangChain’s EmbeddingsContentHandler. This handler implements methods to convert input data into a format compatible with the SageMaker endpoint and converts the endpoint’s output into embeddings.

You then create a SagemakerEndpointEmbeddings instance, specifying the endpoint name, SageMaker runtime client, and custom content handler. This instance wraps the deployed BGE model and integrates it with LangChain workflows.

Using the embed_documents method of the SagemakerEndpointEmbeddings instance, you generate embeddings for documents or queries, which can be used for downstream tasks like similarity search, clustering, or classification.

Integrating the deployed BGE model with LangChain allows you to take advantage of LangChain’s features and abstractions to build sophisticated language model applications that utilize the fine-tuned BGE embeddings. Testing the integration makes sure the model performs as expected and can be seamlessly incorporated into real-world workflows and applications.

Clean up

After you’re finished with the deployed endpoint, don’t forget to delete it to prevent unexpected SageMaker costs.

Conclusion

In this post, we walked through the process of fine-tuning a BGE embedding model using synthetic data generated from Amazon Bedrock. We covered key steps, including generating high-quality synthetic data, fine-tuning the model, evaluating performance, and deploying the optimized model using Amazon SageMaker.

By using synthetic data and advanced fine-tuning techniques like hard negative mining and model merging, you can significantly enhance the performance of embedding models for your specific use cases. This approach is especially valuable when real-world data is limited or difficult to obtain.

To get started, we encourage you to experiment with the code and techniques demonstrated in this post. Adapt them to your own datasets and models to unlock performance improvements in your applications. You can find all the code used in this post in our GitHub repository.

Resources


About the Authors

austinmw photoAustin Welch is a Senior Applied Scientist at Amazon Web Services Generative AI Innovation Center.

bryost photoBryan Yost is a Principle Deep Learning Architect at Amazon Web Services Generative AI Innovation Center.

nmehdi photoMehdi Noori is a Senior Applied Scientist at Amazon Web Services Generative AI Innovation Center.

Read More

Boost post-call analytics with Amazon Q in QuickSight

Boost post-call analytics with Amazon Q in QuickSight

In today’s customer-centric business world, providing exceptional customer service is crucial for success. Contact centers play a vital role in shaping customer experiences, and analyzing post-call interactions can provide valuable insights to improve agent performance, identify areas for improvement, and enhance overall customer satisfaction.

Amazon Web Services (AWS) has AI and generative AI solutions that you can integrate into your existing contact centers to improve post-call analysis.

Post Call Analytics (PCA) is a solution that does most of the heavy lifting associated with providing an end-to-end solution that can process call recordings from your existing contact center. PCA provides actionable insights to spot emerging trends, identify agent coaching opportunities, and assess the general sentiment of calls.

Complementing PCA, we have Live call analytics with agent assist (LCA) for real-time analysis while calls are produced, providing AI and generative AI capabilities.

In this post, we show you how to unlock powerful post-call analytics and visualizations, empowering your organization to make data-driven decisions and drive continuous improvement.

Enrich and boost your post-call recording files with Amazon Q and Amazon Quicksight

Amazon QuickSight is a unified business intelligence (BI) service that provides modern interactive dashboards, natural language querying, paginated reports, machine learning (ML) insights, and embedded analytics at scale.

Amazon Q is a powerful, new capability in Amazon QuickSight that you can use to ask questions about your data using natural language and share presentation-ready data stories to communicate insights to others.

These capabilities can significantly enhance your post-call analytics workflow, making it easier to derive insights from your contact center data.

To get started using Amazon Q in QuickSight, first you will need Quicksight Enterprise Edition, which you can sign up for by following this process.

Amazon Q in QuickSight provides users a suite of new generative BI capabilities.

Depending on the user’s role, they will have access to different sets of capabilities. For instance a Reader Pro user can create data stories and executive summaries. If the user is an Author Pro user, they will also be able to create topics and build dashboards using natural language. The following figure shows the available roles and their capabilities.

The following are some key ways that Amazon Q in QuickSight can boost your post-call analytics productivity.

  • Quick insights: Instead of spending time building complex dashboards and visualizations, you can enable users to quickly get answers to your questions about call volumes, agent performance, customer sentiment, and more. Amazon Q in QuickSight understands the context of your data and generates relevant visualizations on the fly.
  • One-time analysis: With Amazon Q in QuickSight, you can perform one-time analysis on your post-call data without any prior setup. Ask your questions using natural language, and QuickSight will provide the relevant insights, allowing you to explore your data in new ways and uncover hidden patterns.
  • Natural language interface: Amazon Q in QuickSight has a natural language interface that makes it accessible to non-technical users. Business analysts, managers, and executives can ask questions about post-call data without needing to learn complex querying languages or data visualization tools.
  • Contextual recommendations: Amazon Q in QuickSight can provide contextual recommendations based on your questions and the data available. For example, if you ask about customer sentiment, it might suggest analyzing sentiment by agent, call duration, or other relevant dimensions.
  • Automated dashboards: Amazon Q can help accelerate dashboard development based on your questions, saving you the effort of manually building and maintaining dashboards for post-call analytics.

By using Amazon Q in QuickSight, your organization can streamline post-call analytics, enabling faster insights, better decision-making, and improved customer experiences. With its natural language interface and automated visualizations, Amazon Q empowers users at all levels to explore and understand post-call data more efficiently.

Let’s dive into a couple of the capabilities available to Pro users, such as building executive summaries and data stories for post-call analytics.

Executive summaries

When a user is just starting to explore a new dashboard that has been shared with them, it often takes time to familiarize themselves with what is contained in the dashboard and where they should be looking for key insights. Executive summaries are a great way to use AI to highlight key insights and draw the user’s attention to specific visuals that contain metrics worth looking into further.

You can build an executive summary on any dashboard that you have access to. Such as the dashboard shown in the following figure.

As shown in the following figure, you can change to another sheet, or even apply filters and regenerate the summary to get a fresh set of highlights for the filtered set of data.

The key benefits of using executive summaries include:

  • Automated insights: Amazon Q can automatically surface key insights and trends from your post-call data, making it possible to quickly create executive summaries that highlight the most important information.
  • Customized views: Executives can customize the visualizations and summaries generated by Amazon Q to align with their specific requirements and preferences, ensuring that the executive summaries are tailored to their needs.

Data storytelling

After a user has found an interesting trend or insight within a dashboard, they often need to communicate with others to drive a decision on what to do next. That decision might be made in a meeting or offline, but a presentation with key metrics and a structured narrative is often the basis for presenting the argument. This is exactly what data stories are designed to support. Rather than taking screenshots and pasting into a document or email, at which point you lose all governance and the data becomes static, stories in QuickSight are interactive, governed, and can be updated in a click.

To build a story, you always start from a dashboard. You then select visuals to support your story and input a prompt of what you want the story to be about. In the example, we want to generate a story to get insights and recommendations to improve call center operations (shown in the following figure).

As the following figure shows, after a few moments, you will see a fully structured story including visuals and insights, including recommendations for next steps.

Key benefits of using data stories:

  1. Narrative exploration: With Amazon Q, you can explore your post-call data through a narrative approach, asking follow-up questions based on the insights generated. This allows you to build a compelling data story that uncovers the underlying patterns and trends in your contact center operations.
  2. Contextual recommendations: Amazon Q can provide contextual recommendations for additional visualizations or analyses based on your questions and the data available. These recommendations can help you uncover new perspectives and enrich your data storytelling.
  3. Automated narratives: Amazon Q can generate automated narratives that explain the visualizations and insights, making it easier to communicate the data story to stakeholders who might not be familiar with the technical details.
  4. Interactive presentations: By integrating Amazon Q with QuickSight presentation mode, you can create interactive data storytelling experiences. Executives and stakeholders can ask questions during the presentation, and Amazon Q will generate visualizations and insights in real time, enabling a more engaging and dynamic data storytelling experience.

Conclusion

By using the capabilities of Amazon Q in QuickSight, you can uncover valuable insights from your call recordings and post-call analytics data. These insights can then inform data-driven decisions to improve customer experiences, optimize contact center operations, and drive overall business performance.

In the era of customer-centricity, post-call analytics has become a game-changer for contact center operations. By using the power of Amazon Q and Amazon QuickSight on top of your PCA data, you can unlock a wealth of insights, optimize agent performance, and deliver exceptional customer experiences. Embrace the future of customer service with cutting-edge AI and analytics solutions from AWS, and stay ahead of the competition in today’s customer-centric landscape.


About the Author

Daniel Martinez is a Solutions Architect in Iberia Enterprise, part of the worldwide commercial sales organization (WWCS) at AWS.

Read More

Create a next generation chat assistant with Amazon Bedrock, Amazon Connect, Amazon Lex, LangChain, and WhatsApp

Create a next generation chat assistant with Amazon Bedrock, Amazon Connect, Amazon Lex, LangChain, and WhatsApp

This post is co-written with Harrison Chase, Erick Friis and Linda Ye from LangChain.

Generative AI is set to revolutionize user experiences over the next few years. A crucial step in that journey involves bringing in AI assistants that intelligently use tools to help customers navigate the digital landscape. In this post, we demonstrate how to deploy a contextual AI assistant. Built using Amazon Bedrock Knowledge Bases, Amazon Lex, and Amazon Connect, with WhatsApp as the channel, our solution provides users with a familiar and convenient interface.

Amazon Bedrock Knowledge Bases gives foundation models (FMs) and agents contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG) to deliver more relevant, accurate, and customized responses. It also offers a powerful solution for organizations seeking to enhance their generative AI–powered applications. This feature simplifies the integration of domain-specific knowledge into conversational AI through native compatibility with Amazon Lex and Amazon Connect. By automating document ingestion, chunking, and embedding, it eliminates the need to manually set up complex vector databases or custom retrieval systems, significantly reducing development complexity and time.

The result is improved accuracy in FM responses, with reduced hallucinations due to grounding in verified data. Cost efficiency is achieved through minimized development resources and lower operational costs compared to maintaining custom knowledge management systems. The solution’s scalability quickly accommodates growing data volumes and user queries thanks to AWS serverless offerings. It also uses the robust security infrastructure of AWS to maintain data privacy and regulatory compliance. With the ability to continuously update and add to the knowledge base, AI applications stay current with the latest information. By choosing Amazon Bedrock Knowledge Bases, organizations can focus on creating value-added AI applications while AWS handles the intricacies of knowledge management and retrieval, enabling faster deployment of more accurate and capable AI solutions with less effort.

Prerequisites

To implement this solution, you need the following:

Solution overview

This solution uses several key AWS AI services to build and deploy the AI assistant:

  • Amazon Bedrock – Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI
  • Amazon Bedrock Knowledge Bases – Gives the AI assistant contextual information from a company’s private data sources
  • Amazon OpenSearch Service – Works as vector store that is natively supported by Amazon Bedrock Knowledge Bases
  • Amazon Lex – Enables building the conversational interface for the AI assistant, including defining intents and slots
  • Amazon Connect – Powers the integration with WhatsApp to make the AI assistant available to users on the popular messaging application
  • AWS Lambda – Runs the code to integrate the services and implement the LangChain agent that forms the core logic of the AI assistant
  • Amazon API Gateway – Receives the incoming requests triggered from WhatsApp and routes the request to AWS Lambda for further processing
  • Amazon DynamoDB – Stores the messages received and generated to enable conversation memory
  • Amazon SNS – Handles the routing of the outgoing response from Amazon Connect
  • LangChain – Provides a powerful abstraction layer for building the LangChain agent that helps your FMs perform context-aware reasoning
  • LangSmith – Uploads agent traces to LangSmith for added observability, including debugging, monitoring, and testing and evaluation capabilities

The following diagram illustrates the architecture.

Solution Architecture

Flow description

Numbers in red on the right side of the diagram illustrate the data ingestion process:

  1. Upload files to Amazon Simple Storage Service (Amazon S3) Data Source
  2. New files trigger Lambda Function
  3. Lambda Function invokes sync operation of the knowledge base data source
  4. Amazon Bedrock Knowledge Bases fetches the data from Amazon S3, chunks it, and generates the embeddings through the FM of your selection
  5. Amazon Bedrock Knowledge Bases stores the embeddings in Amazon OpenSearch Service

Numbers on the left side of the diagram illustrate the messaging process:

  1. User initiates communication by sending a message through WhatsApp to the webhook hosted on .
  2. Amazon API Gateway routes the incoming message to the inbound message handler, executed on AWS Lambda.
  3. The inbound message handler records the user’s contact details in Amazon DynamoDB.
  4. For first-time users, the inbound message handler establishes a new session in Amazon Connect and logs it in DynamoDB. For returning users, it resumes their existing Amazon Connect session.
  5. Amazon Connect forwards the user’s message to Amazon Lex for natural language processing.
  6. Amazon Lex triggers the LangChain AI assistant, implemented as a Lambda function.
  7. The LangChain AI assistant retrieves the conversation history from DynamoDB.
  8. Using Amazon Bedrock Knowledge Bases, the LangChain AI assistant fetches relevant contextual information.
  9. The LangChain AI assistant compiles a prompt, incorporating context data and the user’s query, and submits it to a FM running on Amazon Bedrock.
  10. Amazon Bedrock processes the input and returns the model’s response to the LangChain AI assistant.
  11. The LangChain AI assistant relays the model’s response back to Amazon Lex.
  12. Amazon Lex transmits the model’s response to Amazon Connect.
  13. Amazon Connect publishes the model’s response to Amazon Simple Notification Service (Amazon SNS).
  14. Amazon SNS triggers the outbound message handler Lambda function.
  15. The outbound message handler retrieves the relevant chat contact information from Amazon DynamoDB.
  16. The outbound message handler dispatches the response to the user through Meta’s WhatsApp API.

Deploying this AI assistant involves three main steps:

  1. Create the knowledge base using Amazon Bedrock Knowledge Bases and ingest relevant product documentation, FAQs, knowledge articles, and other useful data that the AI assistant can use to answer user questions. The data should cover the key use cases and topics the AI assistant will support.
  2. Create a LangChain agent that powers the AI assistant’s logic. The agent is implemented in a Lambda function and uses the knowledge base as its primary tool to look up information. Deploying the agent with other resources is automated through the provided AWS CloudFormation template. See the list of resources in the next section.
  3. Create the Amazon Connect instance and configure the WhatsApp integration. This allows users to chat with the AI assistant using WhatsApp, providing a familiar interface and enabling rich interactions such as images and buttons. WhatsApp’s popularity improves the accessibility of the AI assistant.

Solution deployment

We’ve provided pre-built AWS CloudFormation templates that deploy everything you need in your AWS account.

  1. Sign in to the AWS console if you aren’t already.
  2. Choose the following Launch Stack button to open the CloudFormation console and create a new stack.
  3. Enter the following parameters:
    • StackName: Name your Stack, for example, WhatsAppAIStack
    • LangchainAPIKey: The API key generated through LangChain
Region Deploy button Template URL – use to upgrade existing stack to a new release AWS CDK stack to customize as needed
N. Virginia (us-east-1) Launch Stack button YML GitHub
  1. Check the box to acknowledge that you are creating AWS Identity and Access Management (IAM) resources and choose Create Stack.
  2. Wait for the stack creation to be complete in approximately 10 minutes, which will create the following:
  3. Upload files to the data source (Amazon S3) created for WhatsApp. As soon as you upload a file, the data source will synchronize automatically.
  4. To test the agent, on the Amazon Lex console, select the most recently created assistant. Choose English, choose Test, and send it a message.

Create the Amazon Connect instance and integrate WhatsApp

Configure Amazon Connect to integrate with your WhatsApp business account and enable the WhatsApp channel for the AI assistant:

  1. Navigate to Amazon Connect in the AWS console. If you haven’t already, create an instance. Copy your Instance ARN under Distribution settings. You will need this information later to link your WhatsApp business account.
  2. Choose your instance, then in the navigation panel, choose Flows. Scroll down and select Amazon Lex. Select your bot and choose Add Amazon Lex Bot.
  3. In the navigation panel, choose Overview. Under Access Information, choose Log in for emergency access.
  4. On the Amazon Connect console, under Routing in the navigation panel, choose Flows. Choose Create flow. Drag a Get customer input block onto the flow. Select the block. Select Text-to-speech or chat text and add an intro message such as, “Hello, how can I help you today?” Scroll down and choose Amazon Lex, then select the Amazon Lex bot you created in step 2.
  5. After you save the block, add another block called “Disconnect.” Drag the Entry arrow to the Get customer input and the Get customer input arrow to Disconnect. Choose Publish.
  6. After it’s published, choose Show additional flow information at the bottom of the navigation panel. Copy the flow’s Amazon Resource Name (ARN), which you will need to deploy the WhatsApp integration. The following screenshot shows the Amazon Connect console with the flow.

Connect Flow Diagram

  1. Deploy the WhatsApp integration as detailed in Provide WhatsApp messaging as a channel with Amazon Connect.

Testing the solution

Interact with the AI assistant through WhatsApp, as shown in the following video:

Clean up

To avoid incurring ongoing costs, delete the resources after you are done:

  1. Delete the CloudFormation stacks.
  2. Delete the Amazon Connect instance.

Conclusion

This post showed you how to create an intelligent conversational AI assistant by integrating Amazon Bedrock, Amazon Lex, and Amazon Connect and deploying it on WhatsApp.

The solution ingests relevant data into a knowledge base on Amazon Bedrock Knowledge Bases, implements a LangChain agent that uses the knowledge base to answer questions, and makes the agent available to users through WhatsApp. This provides an accessible, intelligent AI assistant that can guide users through your company’s products and services.

Possible next steps include customizing the AI assistant for your specific use case, expanding the knowledge base, and analyzing conversation logs using LangSmith to identify issues, improve errors, and break down performance bottlenecks in your FM call sequence.


About the Authors

Kenton Blacutt is an AI Consultant within the GenAI Innovation Center. He works hands-on with customers helping them solve real-world business problems with cutting edge AWS technologies, especially Amazon Q and Bedrock. In his free time, he likes to travel, experiment with new AI techniques, and run an occasional marathon.

Lifeth Álvarez is a Cloud Application Architect at Amazon. She enjoys working closely with others, embracing teamwork and autonomous learning. She likes to develop creative and innovative solutions, applying special emphasis on details. She enjoys spending time with family and friends, reading, playing volleyball, and teaching others.

Mani Khanuja is a Tech Lead – Generative AI Specialist, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such as AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Linda Ye leads product marketing at LangChain. Previously, she worked at Sentry, Splunk, and Harness, driving product and business value for technical audiences, and studied economics at Sanford. In her free time, Linda enjoys writing half-baked novels, playing tennis, and reading.

Erick Friis, Founding Engineer at LangChain, currently spends most of his time on the open source side of the company. He’s an ex-founder with a passion for language-based applications. He spends his free time outdoors on skis or training for triathlons.

Harrison Chase is the CEO and cofounder of LangChain, an open source framework and toolkit that helps developers build context-aware reasoning applications. Prior to starting LangChain, he led the ML team at Robus Intelligence, led the entity linking team at Kensho, and studied statistics and computer science at Harvard.

Read More

Generative AI foundation model training on Amazon SageMaker

Generative AI foundation model training on Amazon SageMaker

To stay competitive, businesses across industries use foundation models (FMs) to transform their applications. Although FMs offer impressive out-of-the-box capabilities, achieving a true competitive edge often requires deep model customization through pre-training or fine-tuning. However, these approaches demand advanced AI expertise, high performance compute, fast storage access and can be prohibitively expensive for many organizations.

In this post, we explore how organizations can address these challenges and cost-effectively customize and adapt FMs using AWS managed services such as Amazon SageMaker training jobs and Amazon SageMaker HyperPod. We discuss how these powerful tools enable organizations to optimize compute resources and reduce the complexity of model training and fine-tuning. We explore how you can make an informed decision about which Amazon SageMaker service is most applicable to your business needs and requirements.

Business challenge

Businesses today face numerous challenges in effectively implementing and managing machine learning (ML) initiatives. These challenges include scaling operations to handle rapidly growing data and models, accelerating the development of ML solutions, and managing complex infrastructure without diverting focus from core business objectives. Additionally, organizations must navigate cost optimization, maintain data security and compliance, and democratize both ease of use and access of machine learning tools across teams.

Customers have built their own ML architectures on bare metal machines using open source solutions such as Kubernetes, Slurm, and others. Although this approach provides control over the infrastructure, the amount of effort needed to manage and maintain the underlying infrastructure (for example, hardware failures) over time can be substantial. Organizations often underestimate the complexity involved in integrating these various components, maintaining security and compliance, and keeping the system up-to-date and optimized for performance.

As a result, many companies struggle to use the full potential of ML while maintaining efficiency and innovation in a competitive landscape.

How Amazon SageMaker can help

Amazon SageMaker addresses these challenges by providing a fully managed service that streamlines and accelerates the entire ML lifecycle. You can use the comprehensive set of SageMaker tools for building and training your models at scale while offloading the management and maintenance of underlying infrastructure to SageMaker.

You can use SageMaker to scale your training cluster to thousands of accelerators, with your own choice of compute and optimize your workloads for performance with SageMaker distributed training libraries. For cluster resiliency, SageMaker offers self-healing capabilities that automatically detect and recover from faults, allowing for continuous FM training for months with little to no interruption and reducing training time by up to 40%. SageMaker also supports popular ML frameworks such as TensorFlow and PyTorch through managed pre-built containers. For those who need more customization, SageMaker also allows users to bring in their own libraries or containers.

To address various business and technical use cases, Amazon SageMaker offers two options for distributed pre-training and fine-tuning: SageMaker training jobs and SageMaker HyperPod.

SageMaker training jobs

SageMaker training jobs offer a managed user experience for large, distributed FM training, removing the undifferentiated heavy lifting around infrastructure management and cluster resiliency while offering a pay-as-you-go option. SageMaker training jobs automatically spin up a resilient distributed training cluster, provide managed orchestration, monitor the infrastructure, and automatically recovers from faults for a smooth training experience. After the training is complete, SageMaker spins down the cluster and the customer is billed for the net training time in seconds. FM builders can further optimize this experience by using SageMaker Managed Warm Pools, which allows you to retain and reuse provisioned infrastructure after the completion of a training job for reduced latency and faster iteration time between different ML experiments.

With SageMaker training jobs, FM builders have the flexibility to choose the right instance type to best fit an individual to further optimize their training budget. For example, you can pre-train a large language model (LLM) on a P5 cluster or fine-tune an open source LLM on p4d instances. This allows businesses to offer a consistent training user experience across ML teams with varying levels of technical expertise and different workload types.

Additionally, Amazon SageMaker training jobs integrate tools such as SageMaker Profiler for training job profiling, Amazon SageMaker with MLflow for managing ML experiments, Amazon CloudWatch for monitoring and alerts, and TensorBoard for debugging and analyzing training jobs. Together, these tools enhance model development by offering performance insights, tracking experiments, and facilitating proactive management of training processes.

AI21 Labs, Technology Innovation Institute, Upstage, and Bria AI  chose SageMaker training jobs to train and fine-tune their FMs with the reduced total cost of ownership by offloading the workload orchestration and management of underlying compute to SageMaker. They delivered faster results by focusing their resources on model development and experimentation while SageMaker handled the provisioning, creation, and termination of their compute clusters.

The following demo provides a high-level, step-by-step guide to using Amazon SageMaker training jobs.

SageMaker HyperPod

SageMaker HyperPod offers persistent clusters with deep infrastructure control, which builders can use to connect through Secure Shell (SSH) into Amazon Elastic Compute Cloud (Amazon EC2) instances for advanced model training, infrastructure management, and debugging. To maximize availability, HyperPod maintains a pool of dedicated and spare instances (at no additional cost to the customer), minimizing downtime for critical node replacements. Customers can use familiar orchestration tools such as Slurm or Amazon Elastic Kubernetes Service (Amazon EKS), and the libraries built on top of these tools for flexible job scheduling and compute sharing. Additionally, orchestrating SageMaker HyperPod clusters with Slurm allows NVIDIA’s Enroot and Pyxis integration to quickly schedule containers as performant unprivileged sandboxes. The operating system and software stack are based on the Deep Learning AMI, which are preconfigured with NVIDIA CUDA, NVIDIA cuDNN, and the latest versions of PyTorch and TensorFlow. HyperPod also includes SageMaker distributed training libraries, which are optimized for AWS infrastructure so users can automatically split training workloads across thousands of accelerators for efficient parallel training.

FM builders can use built-in ML tools in HyperPod to enhance model performance, such as using Amazon SageMaker with TensorBoard to visualize model a model architecture and address convergence issues, while Amazon SageMaker Debugger captures real-time training metrics and profiles. Additionally, integrating with observability tools such as Amazon CloudWatch Container Insights, Amazon Managed Service for Prometheus, and Amazon Managed Grafana offer deeper insights into cluster performance, health, and utilization, saving valuable development time.

This self-healing, high-performance environment, trusted by customers like Articul8, IBM, Perplexity AI, Hugging Face, Luma, and Thomson Reuters, supports advanced ML workflows and internal optimizations.

The following demo provides a high-level, step-by-step guide to using Amazon SageMaker HyperPod.

Choosing the right option

For organizations that require granular control over training infrastructure and extensive customization options, SageMaker HyperPod is the ideal choice. HyperPod offers custom network configurations, flexible parallelism strategies, and support for custom orchestration techniques. It integrates seamlessly with tools such as Slurm, Amazon EKS, Nvidia’s Enroot, and Pyxis, and provides SSH access for in-depth debugging and custom configurations.

SageMaker training jobs are tailored for organizations that want to focus on model development rather than infrastructure management and prefer ease of use with a managed experience. SageMaker training jobs feature a user-friendly interface, simplified setup and scaling, automatic handling of distributed training tasks, built-in synchronization, checkpointing, fault tolerance, and abstraction of infrastructure complexities.

When choosing between SageMaker HyperPod and training jobs, organizations should align their decision with their specific training needs, workflow preferences, and desired level of control over the training infrastructure. HyperPod is the preferred option for those seeking deep technical control and extensive customization, and training jobs is ideal for organizations that prefer a streamlined, fully managed solution.

Conclusion

Learn more about Amazon SageMaker and large-scale distributed training on AWS by visiting Getting Started on Amazon SageMaker, watching the Generative AI on Amazon SageMaker Deep Dive Series, and exploring the awsome-distributed-training and amazon-sagemaker-examples GitHub repositories.


About the authors

Trevor Harvey is a Principal Specialist in Generative AI at Amazon Web Services and an AWS Certified Solutions Architect – Professional. Trevor works with customers to design and implement machine learning solutions and leads go-to-market strategies for generative AI services.

Kanwaljit Khurmi is a Principal Generative AI/ML Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

Miron Perel is a Principal Machine Learning Business Development Manager with Amazon Web Services. Miron advises Generative AI companies building their next generation models.

Guillaume Mangeot is Senior WW GenAI Specialist Solutions Architect at Amazon Web Services with over one decade of experience in High Performance Computing (HPC). With a multidisciplinary background in applied mathematics, he leads highly scalable architecture design in cutting-edge fields such as GenAI, ML, HPC, and storage, across various verticals including oil & gas, research, life sciences, and insurance.

Read More

Automate fine-tuning of Llama 3.x models with the new visual designer for Amazon SageMaker Pipelines

Automate fine-tuning of Llama 3.x models with the new visual designer for Amazon SageMaker Pipelines

You can now create an end-to-end workflow to train, fine tune, evaluate, register, and deploy generative AI models with the visual designer for Amazon SageMaker Pipelines. SageMaker Pipelines is a serverless workflow orchestration service purpose-built for foundation model operations (FMOps). It accelerates your generative AI journey from prototype to production because you don’t need to learn about specialized workflow frameworks to automate model development or notebook execution at scale. Data scientists and machine learning (ML) engineers use pipelines for tasks such as continuous fine-tuning of large language models (LLMs) and scheduled notebook job workflows. Pipelines can scale up to run tens of thousands of workflows in parallel and scale down automatically depending on your workload.

Whether you are new to pipelines or are an experienced user looking to streamline your generative AI workflow, this step-by-step post will demonstrate how you can use the visual designer to enhance your productivity and simplify the process of building complex AI and machine learning (AI/ML) pipelines. Specifically, you will learn how to:

Llama fine-tuning pipeline overview

In this post, we will show you how to set up an automated LLM customization (fine-tuning) workflow so that the Llama 3.x models from Meta can provide a high-quality summary of SEC filings for financial applications. Fine-tuning allows you to configure LLMs to achieve improved performance on your domain-specific tasks. After fine-tuning, the Llama 3 8b model should be able to generate insightful financial summaries for its application users. But fine-tuning an LLM just once isn’t enough. You need to regularly tune the LLM to keep it up to date with the most recent real-world data, which in this case would be the latest SEC filings from companies. Instead of repeating this task manually each time new data is available (for example, once every quarter after earnings calls), you can create a Llama 3 fine-tuning workflow using SageMaker Pipelines that can be automatically triggered in the future. This will help you improve the quality of financial summaries produced by the LLM over time while ensuring accuracy, consistency, and reproducibility.

The SEC filings dataset is publicly available through an Amazon SageMaker JumpStart bucket. Here’s an overview of the steps to create the pipeline.

  1. Fine tune a Meta Llama 3 8B model from SageMaker JumpStart using the SEC financial dataset.
  2. Prepare the fine-tuned Llama 3 8B model for deployment to SageMaker Inference.
  3. Deploy the fine-tuned Llama 3 8B model to SageMaker Inference.
  4. Evaluate the performance of the fine-tuned model using the open-source Foundation Model Evaluations (fmeval) library
  5. Use a condition step to determine if the fine-tuned model meets your desired performance. If it does, register the fine-tuned model to the SageMaker Model Registry. If the performance of the fine-tuned model falls below the desired threshold, then the pipeline execution fails.
    SageMaker Pipelines visual editor pipeline overview

Prerequisites

To build this solution, you need the following prerequisites:

  • An AWS account that will contain all your AWS resources.
  • An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, see Identity and Access Management for Amazon SageMaker.
  • Access to SageMaker Studio to access the SageMaker Pipelines visual editor. You first need to create a SageMaker domain and a user profile. See the Guide to getting set up with Amazon SageMaker.
  • An ml.g5.12xlarge instance for endpoint usage to deploy the model to, and an ml.g5.12xlarge training instance to fine-tune the model. You might need to request a quota increase; see Requesting a quota increase for more information.

Accessing the visual editor

Access the visual editor in the SageMaker Studio console by choosing Pipelines in the navigation pane, and then selecting Create in visual editor on the right. SageMaker pipelines are composed of a set of steps. You will see a list of step types that the visual editor supports.

At any time while following this post, you can pause your pipeline building process, save your progress, and resume later. Download the pipeline definition as a JSON file to your local environment by choosing Export at the bottom of the visual editor. Later, you can resume building the pipeline by choosing Import button and re-uploading the JSON file.

Step #1: Fine tune the LLM

With the new editor, we introduce a convenient way to fine tune models from SageMaker JumpStart using the Fine tune step. To add the Fine tune step, drag it to the editor and then enter the following details:

  1. In the Model (input) section select Meta-Llama-3-8B. Scroll to the bottom of the window to accept the EULA and choose Save.
  2. The Model (output) section automatically populates the default Amazon Simple Storage Service (Amazon S3) You can update the S3 URI to change the location where the model artifacts will be stored.
  3. This example uses the default SEC dataset for training. You can also bring your own dataset by updating the Dataset (input)
    SageMaker Pipelines fine-tune step
  4. Choose the ml.g5.12x.large instance.
  5. Leave the default hyperparameter settings. These can be adjusted depending on your use case.
  6. Optional) You can update the name of the step on the Details tab under Step display name. For this example, update the step name to Fine tune Llama 3 8B.
    SageMaker Pipelines fine-tune Llama 3

Step #2: Prepare the fine-tuned LLM for deployment

Before you deploy the model to an endpoint, you will create the model definition, which includes the model artifacts and Docker container needed to host the model.

  1. Drag the Create model step to the editor.
  2. Connect the Fine tune step to the Create model step using the visual editor.
  3. Add the following details under the Settings tab:
    1. Choose an IAM role with the required permissions.
    2. Model (input):Step variable and Fine-tuning Model Artifacts.
    3. Container: Bring your own container and enter the image URI dkr.ecr.<region_name>.amazonaws.com/djl-inference:0.28.0-lmi10.0.0-cu124 (replace <region_name> with your AWS Region) as the Location (ECR URI). This example uses a large model inference container. You can learn more about the deep learning containers that are available on GitHub.

    SageMaker Pipelines create fine-tuned model

Step #3: Deploy the fine-tuned LLM

Next, deploy the model to a real-time inference endpoint.

  1. Drag the Deploy model (endpoint) step to the editor.
  2. Enter a name such as llama-fine-tune for the endpoint name.
  3. Connect this step to the Create model step using the visual editor.
  4. In the Model (input) section, select Inherit model. Under Model name, select Step variable and the Model Name variable should be populated from the previous step. Choose Save.
    SageMaker Pipelines create model step
  5. Select g5.12xlarge instance as the Endpoint Type.
    SageMaker Pipelines deploy model

Step #4: Evaluate the fine-tuned LLM

After the LLM is customized and deployed on an endpoint, you want to evaluate its performance against real-world queries. To do this, you will use an Execute code step type that allows you to run the Python code that performs model evaluation using the factual knowledge evaluation from the fmeval library. The Execute code step type was introduced along with the new visual editor and provides three execution modes in which code can be run: Jupyter Notebooks, Python functions, and Shell or Python scripts. For more information about the Execute code step type, see the developer guide. In this example, you will use a Python function. The function will install the fmeval library, create a dataset to use for evaluation, and automatically test the model on its ability to reproduce facts about the real world.

Download the complete Python file, including the function and all imported libraries. The following are some code snippets of the model evaluation.

Define the LLM evaluation logic

Define a predictor to test your endpoint with a prompt:

# Set up SageMaker predictor for the specified endpoint
predictor = sagemaker.predictor.Predictor(
    endpoint_name=endpoint_name,
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer()
)

# Function to test the endpoint with a sample prompt
def test_endpoint(predictor):

    # Test endpoint and convert the payload to JSON
    prompt = "Tell me about Amazon SageMaker"
    payload = {
        "inputs": prompt,
        "parameters": {
            "do_sample": True,
            "top_p": 0.9,
            "temperature": 0.8,
            "max_new_tokens": 100
        },
    }
    response = predictor.predict(payload)
    print(f'Query successful. nnExample: Prompt: {prompt} Model response: {response["generated_text"]}')
    output_format = '[0].generated_text'
    return output_format

output_format = test_endpoint(predictor)

Invoke your endpoint:

response = runtime.invoke_endpoint(EndpointName=endpoint_name, Body=json.dumps(payload), ContentType=content_type)
result = json.loads(response['Body'].read().decode())

Generate a dataset:

# Create an evaluation dataset in JSONL format with capital cities and their regions
capitals = [
    ("Aurillac", "Cantal"),
    ("Bamiyan", "Bamiyan Province"),
    ("Sokhumi", "Abkhazia"),
    ("Bukavu", "South Kivu"),
    ("Senftenberg", "Oberspreewald-Lausitz"),
    ("Legazpi City", "Albay"),
    ("Sukhum", "Abkhazia"),
    ("Paris", "France"),
    ("Berlin", "Germany"),
    ("Tokyo", "Japan"),
    ("Moscow", "Russia"),
    ("Madrid", "Spain"),
    ("Rome", "Italy"),
    ("Beijing", "China"),
    ("London", "United Kingdom"),
]

# Function to generate a single entry for the dataset
def generate_entry():
    city, region = random.choice(capitals)
    if random.random() < 0.2:
        alternatives = [f"{region} Province", f"{region} province", region]
        answers = f"{region}<OR>" + "<OR>".join(random.sample(alternatives, k=random.randint(1, len(alternatives))))
    else:
        answers = region
    return {
        "answers": answers,
        "knowledge_category": "Capitals",
        "question": f"{city} is the capital of"
    }

# Generate the dataset
num_entries = 15
dataset = [generate_entry() for _ in range(num_entries)]
input_file = "capitals_dataset.jsonl"
with open(input_file, "w") as f:
    for entry in dataset:
        f.write(json.dumps(entry) + "n")

Set up and run model evaluation using fmeval:

# Set up SageMaker model runner
model_runner = SageMakerModelRunner(
endpoint_name=endpoint_name,
content_template=content_template,
output="generated_text"
)

# Configure the dataset for evaluation
config = DataConfig(
dataset_name="capitals_dataset_with_model_outputs",
dataset_uri=output_file,
dataset_mime_type=MIME_TYPE_JSONLINES,
model_input_location="question",
target_output_location="answers",
model_output_location="model_output"
)

# Set up and run the factual knowledge evaluation
eval_algo = FactualKnowledge(FactualKnowledgeConfig(target_output_delimiter="<OR>"))
eval_output = eval_algo.evaluate(model=model_runner, dataset_config=config, prompt_template="$model_input", save=True)

# Print the evaluation results
print(json.dumps(eval_output, default=vars, indent=4))

Upload the LLM evaluation logic

Drag a new Execute code (Run notebook or code) step onto the editor and update the display name to Evaluate model using the Details tab from the settings panel.

SageMaker Pipelines evaluate model

To configure the Execute code step settings, follow these steps in the Settings panel:

  1. Upload the python file py containing the function.
  2. Under Code Settings change the Mode to Function and update the Handler to evaluating_function.py:evaluate_model. The handler input parameter is structured by putting the file name on the left side of the colon, and the handler function name on the right side: file_name.py:handler_function.
  3. Add the endpoint_name parameter for your handler with the value of the endpoint created previously under Function Parameters (input); for example, llama-fine-tune.
  4. Keep the default container and instance type settings.
  5. SageMaker Pipelines evaluate function

After configuring this step, you connect the Deploy model (endpoint) step to the Execute code step using the visual editor.

Step #5: Condition step

After you execute the model evaluation code, you drag a Condition step to the editor. The condition step registers the fine-tuned model to a SageMaker Model Registry if the factual knowledge evaluation score exceeded the desired threshold. If the performance of the model was below the threshold, then the model isn’t added to the model registry and the pipeline execution fails.

  1. Update the Condition step name under the Details tab to Is LLM factually correct.
  2. Drag a Register model step and a Fail step to the editor as shown in the following GIF. You will not configure these steps until the next sections.
  3. Return to the Condition step and add a condition under Conditions (input).
    1. For the first String, enter factual_knowledge.
    2. Select Greater Than as the test.
    3. For the second String enter 7. The evaluation averages a single binary metric across every prompt in the dataset. For more information, see Factual Knowledge.

    SageMaker Pipelines condition for evaluating factual knowledge

  4. In the Conditions (output) section, for Then (execute if true), select Register model, and for Else (execute if false), select Fail.
    SageMaker Pipelines condition step
  5. After configuring this step, connect the Execute  code step to the Condition step using the visual editor.

You will configure the Register model and Fail steps in the following sections.

Step #6: Register the model

To register your model to the SageMaker Model Registry, you need to configure the step to include the S3 URI of the model and the image URI.

  1. Return to the Register model step in the Pipelines visual editor that you created in the previous section and use the following steps to connect the Fine-tune step to the Register model This is required to inherit the model artifacts of the fine-tuned model.
  2. Select the step and choose Add under the Model (input)
  3. Enter the image URI dkr.ecr.<region_name>.amazonaws.com/djl-inference:0.28.0-lmi10.0.0-cu124(replace <region_name> with your Region) in the Image field. For the Model URI field, select Step variable and Fine-tuning Model Artifacts. Choose Save.
  4. Enter a name for the Model group.

Step #7: Fail step

Select the Fail step on the canvas and enter a failure message to be displayed if the model fails to be registered to the model registry. For example: Model below evaluation threshold. Failed to register.

SageMaker Pipelines fail step

Save and execute the pipeline

Now that your pipeline has been constructed, choose Execute and enter a name for the execution to run the pipeline. You can then select the pipeline to view its progress. The pipeline will take 30–40 minutes to execute.

SageMaker Pipelines visual editor fine-tuning pipeline

LLM customization at scale

In this example you executed the pipeline once manually from the UI. But by using the SageMaker APIs and SDK, you can trigger multiple concurrent executions of this pipeline with varying parameters (for example, different LLMs, different datasets, or different evaluation scripts) as part of your regular CI/CD processes. You don’t need to manage the capacity of the underlying infrastructure for SageMaker Pipelines because it automatically scales up or down based on the number of pipelines, number of steps in the pipelines, and number of pipeline executions in your AWS account. To learn more about the default scalability limits and request an increase in the performance of Pipelines, see the Amazon SageMaker endpoints and quotas.

Clean up

Delete the SageMaker model endpoint to avoid incurring additional charges.

Conclusion

In this post, we walked you through a solution to fine-tune a Llama 3 model using the new visual editor for Amazon SageMaker Pipelines. We introduced the fine-tuning step to fine-tune LLMs, and the Execute code step to run your own code in a pipeline step. The visual editor provides a user-friendly interface to create and manage AI/ML workflows. By using this capability, you can rapidly iterate on workflows before executing them at scale in production tens of thousands of times. For more information about this new feature, see Create and Manage Pipelines. Try it out and let us know your thoughts in the comments!


About the Authors

Lauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. She has a decade of experience in DevOps, infrastructure, and ML. Her areas of focus include MLOps/LLMOps, generative AI, and computer vision.

Brock Wade is a Software Engineer for Amazon SageMaker. Brock builds solutions for MLOps, LLMOps, and generative AI, with experience spanning infrastructure, DevOps, cloud services, SDKs, and UIs.

Piyush Kadam is a Product Manager for Amazon SageMaker, a fully managed service for generative AI builders. Piyush has extensive experience delivering products that help startups and enterprise customers harness the power of foundation models.

Read More

Implement Amazon SageMaker domain cross-Region disaster recovery using custom Amazon EFS instances

Amazon SageMaker is a cloud-based machine learning (ML) platform within the AWS ecosystem that offers developers a seamless and convenient way to build, train, and deploy ML models. Extensively used by data scientists and ML engineers across various industries, this robust tool provides high availability and uninterrupted access for its users. When working with SageMaker, your environment resides within a SageMaker domain, which encompasses critical components like Amazon Elastic File System (Amazon EFS) for storage, user profiles, and a diverse array of security configurations. This comprehensive setup enables collaborative efforts by allowing users to store, share, and access notebooks, Python files, and other essential artifacts.

In 2023, SageMaker announced the release of the new SageMaker Studio, which offers two new types of applications: JupyterLab and Code Editor. The old SageMaker Studio was renamed to SageMaker Studio Classic. Unlike other applications that share one single storage volume in SageMaker Studio Classic, each JupyterLab and Code Editor instance has its own Amazon Elastic Block Store (Amazon EBS) volume. For more information about this architecture, see New – Code Editor, based on Code-OSS VS Code Open Source now available in Amazon SageMaker Studio. Another new feature was the ability to bring your own EFS instance, which enables you to attach and detach a custom EFS instance.

A SageMaker domain exclusive to the new SageMaker Studio is composed of the following entities:

  • User profiles
  • Applications including JupyterLab, Code Editor, RStudio, Canvas, and MLflow
  • A variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations

As a precautionary measure, some customers may want to ensure continuous operation of SageMaker in unlikely event of regional impairment of SageMaker service. This solution leverages Amazon EFS’s built-in cross-region replication capability to serve as a robust disaster recovery mechanism, providing continuous and uninterrupted access to your SageMaker domain data across multiple regions. Replicating your data and resources across multiple Regions helps to safeguards against Regional outages and fortifies your defenses against natural disasters or unforeseen technical failures, thereby providing business continuity and disaster recovery capabilities. This setup is particularly crucial for mission-critical and time-sensitive workloads, so data scientists and ML engineers can seamlessly continue their work without any disruptive interruptions.

The solution illustrated in this post focuses on the new SageMaker Studio experience, particularly private JupyterLab and Code Editor spaces. Although the code base doesn’t include shared spaces, the solution is straightforward to extend with the same concept. In this post, we guide you through a step-by-step process to seamlessly migrate and safeguard your new SageMaker domain in Amazon SageMaker Studio from one active AWS to another AWS Region, including all associated user profiles and files. By using a combination of AWS services, you can implement this feature effectively, overcoming the current limitations within SageMaker.

Solution overview

In active-passive mode, the SageMaker domain infrastructure is only provisioned in the primary AWS Region. Data backup is in near real time using Amazon EFS replication. Diagram 1 illustrates this architecture.

Diagram 1:

When the primary Region is down, a new domain is launched in the secondary Region, and an AWS Step Functions workflow runs to restore data as seen in diagram 2.

Diagram 2:

In active-active mode depicted in diagram 3, the SageMaker domain infrastructure is provisioned in two AWS Regions. Data backup is in near real time using Amazon EFS replication. The data sync is completed by the Step Functions workflow, and its cadence can be on demand, scheduled, or invoked by an event.

Diagram 3:

You can find the complete code sample in the GitHub repo.

Click here to open the AWS console and follow along.

With all the benefits of upgraded SageMaker domains, we developed a fast and robust cross- AWS Region disaster recovery solution, using Amazon EFS to back up and recover user data stored in SageMaker Studio applications. In addition, domain user profiles and respective custom Posix are managed by a YAML file in an AWS Cloud Development Kit (AWS CDK) code base to make sure domain entities in the secondary AWS Region are identical to those in the primary AWS Region. Because user-level custom EFS instances are only configurable through programmatic API calls, creating users on the AWS Management Console is not considered in our context.

Backup

Backup is performed within the primary AWS Region. There are two types of sources: an EBS space and a custom EFS instance.

For an EBS space, a lifecycle config is attached to JupyterLab or Code Editor for the purposes of backing up files. Every time the user opens the application, the lifecycle config takes a snapshot of its EBS spaces and stores them in the custom EFS instance using an rsync command.

For a custom EFS instance, it’s automatically replicated to its read-only replica in the secondary AWS Region.

Recovery

For recovery in the secondary AWS Region, a SageMaker domain with the same user profiles and spaces is deployed, and an empty custom EFS instance is created and attached to it. Then an Amazon Elastic Container Service (Amazon ECS) task runs to copy all the backup files to the empty custom EFS instance. At the last step, a lifecycle config script is run to restore the Amazon EBS snapshots before the SageMaker space launched.

Prerequisites

Complete the following prerequisite steps:

  1. Clone the GitHub repo to your local machine by running the following command in your terminal:
    git clone git@github.com:aws-samples/sagemaker-domain-cross-region-disaster-recovery-using-custom-efs.git

  2. Navigate to the project working directory and set up the Python virtual environment:
    python3 -m venv .venv
    source .venv/bin/activate

  3. Install the required dependencies:
    pip3 install -r requirements.txt

  4. Bootstrap your AWS account and set up the AWS CDK environment in both Regions:
    cdk bootstrap aws://ACCOUNT-NUMBER/PIMARY_REGION
    cdk bootstrap aws://ACCOUNT-NUMBER/SECONDARY_REGION

  5. Synthesize the AWS CloudFormation templates by running the following code:
    cdk synth

  6. Configure the necessary arguments in the constants.py file:
    1. Set the primary Region in which you want to deploy the solution.
    2. Set the secondary Region in which you want to recover the primary domain.
    3. Replace the account ID variable with your AWS account ID.

Deploy the solution

Complete the following steps to deploy the solution:

  1. Deploy the primary SageMaker domain:
    cdk deploy SagemakerDomainPrimaryStack-NewStudio

  2. Deploy the secondary SageMaker domain:
    cdk deploy SagemakerDomainSecondaryStack-NewStudio

  3. Deploy the disaster recovery Step Functions workflow:
    cdk deploy ECSTaskStack-NewStudio

  4. Launch the application with the custom EFS instance attached and add files to the application’s EBS volume and custom EFS instance.

Test the solution

Complete the following steps to test the solution:

  1. Add test files using Code Editor or JupyterLab in the primary Region.
  2. Stop and restart the application.

This invokes the lifecycle config script to take an Amazon EBS snapshot on the application.

  1. On the Step Functions console in the secondary Region, run the disaster recovery Step Functions workflow.

The following figure illustrates the workflow steps.

  1. On the SageMaker console in the secondary Region, launch the same user’s SageMaker Studio.

You will find your files backed up in either Code Editor or JupyterLab.

Clean up

To avoid incurring ongoing charges, clean up the resources you created as part of this post:

  1. Stop all Code Editor and JupyterLab Apps
  2. Delete all cdk stacks
cdk destroy ECSTaskStack-NewStudio
cdk destroy SagemakerDomainSecondaryStack-NewStudio
cdk destroy SagemakerDomainPrimaryStack-NewStudio

Conclusion

SageMaker offers a robust and highly available ML platform, enabling data scientists and ML engineers to build, train, and deploy models efficiently. For critical use cases, implementing a comprehensive disaster recovery strategy enhances the resilience of your SageMaker domain, ensuring continuous operation in the unlikely event of regional impairment. This post presents a detailed solution for migrating and safeguarding your SageMaker domain, including user profiles and files, from one active AWS Region to another passive or active AWS Region. By using a strategic combination of AWS services, such as Amazon EFS, Step Functions, and the AWS CDK, this solution overcomes the current limitations within SageMaker Studio and provides continuous access to your valuable data and resources. Whether you choose an active-passive or active-active architecture, this solution provides a robust and resilient backup and recovery mechanism, fortifying your defenses against natural disasters, technical failures, and Regional outages. With this comprehensive guide, you can confidently safeguard your mission-critical and time-sensitive workloads, maintaining business continuity and uninterrupted access to your SageMaker domain, even in the case of unforeseen circumstances.

For more information on disaster recovery on AWS, refer to the following:


About the Authors

Jinzhao Feng is a Machine Learning Engineer at AWS Professional Services. He focuses on architecting and implementing large-scale generative AI and classic ML pipeline solutions. He is specialized in FMOps, LLMOps, and distributed training.

Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.

Natasha Tchir is a Cloud Consultant at the Generative AI Innovation Center, specializing in machine learning. With a strong background in ML, she now focuses on the development of generative AI proof-of-concept solutions, driving innovation and applied research within the GenAIIC.

Katherine Feng is a Cloud Consultant at AWS Professional Services within the Data and ML team. She has extensive experience building full-stack applications for AI/ML use cases and LLM-driven solutions.

Read More

Amazon Bedrock Custom Model Import now generally available

Amazon Bedrock Custom Model Import now generally available

Today, we’re pleased to announce the general availability (GA) of Amazon Bedrock Custom Model Import. This feature empowers customers to import and use their customized models alongside existing foundation models (FMs) through a single, unified API. Whether leveraging fine-tuned models like Meta Llama, Mistral Mixtral, and IBM Granite, or developing proprietary models based on popular open-source architectures, customers can now bring their custom models into Amazon Bedrock without the overhead of managing infrastructure or model lifecycle tasks.

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

With Amazon Bedrock Custom Model Import, customers can access their imported custom models on demand in a serverless manner, freeing them from the complexities of deploying and scaling models themselves. They’re able to accelerate generative AI application development by using native Amazon Bedrock tools and features such as Knowledge Bases, Guardrails, Agents, and more—all through a unified and consistent developer experience.

Benefits of Amazon Bedrock Custom Model Import include:

  1. Flexibility to use existing fine-tuned models:Customers can use their prior investments in model customization by importing existing customized models into Amazon Bedrock without the need to recreate or retrain them. This flexibility maximizes the value of previous efforts and accelerates application development.
  2. Integration with Amazon Bedrock Features: Imported custom models can be seamlessly integrated with the native tools and features of Amazon Bedrock, such as Knowledge Bases, Guardrails, Agents, and Model Evaluation. This unified experience enables developers to use the same tooling and workflows across both base FMs and imported custom models.
  3. Serverless: Customers can access their imported custom models in an on-demand and serverless manner. This eliminates the need to manage or scale underlying infrastructure, as Amazon Bedrock handles all those aspects. Customers can focus on developing generative AI applications without worrying about infrastructure management or scalability issues.
  4. Support for popular model architectures: Amazon Bedrock Custom Model Import supports a variety of popular model architectures, including Meta Llama 3.2, Mistral 7B, Mixtral 8x7B, and more. Customers can import custom weights in formats like Hugging Face Safetensors from Amazon SageMaker and Amazon S3. This broad compatibility allows customers to work with models that best suit their specific needs and use cases, allowing for greater flexibility and choice in model selection.
  5. Leverage Amazon Bedrock converse API: Amazon Custom Model Import allows our customers to use their supported fine-tuned models with Amazon Bedrock Converse API which simplifies and unifies the access to the models.

Getting started with Custom Model Import

One of the critical requirements from our customers is the ability to customize models with their proprietary data while retaining complete ownership and control over the tuned model artifact and its deployment. Customization could be in form of domain adaptation or instruction fine-tuning. Customers have a wide degree of options for fine-tuning models efficiently and cost effectively. However, hosting models presents its own unique set of challenges. Customers are looking for some key aspects, namely:

  • Using the existing customization investment and fine-grained control over customization.
  • Having a unified developer experience when accessing custom models or base models through Amazon Bedrock’s API.
  • Ease of deployment through a fully managed, serverless, service.
  • Using pay-as-you-go inference to minimize the costs of their generative AI workloads.
  • Be backed by enterprise grade security and privacy tooling.

Amazon Bedrock Custom Model Import feature seeks to address these concerns. To bring your custom model into the Amazon Bedrock ecosystem, you need to run an import job. The import job can be invoked using the AWS Management Console or through APIs. In this post, we demonstrate the code for running the import model process through APIs. After the model is imported, you can invoke the model by using the model’s Amazon Resource Name (ARN).

As of this writing, supported model architectures today include Meta Llama (v.2, 3, 3.1, and 3.2), Mistral 7B, Mixtral 8x7B, Flan and IBM Granite models like Granite 3B-Code, 8B-Code, 20B-Code and 34B-Code.

A few points to be aware of when importing your model:

  • Models must be serialized in Safetensors format.
  • If you have a different format, you can potentially use Llama convert scripts or Mistral convert scripts to convert your model to a supported format.
  • The import process expects at least the following files:.safetensors, json, tokenizer_config.json, tokenizer.json, and tokenizer.model.
  • The precision for the model weights supported is FP32, FP16, and BF16.
  • For fine-tuning jobs that create adapters like LoRA-PEFT adapters, the import process expects the adapters to be merged into the main base model weight as described in Model merging.

Importing a model using the Amazon Bedrock console

  1. Go to the Amazon Bedrock console and choose Foundational models and then Imported models from the navigation pane on the left hand side to get to the Models
  2. Click on Import Model to configure the import process.
  3. Configure the model.
    1. Enter the location of your model weights. These can be in Amazon S3 or point to a SageMaker Model ARN object.
    2. Enter a Job name. We recommend this be suffixed with the version of the model. As of now, you need to manage the generative AI operations aspects outside of this feature.
    3. Configure your AWS Key Management Service (AWS KMS) key for encryption. By default, this will default to a key owned and managed by AWS.
    4. Service access role. You can create a new role or use an existing role which will have the necessary permissions to run the import process. The permissions must include access to your Amazon S3 if you’re specifying model weights through S3.
  1. After the Import Model job is complete, you will see the model and the model ARN. Make a note of the ARN to use later.
  2. Test the model using the on-demand feature in the Text playground as you would for any base foundations model.

The import process validates that the model configuration complies with the specified architecture for that model by reading the config.json file and validates the model architecture values such as the maximum sequence length and other relevant details. It also checks that the model weights are in the Safetensors format. This validation verifies that the imported model meets the necessary requirements and is compatible with the system.

Fine tuning a Meta Llama Model on SageMaker

Meta Llama 3.2 offers multi-modal vision and lightweight models, representing Meta’s latest advances in large language models (LLMs). These new models provide enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, the Llama 3.2 models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features to help you build a new generation of AI experiences.

SageMaker JumpStart provides FMs through two primary interfaces: SageMaker Studio and the SageMaker Python SDK. This gives you multiple options to discover and use hundreds of models for your use case.

In this section, we’ll show you how to fine-tune the Llama 3.2 3B Instruct model using SageMaker JumpStart. We’ll also share the supported instance types and context for the Llama 3.2 models available in SageMaker JumpStart. Although not highlighted in this post, you can also find other Llama 3.2 Model variants that can be fine-tuned using SageMaker JumpStart.

Instruction fine-tuning

The text generation model can be instruction fine-tuned on any text data, provided that the data is in the expected format. The instruction fine-tuned model can be further deployed for inference. The training data must be formatted in a JSON Lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, but can be saved in multiple JSON Lines files. The training folder can also contain a template.json file describing the input and output formats.

Synthetic dataset

For this use case, we’ll use a synthetically generated dataset named amazon10Ksynth.jsonl in an instruction-tuning format. This dataset contains approximately 200 entries designed for training and fine-tuning LLMs in the finance domain.

The following is an example of the data format:

instruction_sample = {
    "question": "What is Amazon's plan for expanding their physical store footprint and how will that impact their overall revenue?",
    "context": "The 10-K report mentions that Amazon is continuing to expand their physical store network, including 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth.",
    "answer": "Amazon is expanding their physical store footprint, with 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth."
}
 
print(instruction_sample)

Prompt template

Next, we create a prompt template for using the data in an instruction input format for the training job (because we are instruction fine-tuning the model in this example), and for inferencing the deployed endpoint.

import json

prompt_template = {
  "prompt": "question: {question} context: {context}",
  "completion": "{answer}"
}

with open("prompt_template.json", "w") as f:
    json.dump(prompt_template, f)

After the prompt template is created, upload the prepared dataset that will be used for fine-tuning to Amazon S3.

from sagemaker.s3 import S3Uploader
import sagemaker
output_bucket = sagemaker.Session().default_bucket()
local_data_file = "amazon10Ksynth.jsonl"
train_data_location = f"s3://{output_bucket}/amazon10Ksynth_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("prompt_template.json", train_data_location)
print(f"Training data: {train_data_location}")

Fine-tuning the Meta Llama 3.2 3B model

Now, we’ll fine-tune the Llama 3.2 3B model on the financial dataset. The fine-tuning scripts are based on the scripts provided by the Llama fine-tuning repository.

from sagemaker.jumpstart.estimator import JumpStartEstimator
 
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},
    disable_output_compression=True,
    instance_type="ml.g5.12xlarge",  
)
 
# Set the hyperparameters for instruction tuning
estimator.set_hyperparameters(
    instruction_tuned="True", epoch="5", max_input_length="1024"
)
 
# Fit the model on the training data
estimator.fit({"training": train_data_location})

Importing a custom model from SageMaker to Amazon Bedrock

In this section, we will use a Python SDK to create a model import job, get the imported model ID and finally generate inferences. You can refer to the console screenshots in the earlier section  for how to import a model using the Amazon Bedrock console.

Parameter and helper function set up

First, we’ll create a few helper functions and set up our parameters to create the import job. The import job is responsible for collecting and deploying the model from SageMaker to Amazon Bedrock. This is done by using the create_model_import_job function.

Stored safetensors need to be formatted so that the Amazon S3 location is the top-level folder. The configuration files and safetensors will be stored as shown in the following figure.

import json
import boto3
from botocore.exceptions import ClientError
bedrock = boto3.client('bedrock', region_name='us-east-1')
job_name = 'fine-tuned-model-import-demo'
sagemaker_model_name = 'meta-textgeneration-llama-3-2-3b-2024-10-12-23-29-57-373'
model_url = {'s3DataSource':
                 {'s3Uri':
                      "s3://sagemaker-{REGION}-{AWS_ACCOUNT}/meta-textgeneration-llama-3-2-3b-2024-10-12-23-19-53-906/output/model/"
                  }
             }

Check the status and get job ARN from the response:

After a few minutes, the model will be imported, and the status of the job can be checked using get_model_import_job. The job ARN is then used to get the imported model ARN, which we will use to generate inferences.

def get_import_model_from_job(job_name):
    response = bedrock.get_model_import_job(jobIdentifier=job_name)
    return response['importedModelArn']


job_arn = response['jobArn']
import_model_arn = get_import_model_from_job(job_arn)

Generating inferences using the imported custom model

The model can be invoked by using the invoke_model and converse APIs. The following is a support function that will be used to invoke and extract the generated text from the overall output.

from botocore.exceptions import ClientError

client = boto3.client('bedrock-runtime', region_name='us-east-1')

def generate_conversation_with_imported_model(native_request, model_id):
    request = json.dumps(native_request)
    try:
        # Invoke the model with the request.
        response = client.invoke_model(modelId=model_id, body=request)
        model_response = json.loads(response["body"].read())

        response_text = model_response["outputs"][0]["text"]
        print(response_text)
    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

Context set up and model response

Finally, we can use the custom model. First, we format our inquiry to match the fined-tuned prompt structure. This will make sure that the responses generated closely resemble the format used in the fine-tuning phase and are more aligned to our needs. To do this we use the template that we used to format the data used for fine-tuning. The context will be coming from your RAG solutions like Amazon Bedrock Knowledgebases. For this example, we take a sample context and add to demo the concept:

input_output_demarkation_key = "nn### Response:n"
question = "Tell me what was the improved inflow value of cash?"

context = "Amazons free cash flow less principal repayments of finance leases and financing obligations improved to an inflow of $46.1 billion for the trailing twelve months, compared with an outflow of $10.1 billion for the trailing twelve months ended March 31, 2023."

payload = {
    "prompt": template[0]["prompt"].format(
                                    question=question, # user query
                                    context=context 
                                       + input_output_demarkation_key # rag context
                                        ),
    "max_tokens": 100, 
    "temperature": 0.01 
}
generate_conversation_with_imported_model(payload, import_model_arn)

The output will look similar to:

After the model has been fine-tuned and imported into Amazon Bedrock, you can experiment by sending different sets of input questions and context to the model to generate a response, as shown in the following example:

question: """How did Amazon's international segment operating income change 
            in Q4 2022 compared to the prior year?"""
context: """Amazon's international segment reported an operating loss of 
            $1.1 billion in Q4 2022, an improvement from a $1.7 billion 
            operating loss in Q4 2021."""
response:

Some points to note

This examples in this post are to demonstrate Custom Model Import and aren’t designed to be used in production. Because the model has been trained on only 200 samples of synthetically generated data, it’s only useful for testing purposes. You would ideally have more diverse datasets and additional samples with continuous experimentation conducted using hyperparameter tuning for your respective use case, thereby steering the model to create a more desirable output. For this post, ensure that the model temperature parameter is set to 0 and max_tokens run time parameter is set to a lower values such as 100–150 tokens so that a succinct response is generated. You can experiment with other parameters to generate a desirable outcome. See Amazon Bedrock Recipes and GitHub for more examples.

Best practices to consider:

This feature brings significant advantages for hosting your fine-tuned models efficiently. As we continue to develop this feature to meet our customers’ needs, there are a few points to be aware of:

  • Define your test suite and acceptance metrics before starting the journey. Automating this will help to save time and effort.
  • Currently, the model weights need to be all-inclusive, including the adapter weights. There are multiple methods for merging the models and we recommend experimenting to determine the right methodology. The Custom Model Import feature lets you test your model on demand.
  • When creating your import jobs, add versioning to the job name to help quickly track your models. Currently, we’re not offering model versioning, and each import is a unique job and creates a unique model.
  • The precision supported for the model weights is FP32, FP16, and BF16. Run tests to validate that these will work for your use case.
  • The maximum concurrency that you can expect for each model will be 16 per account. Higher concurrency requests will cause the service to scale and increase the number of model copies.
  • The number of model copies active at any point in time will be available through Amazon CloudWatch See Import a customized model to Amazon Bedrock for more information.
  • As of the writing this post, we are releasing this feature in the US-EAST-1 and US-WEST-2 AWS Regions only. We will continue to release to other Regions. Follow Model support by AWS Region for updates.
  • The default import quota for each account is three models. If you need more for your use cases, work with your account teams to increase your account quota.
  • The default throttling limits for this feature for each account will be 100 invocations per second.
  • You can use this sample notebook to performance test your models imported via this feature. This notebook is mere reference and not designed to be an exhaustive testing. We will always recommend you to run your own full performance testing along with your end to end testing including functional and evaluation testing.

Now available

Amazon Bedrock Custom Model Import is generally available today in Amazon Bedrock in the US-East-1 (N. Virginia) and US-West-2 (Oregon) AWS Regions. See the full Region list for future updates. To learn more, see the Custom Model Import product page and pricing page.

Give Custom Model Import a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.


About the authors

Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.

Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.

Claudio Mazzoni is a Sr GenAI Specialist Solutions Architect at AWS working on world class applications guiding costumers through their implementation of GenAI to reach their goals and improve their business outcomes. Outside of work Claudio enjoys spending time with family, working in his garden and cooking Uruguayan food.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.

Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.

Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.

Read More

Deploy a serverless web application to edit images using Amazon Bedrock

Deploy a serverless web application to edit images using Amazon Bedrock

Generative AI adoption among various industries is revolutionizing different types of applications, including image editing. Image editing is used in various sectors, such as graphic designing, marketing, and social media. Users rely on specialized tools for editing images. Building a custom solution for this task can be complex. However, by using various AWS services, you can quickly deploy a serverless solution to edit images. This approach can give your teams access to image editing foundation models (FMs) using Amazon Bedrock.

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. Amazon Bedrock is serverless, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

Amazon Titan Image Generator G1 is an AI FM available with Amazon Bedrock that allows you to generate an image from text, or upload and edit your own image. Some of the key features we focus on include inpainting and outpainting.

This post introduces a solution that simplifies the deployment of a web application for image editing using AWS serverless services. We use AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon Bedrock with the Amazon Titan Image Generator G1 model to build an application to edit images using prompts. We cover the inner workings of the solution to help you understand the function of each service and how they are connected to give you a complete solution. At the time of writing this post, Amazon Titan Image Generator G1 comes in two versions; for this post, we use version 2.

Solution overview

The following diagram provides an overview and highlights the key components. The architecture uses Amazon Cognito for user authentication and Amplify as the hosting environment for our frontend application. A combination of API Gateway and a Lambda function is used for our backend services, and Amazon Bedrock integrates with the FM model, enabling users to edit the image using prompts.

Solution Overview

Prerequisites

You must have the following in place to complete the solution in this post:

Deploy solution resources using AWS CloudFormation

When you run the AWS CloudFormation template, the following resources are deployed:

  • Amazon Cognito resources:
  • Lambda resources:
    • Function: <Stack name>-ImageEditBackend-<auto-generated>
  • AWS Identity Access Management (IAM) resources:
    • IAM role: <Stack name>-ImageEditBackendRole-<auto-generated>
    • IAM inline policy: AmazonBedrockAccess (this policy allows Lambda to invoke Amazon Bedrock FM amazon.titan-image-generator-v2:0)
  • API Gateway resources:
    • Rest API: ImageEditingAppBackendAPI
    • Methods:
      • OPTIONS – Added header mapping for CORS
      • POST – Lambda integration
    • Authorization: Through Amazon Cognito using CognitoAuthorizer

After you deploy the CloudFormation template, copy the following from the Outputs tab to be used during the deployment of Amplify:

  • userPoolId
  • userPoolClientId
  • invokeUrl

CFN Output

Deploy the Amplify application

You have to manually deploy the Amplify application using the frontend code found on GitHub. Complete the following steps:

  1. Download the frontend code from the GitHub repo.
  2. Unzip the downloaded file and navigate to the folder.
  3. In the js folder, find the config.js file and replace the values of XYZ for userPoolId, userPoolClientId, and invokeUrl with the values you collected from the CloudFormation stack outputs. Set the region value based on the Region where you’re deploying the solution.

The following is an example config.js file:

window._config = {
    cognito: {
        userPoolId: 'XYZ', // e.g. us-west-2_uXboG5pAb
        userPoolClientId: 'XYZ', // e.g. 25ddkmj4v6hfsfvruhpfi7n4hv
        region: 'XYZ// e.g. us-west-2
    },
    api: {
        invokeUrl: 'XYZ' // e.g. https://rc7nyt4tql.execute-api.us-west-2.amazonaws.com/prod,
    }
};

Extract Update Config File

  1. Select all the files and compress them as shown in the following screenshot.

Make sure you zip the contents and not the top-level folder. For example, if your build output generates a folder named AWS-Amplify-Code, navigate into that folder and select all the contents, and then zip the contents.

Create New Zip File

  1. Use the new .zip file to manually deploy the application in Amplify.

After it’s deployed, you will receive a domain that you can use in later steps to access the application.

AWS Amplify Search Create App

  1. Create a test user in the Amazon Cognito user pool.

An email address is required for this user because you will need to mark the email address as verified.

Cognito Create User

  1. Return to the Amplify page and use the domain it automatically generated to access the application.

Use Amazon Cognito for user authentication

Amazon Cognito is an identity platform that you can use to authenticate and authorize users. We use Amazon Cognito in our solution to verify the user before they can use the image editing application.

Upon accessing the Image Editing Tool URL, you will be prompted to sign in with a previously created test user. For first-time sign-ins, users will be asked to update their password. After this process, the user’s credentials are validated against the records stored in the user pool. If the credentials match, Amazon Cognito will issue a JSON Web Token (JWT). In the API payload to be sent section of the page, you will notice that the Authorization field has been updated with the newly issued JWT.

Use Lambda for backend code and Amazon Bedrock for generative AI function

The backend code is hosted on Lambda, and launched by user requests routed through API Gateway. The Lambda function process the request payload and forwards it to Amazon Bedrock. The reply from Amazon Bedrock follows the same route as the initial request.

Use API Gateway for API management

API Gateway streamlines API management, allowing developers to deploy, maintain, monitor, secure, and scale their APIs effortlessly. In our use case, API Gateway serves as the orchestrator for the application logic and provides throttling to manage the load to the backend. Without API Gateway, you would need to use the JavaScript SDK in the frontend to interact directly with the Amazon Bedrock API, bringing more work to the frontend.

Use Amplify for frontend code

Amplify offers a development environment for building secure, scalable mobile and web applications. It allows developers to focus on their code rather than worrying about the underlying infrastructure. Amplify also integrates with many Git providers. For this solution, we manually upload our frontend code using the method outlined earlier in this post.

Image editing tool walkthrough

Navigate to the URL provided after you created the application in Amplify and sign in. At first login attempt, you’ll be asked to reset your password.

App Login

As you follow the steps for this tool, you will notice the API Payload to be Sent section on the right side updating dynamically, reflecting the details mentioned in the corresponding steps that follow.

Step 1: Create a mask on your image

To create a mask on your image, choose a file (JPEG, JPG, or PNG).

After the image is loaded, the frontend converts the file into base64 and base_image value is updated.

As you select a portion of the image you want to edit, a mask will be created, and mask value is updated with a new base64 value. You can also use the stroke size option to adjust the area you are selecting.

You now have the original image and the mask image encoded in base64. (The Amazon Titan Image Generator G1 model requires the inputs to be in base64 encoding.)

Choose File and Create Mask

Step 2: Write a prompt and set your options

Write a prompt that describes what you want to do with the image. For this example, we enter Make the driveway clear and empty. This is reflected in the prompt on the right.

You can choose from the following image editing options: inpainting and outpainting. The value for mode is updated depending on your selection.

  • Use inpainting to remove masked elements and replace them with background pixels
  • Use outpainting to extend the pixels of the masked image to the image boundaries

Choose Send to API to send the payload to the API gateway. This action invokes the Lambda function, which validates the received payload. If the payload is validated successfully, the Lambda function proceeds to invoke the Amazon Bedrock API for further processing.

The Amazon Bedrock API generates two image outputs in base64 format, which are transmitted back to the frontend application and rendered as visual images.

Prompt

Step 3: View and download the result

The following screenshot shows the results of our test. You can download the results or provide an updated prompt to get a new output.

Download

Testing and troubleshooting

When you initiate the Send to API action, the system performs a validation check. If required information is missing or incorrect, it will display an error notification. For instance, if you attempt to send an image to the API without providing a prompt, an error message will appear on the right side of the interface, alerting you to the missing input, as shown in the following screenshot.

App Error

Clean up

If you decide to discontinue using the Image Editing Tool, you can follow these steps to remove the Image Editing Tool, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:

  1. Delete the CloudFormation stack:
    1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
    2. Locate the stack you created during the deployment process (you assigned a name to it).
    3. Select the stack and choose Delete.
  2. Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion

In this post, we explored a sample solution that you can use to deploy an image editing application by using AWS serverless services and generative AI services. We used Amazon Bedrock and an Amazon Titan FM that allows you to edit images by using prompts. By adopting this solution, you gain the advantage of using AWS managed services, so you don’t have to maintain the underlying infrastructure. Get started today by deploying this sample solution.

Additional resources

To learn more about Amazon Bedrock, see the following resources:

To learn more about the Amazon Titan Image Generator G1 model, see the following resources:


About the Authors

Salman AhmedSalman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys helping customers in the travel and hospitality industry to design, implement, and support cloud infrastructure. With a passion for networking services and years of experience, he helps customers adopt various AWS networking services. Outside of work, Salman enjoys photography, traveling, and watching his favorite sports teams.

Sergio BarrazaSergio Barraza is a Senior Enterprise Support Lead at AWS, helping energy customers design and optimize cloud solutions. With a passion for software development, he guides energy customers through AWS service adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.

Ravi KumarRavi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.

Ankush GoyalAnkush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Read More