Generative AI can revolutionize organizations by enabling the creation of innovative applications that offer enhanced customer and employee experiences. Intelligent document processing, translation and summarization, flexible and insightful responses for customer support agents, personalized marketing content, and image and code generation are a few use cases using generative AI that organizations are rolling out in production.
Large organizations often have many business units with multiple lines of business (LOBs), with a central governing entity, and typically use AWS Organizations with an Amazon Web Services (AWS) multi-account strategy. They implement landing zones to automate secure account creation and streamline management across accounts, including logging, monitoring, and auditing. Although LOBs operate their own accounts and workloads, a central team, such as the Cloud Center of Excellence (CCoE), manages identity, guardrails, and access policies
As generative AI adoption grows, organizations should establish a generative AI operating model. An operating model defines the organizational design, core processes, technologies, roles and responsibilities, governance structures, and financial models that drive a business’s operations.
In this post, we evaluate different generative AI operating model architectures that could be adopted.
Operating model patterns
Organizations can adopt different operating models for generative AI, depending on their priorities around agility, governance, and centralized control. Governance in the context of generative AI refers to the frameworks, policies, and processes that streamline the responsible development, deployment, and use of these technologies. It encompasses a range of measures aimed at mitigating risks, promoting accountability, and aligning generative AI systems with ethical principles and organizational objectives. Three common operating model patterns are decentralized, centralized, and federated, as shown in the following diagram.
Decentralized model
In a decentralized approach, generative AI development and deployment are initiated and managed by the individual LOBs themselves. LOBs have autonomy over their AI workflows, models, and data within their respective AWS accounts.
This enables faster time-to-market and agility because LOBs can rapidly experiment and roll out generative AI solutions tailored to their needs. However, even in a decentralized model, often LOBs must align with central governance controls and obtain approvals from the CCoE team for production deployment, adhering to global enterprise standards for areas such as access policies, model risk management, data privacy, and compliance posture, which can introduce governance complexities.
Centralized model
In a centralized operating model, all generative AI activities go through a central generative artificial intelligence and machine learning (AI/ML) team that provisions and manages end-to-end AI workflows, models, and data across the enterprise.
LOBs interact with the central team for their AI needs, trading off agility and potentially increased time-to-market for stronger top-down governance. A centralized model may introduce bottlenecks that slow down time-to-market, so organizations need to adequately resource the team with sufficient personnel and automated processes to meet the demand from various LOBs efficiently. Failure to scale the team can negate the governance benefits of a centralized approach.
Federated model
A federated model strikes a balance by having key activities of the generative AI processes managed by a central generative AI/ML platform team.
While LOBs drive their AI use cases, the central team governs guardrails, model risk management, data privacy, and compliance posture. This enables agile LOB innovation while providing centralized oversight on governance areas.
Generative AI architecture components
Before diving deeper into the common operating model patterns, this section provides a brief overview of a few components and AWS services used in the featured architectures.
Large language models
Large language models (LLMs) are large-scale ML models that contain billions of parameters and are pre-trained on vast amounts of data. LLMs may hallucinate, which means a model can provide a confident but factually incorrect response. Furthermore, the data that the model was trained on might be out of date, which leads to providing inaccurate responses. One way to mitigate LLMs from giving incorrect information is by using a technique known as Retrieval Augmented Generation (RAG). RAG is an advanced natural language processing technique that combines knowledge retrieval with generative text models. RAG combines the powers of pre-trained language models with a retrieval-based approach to generate more informed and accurate responses. To set up RAG, you need to have a vector database to provide your model with related source documents. Using RAG, the relevant document segments or other texts are retrieved and shared with LLMs to generate targeted responses with enhanced content quality and relevance.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon using a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Amazon SageMaker JumpStart provides access to proprietary FMs from third-party providers such as AI21 Labs, Cohere, and LightOn. In addition, Amazon SageMaker JumpStart onboards and maintains open source FMs from third-party sources such as Hugging Face.
Data sources, embeddings, and vector store
Organizations’ domain-specific data, which provides context and relevance, typically resides in internal databases, data lakes, unstructured data repositories, or document stores, collectively referred to as organizational data sources or proprietary data stores.
A vector store is a system you can use to store and query vectors at scale, with efficient nearest neighbor query algorithms and appropriate indexes to improve data retrieval. It includes not only the embeddings of an organization’s data (mathematical representation of data in the form of vectors) but also raw text from the data in chunks. These vectors are generated by specialized embedding LLMs, which process the organization’s text chunks to create numerical representations (vectors), which are stored with the text chunks in the vector store. For a comprehensive read about vector store and embeddings, you can refer to The role of vector databases in generative AI applications.
With Amazon Bedrock Knowledge Bases, you securely connect FMs in Amazon Bedrock to your company data for RAG. Amazon Bedrock Knowledge Bases facilitates data ingestion from various supported data sources; manages data chunking, parsing, and embeddings; and populates the vector store with the embeddings. With all that provided as a service, you can think of Amazon Bedrock Knowledge Bases as a fully managed and serverless option to build powerful conversational AI systems using RAG.
Guardrails
Content filtering mechanisms are implemented as safeguards to control user-AI interactions, aligning with application requirements and responsible AI policies by minimizing undesirable and harmful content. Guardrails can check user inputs and FM outputs and filter or deny topics that are unsafe, redact personally identifiable information (PII), and enhance content safety and privacy in generative AI applications.
Amazon Bedrock Guardrails is a feature of Amazon Bedrock that you can use to put safeguards in place. You determine what qualifies based on your company policies. These safeguards are FM agnostic. You can create multiple guardrails with different configurations tailored to specific use cases. For a review on Amazon Bedrock Guardrails, you can refer to these blog posts: Guardrails for Amazon Bedrock helps implement safeguards customized to your use cases and responsible AI policies and Guardrails for Amazon Bedrock now available with new safety filters and privacy controls.
Operating model architectures
This section provides an overview of the three kinds of operating models.
Decentralized operating model
In a decentralized operating model, LOB teams maintain control and ownership of their AWS accounts. Each LOB configures and orchestrates generative AI components, common functionalities, applications, and Amazon Bedrock configurations within their respective AWS accounts. This model empowers LOBs to tailor their generative AI solutions according to their specific requirements, while taking advantage of the power of Amazon Bedrock.
With this model, the LOBs configure the core components, such as LLMs and guardrails, and the Amazon Bedrock service account manages the hosting, execution, and provisioning of interface endpoints. These endpoints enable LOBs to access and interact with the Amazon Bedrock services they’ve configured.
Each LOB performs monitoring and auditing of their configured Amazon Bedrock services within their account, using Amazon CloudWatch Logs and AWS CloudTrail for log capture, analysis, and auditing tailored to their needs. Amazon Bedrock cost and usage will be recorded in each LOB’s AWS accounts. By adopting this decentralized model, LOBs retain control over their generative AI solutions through a decentralized configuration, while benefiting from the scalability, reliability, and security of Amazon Bedrock.
The following diagram shows the architecture of the decentralized operating model.
Centralized operating model
The centralized AWS account serves as the primary hub for configuring and managing the core generative AI functionalities, including reusable agents, prompt flows, and shared libraries. LOB teams contribute their business-specific requirements and use cases to the centralized team, which then integrates and orchestrates the appropriate generative AI components within the centralized account.
Although the orchestration and configuration of generative AI solutions reside in the centralized account, they often require interaction with LOB-specific resources and services. To facilitate this, the centralized account uses API gateways or other integration points provided by the LOBs’ AWS accounts. These integration points enable secure and controlled communication between the centralized generative AI orchestration and the LOBs’ business-specific applications, data sources, or services. This centralized operating model promotes consistency, governance, and scalability of generative AI solutions across the organization.
The centralized team maintains adherence to common standards, best practices, and organizational policies, while also enabling efficient sharing and reuse of generative AI components. Furthermore, the core components of Amazon Bedrock, such as LLMs and guardrails, continue to be hosted and executed by AWS in the Amazon Bedrock service account, promoting secure, scalable, and high-performance execution environments for these critical components. In this centralized model, monitoring and auditing of Amazon Bedrock can be achieved within the centralized account, allowing for comprehensive monitoring, auditing, and analysis of all generative AI activities and configurations. Amazon CloudWatch Logs provides a unified view of generative AI operations across the organization.
By consolidating the orchestration and configuration of generative AI solutions in a centralized account while enabling secure integration with LOB-specific resources, this operating model promotes standardization, governance, and centralized control over generative AI operations. It uses the scalability, reliability, security, and centralized monitoring capabilities of AWS managed infrastructure and services, while still allowing for integration with LOB-specific requirements and use cases.
The following is the architecture for a centralized operating model.
Federated operating model
In a federated model, Amazon Bedrock enables a collaborative approach where LOB teams can develop and contribute common generative AI functionalities within their respective AWS accounts. These common functionalities, such as reusable agents, prompt flows, or shared libraries, can then be migrated to a centralized AWS account managed by a dedicated team or CCoE.
The centralized AWS account acts as a hub for integrating and orchestrating these common generative AI components, providing a unified platform for action groups and prompt flows. Although the orchestration and configuration of generative AI solutions remain within the LOBs’ AWS accounts, they can use the centralized Amazon Bedrock agents, prompt flows, and other shared components defined in the centralized account.
This federated model allows LOBs to retain control over their generative AI solutions, tailoring them to specific business requirements while benefiting from the reusable and centrally managed components. The centralized account maintains consistency, governance, and scalability of these shared generative AI components, promoting collaboration and standardization across the organization.
Organizations frequently prefer storing sensitive data, including Payment Card Industry (PCI), PII, General Data Protection Regulation (GDPR), and Health Insurance Portability and Accountability Act (HIPAA) information, within their respective LOB AWS accounts. This approach makes sure that LOBs maintain control over their sensitive business data in the vector store while preventing centralized teams from accessing it without proper governance and security measures.
A federated model combines decentralized development, centralized integration, and centralized monitoring. This operating model fosters collaboration, reusability, and standardization while empowering LOBs to retain control over their generative AI solutions. It uses the scalability, reliability, security, and centralized monitoring capabilities of AWS managed infrastructure and services, promoting a harmonious balance between autonomy and governance.
The following is the architecture for a federated operating model.
Cost management
Organizations may want to analyze Amazon Bedrock usage and costs per LOB. To track the cost and usage of FMs across LOBs’ AWS accounts, solutions that record model invocations per LOB can be implemented.
Amazon Bedrock now supports model invocation resources that use inference profiles. Inference profiles can be defined to track Amazon Bedrock usage metrics, monitor model invocation requests, or route model invocation requests to multiple AWS Regions for increased throughput.
There are two types of inference profiles. Cross-Region inference profiles, which are predefined in Amazon Bedrock and include multiple AWS Regions to which requests for a model can be routed. The other is application inference profiles, which are user created to track cost and model usage when submitting on-demand model invocation requests. You can attach custom tags, such as cost allocation tags, to your application inference profiles. When submitting a prompt, you can include an inference profile ID or its Amazon Resource Name (ARN). This capability enables organizations to track and monitor costs for various LOBs, cost centers, or applications. For a detailed explanation of application inference profiles refer to this post: Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock.
Conclusion
Although enterprises often begin with a centralized operating model, the rapid pace of development in generative AI technologies, the need for agility, and the desire to quickly capture value often lead organizations to converge on a federated operating model.
In a federated operating model, lines of business have the freedom to innovate and experiment with generative AI solutions, taking advantage of their domain expertise and proximity to business problems. Key aspects of the AI workflow, such as data access policies, model risk management, and compliance monitoring, are managed by a central cloud governance team. Successful generative AI solutions developed by a line of business can be promoted and productionized by the central team for enterprise-wide re-use.
This federated model fosters innovation from the lines of business closest to domain problems. Simultaneously, it allows the central team to curate, harden, and scale those solutions adherent to organizational policies, then redeploy them efficiently to other relevant areas of the business.
To sustain this operating model, enterprises often establish a dedicated product team with a business owner that works in partnership with lines of business. This team is responsible for continually evolving the operating model, refactoring and enhancing the generative AI services to help meet the changing needs of the lines of business and keep up with the rapid advancements in LLMs and other generative AI technologies.
Federated operating models strike a balance, mitigating the risks of fully decentralized initiatives while minimizing bottlenecks from overly centralized approaches. By empowering business agility with curation by a central team, enterprises can accelerate compliant, high-quality generative AI capabilities aligned with their innovation goals, risk tolerances, and need for rapid value delivery in the evolving AI landscape.
As enterprises look to capitalize on the generative AI revolution, Amazon Bedrock provides the ideal foundation to establish a flexible operating model tailored to their organization’s needs. Whether you’re starting with a centralized, decentralized, or federated approach, AWS offers a comprehensive suite of services to support the full generative AI lifecycle.
Try Amazon Bedrock and let us know your feedback on how you’re planning to implement the operating model that suits your organization.
About the Authors
Martin Tunstall is a Principal Solutions Architect at AWS. With over three decades of experience in the finance sector, he helps global finance and insurance customers unlock the full potential of Amazon Web Services (AWS).
Yashar Araghi is a Senior Solutions Architect at AWS. He has over 20 years of experience designing and building infrastructure and application security solutions. He has worked with customers across various industries such as government, education, finance, energy, and utilities. In the last 6 years at AWS, Yashar has helped customers design, build, and operate their cloud solutions that are secure, reliable, performant and cost optimized in the AWS Cloud.