ARMADA: Augmented Reality for Robot Manipulation and Robot-Free Data Acquisition

Teleoperation for robot imitation learning is bottlenecked by hardware availability. Can high-quality robot data be collected without a physical robot? We present a system for augmenting Apple Vision Pro with real-time virtual robot feedback. By providing users with an intuitive understanding of how their actions translate to robot motions, we enable the collection of natural barehanded human data that is compatible with the limitations of physical robot hardware. We conducted a user study with 15 participants demonstrating 3 different tasks each under 3 different feedback conditions and…Apple Machine Learning Research

Tech Leader, AI Visionary, Endlessly Curious Jensen Huang to Keynote CES 2025

Tech Leader, AI Visionary, Endlessly Curious Jensen Huang to Keynote CES 2025

On Jan. 6 at 6:30 p.m. PT, NVIDIA founder and CEO Jensen Huang — with his trademark leather jacket and an unwavering vision — will step onto the CES 2025 stage.

From humble beginnings as a busboy at a Denny’s to founding NVIDIA, Huang’s story embodies innovation and perseverance.

Huang has been named the world’s best CEO by Fortune and The Economist, as well as one of TIME magazine’s 100 most influential people in the world.

Today, NVIDIA is a driving force behind breakthroughs in AI and accelerated computing, technologies transforming industries ranging from healthcare, to automotive and entertainment.

Across the globe, NVIDIA’s innovations enable advanced chatbots, robots, software-defined vehicles, sprawling virtual worlds, hypersynchronized factory floors and much more.

NVIDIA’s accelerated computing and AI platforms power hundreds of millions of computers, available from major cloud providers and server manufacturers.

They fuel 76% of the world’s fastest supercomputers on the TOP500 list and are supported by a thriving community of more than 5 million developers.

For decades, Huang has led NVIDIA through revolutions that ripple across industries.

GPUs redefined gaming as an art form, and NVIDIA’s AI tools empower labs, factory floors and Hollywood sets. From self-driving cars to automated industrial processes, these tools are foundational to the next generation of technological breakthroughs.

CES has long been the stage for the unveiling of technological advancements, and Huang’s keynote is no exception.

Since its inception in 1967, CES has unveiled iconic innovations, including transistor radios, VCRs and HDTVs.

Over the decades, CES has launched numerous NVIDIA flagship innovations, from a first look at NVIDIA SHIELD to NVIDIA DRIVE for autonomous vehicles.

NVIDIA at CES 2025

The keynote is just the beginning.

From Jan. 7-10, NVIDIA will host press, analysts, customers and partners at the Fontainebleau Resort Las Vegas.

The space will feature hands-on demos showcasing innovations in AI, robotics and accelerated computing across NVIDIA’s automotive, consumer, enterprise, Omniverse and robotics portfolios.

Meanwhile, NVIDIA’s technologies will take center stage on the CES show floor at the Las Vegas Convention Center, where partners will highlight AI-powered technologies, immersive gaming experiences and groundbreaking automotive advancements.

Attendees can also participate in NVIDIA’s “Explore to Win” program, an interactive scavenger hunt featuring missions, points and prizes.

Curious about the future? Tune in live on NVIDIA’s website or the company’s YouTube channels to witness how NVIDIA is shaping the future of technology.

Read More

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

We spoke with Dr. Swami Sivasubramanian, Vice President of Data and AI, shortly after AWS re:Invent 2024 to hear his impressions—and to get insights on how the latest AWS innovations help meet the real-world needs of customers as they build and scale transformative generative AI applications.

Q: What made this re:Invent different?

Swami Sivasubramanian: The theme I spoke about in my re:Invent keynote was simple but powerful—convergence. I believe that we’re at an inflection point unlike any other in the evolution of AI. We’re seeing a remarkable convergence of data, analytics, and generative AI. It’s a combination that enables next-level generative AI applications that are far more capable. And it lets our customers move faster in a really significant way, getting more value, more quickly. Companies like Rocket Mortgage are building on an AI-driven platform powered by Amazon Bedrock to create AI agents and automate tasks—working to give their employees access to generative AI with no-code tools. Canva uses AWS to power 1.2 million requests a day and sees 450 new designs created every second. There’s also a human side to convergence, as people across organizations are working together in new ways, requiring a deeper level of collaboration between groups, like science and engineering teams. And this isn’t just a one-time collaboration. It’s an ongoing process.

People’s expectations for applications and customer experiences are changing again with generative AI. Increasingly, I think generative AI inference is going to be a core building block for every application. To realize this future, organizations need more than just a chatbot or a single powerful large language model (LLM). At re:Invent, we made some exciting announcements about the future of generative AI, of course. But we also launched a remarkable portfolio of new products, capabilities, and features that will help our customers manage generative AI at scale—making it easier to control costs, build trust, increase productivity, and deliver ROI.

Q: Are there key innovations that build on the experience and lessons learned at Amazon in adopting generative AI? How are you bringing those capabilities to your customers

Swami Sivasubramanian: Yes, our announcement of Amazon Nova, a new generation of foundation models (FMs), has state-of-the-art intelligence across a wide range of tasks and industry-leading price performance. Amazon Nova models expand the growing selection of the broadest and most capable FMs in Amazon Bedrock for enterprise customers. The specific capabilities of Amazon Nova Micro, Lite, and Pro demonstrate exceptional intelligence, capabilities, and speed—and perform quite competitively against the best models in their respective categories. Amazon Nova Canvas, our state-of-the-art image generation model, creates professional grade images from text and image inputs, democratizing access to production-grade visual content for advertising, training, social media, and more. Finally, Amazon Nova Reel offers state-of-the-art video generation that allows customers to create high-quality video from text or images. With about 1,000 generative AI applications in motion inside Amazon, groups like Amazon Ads are using Amazon Nova to remove barriers for sellers and advertisers, enabling new levels of creativity and innovation. New capabilities like image and video generation are helping Amazon Ads customers promote more products in their catalogs, and experiment with new strategies like keyword-level creative to increase engagement and drive sales.

But there’s more ahead, and here’s where an important shift is happening. We’re working on an even more capable any-to-any model where you can provide text, images, audio, and video as input and the model can generate outputs in any of these modalities. And we think this multi-modal approach is how models are going to evolve, moving ahead where one model can accept any kind of input and generate any kind of output. Over time, I think this is what state-of-the-art models will look like.

Q: Speaking of announcements like Amazon Nova, you’ve been a key innovator in AI for many years. What continues to inspire you?

Swami Sivasubramanian: It’s fascinating to think about what LLMs are capable of. What inspires me most though is how can we help our customers unblock the challenges they are facing and realize that potential. Consider hallucinations. As highly capable as today’s models are, they still have a tendency to get things wrong occasionally. It’s a challenge that many of our customers struggle with when integrating generative AI into their businesses and moving to production. We explored the problem and asked ourselves if we could do more to help. We looked inward, and leveraged Automated Reasoning, an innovation that Amazon has been using as a behind-the-scenes technology in many of our services like identity and access management.

I like to think of this situation as yin and yang. Automated Reasoning is all about certainty and being able to mathematically prove that something is correct. Generative AI is all about creativity and open-ended responses. Though they might seem like opposites, they’re actually complementary—with Automated Reasoning completing and strengthening generative AI. We’ve found that Automated Reasoning works really well when you have a huge surface area of a problem, a corpus of knowledge about that problem area, and when it’s critical that you get the correct answer—which makes Automated Reasoning a good fit for addressing hallucinations.

At re:Invent, we announced Amazon Bedrock Guardrails Automated Reasoning checks—the first and only generative AI safeguard that helps prevent factual errors due to hallucinations. All by using logically accurate and verifiable reasoning that explains why generative AI responses are correct. I think that it’s an innovation that will have significant impact across organizations and industries, helping build trust and accelerate generative AI adoption.

Q: Controlling costs is important to all organizations, large and small, particularly as they take generative AI applications into production. How do the announcements at re:Invent answer this need?

Swami Sivasubramanian: Like our customers, here at Amazon we’re increasing our investment in generative AI development, with multiple projects in process—all requiring timely access to accelerated compute resources. But allocating optimal compute capacity to each project can create a supply/demand challenge. To address this challenge, we created an internal service that helped Amazon drive utilization of compute resources to more than 90% across all our projects. This service enabled us to smooth out demand across projects and achieve higher capacity utilization, speeding development.

As with Automated Reasoning, we realized that our customers would also benefit from these capabilities. So, at re:Invent, I announced the new task governance capability in Amazon SageMaker HyperPod, which helps our customers optimize compute resource utilization and reduce time to market by up to 40%. With this capability, users can dynamically run tasks across the end-to-end FM workflow— accelerating time to market for AI innovations while avoiding cost overruns due to underutilized compute resources.

Our customers also tell me that the trade-off between cost and accuracy for models is real. We’re answering this need by making it super-easy to evaluate models on Amazon Bedrock, so they don’t have to spend months researching and making comparisons. We’re also lowering costs with game-changing capabilities such Amazon Bedrock Model Distillation, which pairs models for lower costs; Amazon Bedrock Intelligent Prompt Routing, which manages prompts more efficiently, at scale; and prompt caching, which reduces repeated processing without compromising on accuracy.

Q: Higher productivity is one of the core promises of generative AI. How is AWS helping employees at all levels be more productive?

Swami Sivasubramanian: I like to point out that using generative AI becomes irresistible when it makes employees 10 times more productive. In short, not an incremental increase, but a major leap in productivity. And we’re helping employees get there. For example, Amazon Q Developer is transforming code development by taking care of the time-consuming chores that developers don’t want to deal with, like software upgrades. And it also helps them move much faster by automating code reviews and dealing with mainframe modernization. Consider Novacomp, a leading IT company in Latin America, which leveraged Amazon Q Developer to upgrade a project with over 10,000 lines of Java code in just 50 minutes, a task that would have typically taken an estimated 3 weeks. The company also simplified everyday tasks for developers, reducing its technical debt by 60% on average.

On the business side, Amazon Q Business is bridging the gap between unstructured and structured data, recognizing that most businesses need to draw from a mix of data. With Amazon Q in QuickSight, non-technical users can leverage natural language to build, discover, and share meaningful insights in seconds. Now they can access databases and data warehouses, as well as unstructured business data, like emails, reports, charts, graphs, and images.

And looking ahead, we announced advanced agentic capabilities for Amazon Q Business, coming in 2025, which will use agents to automate complex tasks that stretch across multiple teams and applications. Agents give generative AI applications next-level capabilities, and we’re bringing them to our customers via Amazon Q Business, as well as Amazon Bedrock multi-agent collaboration, which improves successful task completion by 40% over popular solutions. This major improvement translates to more accurate and human-like outcomes in use cases like automating customer support, analyzing financial data for risk management, or optimizing supply-chain logistics.

It’s all part of how we’re enabling greater productivity today, with even more on the horizon.

Q: To get employees and customers adopting generative AI and benefiting from that increased productivity, it has to be trusted. What steps is AWS taking to help build that trust?

Swami Sivasubramanian: I think that lack of trust is a big obstacle to moving from proof of concept to production. Business leaders are about to hit go and they hesitate because they don’t want to lose the trust of their customers. As generative AI continues to drive innovation across industries and our daily life, the need for responsible AI has become increasingly acute. And we’re helping meet that need with innovations like Amazon Bedrock Automated Reasoning, which I mentioned earlier, that works to prevent hallucinations—and increases trust. We also announced new LLM-as-a-judge capabilities with Amazon Bedrock Model Evaluation so you can now perform tests and evaluate other models with humanlike quality at a fraction of the cost and time of running human evaluations. These evaluations assess multiple quality dimensions, including correctness, helpfulness, and responsible AI criteria such as answer refusal and harmfulness.

I should also mention that AWS recently became the first major cloud provider to announce ISO/IEC 42001 accredited certification for AI services, covering Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. This international management system standard outlines requirements and controls for organizations to promote the responsible development and use of AI systems. Technical standards like ISO/IEC 42001 are significant because they provide a much-needed common framework for responsible AI development and deployment.

Q: Data remains central to building more personalized experiences applicable to your business. How do the re:Invent launches help AWS customers get their data ready for generative AI?

Swami Sivasubramanian: Generative AI isn’t going to be useful for organizations unless it can seamlessly access and deeply understand the organization’s data. With these insights, our customers can create customized experiences, such as highly personalized customer service agents that can help service representatives resolve issues faster. For AWS customers, getting data ready for generative AI isn’t just a technical challenge—it’s a strategic imperative. Proprietary, high-quality data is the key differentiator in transforming generic AI into powerful, business-specific applications. To prepare for this AI-driven future, we’re helping our customers build a robust, cloud-based data foundation, with built-in security and privacy. That’s the backbone of AI readiness.

With the next generation of Amazon SageMaker announced at re:Invent, we’re introducing an integrated experience to access, govern, and act on all your data by bringing together widely adopted AWS data, analytics, and AI capabilities. Collaborate and build faster from a unified studio using familiar AWS tools for model development, generative AI, data processing, and SQL analytics—with Amazon Q Developer assisting you along the way. Access all your data whether it’s stored in data lakes, data warehouses, third-party or federated data sources. And move with confidence and trust, thanks to built-in governance to address enterprise security needs.

At re:Invent, we also launched key Amazon Bedrock capabilities that help our customers maximize the value of their data. Amazon Bedrock Knowledge Bases now offers the only managed, out-of-the-box Retrieval Augmented Generation (RAG) solution, which enables our customers to natively query their structured data where it resides, accelerating development. Support for GraphRAG generates more relevant responses by modeling and storing relationships between data. And Amazon Bedrock Data Automation transforms unstructured, multimodal data into structured data for generative AI—automatically extracting, transforming, and generating usable data from multimodal content, at scale. These capabilities and more help our customers leverage their data to create powerful, insightful generative AI applications.

Q: What did you take away from your customer conversations at re:Invent?

Swami Sivasubramanian: I continue to be amazed and inspired by our customers and the important work they’re doing. We continue to offer our customers the choice and specialization they need to power their unique use cases. With Amazon Bedrock Marketplace, customers now have access to more than 100 popular, emerging, and specialized models.

At re:Invent, I heard a lot about the new efficiency and transformative experiences customers are creating. I also heard about innovations that are changing people’s lives. Like Exact Sciences, a molecular diagnostic company, which developed an AI-powered solution using Amazon Bedrock to accelerate genetic testing and analysis by 50%. Behind that metric there’s a real human value—enabling earlier cancer detection and personalized treatment planning. And that’s just one story among thousands, as our customers reach higher and build faster, achieving impressive results that change industries and improve lives.

I get excited when I think about how we can help educate the next wave of innovators building these experiences. With the launch of the new Education Equity Initiative, Amazon is committing up to $100 million in cloud technology and technical resources to help existing, dedicated learning organizations reach more learners by creating new and innovative digital learning solutions. That’s truly inspiring to me.

In fact, the pace of change, the remarkable innovations we introduced at re:Invent, and the enthusiasm of our customers all reminded me of the early days of AWS, when anything seemed possible. And now, it still is.


About the author

Swami Sivasubramanian is VP, AWS AI & Data. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Read More

Moving to GraphRAG 1.0 – Streamlining ergonomics for developers and users

Moving to GraphRAG 1.0 – Streamlining ergonomics for developers and users

GraphRAG blog hero - cluster of small circular nodes on a blue/green gradient background

Introducing GraphRAG 1.0

Microsoft debuted (opens in new tab) the pre-release version of GraphRAG (opens in new tab) in July 2024 to advance AI use in complex domains. Since that time, we’ve seen incredible adoption and community engagement (over 20k stars and 2k forks on GitHub as of this writing), with numerous fixes and improvements by the core team and community contributors. We’re deeply grateful for the contributions and feedback we’ve received and are excited to share a number of major ergonomic and structural improvements that culminate in the official release of GraphRAG 1.0. 

Ergonomic refactors

Easier setup for new projects

When we first launched GraphRAG, most config was done using environment variables, which could be daunting, given the many options available. We’ve reduced the friction on setup by adding an init command (opens in new tab) that generates a simplified starter settings.yml file with all core required config already set. We recommend developers start here to ensure they get the clearest initial config. With this update, a minimal starting config does not require the user to have expertise with GraphRAG for a quick setup, only an OpenAI API key in their environment. 

New and expanded command line interface

We expanded the functionality and ease of use of the command line interface (opens in new tab) (CLI) and adopted Typer (opens in new tab) to provide better inline documentation and a richer CLI experience. The original CLI was intended as a starter demo for users to try GraphRAG on a sample dataset. We’ve since learned from the community that most people actually want to use this as their primary interaction mode for GraphRAG, so as part of this milestone release, we’ve incorporated enhancements that result in a more streamlined experience. From this work, CLI startup times dropped from an average of 148 seconds to 2 seconds. 

Consolidated API layer

In August 2024 we introduced a standalone API layer to simplify developer usage. The original CLI contained all the code required to instantiate and execute basic indexing and query commands, which users often needed to replicate. The API layer is still considered provisional as we gather feedback, but is intended to be the primary entry point for developers who wish to integrate GraphRAG functionality into their own applications without deep pipeline or query class customization. In fact, the CLI and Accelerator (opens in new tab) are built entirely on top of the API layer, acting as a documented example of how to interact with the API. We have also added examples of how to use this API to our notebook collection (opens in new tab) that we will continue to update as we iterate in future releases. 

Simplified data model

GraphRAG creates several output artifacts to store the indexed knowledge model. The initial model contained a large number of files, fields, and cross-references based on experimental ideas during the early research, which can be overwhelming for both new and routine users. We performed a comprehensive review of the data model and incorporated fixes to add clarity and consistency, remove redundant or unused fields, improve storage space, and simplify the data models. Previously, the output lacked standardization, and relevant outputs could easily be confused with non-critical intermediary output files. Now with GraphRAG 1.0, the output will only include relevant outputs that are easily readable and traceable. 

Microsoft research podcast

Abstracts: August 15, 2024

Advanced AI may make it easier for bad actors to deceive others online. A multidisciplinary research team is exploring one solution: a credential that allows people to show they’re not bots without sharing identifying information. Shrey Jain and Zoë Hitzig explain.


Streamlined vector stores

Embeddings and their vector stores are some of the primary drivers of  GraphRAG’s storage needs. Our original data model stored all embeddings within the parquet output files after data ingestion and indexing. This made the files portable, which was convenient for early research, but for many users it became unnecessary as they configured their own vector stores and the scale of data ingestion grew. We have updated the GraphRAG pipeline to create a default vector store during indexing, so no post-processing is needed, and the query library shares this configuration for seamless use. The benefit of this change is that those vectors (which can be quite large) no longer need to be loaded when the output files are read from disk, saving read time and memory during every query. Coupled with the simplified data model, this resulted in output parquet disk savings of 80%, and total disk space (including embeddings in the vector store) reduction of 43%. GraphRAG supports LanceDB and Azure AI Search out-of-the-box for vector stores. For simple startup, LanceDB is used as the default, and is written to a local database alongside the knowledge model artifacts. 

Flatter, clearer code structure

A key initiative on the road to version 1.0 has been to simplify the codebase so it is easier to maintain and more approachable for third-party users. We’ve removed much of the code depth from the organization to make it easier to browse, and co-located more code that our own usage patterns indicate was not required to be in separate functional areas. 

We have also found that very few users need the declarative configuration that the underlying DataShaper (opens in new tab) engine provides, so we collapsed these 88 verbose workflow definitions into a smaller set of 11 workflows that operate in a functional versus composed manner. This makes the pipeline easier to understand and is a step toward an architecture that is better suited for our future research plans and improves performance across the board. By collapsing workflows, we now have fewer unused output artifacts, reduced data duplication, and fewer disk I/O operations. This streamlining has also reduced the in-memory footprint of the pipeline, enabling users to index and analyze larger datasets with GraphRAG.

Incremental ingest

Until now, an evolving dataset needed complete re-indexing every time new information was acquired in order to re-generate the knowledge model. In GraphRAG 1.0 we are including a new update command in the CLI that computes the deltas between an existing index and newly added content and intelligently merges the updates to minimize re-indexing. GraphRAG uses an LLM caching mechanism to save as much cost as possible when re-indexing, so re-runs over a dataset are often significantly faster and cheaper than an initial run. Adding brand new content can alter the community structure such that much of an index needs to be re-computed – the update command (opens in new tab) resolves this while also improving answer quality. 

Availability

GraphRAG version 1.0 is now available on GitHub (opens in new tab), and published to PyPI (opens in new tab). Check out the Getting Started (opens in new tab) guide to use GraphRAG 1.0 today. today. 

Migrating

We recommend users migrate to GraphRAG 1.0, which offers a streamlined experience including multiple improvements for both users and developers. However, because of the breadth of its updates, version 1.0 is not backwards compatible. If you’ve used GraphRAG prior to version 1.0 and have existing indexes, there are a handful of breaking changes that need to be addressed, but this should be a straightforward process. To support the community in this migration, we’ve created a migration guide (opens in new tab) in the repository with more information. 

Future directions

We recently posted about a brand-new approach to GraphRAG called LazyGraphRAG, which performs minimal up-front indexing to avoid LLM usage until user queries are executed. This avoids LLM-based summarization of large volumes of content that may not be interesting to users – and therefore never explored even after expensive processing. This approach shows strong performance at a fraction of the cost of GraphRAG, and will be added to the core GraphRAG codebase in the near future as a new option for users. 

Additionally, Microsoft has been active in exploring how GraphRAG can advance the rate of scientific progress, and is in the process of building relevant GraphRAG capabilities to align with our broader work in AI-enabled scientific discovery (opens in new tab).

We continue to refine the codebase and investigate architectural changes that will enable users to use their own language model APIs, storage providers, and vector stores. We’re excited about this major milestone, and the foundation that this refactoring lays for our continued research in the GraphRAG space.

The post Moving to GraphRAG 1.0 – Streamlining ergonomics for developers and users appeared first on Microsoft Research.

Read More

Economics Nobelist on causal inference

In a keynote address at the latest Amazon Machine Learning Conference, Amazon Visiting Academic, Stanford professor, and recent Nobel laureate Guido Imbens offered insights on the estimation of causal effects in “panel data” settings.Read More

Multi-tenant RAG with Amazon Bedrock Knowledge Bases

Multi-tenant RAG with Amazon Bedrock Knowledge Bases

Organizations are continuously seeking ways to use their proprietary knowledge and domain expertise to gain a competitive edge. With the advent of foundation models (FMs) and their remarkable natural language processing capabilities, a new opportunity has emerged to unlock the value of their data assets.

As organizations strive to deliver personalized experiences to customers using generative AI, it becomes paramount to specialize the behavior of FMs using their own—and their customers’—data. Retrieval Augmented Generation (RAG) has emerged as a simple yet effective approach to achieve a desired level of specialization.

Amazon Bedrock Knowledge Bases is a fully managed capability that simplifies the management of the entire RAG workflow, empowering organizations to give FMs and agents contextual information from company’s private data sources to deliver more relevant and accurate responses tailored to their specific needs.

For organizations developing multi-tenant products, such as independent software vendors (ISVs) creating software as a service (SaaS) offerings, the ability to personalize experiences for each of their customers (tenants in their SaaS application) is particularly significant. This personalization can be achieved by implementing a RAG approach that selectively uses tenant-specific data.

In this post, we discuss and provide examples of how to achieve personalization using Amazon Bedrock Knowledge Bases. We focus particularly on addressing the multi-tenancy challenges that ISVs face, including data isolation, security, tenant management, and cost management. We focus on scenarios where the RAG architecture is integrated into the ISV application and not directly exposed to tenants. Although the specific implementations presented in this post use Amazon OpenSearch Service as a vector database to store tenants’ data, the challenges and architecture solutions proposed can be extended and tailored to other vector store implementations.

Multi-Tenancy design considerations

When architecting a multi-tenanted RAG system, organizations need to take several considerations into account:

  • Tenant isolation – One crucial consideration in designing multi-tenanted systems is the level of isolation between the data and resources related to each tenant. These resources include data sources, ingestion pipelines, vector databases, and RAG client application. The level of isolation is typically governed by security, performance, and the scalability requirements of your solution, together with your regulatory requirements. For example, you may need to encrypt the data related to each of your tenants using a different encryption key. You may also need to make sure that high activity generated by one of the tenants doesn’t affect other tenants.
  • Tenant variability – A similar yet distinct consideration is the level of variability of the features provided to each tenant. In the context of RAG systems, tenants might have varying requirements for data ingestion frequency, document chunking strategy, or vector search configuration.
  • Tenant management simplicity – Multi-tenant solutions need a mechanism for onboarding and offboarding tenants. This dimension determines the degree of complexity for this process, which might involve provisioning or tearing down tenant-specific infrastructure, such as data sources, ingestion pipelines, vector databases, and RAG client applications. This process could also involve adding or deleting tenant-specific data in its data sources.
  • Cost-efficiency – The operating costs of a multi-tenant solution depend on the way it provides the isolation mechanism for tenants, so designing a cost-efficient architecture for the solution is crucial.

These four considerations need to be carefully balanced and weighted to suit the needs of the specific solution. In this post, we present a model to simplify the decision-making process. Using the core isolation concepts of silo, pool, and bridge defined in the SaaS Tenant Isolation Strategies whitepaper, we propose three patterns for implementing a multi-tenant RAG solution using Amazon Bedrock Knowledge Bases, Amazon Simple Storage Service (Amazon S3), and OpenSearch Service.

A typical RAG solution using Amazon Bedrock Knowledge Bases is composed of several components, as shown in the following figure:

A Typical RAG Solution Architecture

The main challenge in adapting this architecture for multi-tenancy is determining how to provide isolation between tenants for each of the components. We propose three prescriptive patterns that cater to different use cases and offer carrying levels of isolation, variability, management simplicity, and cost-efficiency. The following figure illustrates the trade-offs between these three architectural patterns in terms of achieving tenant isolation, variability, cost-efficiency, and ease of tenant management.

Trade offs of the three RAG architectural patterns

Multi-tenancy patterns

In this section, we describe the implementation of these three different multi-tenancy patterns in a RAG architecture based on Amazon Bedrock Knowledge Bases, discussing their use cases as well as their pros and cons.

Silo

The silo pattern, illustrated in the following figure, offers the highest level of tenant isolation, because the entire stack is deployed and managed independently for each single tenant.

Solution architecture for the Silo pattern

In the context of the RAG architecture implemented by Amazon Bedrock Knowledge Bases, this pattern prescribes the following:

  • A separate data source per tenant – In this post, we consider the scenario in which tenant documents to be vectorized are stored in Amazon S3, therefore a separate S3 bucket is provisioned per tenant. This allows for per-tenant AWS Key Management Service (AWS KMS) encryption keys, as well as per-tenant S3 lifecycle policies to manage object expiration, and object versioning policies to maintain multiple versions of objects. Having separate buckets per tenant provides isolation and allows for customized configurations based on tenant requirements.
  • A separate knowledge base per tenant – This allows for a separate chunking strategy per tenant, and it’s particularly useful if you envision the document basis of your tenants to be different in nature. For example, one of your tenants might have a document base composed of flat text documents, which can be treated with fixed-size chunking, whereas another tenant might have a document base with explicit sections, for which semantic chunking would be better suited to section. Having a different knowledge base per tenant also lets you decide on different embedding models, giving you the possibility to choose different vector dimensions, balancing accuracy, cost, and latency. You can choose a different KMS key per tenant for the transient data stores, which Amazon Bedrock uses for end-to-end per-tenant encryption. You can also choose per-tenant data deletion policies to control whether your vectors are deleted from the vector database when a knowledge base is deleted. Separate knowledge bases also mean that you can have different ingestion schedules per tenants, allowing you to agree to different data freshness standards with your customers.
  • A separate OpenSearch Serverless collection per tenant – Having a separate OpenSearch Serverless collection per tenant allows you to have a separate KMS encryption key per tenant, maintaining per-tenant end-to-end encryption. For each tenant-specific collection, you can create a separate vector index, therefore choosing for each tenant the distance metric between Euclidean and dot product, so that you can choose how much importance to give to the document length. You can also choose the specific settings for the HNSW algorithm per tenant to control memory consumption, cost, and indexing time. Each vector index, in conjunction with the setup of metadata mappings in your knowledge base, can have a different metadata set per tenant, which can be used to perform filtered searches. Metadata filtering can be used in the silo pattern to restrict the search to a subset of documents with a specific characteristic. For example, one of your tenants might be uploading dated documents and wants to filter documents pertaining to a specific year, whereas another tenant might be uploading documents coming from different company divisions and wants to filter over the documentation of a specific company division.

Because the silo pattern offers tenant architectural independence, onboarding and offboarding a tenant means creating and destroying the RAG stack for that tenant, composed of the S3 bucket, knowledge base, and OpenSearch Serverless collection. You would typically do this using infrastructure as code (IaC). Depending on your application architecture, you may also need to update the log sinks and monitoring systems for each tenant.

Although the silo pattern offers the highest level of tenant isolation, it is also the most expensive to implement, mainly due to creating a separate OpenSearch Serverless collection per tenant for the following reasons:

  • Minimum capacity charges – Each OpenSearch Serverless collection encrypted with a separate KMS key has a minimum of 2 OpenSearch Compute Units (OCUs) charged hourly. These OCUs are charged independently from usage, meaning that you will incur charges for dormant tenants if you choose to have a separate KMS encryption key per tenant.
  • Scalability overhead – Each collection separately scales OCUs depending on usage, in steps of 6 GB of memory, and associated vCPUs and fast access storage. This means that resources might not be fully and optimally utilized across tenants.

When choosing the silo pattern, note that a maximum of 100 knowledge bases are supported in each AWS account. This makes the silo pattern favorable for your largest tenants with specific isolation requirements. Having a separate knowledge base per tenant also reduces the impact of quotas on concurrent ingestion jobs (maximum one concurrent job per KB, five per account), job size (100 GB per job), and data sources (maximum of 5 million documents per data source). It also improves the performance fairness as perceived by your tenants.
Deleting a knowledge base during offboarding a tenant might be time-consuming, depending on the size of the data sources and the synchronization process. To mitigate this, you can set the data deletion policy in your tenants’ knowledge bases to RETAIN. This way, the knowledge base deletion process will not delete your tenants’ data from the OpenSearch Service index. You can delete the index by deleting the OpenSearch Serverless collection.

Pool

In contrast with the silo pattern, in the pool pattern, illustrated in the following figure, the whole end-to-end RAG architecture is shared by your tenants, making it particularly suitable to accommodate many small tenants.

Solution architecture for the pool pattern

The pool pattern prescribes the following:

  • Single data source – The tenants’ data is stored within the same S3 bucket. This implies that the pool model supports a shared KMS key for encryption at rest, not offering the possibility of per-tenant encryption keys. To identify tenant ownership downstream for each document uploaded to Amazon S3, a corresponding JSON metadata file has to be generated and uploaded. The metadata file generation process can be asynchronous, or even batched for multiple files, because Amazon Bedrock Knowledge Bases requires an explicit triggering of the ingestion job. The metadata file must use the same name as its associated source document file, with .metadata.json appended to the end of the file name, and must be stored in the same folder or location as the source file in the S3 bucket. The following code is an example of the format:
{
  "metadataAttributes" : {
    "tenantId" : "tenant_1",
  ...
  }
}

In the preceding JSON structure, the key tenantId has been deliberately chosen, and can be changed to a key you want to use to express tenancy. The tenancy field will be used at runtime to filter documents belonging to a specific tenant, therefore the filtering key at runtime must match the metadata key in the JSON used to index the documents. Additionally, you can include other metadata keys to perform further filtering that isn’t based on tenancy. If you don’t upload the object.metadata.json file, the client application won’t be able to find the document using metadata filtering.

  • Single knowledge base – A single knowledge base is created to handle the data ingestion for your tenants. This means that your tenants will share the same chunking strategy and embedding model, and share the same encryption at-rest KMS key. Moreover, because ingestion jobs are triggered per data source per KB, you will be restricted to offer to your tenants the same data freshness standards.
  • Single OpenSearch Serverless collection and index – Your tenant data is pooled in a single OpenSearch Service vector index, therefore your tenants share the same KMS encryption key for vector data, and the same HNSW parameters for indexing and query. Because tenant data isn’t physically segregated, it’s crucial that the query client be able to filter results for a single tenant. This can be efficiently achieved using either the Amazon Bedrock Knowledge Bases Retrieve or RetrieveAndGenerate, expressing the tenant filtering condition as part of the retrievalConfiguration (for more details, see Amazon Bedrock Knowledge Bases now supports metadata filtering to improve retrieval accuracy). If you want to restrict the vector search to return results for tenant_1, the following is an example client implementation performing RetrieveAndGenerate based on the AWS SDK for Python (Boto3):

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

tenant_filter = {
    "equals": {
        "key": "tenantId",
        "value": "tenant_1"
    }
}

retrievalConfiguration = {
    "vectorSearchConfiguration": {
        "filter": tenant_filter
    }
}

bedrock_agent_runtime.retrieve_and_generate(
    input = {
        'text': 'The original user query'
    },
    retrieveAndGenerateConfiguration = {
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': <YOUR_KNOWLEDGEBASE_ID>,
            'modelArn': <FM_ARN>,
            'retrievalConfiguration': retrievalConfiguration
        }
    }
)

text contains the original user query that needs to be answered. Taking into account the document base, <YOUR_KNOWLEDGEBASE_ID> needs to be substituted with the identifier of the knowledge base used to pool your tenants, and <FM_ARN> needs to be substituted with the Amazon Bedrock model Amazon Resource Name (ARN) you want to use to reply to the user query. The client presented in the preceding code has been streamlined to present the tenant filtering functionality. In a production case, we recommend implementing session and error handling, logging and retry logic, and separating the tenant filtering logic from the client invocation to make it inaccessible to developers.

Because the end-to-end architecture is pooled in this pattern, onboarding and offboarding a tenant doesn’t require you to create new physical or logical constructs, and it’s as simple as starting or stopping and uploading specific tenant documents to Amazon S3. This implies that there is no AWS managed API that can be used to offboard and end-to-end forget a specific tenant. To delete the historical documents belonging to a specific tenant, you can just delete the relevant objects in Amazon S3. Typically, customers will have an external application that maintains the list of available tenants and their status, facilitating the onboarding and offboarding process.

Sharing the monitoring system and logging capabilities in this pattern reduces the complexity of operations with a large number of tenants. However, it requires you to collect the tenant-specific metrics from the client side to perform specific tenant attribution.

The pool pattern optimizes the end-to-end cost of your RAG architecture, because sharing OCUs across tenants maximizes the use of each OCU and minimizes the tenants’ idle time. Sharing the same pool of OCUs across tenants means that this pattern doesn’t offer performance isolation at the vector store level, so the largest and most active tenants might impact the experience of other tenants.

When choosing the pool pattern for your RAG architecture, you should be aware that a single ingestion job can ingest or delete a maximum of 100 GB. Additionally, the data source can have a maximum of 5 million documents. If the solution has many tenants that are geographically distributed, consider triggering the ingestion job multiple times a day so you don’t hit the ingestion job size limit. Also, depending on the number and size of your documents to be synchronized, the time for ingestion will be determined by the embedding model invocation rate. For example, consider the following scenario:

  • Number of tenants to be synchronized = 10
  • Average number of documents per tenant = 100
  • Average size per document = 2 MB, containing roughly 200,000 tokens divided in 220 chunks of 1,000 tokens to allow for overlap
  • Using Amazon Titan Embeddings v2 on demand, allowing for 2,000 RPM and 300,000 TPM

This would result in the following:

  • Total embeddings requests = 10*100*220 = 220,000
  • Total tokens to process = 10*100*1,000=1,000,000
  • Total time taken to embed is dominated by the RPM, therefore 220,000/2,000 = 1 hour, 50 minutes

This means you could trigger an ingestion job 12 times per day to have a good time distribution of data to be ingested. This calculation is a best-case scenario and doesn’t account for the latency introduced by the FM when creating the vector from the chunk. If you expect having to synchronize a large number of tenants at the same time, consider using provisioned throughput to decrease the time it takes to create vector embeddings. This approach will also help distribute the load on the embedding models, limiting throttling of the Amazon Bedrock runtime API calls.

Bridge

The bridge pattern, illustrated in the following figure, strikes a balance between the silo and pool patterns, offering a middle ground that balances tenant data isolation and security.

Solution architecture for the bridge pattern

The bridge pattern delivers the following characteristics:

  • Separate data source per tenant in a common S3 bucket – Tenant data is stored in the same S3 bucket, but prefixed by a tenant identifier. Although having a different prefix per tenant doesn’t offer the possibility of using per-tenant encryption keys, it does create a logical separation that can be used to segregate data downstream in the knowledge bases.
  • Separate knowledge base per tenant – This pattern prescribes creating a separate knowledge base per tenant similar to the silo pattern. Therefore, the considerations in the silo pattern apply. Applications built using the bridge pattern usually share query clients across tenants, so they need to identify the specific tenant’s knowledge base to query. They can identify the knowledge base by storing the tenant-to-knowledge base mapping in an external database, which manages tenant-specific configurations. The following example shows how to store this tenant-specific information in an Amazon DynamoDB table:
    import boto3
    # Create a DynamoDB resource
    dynamodb = boto3.resource('dynamodb')
    
    table_name = 'tenantKbConfig'
    attribute_definitions = [
        {'AttributeName': 'tenantId', 'AttributeType': 'S'}
    ]
    
    key_schema = [
        {'AttributeName': 'tenantId', 'KeyType': 'HASH'}
    ]
    
    #Create the table holding KB tenant configurations
    tenant_kb_config_table = dynamodb.create_table(
        TableName=table_name,
        AttributeDefinitions=attribute_definitions,
        KeySchema=key_schema,
        BillingMode='PAY_PER_REQUEST' # Use on-demand billing mode for illustration
    )
    
    #Create a tenant
        tenant_kb_config_table.put_item(
        Item={
            'tenantId': 'tenant_1',
            'knowledgebaseId': <YOUR_KNOWLEDGEBASE_ID>,
            'modelArn': <FM_ARN>     }
    )

    In a production setting, your application will store tenant-specific parameters belonging to other functionality in your data stores. Depending on your application architecture, you might choose to store knowledgebaseId and modelARN alongside the other tenant-specific parameters, or create a separate data store (for example, the tenantKbConfig table) specifically for your RAG architecture.

    This mapping can then be used by the client application by invoking the RetrieveAndGenerate API. The following is an example implementation:

    import json
    import boto3
    
    # Create a DynamoDB resource
    dynamodb = boto3.resource('dynamodb')
    
    # Create a Bedrock Runtime client
    bedrock_runtime = boto3.client('bedrock-agent-runtime')
    
    # Define the table name
    table_name = 'tenantKbConfig'
    
    # Define function returning tenant config
    def get_tenant_config(tenant_id):
        table = dynamodb.Table(table_name)
        response = table.get_item(
            Key = {
                'tenantId': tenant_id
            }
        )
    if 'Item' in response:
        return { 'knowledgebaseId':response['Item'].get('knowledgebaseId'), 'modelArn': response['Item'].get('modelArn')}
    else:
        return None
    
    # Retrieve the tenant configurations from DynamoDB
    
    tenant_config = get_tenant_config('tenant_1')
    
    #Invoke the Retrieve and Generate API
    bedrock_runtime.retrieve_and_generate(
        input = {
            'text': 'What type of info do your documents contain?'
        },
        retrieveAndGenerateConfiguration = {
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': tenant_config['knowledgebaseId'],
                'modelArn': tenant_config['modelArn']
            }
        }
    )

  • Separate OpenSearch Service index per tenant – You store data within the same OpenSearch Serverless collection, but you create a vector index per tenant. This implies your tenants share the same KMS encryption key and the same pool of OCUs, optimizing the OpenSearch Service resources usage for indexing and querying. The separation in vector indexes gives you the flexibility of choosing different HNSM parameters per tenant, letting you tailor the performance of your k-NN indexing and querying for your different tenants.

The bridge pattern supports up to 100 tenants, and onboarding and offboarding a tenant requires the creation and deletion of a knowledge base and OpenSearch Service vector index. To delete the data pertaining to a particular tenant, you can delete the created resources and use the tenant-specific prefix as a logical parameter in your Amazon S3 API calls. Unlike the silo pattern, the bridge pattern doesn’t allow for per-tenant end-to-end encryption; it offers the same level of tenant customization offered by the silo pattern while optimizing costs.

Summary of differences

The following figure and table provide a consolidated view for comparing the characteristics of the different multi-tenant RAG architecture patterns. This comprehensive overview highlights the key attributes and trade-offs associated with the pool, bridge, and silo patterns, enabling informed decision-making based on specific requirements.

The following figure illustrates the mapping of design characteristics to components of the RAG architecture.

The following table summarizes the characteristics of the multi-tenant RAG architecture patterns.

Characteristic Attribute of  Pool Bridge Silo
Per-tenant chunking strategy Amazon Bedrock Knowledge Base Data Source No Yes Yes
Customer managed key for encryption of transient data and at rest Amazon Bedrock Knowledge Base Data Source No No Yes
Per-tenant distance measure Amazon OpenSearch Service Index No Yes Yes
Per-tenant ANN index configuration Amazon OpenSearch Service Index No Yes Yes
Per-tenant data deletion policies Amazon Bedrock Knowledge Base Data Source No Yes Yes
Per-tenant vector size Amazon Bedrock Knowledge Base Data Source No Yes Yes
Tenant performance isolation Vector database No No Yes
Tenant onboarding and offboarding complexity Overall solution Simplest, requires management of new tenants in existing infrastructure Medium, requires minimal management of end-to-end infrastructure Hardest, requires management of end-to-end infrastructure
Query client implementation Original Data Source Medium, requires dynamic filtering Hardest, requires external tenant mapping table Simplest, same as single-tenant implementation
Amazon S3 tenant management complexity Amazon S3 buckets and objects Hardest, need to maintain tenant specific metadata files for each object Medium, each tenant needs a different S3 path Simplest, each tenant requires a different S3 bucket
Cost Vector database Lowest Medium Highest
Per-tenant FM used to create vector embeddings Amazon Bedrock Knowledge Base No Yes Yes

Conclusion

This post explored three distinct patterns for implementing a multi-tenant RAG architecture using Amazon Bedrock Knowledge Bases and OpenSearch Service. The silo, pool, and bridge patterns offer varying levels of tenant isolation, variability, management simplicity, and cost-efficiency, catering to different use cases and requirements. By understanding the trade-offs and considerations associated with each pattern, organizations can make informed decisions and choose the approach that best aligns with their needs.

Get started with Amazon Bedrock Knowledge Bases today.


About the Authors

Emanuele Levi is a Solutions Architect in the Enterprise Software and SaaS team, based in London. Emanuele helps UK customers on their journey to refactor monolithic applications into modern microservices SaaS architectures. Emanuele is mainly interested in event-driven patterns and designs, especially when applied to analytics and AI, where he has expertise in the fraud-detection industry.

Mehran Nikoo is a Generative AI Go-To-Market Specialist at AWS. He leads the generative AI go-to-market strategy for UK and Ireland.

Dani Mitchell is a Generative AI Specialist Solutions Architect at AWS. He is focused on computer vision use case and helps AWS customers in EMEA accelerate their machine learning and generative AI journeys with Amazon SageMaker and Amazon Bedrock.

Read More