Taking AI to Warp Speed: Decoding How NVIDIA’s Latest RTX-Powered Tools and Apps Help Developers Accelerate AI on PCs and Workstations

Taking AI to Warp Speed: Decoding How NVIDIA’s Latest RTX-Powered Tools and Apps Help Developers Accelerate AI on PCs and Workstations

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC users.

NVIDIA is spotlighting the latest NVIDIA RTX-powered tools and apps at SIGGRAPH, an annual trade show at the intersection of graphics and AI.

These AI technologies provide advanced ray-tracing and rendering techniques, enabling highly realistic graphics and immersive experiences in gaming, virtual reality, animation and cinematic special effects. RTX AI PCs and workstations are helping drive the future of interactive digital media, content creation, productivity and development.

ACE’s AI Magic

During a SIGGRAPH fireside chat, NVIDIA founder and CEO Jensen Huang introduced “James” — an interactive digital human built on NVIDIA NIM microservices — that showcases the potential of AI-driven customer interactions.

Using NVIDIA ACE technology and based on a customer-service workflow, James is a virtual assistant that can connect with people using emotions, humor and contextually accurate responses. Soon, users will be able to interact with James in real time at ai.nvidia.com.

James is a virtual assistant in NVIDIA ACE.

NVIDIA also introduced the latest advancements in the NVIDIA Maxine AI platform for telepresence, as well as companies adopting NVIDIA ACE, a suite of technologies for bringing digital humans to life with generative AI. These technologies enable digital human development with AI models for speech and translation, vision, intelligence, realistic animation and behavior, and lifelike appearance.

Maxine features two AI technologies that enhance the digital human experience in telepresence scenarios: Maxine 3D and Audio2Face-2D.

Developers can harness Maxine and ACE technologies to drive more engaging and natural interactions for people using digital interfaces across customer service, gaming and other interactive experiences.

Tapping advanced AI, NVIDIA ACE technologies allow developers to design avatars that can respond to users in real time with lifelike animations, speech and emotions. RTX GPUs provide the necessary computational power and graphical fidelity to render ACE avatars with stunning detail and fluidity.

With ongoing advancements and increasing adoption, ACE is setting new benchmarks for building virtual worlds and sparking innovation across industries. Developers tapping into the power of ACE with RTX GPUs can build more immersive applications and advanced, AI-based, interactive digital media experiences.

RTX Updates Unleash AI-rtistry for Creators

NVIDIA GeForce RTX PCs and NVIDIA RTX workstations are getting an upgrade with GPU accelerations that provide users with enhanced AI content-creation experiences.

For video editors, RTX Video HDR is now available through Wondershare Filmora and DaVinci Resolve. With this technology, users can transform any content into high dynamic range video with richer colors and greater detail in light and dark scenes — making it ideal for gaming videos, travel vlogs or event filmmaking. Combining RTX Video HDR with RTX Video Super Resolution further improves visual quality by removing encoding artifacts and enhancing details.

RTX Video HDR requires an RTX GPU connected to an HDR10-compatible monitor or TV. Users with an RTX GPU-powered PC can send files to the Filmora desktop app and continue to edit with local RTX acceleration, doubling the speed of the export process with dual encoders on GeForce RTX 4070 Ti or above GPUs. Popular media player VLC in June added support for RTX Video Super Resolution and RTX Video HDR, adding AI-enhanced video playback.

Read this blog on RTX-powered video editing and the RTX Video FAQ for more information. Learn more about Wondershare Filmora’s AI-powered features.

In addition, 3D artists are gaining more AI applications and tools that simplify and enhance workflows, including Replikant, Adobe, Topaz and Getty Images.

Replikant, an AI-assisted 3D animation platform, is integrating NVIDIA Audio2Face, an ACE technology, to enable improved lip sync and facial animation. By taking advantage of NVIDIA-accelerated generative models, users can enjoy real-time visuals enhanced by RTX and NVIDIA DLSS technology. Replikant is now available on Steam.

Adobe Substance 3D Modeler has added Search Asset Library by Shape, an AI-powered feature designed to streamline the replacement and enhancement of complex shapes using existing 3D models. This new capability significantly accelerates prototyping and enhances design workflows.

New AI features in Adobe Substance 3D integrate advanced generative AI capabilities, enhancing its texturing and material-creation tools. Adobe has launched the first integration of its Firefly generative AI capabilities into Substance 3D Sampler and Stager, making 3D workflows more seamless and productive for industrial designers, game developers and visual effects professionals.

For tasks like text-to-texture generation and prompt descriptions, Substance 3D users can generate photorealistic or stylized textures. These textures can then be applied directly to 3D models. The new Text to Texture and Generative Background features significantly accelerate traditionally time-consuming and intricate 3D texturing and staging tasks.

Powered by NVIDIA RTX Tensor Cores, Substance 3D can significantly accelerate computations and allows for more intuitive and creative design processes. This development builds on Adobe’s innovation with Firefly-powered Creative Cloud upgrades in Substance 3D workflows.

Topaz AI has added NVIDIA TensorRT acceleration for multi-GPU workflows, enabling parallelization across multiple GPUs for supercharged rendering speeds — up to 2x faster with two GPUs over a single GPU system, and scaling further with additional GPUs.

Getty Images has updated its Generative AI by iStock service with new features to enhance image generation and quality. Powered by NVIDIA Edify models, the latest enhancement delivers generation speeds set to reach around six seconds for four images, doubling the performance of the previous model, with speeds at the forefront of the industry. The improved Text-2-Image and Image-2-Image functionalities provide higher-quality results and greater adherence to user prompts.

Generative AI by iStock users can now also designate camera settings such as focal length (narrow, standard or wide) and depth of field (near or far). Improvements to generative AI super-resolution enhances image quality by using AI to create new pixels, significantly improving resolution without over-sharpening the image.

LLM-azing AI

ChatRTX — a tech demo that connects a large language model (LLM), like Meta’s Llama, to a user’s data for quickly querying notes, documents or images — is getting a user interface (UI) makeover, offering a cleaner, more polished experience.

ChatRTX also serves as an open-source reference project that shows developers how to build powerful, local, retrieval-augmented applications (RAG) applications accelerated by RTX.

ChatRTX is getting a interface (UI) makeover.

The latest version of ChatRTX, released today, uses the Electron + Material UI framework, which lets developers more easily add their own UI elements or extend the technology’s functionality. The update also includes a new architecture that simplifies the integration of different UIs and streamlines the building of new chat and RAG applications on top of the ChatRTX backend application programming interface.

End users can download the latest version of ChatRTX from the ChatRTX web page. Developers can find the source code for the new release on the ChatRTX GitHub repository.

Meta Llama 3.1-8B models are now optimized for inference on NVIDIA GeForce RTX PCs and NVIDIA RTX workstations. These models are natively supported with NVIDIA TensorRT-LLM, open-source software that accelerates LLM inference performance.

Dell’s AI Chatbots: Harnessing RTX Rocket Fuel

Dell is presenting how enterprises can boost AI development with an optimized RAG chatbot using NVIDIA AI Workbench and an NVIDIA NIM microservice for Llama 3. Using the NVIDIA AI Workbench Hybrid RAG Project, Dell is demonstrating how the chatbot can be used to converse with enterprise data that’s embedded in a local vector database, with inference running in one of three ways:

  • Locally on a Hugging Face TGI server
  • In the cloud using NVIDIA inference endpoints
  • On self-hosted NVIDIA NIM microservices

Learn more about the AI Workbench Hybrid RAG Project. SIGGRAPH attendees can experience this technology firsthand at Dell Technologies’ booth 301.

HP AI Studio: Innovate Faster With CUDA-X and Galileo

At SIGGRAPH, HP is presenting the Z by HP AI Studio, a centralized data science platform. Announced in October 2023, AI Studio has now been enhanced with the latest NVIDIA CUDA-X libraries as well as HP’s recent partnership with Galileo, a generative AI trust-layer company. Key benefits include:

  • Deploy projects faster: Configure, connect and share local and remote projects quickly.
  • Collaborate with ease: Access and share data, templates and experiments effortlessly.
  • Work your way: Choose where to work on your data, easily switching between online and offline modes.

Designed to enhance productivity and streamline AI development, AI Studio allows data science teams to focus on innovation. Visit HP’s booth 501 to see how AI Studio with RAPIDS cuDF can boost data preprocessing to accelerate AI pipelines. Apply for early access to AI Studio.

An RTX Speed Surge for Stable Diffusion

Stable Diffusion 3.0, the latest model from Stability AI, has been optimized with TensorRT to provide a 60% speedup.

A NIM microservice for Stable Diffusion 3 with optimized performance is available for preview on ai.nvidia.com.

There’s still time to join NVIDIA at SIGGRAPH to see how RTX AI is transforming the future of content creation and visual media experiences. The conference runs through Aug. 1.

Generative AI is transforming graphics and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

Configure Amazon Q Business with AWS IAM Identity Center trusted identity propagation

Configure Amazon Q Business with AWS IAM Identity Center trusted identity propagation

Amazon Q Business is a fully managed, permission aware generative artificial intelligence (AI)-powered assistant built with enterprise grade security and privacy features. Amazon Q Business can be configured to answer questions, provide summaries, generate content, and securely complete tasks based on your enterprise data. The native data source connectors provided by Amazon Q Business can seamlessly integrate and index content from multiple repositories into a unified index. Amazon Q Business uses AWS IAM Identity Center to record the workforce users you assign access to and their attributes, such as group associations. IAM Identity Center is used by many AWS managed applications such as Amazon Q. You connect your existing source of identities to Identity Center once and can then assign users to any of these AWS services. Because Identity Center serves as their common reference of your users and groups, these AWS applications can give your users a consistent experience as they navigate AWS. For example, it enables user subscription management across Amazon Q offerings and consolidates Amazon Q billing from across multiple AWS accounts. Additionally, Q Business conversation APIs employ a layer of privacy protection by leveraging trusted identity propagation enabled by IAM Identity Center.

Amazon Q Business comes with rich API support to perform administrative tasks or to build an AI-assistant with customized user experience for your enterprise. With administrative APIs you can automate creating Q Business applications, set up data source connectors, build custom document enrichment, and configure guardrails. With conversation APIs, you can chat and manage conversations with Q Business AI assistant. Trusted identity propagation provides authorization based on user context, which enhances the privacy controls of Amazon Q Business.

In this blog post, you will learn what trusted identity propagation is and why to use it, how to automate configuration of a trusted token issuer in AWS IAM Identity Center with provided AWS CloudFormation templates, and what APIs to invoke from your application facilitate calling Amazon Q Business identity-aware conversation APIs.

Why use trusted identity propagation?

Trusted identity propagation provides a mechanism that enables applications that authenticate outside of AWS to make requests on behalf of their users with the use of a trusted token issuer. Consider a client-server application that uses an external identity provider (IdP) to authenticate a user to provide access to an AWS resource that’s private to the user. For example, your web application might use Okta as an external IdP to authenticate a user to view their private conversations from Q Business. In this scenario, Q Business is unable to use the identity token generated by the third party provider to provide direct access to the user’s private data since there is no mechanism to trust the identity token issued by the third party.

To solve this, you can use IAM Identity Center to get the user identity from your external IdP into an AWS Identity and Access Management (IAM) role session which allows you to authorize requests based on the human, their attributes, and their group memberships, rather than set up fine-grained permissions in an IAM policy. You can exchange the token issued by the external IdP for a token generated by Identity Center. The token generated by Identity Center refers to the corresponding Identity Center user. The web application can now use the new token to initiate a request to Q Business for the private chat conversation. That token refers to the corresponding user in Identity Center, Q Business can authorize the requested access to the private conversation based on the user or their group membership as represented in Identity Center.

Some of the benefits of using trusted identity propagation are:

  • Prevents user impersonation and protects against unauthorized access to user private data by spoofing user identity.
  • Facilitates auditability and fosters responsible use of resources as Q Business automatically logs API invocations to AWS CloudTrail along with user identifier.
  • Promotes software design principles rooted in user privacy.

Overview of trusted identity propagation deployment

The following figure is a model of a client-server architecture for trusted identity propagation.

To understand how your application can be integrated with IAM Identity Center for trusted identity propagation, consider the model client-server web application shown in the preceding figure. In this model architecture, the web browser represents the user interface to your application. This could be a web page rendered on a web browser, Slack, Microsoft Teams, or other applications. The application server might be a web server running on Amazon Elastic Container Service (Amazon ECS), or a Slack or Microsoft Teams gateway implemented with AWS Lambda. Identity Center itself might be deployed on a delegated admin account or Identity Center (the Identity Account in the preceding figure), or could be deployed in the same AWS account (the Application Account in the preceding figure) where the application server is deployed along with Amazon Q Business. Finally, you have an OAuth 2.0 OpenID Connect (OIDC) external IdP such as Okta, Ping One, Microsoft Entra ID, or Amazon Cognito for authenticating and authorizing.

Deployment of trusted identity propagation involves five steps. As a best practice, we recommend that the security owner manages IAM Identity Center updates and the application owner manages application updates, providing clear separation of duties. The security owner is responsible for administering the Identity Center of an organization or account. The application owner is responsible for creating an application on AWS.

  1. The security owner adds the external OIDC IdP’s issuer URL to the IAM Identity Center instance’s trusted token issuer. It’s important that the issuer URL matches the iss claim attribute present in the JSON Web Token (JWT) identity token generated by the IdP after user authentication. This is configured once for a given issuer URL.
  2. The security owner creates a customer managed identity provider application in IAM Identity Center and explicitly configures the specific audience for a given trusted token issuer is being authorized to perform token exchange using Identity Center. Because there could be more than one application (or audience) for which the external IdP could be authenticating users, explicitly specifying an audience helps prevent an unauthorized applications from using the token exchange process. It’s important the audience ID matches the aud claim attribute present in the JWT identity token generated by the IdP after user authentication.
  3. The security owner edits the application policy for the customer managed identity provider application created in the previous step to add or update the IAM execution role used by the application server or AWS Lambda. This helps prevent any unapproved users or applications from invoking the CreateTokenWithIAM API in Identity Center to initiate the token exchange.
  4. The application owner creates and adds an IAM policy to the application execution role to allow the application to invoke a CreateTokenWithIAM API on Identity Center to perform a token exchange and to create temporary credentials using AWS Security Token Service (AWS STS) .
  5. The application owner creates an IAM role with a policy allowing access to the Q Business Conversation API for use with STS to create a temporary credential to invoke Q Business APIs.

You can use AWS CloudFormation templates, discussed later in this blog, to automate the preceding deployment steps. See the IAM Identity Center documentation for detailed step-by-step instructions on setting up trusted identity propagation. You can also use the AWS Command Line Interface (AWS CLI) setup process in Making authenticated Amazon Q Business API calls using IAM Identity Center.

Important: Choosing to add a trusted token issuer is a security decision that requires careful consideration. Only choose trusted token issuers that you trust to perform the following tasks:

  • Authenticate the user who is specified in the token. Control the audience claim, a claim you configure as the user identifier.
  • Generate a token that IAM Identity Center can exchange for an Identity Center-created token. Control the Identity Center customer managed application policy to add only IAM users, roles, and execution roles that can perform the exchange.

Authorization flow

For a typical web application, the trusted identity propagation process will involve five steps as shown in the following flow diagram.

  1. Sign-in and obtain an authorization code from the IdP.
  2. Use the authorization code and client secret to retrieve the ID token from the IdP.
  3. Exchange the IdP generated JWT ID token with the IAM Identity Center token that includes the AWS STS context identity.
  4. Use the STS context identity to obtain temporary access credentials from AWS STS.
  5. Use temporary access credentials to access Q Business APIs.

An end-to-end implementation of the identity propagation is available for reference in <project_home>/webapp/main.py of AWS Samples – main.py.

Sample JWT tokens

In the preceding authorization flow, one of the key steps is step 3, where the JWT ID token from the OAuth IdP is exchanged with IAM Identity Center for an AWS identity-aware JWT token. Key attributes of the respective JWT tokens are explored in the next section. An understanding of the tokens will help with troubleshooting authorization flow errors.

OpenID Connect JWT ID token

A sample JWT ID token generated by an OIDC OAuth IdP is shown in the following code sample. OIDC’s ID tokens take the form of a JWT, which is a JSON payload that’s signed with the private key of the issuer and can be parsed and verified by the application. In contrast to access tokens, ID tokens are intended to be understood by the OAuth client and include a handful of defined property names that provide information to the application. Important properties include aud, email, iss, and jti, which are used by IAM Identity Center to validate the token issuer, match the user directory, and issue a new Identity Center token. The following code sample shows a JWT identity token issued by an OIDC external IdP (such as Okta).

{
    'amr': ['pwd'],
    'at_hash': '3fMsKeFGoem************',
    'aud': '0oae4epmqqa************',
    'auth_time': 1715792363,
    'email': 'john_doe@******.com',
    'exp': 1715795964,
    'iat': 1715792364,
    'idp': '00oe36vc7kj7************',
    'iss': 'https://*******.okta.com/oauth2/default',
    'jti': 'ID.7l6jFX3KO9M7***********************',
    'name': 'John Doe',
    'nonce': 'SampleNonce',
    'preferred_username': 'john_doe@******.com',
    'sub': '00ue36ou4gCv************',
    'ver': 1
}

IAM Identity Center JWT token with identity context

A sample JWT token generated by CreateTokenWithIAM is shown in the following code sample. This token includes a property called sts:identity_context which allows you to create an identity-enhanced IAM role session using an AWS STS AssumeRole API. The enhanced STS session allows the receiving AWS service to authorize the IAM Identity Center user to perform an action and log the user identity to CloudTrail for auditing.

{
    'act':{
        'sub': 'arn:aws:sso::*********:trustedTokenIssuer/ssoins-*********/74******-7***-7***-d***-fd9*********'
    },
    'aud': 'BTHY************-c9Ed3V************',
    'auth_time': '2024-05-15T16:59:27Z',
    'aws:application_arn': 'arn:aws:sso::************:application/ssoins-************/apl-************',
    'aws:credential_id': 'AAAAAGZE9_8Y******_Zj******',
    'aws:identity_store_arn': 'arn:aws:identitystore::************:identitystore/d-**********',
    'aws:identity_store_id': 'd-**********',
    'aws:instance_account': '************',
    'aws:instance_arn': 'arn:aws:sso:::instance/ssoins-************',
    'exp': 1715795967,
    'iat': 1715792367,
    'iss': 'https://identitycenter.amazonaws.com/ssoins-************',
    'sts:audit_context': 'AQoJb3Jp*********************************Bg==',
    'sts:identity_context': 'AQoJb3Jp********************************************gY=',
    'sub': '34******-d***-7***-b***-e2*********'
}

Automate configuration of a trusted token issuer using AWS CloudFormation

A broad range of possibilities exists to integrate your application with Amazon Q Business using IAM Identity Center and your enterprise IdP. For all integration projects, Identity Center needs to be configured to use a trusted token issuer. The sample CloudFormation templates discussed in this post focuses on helping you automate the core trusted token issuer setup. If you’re new to Amazon Q Business and don’t have all the inputs required to deploy the CloudFormation template, the prerequisites section includes links to resources that can help you get started. You can also follow a tutorial on Configuring sample web application with Okta included in the accompanying AWS Samples repository.

Note: The full source code of the solution using AWS CloudFormation templates and sample web application is available in AWS Samples Repository.

Prerequisites and considerations

Template for configuring AWS IAM Identity Center by a security owner

A security owner can use this CloudFormation template to automate configuration of the trusted token issuer in your IAM Identity Center. Deploy this stack in the AWS account where your Identity Center instance is located. This could be in the same AWS account where your application is deployed as a standalone or account instance, or can be in a delegated admin account managed as part of AWS Organizations.

  1. To launch the stack, choose:
    Launch Stack

You can download the latest version of the CloudFormation template from AWS Samples – TTI CFN.

The following figure shows the stack input for the template

  1. The stack creation requires four parameters:
  • AuthorizedAudiences: The authorized audience is an auto generated UUID by a third-party IdP service or a pseudo-ID configured by the administrator of the third-party IdP to uniquely identify the client (your application) for which the ID token is generated. The value must match the aud attribute value included in the JWT ID token generated by the third-party identity provider.
  • ClientAppExecutionArn: The Amazon Resource Name (ARN) of the IAM user, group or execution role that’s used to run your application, which will invoke Identity Center for token exchange and AWS STS service for generating temporary credentials. For example, this could be the execution role ARN of the Lambda function where your code is run.
  • IDCInstanceArn: The instance ARN of the IAM Identity Center instance used by your application.
  • TokenIssuerUrl: The URL of the trusted token issuer. The trusted token issuer is a third-party identity provider that will authenticate a user and generate an ID token for authorization purposes. The token URL must match the iss attribute value included in the JWT ID token generated by the third-party identity provider.

The following figure shows the output of the CloudFormation stack to configure a trusted token issuer with IAM Identity Center

The stack creation produces the following output:

  • IDCApiAppArn: The ARN for the IAM Identity Center custom application auth provider. You will use this application to call the Identity Center CreateTokenWithIAM API to exchange the third-party JWT ID token with the Identity Center token.

Validate the configuration

  1. From the AWS Management Console where your IAM Identity Center instance is located, go to the AWS IAM Identity Center console to verify if the trusted token issuer is configured properly.
  2. From the left navigation pane, choose Applications and choose the Customer Managed tab to see a list of applications as shown in the following figure. The newly created customer managed IdP application will be the same as the CloudFormation stack name. Choose application name to open the application configuration page.
  3. On your application configuration page, as shown in the following figure, verify the following:
    1. User and group assignments are set to Do not require assignments.
    2. Trusted applications for identity propagation lists Amazon Q and includes the application scope qbusiness:conversations:access.
    3. Authentication with the trusted token issuer is set to configured.
  4. Next, to verify trusted token issuer configuration, choose Actions on the top right of the page and select Edit configurations from the drop-down menu.
  5. At the bottom of the page, expand Authentication with trusted token issuer and verify:
  6. That your Issuer URL is selected by default and is listed under .
  7. The audience ID (Aud claim) is configured properly for the issuer URL, as shown in the following figure. Next expand Application credentials to verify if your application execution IAM role is listed.

Depending on your IAM Identity Center instance type, you might not be able to access the console customer managed applications page. In such cases, you can use the AWS CLI or SDK to view the configuration. Here is a list of useful AWS CLI commands: list-applicationslist-application-access-scopesget-application-assignment-configurationdescribe-trusted-token-issuer, and list-application-grants.

Template for configuring your application by the application owner

To propagate user identities, your application will need to:

  • Invoke the IAM Identity Center instance to exchange a third-party JWT ID token and obtain an Identity Center ID token
  • Invoke AWS STS to generate a temporary credential with an IAM assumed role.

The application owner can use a CloudFormation template to generate the required IAM policy, which can be attached to your application execution role and the assumed role with the required Q Business chat API privileges for use with AWS STS to generate temporary credentials.

Remember to include the add-on policy generated to your application’s IAM execution role to allow the applications to invoke Identity Center and AWS STS APIs.

  1. To launch the stack, choose:
    Launch Stack

You can download the latest version of the CloudFormation template from AWS Samples – App Roles CFN.

The following figure shows the CloudFormation stack configuration to install IAM roles and policies required for the application to propagate identities

  1. The stack creation takes four parameters, as shown in the preceding figure:
  • ClientAppExecutionArn: The ARN of an IAM user, group, or execution role that is used to run your application and will invoke IAM Identity Center for token exchange and AWS STS for generating temporary credentials. For example, this could be the execution role ARN of Lambda where your code is run.
  • IDCApiAppArn: ARN for the IAM Identity Center custom application auth provider. This will be created as part of the trusted token issuer configuration.
  • KMSKeyId: [Optional] The AWS Key Management Server (AWS KMS) ID, if the Q Business Application is encrypted with a customer managed encryption key.
  • QBApplicationID: Q Business application ID, which your application will use to invoke chat APIs. The STS assume role will be restricted to this application ID.

The following figure shows the output of the CloudFormation stack to install IAM roles and policies required for the application to propagate identities.

The stack creation produces the following outputs:

  • ClientAppExecutionAddOnPolicyArn: This is a customer managed IAM policy created with the required permissions for your application to invoke the IAM Identity Center CreateTokenWithIAM API and call the STS AssumeRole API to generate temporary credentials to call Q Business chat APIs. You can include this policy in your application IAM execution role to allow access for the APIs.
  • QBusinessSTSAssumeRoleArn: This IAM role will include the necessary permissions to call Q Business chat APIs, for use with the STS AssumeRole API call.

Validate the configuration

  1. From the AWS account where your application is deployed, open the AWS IAM console, verify if the IAM role for STS AssumeRole and the user managed IAM policy for the application execution role are created.
    • To verify if the IAM Role for STS AssumeRole, obtain the role name QBusinessSTSAssumeRoleArn stack output value, choose theRoles link on the left panel of the IAM console and use the search bar to enter the role name and shown in the following figure.
  2. Choose the link to the role to open the role and expand the inline policy to review the permissions, as shown in the following figure.
  3. To verify if the IAM policy for add-on to an application execution role is created, obtain the IAM policy name from the ClientAppExecutionAddOnPolicyArn stack output value, go the Policies in the IAM console, and search for the policy, as shown in the following figure.
  4. Choose the link to the policy name to open the policy and review the permissions, as shown in the following figure.

Update the application for invoking the Q Business API with identity propagation

Most web applications using OAuth 2.0 with an IdP will have implemented a sign-in mechanism and invoke the IdPs ID endpoint to retrieve a JWT ID token. However, before invoking Amazon Q Business APIs that require identity propagation, your application needs to be updated to include calls to CreateTokenWithIAM and AssumeRole to facilitate trusted token propagation.

The CreateTokenWithIAM API enables exchanging the JWT ID token received from the OIDC IdP with an IAM identity Center generated JWT token. The newly generated token is then passed on to AssumeRole API to create an identity aware temporary security credentials that you can use to access AWS resources.

Note: Remember to add permissions to your IAM role and user policy to allow invoking these APIs. Alternatively, you can attach the sample policy referenced by ClientAppExecutionAddOnPolicyArn that was created by the CloudFormation template for configuring your application.

A sample access helper method using  get_oidc_id_tokenget_idc_sts_id_context, or get_sts_credential is available in <project_home>/src/qbapi_tools/access_helpers.py  (AWS Samples – access_helpers.py). An end-to-end sample implementation of the complete sequence of steps as depicted in the end-to-end authentication sequence is provided in <project_home>/webapp/main.py (AWS Samples – main.py).

Restrictions and limitations

Below are some common limitations and restrictions that you may encounter while configuring trusted token propagation along with recommendations on how to mitigate them.

Group membership propagation

Enterprises typically manage group membership in their external IdP. However, when using trusted token propagation, the web identity token generated by the external IdP is exchanged with an ID token generated by IAM Identity Center. Thus, when invoking the Q Business API from an STS session enhanced with Identity Center identity context, only the group membership information available for the user in Identity Center is passed to the Q Business API, not the group membership from the external IdP. To mitigate this issue, it’s recommended that all relevant users and groups are synchronized to Identity Center from the external IdP using System for Cross-domain Identity Management (SCIM). For more information, see automatic provisioning (synchronization) of users and groups.

Caching credentials to prevent invalid grant types

You can use a web identity token only once with the CreateTokenWithIAM API. This is to prevent token replay attacks, where an attacker can intercept a JWT and reuse it multiple times, allowing them to bypass authentication and authorization controls. Because it isn’t practical to generate a new ID token for every Q Business API, it’s recommended that the temporary credentials generated by a Q Business API session using AWS STS AssumeRole is cached and reused for subsequent API calls.

Clean up

To avoid incurring additional charges, make sure you delete any resources created in this post.

  1. Follow the instructions in Deleting a stack on the AWS CloudFormation console to delete any CloudFormation stacks created using templates provided in this post.
  2. If you enabled an IAM Identity Center instance, follow the instructions to delete your IAM Identity Center instance.
  3. Ensure you unregister or delete any IdP services such as Okta, Entra ID, Ping Identity, or Amazon Cognito that you have created for this post.
  4. Finally, delete any sample code repositories you have cloned or downloaded, and any associated resources deployed as part of setting up the environment for running the samples in the code repository.

Conclusion

Trusted identity propagation is an important mechanism for securely integrating Amazon Q Business APIs into enterprise applications that use external IdPs. By implementing trusted identity propagation with AWS IAM Identity Center, organizations can confidently build AI-powered applications and tools using Amazon Q Business APIs, knowing that user identities are properly verified and protected throughout the process. This approach allows enterprises to harness the full potential of generative AI while maintaining the highest standards of security and privacy. To get started with Amazon Q Business, explore the Getting started guide. To learn more about how trusted token propagation works, see How to develop a user-facing data application with IAM Identity Center and S3 Access Grants.


About the Author

Rajesh Kumar Ravi is a Senior Solutions Architect at Amazon Web Services specializing in building generative AI solutions with Amazon Q Business, Amazon Bedrock, and Amazon Kendra. He is an accomplished technology leader with experience in building innovative AI products, nurturing the builder community, and contributes to the development of new ideas. He enjoys walking and loves to go on short hiking trips outside of work.

Read More

Enhance your media search experience using Amazon Q Business and Amazon Transcribe

Enhance your media search experience using Amazon Q Business and Amazon Transcribe

In today’s digital landscape, the demand for audio and video content is skyrocketing. Organizations are increasingly using media to engage with their audiences in innovative ways. From product documentation in video format to podcasts replacing traditional blog posts, content creators are exploring diverse channels to reach a wider audience. The rise of virtual workplaces has also led to a surge in content captured through recorded meetings, calls, and voicemails. Additionally, contact centers generate a wealth of media content, including support calls, screen-share recordings, and post-call surveys.

We are excited to introduce Mediasearch Q Business, an open source solution powered by Amazon Q Business and Amazon Transcribe. Mediasearch Q Business builds on the Mediasearch solution powered by Amazon Kendra and enhances the search experience using Amazon Q Business. Mediasearch Q Business supercharges the way you consume media files by using them as part of the knowledge base used by Amazon Q Business to generate reliable answers to user questions. The solution also features an enhanced Amazon Q Business query application that allows users to play the relevant section of the original media files or YouTube videos directly from the search results page, providing a seamless and intuitive user experience.

Solution overview

Mediasearch Q Business is straightforward to install and try out.

The solution has two components, as illustrated in the following diagram:

  • A Mediasearch indexer that transcribes media files (audio and video) on an Amazon Simple Storage Service (Amazon S3) bucket or media from a YouTube playlist and ingests the transcriptions into either an Amazon Q Business native index (configured as part of the Amazon Q Business application) or an Amazon Kendra
  • A Mediasearch finder, which provides a UI and makes API calls to the Amazon Q Business service APIs on behalf of the logged-in user. The response from API calls are displayed to the end-user.

The Mediasearch indexer finds and transcribes audio and video files stored in an S3 bucket. The indexer can also index YouTube videos from a YouTube playlist as audio files and transcribe these audio files. It prepares the transcriptions by embedding time markers at the start of each sentence, and it indexes each prepared transcription in an Amazon Q Business native retriever or an Amazon Kendra retriever. The indexer runs the first time when you install it, and subsequently runs on an interval that you specify, maintaining the index to reflect any new, modified, or deleted files.

The Mediasearch finder is a web search client that you use to search for content in your Amazon Q Business application. Additionally, the Mediasearch finder includes in-line embedded media players in the search result, so you can see the relevant section of the transcript, and play the corresponding section from the original media (audio files and video files in your media bucket or a YouTube video) without navigating away from the search page.

In the sections that follow, we discuss the following topics:

  • How to deploy the solution to your AWS account
  • How to use it to index and search sample media files
  • How to use the solution with your own media files
  • How the solution works
  • The estimated costs involved
  • How to monitor usage and troubleshoot problems
  • Options to customize and tune the solution
  • How to uninstall and clean up when you’re done experimenting

Prerequisites

Make sure you have the following:

Deploy the Mediasearch Q Business solution

In this section, we walk through deploying the two solution components: the indexer and the finder. We use a CloudFormation stack to deploy the necessary resources in the us-east-1 AWS Region.

If you’re deploying the solution to another Region, follow the instructions in the README available in the Mediasearch Q Business GitHub repository.

Deploy the Mediasearch Q Business indexer component

To deploy the indexer component, complete the following steps:

  1. Choose Launch Stack.
  2. In the Identity center ARN and Retriever selection section, for IdentityCenterInstanceArn, enter the ARN for your IAM Identity Center instance.

You can find the ARN on the Settings page of the IAM Identity Center console. The ARN is a required field.

  1. Use default values for all other parameters. We will customize these values later to suit your specific requirements.
  2. Acknowledge that the stack might create IAM resources with custom names, then choose Create stack.

The indexer stack takes around 10 minutes to deploy. Wait for the indexer to finish deploying before you deploy the Mediasearch Q Business finder.

Deploy the Mediasearch Q Business finder component

The Mediasearch finder uses Amazon Cognito to authenticate users to the solution. For an authenticated user to interact with an Amazon Q Business application, you must configure an IAM Identity Center customer managed application that either supports SAML 2.0 or OAuth 2.0.

In this post, we create a customer managed application that supports OAuth 2.0, a secure way for applications to communicate and share user data without exposing passwords. We use a technique called trusted identity propagation, which allows the Mediasearch Q Business finder app to access the Amazon Q service securely without sharing passwords between the two identity providers (Amazon Cognito and IAM Identity Center in our example).

Instead of sharing passwords, trusted identity propagation uses tokens. Tokens are like digital certificates that prove who the user is and what they’re allowed to do. AWS managed applications that work with trusted identity propagation get tokens directly from IAM Identity Center. IAM Identity Center can also exchange identity tokens and access tokens from external authorization servers like Amazon Cognito. This lets an application authenticate users and obtain tokens outside of AWS (like with Amazon Cognito, Microsoft Entra ID, or Okta), exchange that token for an IAM Identity Center token, and then use the new token to request access to AWS services like Amazon Q Business.

For more information, see Using trusted identity propagation with customer managed applications.

When the IAM Identity Center instance is in the same account where you are deploying the Mediasearch Q Business solution, the finder stack allows you to automatically create the IAM Identity Center customer managed application as part of the stack deployment.

If you use the organization instance of IAM Identity Center enabled in your management account, then you will be deploying the Mediasearch Q Business finder stack in a different AWS account. In this case, follow the steps in the README to create an IAM Identity Center application manually.

To deploy the finder component and create the IAM Identity Center customer managed application, complete the following steps:

  1. Choose Launch Stack.
  2. For IdentityCenterInstanceArn, enter the ARN for the IAM Identity Center instance. This is the same value you used while deploying the indexer stack.
  3. For CreateIdentityCenterApplication, choose Yes to create the IAM Identity Center application for the Mediasearch finder application.
  4. Under Mediasearch Indexer parameters, enter the Amazon Q Business application ID that was created by the indexer stack. You can copy this from the QBusinessApplicationId output of the indexer stack.
  5. Select the retriever type that was used to deploy the Mediasearch indexer. (If you deployed an Amazon Kendra index, then select Kendra, otherwise select Native.
  6. If you selected Kendra, enter the Amazon Kendra index ID that was used by the indexer stack.
  7. For MediaBucketNames, use the MediaBucketsUsed output from the indexer CloudFormation stack to allow the search page to access media files across YTMediaBucket and Mediabucket.
  8. Acknowledge that the stack might create IAM resources with custom names, then choose Create stack.

Configure user access to Amazon Q Business

To access the Mediasearch Q Business solution, add a user with an appropriate subscription to the Amazon Q Business application and to the IAM Identity Center customer managed application.

Add a user to the Amazon Q Business application

To start using the Amazon Q Business application, you can add users or groups to the Amazon Q Business application from your IAM Identity Center instance. Complete the following steps to add a user to the application:

  1. Access the Amazon Q Business application by choosing the link for QBusinessApplication in the indexer CloudFormation stack outputs.
  2. Under Groups and users, on the Users tab, choose Manage access and subscription.
  3. Choose Add groups and users.
  4. Choose Add existing users and groups.
  5. Search for an existing user, choose the user, and choose Assign.
  6. Select the added user and on the Change subscription menu, choose Update subscription tier.
  7. Select the appropriate subscription tier and choose Confirm.

For details of each Amazon Q subscription, refer to Amazon Q Business pricing.

Assign users to the IAM Identity Center customer managed application

Now you can assign users or groups to the IAM Identity Center customer managed application. Complete the following steps to add a user:

  1. From the outputs section of the finder CloudFormation stack, choose the URL for IdentityCenterApplicationConsoleURLto navigate to the customer managed application.
  1. Choose Assign users and groups.
  1. Select users and choose Assign users.

This concludes the user access configuration to the Mediasearch Q Business solution.

Test with the sample media files

When the Mediasearch indexer and finder stack are deployed, the indexer should have completed processing the audio (mp3) files for the YouTube videos and sample media files (selected AWS Podcast episodes and AWS Knowledge Center videos). You can now run your first Mediasearch query.

  1. To log in to the Mediasearch finder application, choose the URL for MediasearchFinderURL in the stack outputs.

The Mediasearch finder application in your browser will show a splash page for Amazon Q Business.

  1. Choose Get Started to access the Amazon Cognito page.

To access Mediasearch Q Business, you need to log in to the application using a user ID in the Amazon Cognito user pool created by the finder stack. The email address in Amazon Cognito must match the email address for the user in IAM Identity Center. Alternatively, the Mediasearch solution allows you to create a user through the application.

  1. On the Create Account tab, enter your email (which matches the email address in IAM Identity Center), followed by a password and password confirmation, and choose Create Account.

Amazon Cognito will send an email with a confirmation code for email verification.

  1. Enter this confirmation code to complete your email verification.
  1. After email verification, you will now be able to log in to the Mediasearch Q Business application.
  2. After you’re logged in, in the Enter a prompt box, write a query, such as “What is AWS Fargate?”

The query returns a response from Amazon Q Business based on the media (sample media files and YouTube audio sources) ingested into the index.


The response includes citations, with reference to sources. Users can verify their answer from Amazon Q Business by playing media files from their S3 buckets or YouTube starting at the time marker where the relevant information is found.

  1. Use the embedded video player to play the original video inline. Observe that the media playback starts at the relevant section of the video based on the time marker.
  2. To play the video full screen in a new browser tab, use the Full screen menu option in the player, or choose the media file hyperlink shown above the answer text.
  3. Choose (right-click) the video file hyperlink, copy the URL, and enter it into a text editor.

If the media is an audio file for a YouTube video, it looks something like the following:

https://www.youtube.com/watch?v=unFVfqj9cQ8&t=36.58s

If the media file is a non-YouTube audio file that resides in MediaBucket, the URL looks like the following:

https://mediasearchtest.s3.amazonaws.com/mediasamples/What_is_an_Interface_VPC_Endpoint_and_how_can_I_create_Interface_Endpoint_for_my_VPC_.mp4?AWSAccessKeyId=ASIAXMBGHMGZLSYWJHGD&Expires=1625526197&Signature=BYeOXOzT585ntoXLDoftkfS4dBU%3D&x-amz-security-token=.... #t=253.52

This is a presigned S3 URL that provides your browser with temporary read access to the media file referenced in the search result. Using presigned URLs means you don’t need to provide permanent public access to all of your indexed media files.

  1. Experiment with additional queries, such as “How has AWS helped customers in building MLOps platform?” or “How can I use Generative AI to improve customer experience?” or try your own questions.

Index and search your own media files

To index media files stored in your own S3 bucket, replace the MediaBucket and MediaFolderPrefix parameters with your own bucket name and prefix when you install or update the indexer component stack, and modify the MediaBucketName parameter with your own bucket name when you install or update the finder component stack. Additionally, you can replace the YouTube playlist (PlayListURL) with your own playlist URL and update the indexer stack.

  1. When creating a new MediaSearch indexer stack, you can choose to use either a native retriever or an Amazon Kendra retriever. You can make this selection using the parameter RetrieverType. When using the Amazon Kendra retriever, you can either let indexer stack create an Amazon Kendra index or use an existing Amazon Kendra IndexId to add files stored in the new location. To deploy a new indexer, follow the steps from earlier in this post, but replace the defaults to specify the media bucket name and prefix for your own media files or replace the YouTube playlist URL with your own playlist URL. Make sure that you comply with the YouTube Terms of Service.
  2. Alternatively, update an existing MediaSearch indexer stack to replace the previously indexed files with files from the new location or update the YouTube playlist URL or the number of videos to download from the playlist:
    1. Select the stack on the AWS CloudFormation console, choose Update, then Use current template, then Next.
    2. Modify the media bucket name and prefix parameter values as needed.
    3. Modify the YouTube Playlist URL and Number of YouTube Videos values as needed.
    4. Choose Next twice, select the acknowledgement check box, and choose Update stack.
  3. Update an existing MediaSearch finder stack to change bucket names or add additional bucket names to the MediaBucketNames

When the MediaSearch indexer stack is successfully created or updated, the indexer automatically finds, transcribes, and indexes the media files stored in your S3 bucket. When it’s complete, you can submit queries and find answers from the audio tracks of your own audio and video files.

You have the option to provide metadata for any or all of your media files. Use metadata to assign values to index attributes for sorting, filtering, and faceting your search results, or to specify access control lists to govern access to the files. Metadata files can be in the same S3 folder as your media files (default), or in a parallel folder structure specified by the optional indexer parameter MetadataFolderPrefix. For more information about how to create metadata files, see Amazon S3 document metadata.

You can also provide customized transcription options for any or all of your media files. This allows you to take full advantage of Amazon Transcribe features such as custom vocabularies, automatic content redaction, and custom language models.

How the Mediasearch solution works

Let’s take a quick look at how the solution works, as illustrated in the following diagram.

The Mediasearch solution has an event-driven serverless computing architecture with the following steps:

  1. You provide an S3 bucket containing the audio and video files you want to index and search. This is also known as the MediaBucket. Leave this blank if you don’t want to index media from your MediaBucket.
  2. You also provide your YouTube playlist URL and the number of videos to index from the YouTube playlist. Make sure that you comply with the YouTube Terms of Service. The YTIndexer will index the latest files from the YouTube playlist. For example, if the number of videos is set to 5, then the YTIndexer will index the five latest videos in the playlist. Any YouTube video indexed prior is ignored from being indexed.
  3. An AWS Lambda function fetches the YouTube videos from the playlist as audio (mp3 files) into the YTMediaBucket and also creates a metadata file in the MetadataFolderPrefix location with metadata for the YouTube video. The YouTube videoid along with the related metadata are recorded in an Amazon DynamoDB table (YTMediaDDBQueueTable).
  4. Amazon EventBridge generates events on a repeating interval (every 2 hours, every 6 hours, and so on) These events invoke the Lambda function S3CrawlLambdaFunction.
  5. An AWS Lambda function is invoked initially when the CloudFormation stack is first deployed, and then subsequently by the scheduled events from EventBridge. The S3CrawlLambdaFunction function crawls through the MediaBucket and the YTMediabucket and starts an Amazon Q Business index (or Amazon Kendra) data source sync job. The Lambda function lists all the supported media files (FLAC, MP3, MP4, Ogg, WebM, AMR, or WAV) and associated metadata and transcribe options stored in the user provided S3 bucket.
  6. Each new file is added to another DynamoDB tracking table and submitted to be transcribed by an Amazon Transcribe job. Any file that has been previously transcribed is submitted for transcription again only if it has been modified since it was previously transcribed, or if associated Amazon Transcribe options have been updated. The DynamoDB table is updated to reflect the transcription status and last modified timestamp of each file. Any tracked files that no longer exist in the S3 bucket are removed from the DynamoDB table and from the Amazon Q Business index (or Amazon Kendra index). If no new or updated files are discovered, the Amazon Q Business index (or Amazon Kendra) data source sync job is immediately stopped. The DynamoDB table holds a record for each media file with attributes to track transcription job names and status, and last modified timestamps.
  7. As each Amazon Transcribe job completes, EventBridge generates a job complete event, which invokes another Lambda function (S3JobCompletionLambdaFunction).
  8. The Lambda function processes the transcription job output, generating a modified transcription that has a time marker inserted at the start of each sentence. This modified transcription is indexed in Amazon Q Business (or Amazon Kendra), and the job status for the file is updated in the DynamoDB table. When the last file has been transcribed and indexed, the Amazon Q Business (or Amazon Kendra) data source sync job is stopped.
  9. The index is populated and kept in sync with the transcriptions of all the media files in the S3 bucket monitored by the Mediasearch indexer component, integrated with any additional content from any other provisioned data sources. The media transcriptions are used by the Amazon Q Business application, which allows users to find content and answers to their questions.
  10. The sample finder client application enhances users’ search experience by embedding an inline media player with each source or citation that is based on a transcribed media file. The client uses the time markers embedded in the transcript to start media playback at the relevant section of the original media file.
  11. An Amazon Cognito user pool is used to authenticate users and is configured to exchange tokens from IAM Identity Center to support Amazon Q Business service calls.

Estimated costs

In addition to Amazon S3 costs associated with storing your media, the Mediasearch solution incurs usage costs from the Amazon Q, Amazon Kendra (if using an Amazon Kendra index), Amazon Transcribe, and Amazon API Gateway. Additional minor costs are incurred by the other services mentioned after free tier allowances have been used. For more information, see the pricing pages for Amazon Q Business, Amazon Kendra, Amazon Transcribe, Lambda, DynamoDB, and EventBridge.

Monitor and troubleshoot

To see the details of each media file transcript job, navigate to the Transcription jobs page on the Amazon Transcribe console.

Each media file is transcribed only one time, unless the file is modified. Modified files are re-transcribed and re-indexed to reflect the changes.

Choose any transcription job to review the transcription and examine additional job details.

You can check the status of the data source sync by navigating to the Amazon Q Business application deployed by the indexer stack (choose the link on the indexer stack outputs page for QApplication). In the data source section, choose the custom data source and view the status of the sync job.

On the DynamoDB console, choose Tables in the navigation pane. Use your MediaSearch stack name as a filter to display the MediaSearch DynamoDB tables, and examine the items showing each indexed media file and corresponding status.

The table MediaSearch-Indexer-YTMediaDDBQueueTable has one record for each YouTube videoid that is downloaded as an audio (mp3) file along with the metadata for the video like author, view count, video title, and so on.

The table MediaSearch-Indexer-MediaDynamoTable has one record for each media file (including YouTube videos), and contains attributes with information about the file and its processing status.

On the Functions page of the Lambda console, use your indexer stack name as a filter to list the Lambda functions that are part of the solution:

  • The YouTubeVideoIndexer function indexes and downloads YouTube videos if the CloudFormation stack parameter PlayListURL is set to a valid YouTube playlist
  • The S3CrawlLambdaFunction function crawls the YTMediaBucket and the MediaBucket for media files and initiates the transcription jobs for the media files

When the transcription job is complete, a completion event invokes the S3JobCompletionLambdaFunction function, which ingests the transcription into the Amazon Q Business index (or Amazon Kendra index) with any related metadata.

Choose any of the functions to examine the function details, including environment variables, source code, and more. Choose Monitor and View logs in CloudWatch to examine the output of each function invocation and troubleshoot any issues.

On the Functions page of the Lambda console, use your finder stack name as a filter to list the Lambda functions that are part of the solution:

  • The BuildTriggerLambda function runs the build of the finder AWS Amplify application after cloning the AWS CodeCommit repository with the finder ReactJS code.
  • The IDCTokenCreateLambda function uses the authorization header that contains a JWT token from a successful authentication with Amazon Cognito to exchange bearer tokens from IAM Identity Center.
  • The IDCAppCreateLambda function creates an OAuth 2.0 IAM Identity Center application to exchange tokens from IAM Identity Center and a trusted token issuer for the Amazon Cognito user pool.
  • The UserConversationLambda function is called from API Gateway to list or delete Amazon Q Business conversations.
  • The UserPromptsLambda function is called from API Gateway to call the chat_sync API of the Amazon Q Business service.
  • The PreSignedURLCreateLambda function is called from API Gateway to create a presigned URL for S3 buckets. The presigned URL is used to play the media files residing on the Mediabucket that serves as the source for an Amazon Q Business response.

Choose any of the functions to examine the function details, including environment variables, source code, and more. Choose Monitor and View logs in CloudWatch to examine the output of each function invocation and troubleshoot any issues.

Customize and enhance the solution

You can fork the MediaSearch Q Business GitHub repository, enhance the code, and send us pull requests so we can incorporate and share your improvements.

The following are a few suggestions for features you might want to implement:

  • Enhance the indexer stack to allow the existing Amazon Q Business application IDs to be used
  • Extend your search sources to include other video streaming platforms relevant to your organization
  • Build Amazon CloudWatch metrics and dashboards to improve the manageability of MediaSearch

Clean up

When you’re finished experimenting with this solution, clean up your resources by using the AWS CloudFormation console to delete the indexer and finder stacks that you deployed. This deletes all the resources that were created by deploying the solution.

Preexisting Amazon Q Business applications or indexes or IAM Identity Center applications or trusted token issuers that were created manually aren’t deleted.

Conclusion

The combination of Amazon Q Business and Amazon Transcribe enables a scalable, cost-effective solution to surface insights from your media files. You can use the content of your media files to find accurate answers to your users’ questions, whether they’re from text documents or media files, and consume them in their native format. This solution enhances the overall experience of the previous Mediasearch solution by using the powerful generative artificial intelligence (AI) capabilities of Amazon Q Business.

The sample MediaSearch Q Business solution is provided as open source—use it as a starting point for your own solution, and help us make it better by contributing back fixes and features through GitHub pull requests. For expert assistance, AWS Professional Services and other Amazon partners are here to help.

We’d love to hear from you. Let us know what you think in the comments section, or use the issues forum in the MediaSearch Q Business GitHub repository.


About the Authors

Roshan Thomas is a Senior Solutions Architect at Amazon Web Services. He is based in Melbourne, Australia, and works closely with power and utilities customers to accelerate their journey in the cloud. He is passionate about technology and helping customers architect and build solutions on AWS.

Anup Dutta is a Solutions Architect with AWS based in Chennai, India. In his role at AWS, Anup works closely with startups to design and build cloud-centered solutions on AWS.

Bob StrahanBob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

Abhinav JawadekarAbhinav Jawadekar is a Principal Solutions Architect in the Amazon Q Business service team at AWS. Abhinav works with AWS customers and partners to help them build generative AI solutions on AWS.

Read More

Monks boosts processing speed by four times for real-time diffusion AI image generation using Amazon SageMaker and AWS Inferentia2

Monks boosts processing speed by four times for real-time diffusion AI image generation using Amazon SageMaker and AWS Inferentia2

This post is co-written with Benjamin Moody from Monks.

Monks is the global, purely digital, unitary operating brand of S4Capital plc. With a legacy of innovation and specialized expertise, Monks combines an extraordinary range of global marketing and technology services to accelerate business possibilities and redefine how brands and businesses interact with the world. Its integration of systems and workflows delivers unfettered content production, scaled experiences, enterprise-grade technology and data science fueled by AI—managed by the industry’s best and most diverse digital talent—to help the world’s trailblazing companies outmaneuver and outpace their competition.

Monks leads the way in crafting cutting-edge brand experiences. We shape modern brands through innovative and forward-thinking solutions. As brand experience experts, we harness the synergy of strategy, creativity, and in-house production to deliver exceptional results. Tasked with using the latest advancements in AWS services and machine learning (ML) acceleration, our team embarked on an ambitious project to revolutionize real-time image generation. Specifically, we focused on using AWS Inferentia2 chips with Amazon SageMaker to enhance the performance and cost-efficiency of our image generation processes..

Initially, our setup faced significant challenges regarding scalability and cost management. The primary issues were maintaining consistent inference performance under varying loads, while providing generative experience for the end-user. Traditional compute resources were not only costly but also failed to meet the low latency requirements. This scenario prompted us to explore more advanced solutions from AWS that could offer high-performance computing and cost-effective scalability.

The adoption of AWS Inferentia2 chips and SageMaker asynchronous inference endpoints emerged as a promising solution. These technologies promised to address our core challenges by significantly enhancing processing speed (AWS Inferentia2 chips were four times faster in our initial benchmarks) and reducing costs through fully managed auto scaling inference endpoints.

In this post, we share how we used AWS Inferentia2 chips with SageMaker asynchronous inference to optimize the performance by four times and achieve a 60% reduction in cost per image for our real-time diffusion AI image generation.

Solution overview

The combination of SageMaker asynchronous inference with AWS Inferentia2 allowed us to efficiently handle requests that had large payloads and long processing times while maintaining low latency requirements. A prerequisite was to fine-tune the Stable Diffusion XL model with domain-specific images which were stored in Amazon Simple Storage Service (Amazon S3). For this, we used Amazon SageMaker JumpStart. For more details, refer to Fine-Tune a Model.

The solution workflow consists of the following components:

  • Endpoint creation – We created an asynchronous inference endpoint using our existing SageMaker models, using AWS Inferentia2 chips for higher price/performance.
  • Request handling – Requests were queued by SageMaker upon invocation. Users submitted their image generation requests, where the input payload was placed in Amazon S3. SageMaker then queued the request for processing.
  • Processing and output – After processing, the results were stored back in Amazon S3 in a specified output bucket. During periods of inactivity, SageMaker automatically scaled the instance count to zero, significantly reducing costs because charges only occurred when the endpoint was actively processing requests.
  • Notifications – Completion notifications were set up through Amazon Simple Notification Service (Amazon SNS), notifying users of success or errors.

The following diagram illustrates our solution architecture and process workflow.

Solution architecture

In the following sections, we discuss the key components of the solution in more detail.

SageMaker asynchronous endpoints

SageMaker asynchronous endpoints queue incoming requests to process them asynchronously, which is ideal for large inference payloads (up to 1 GB) or inference requests with long processing times (up to 60 minutes) that need to be processed as requests arrive. The ability to serve long-running requests enabled Monks to effectively serve their use case. Auto scaling the instance count to zero allows you to design cost-optimal inference in response to spiky traffic, so you only pay for when the instances are serving traffic. You can also scale the endpoint instance count to zero in the absence of outstanding requests and scale back up when new requests arrive.

To learn how to create a SageMaker asynchronous endpoint, attach auto scaling policies, and invoke an asynchronous endpoint, refer to Create an Asynchronous Inference Endpoint.

AWS Inferentia2 chips, which powered the SageMaker asynchronous endpoints, are AWS AI chips optimized to deliver high performance for deep learning inference applications at lowest cost. Integrated within SageMaker asynchronous inference endpoints, AWS Inferentia2 chips support scale-out distributed inference with ultra-high-speed connectivity between chips. This setup was ideal for deploying our large-scale generative AI model across multiple accelerators efficiently and cost-effectively.

In the context of our high-profile nationwide campaign, the use of asynchronous computing was key in managing peak and unexpected spikes in concurrent requests to our inference infrastructure, which was expected to be in the hundreds of concurrent requests per second. Asynchronous inference endpoints, like those provided by SageMaker, offer dynamic scalability and efficient task management.

The solution offered the following benefits:

  • Efficient handling of longer processing times – SageMaker asynchronous inference endpoints are perfect for scenarios where each request might involve substantial computational work. These fully managed endpoints queue incoming inference requests and process them asynchronously. This method was particularly advantageous in our application, because it allowed the system to manage fluctuating demand efficiently. The ability to process requests asynchronously makes sure our infrastructure can handle large unexpected spikes in traffic without causing delays in response times.
  • Cost-effective resource utilization – One of the most significant advantages of using asynchronous inference endpoints is their impact on cost management. These endpoints can automatically scale the compute resources down to zero in periods of inactivity, without the risk of dropping or losing requests as resources scale back up.

Custom scaling policies using Amazon CloudWatch metrics

SageMaker endpoint auto scaling behavior is defined through the use of a scaling policy, which helps us scale to multiple users using the application concurrently This policy defines how and when to scale resources up or down to provide optimal performance and cost-efficiency.

SageMaker synchronous inference endpoints are typically scaled using the InvocationsPerInstance metric, which helps determine event triggers based on real-time demands. However, for SageMaker asynchronous endpoints, this metric isn’t available due to their asynchronous nature.

We encountered challenges with alternative metrics such as ApproximateBacklogSizePerInstance because they didn’t meet our real-time requirements. The inherent delay in these metrics resulted in unacceptable latency in our scaling processes.

Consequently, we sought a custom metric that could more accurately reflect the real-time load on our SageMaker instances.

Amazon CloudWatch custom metrics provide a powerful tool for monitoring and managing your applications and services in the AWS Cloud.

We had previously established a range of custom metrics to monitor various aspects of our infrastructure, including a particularly crucial one for tracking cache misses during image generation. Due to the nature of asynchronous endpoints, which don’t provide the InvocationsPerInstance metric, this custom cache miss metric became essential. It enabled us to gauge the number of requests contributing to the size of the endpoint queue. With this insight into the number of requests, one of our senior developers began to explore additional metrics available through CloudWatch to calculate the asynchronous endpoint capacity and utilization rate. We used the following calculations:

  • InferenceCapacity = (CPU utilization * 60) / (InferenceTimeInSeconds * InstanceGPUCount)
  • Number of inference requests = (served from cache + cache misses)
  • Usage rate = (number of requests) / (InferenceCapacity)​

The calculations included the following variables:

  • CPU utilization – Represents the average CPU utilization percentage of the SageMaker instances (CPUUtilization CloudWatch metric). It provides a snapshot of how much CPU resources are currently being used by the instances.
  • InferenceCapacity – The total number of inference tasks that the system can process per minute, calculated based on the average CPU utilization and scaled by the number of GPUs available (inf2.48xlarge has 12 GPUs). This metric provides an estimate of the system’s throughput capability per minute.
    • Multiply by 60 / Divide by InferenceTimeInSeconds – This step effectively adjusts the CPUUtilization metric to reflect how it translates into jobs per minute, assuming each job takes 10 seconds. Therefore, (CPU utilization * 60) / 10 represents the theoretical maximum number of jobs that can be processed in one minute based on current or typical CPU utilization.
    • Multiply by 12 – Because the inf2.48xlarge instance has 12 GPUs, this multiplication provides a total capacity in terms of how many jobs all GPUs can handle collectively in 1 minute.
  • Number of inference requests (served from cache + cache misses) – We monitor the total number of inference requests processed, distinguishing between those served from cache and those requiring real-time processing due to cache misses. This helps us gauge the overall workload.
  • Usage rate (number of inference requests) / (InferenceCapacity)​ – This formula determines the rate of resource usage by comparing the number of operations that invoke new tasks (number of requests) to the total inference capacity (InferenceCapacity).

A higher InferenceCapacity value suggests that we have either scaled up our resources or that our instances are under-utilized. Conversely, a lower capacity value could indicate that we’re reaching our capacity limits and might need to scale out to maintain performance.

Our custom usage rate metric quantifies the usage rate of available SageMaker instance capacity. It’s a composite measure that factors in both the image generation tasks that weren’t served from cache and those that resulted in a cache miss, relative to the total capacity metric. The usage rate is intended to provide insights into how much of the total provisioned SageMaker instance capacity is actively being used for image generation operations. It serves as a key indicator of operational efficiency and helps identify the workload’s operational demands.

We then used the usage rate metric as our auto scaling trigger metric. The use of this trigger in our auto scaling policy made sure SageMaker instances were neither over-provisioned nor under-provisioned. A high value for usage rate might indicate the need to scale up resources to maintain performance. A low value, on the other hand, could signal under-utilization, indicating a potential for cost optimization by scaling down resources.

We applied our custom metrics as triggers for a scaling policy:

CustomizedMetricSpecification = {
    "Metrics": [
        {
            "Id": "m1",
            "MetricStat": {
                "Metric": {
                    "MetricName": "CPUUtilization",
                    "Namespace": "/aws/sagemaker/Endpoints",
                    "Dimensions": [
                        { "Name": "EndpointName", "Value": endpoint_name },
                        { "Name": "VariantName", "Value": "AllTraffic" },
                    ]
                },
                "Stat": "SampleCount"
            },
            "ReturnData": False
        },
        {
            "Id": "m2",
            "MetricStat": {
                "Metric": {
                    "MetricName": " NumberOfInferenceRequests ",
                    "Namespace": "ImageGenAPI",
                    "Dimensions": [
                        { "Name": "service", "Value": "ImageGenerator" },
                        { "Name": "executionEnv", "Value": "AWS_Lambda_nodejs18.x" },
                        { "Name": "region", "Value": "us-west-2" },
                    ]
                },
                "Stat": "SampleCount"
            },
            "ReturnData": False
        },
        {
            "Label": "utilization rate",
            "Id": "e1",
            "Expression": "IF(m1 != 0, m2 / (m1 * 60 / 10 * 12))",
            "ReturnData": True
        }
    ]
}

aas_client.put_scaling_policy(
    PolicyName=endpoint_name,
    PolicyType="TargetTrackingScaling",
    ServiceNamespace=service_namespace,
    ResourceId=resource_id,
    ScalableDimension=scalable_dimension,
    TargetTrackingScalingPolicyConfiguration={
        "CustomizedMetricSpecification": CustomizedMetricSpecification,
        "TargetValue":0.75,
        "ScaleOutCooldown": 60,
        "ScaleInCooldown": 120,
        "DisableScaleIn": False,
    }
)

Deployment on AWS Inferentia2 chips

The integration of AWS Inferentia2 chips into our SageMaker inference endpoints not only resulted in a four-times increase in inference performance for our finely-tuned Stable Diffusion XL model, but also significantly enhanced cost-efficiency. Specifically, SageMaker instances powered by these chips reduced our deployment costs by 60% compared to other comparable instances on AWS. This substantial reduction in cost, coupled with improved performance, underscores the value of using AWS Inferentia2 for intensive computational tasks such as real-time diffusion AI image generation.

Given the importance of swift response times for our specific use case, we established an acceptance criterion of single digit second latency.

SageMaker instances equipped with AWS Inferentia2 chips successfully optimized our infrastructure to deliver image generation in just 9.7 seconds. This enhancement not only met our performance requirements at a low cost, but also provided a seamless and engaging user experience owing to the high availability of Inferentia2 chips.

The effort to integrate with the Neuron SDK also proved highly beneficial. The optimized model not only met our performance criteria, but also enhanced the overall efficiency of our inference processes.

Results and benefits

The implementation of SageMaker asynchronous inference endpoints significantly enhanced our architecture’s ability to handle varying traffic loads and optimize resource utilization, leading to marked improvements in performance and cost-efficiency:

  • Inference performance – The AWS Inferentia2 setup processed an average of 27,796 images per instance per hour, giving us 2x improvement in throughput over comparable accelerated compute instances.
  • Inference savings – In addition to performance enhancements, the AWS Inferentia2 configurations achieved a 60% reduction in cost per image compared to the original estimation. The cost for processing each image with AWS Inferentia2 was $0.000425. Although the initial requirement to compile models for the AWS Inferentia2 chips introduced an additional time investment, the substantial throughput gains and significant cost reductions justified this effort. For demanding workloads that necessitate high throughput without compromising budget constraints, AWS Inferentia2 instances are certainly worthy of consideration.
  • Smoothing out traffic spikes – We effectively smoothed out spikes in traffic to provide continual real-time experience for end-users. As shown in the following figure, the SageMaker asynchronous endpoint auto scaling and managed queue is preventing significant drift from our goal of single digit second latency per image generation.

Image generation request latency

  • Scheduled scaling to manage demand – We can scale up and back down on schedule to cover more predictable traffic demands, reducing inference costs while supplying demand. The following figure illustrates the impact of auto scaling reacting to unexpected demand as well as scaling up and down on a schedule.

Utilization rate

Conclusion

In this post, we discussed the potential benefits of applying SageMaker and AWS Inferentia2 chips within a production-ready generative AI application. SageMaker fully managed asynchronous endpoints provide an application time to react to both unexpected and predictable demand in a structured manner, even for high-demand applications such as image-based generative AI. Despite the learning curve involved in compiling the Stable Diffusion XL model to run on AWS Inferentia2 chips, using AWS Inferentia2 allowed us to achieve our demanding low-latency inference requirements, providing an excellent user experience, all while remaining cost-efficient.

To learn more about SageMaker deployment options for your generative AI use cases, refer to the blog series Model hosting patterns in Amazon SageMaker. You can get started with hosting a Stable Diffusion model with SageMaker and AWS Inferentia2 by using the following example.

Discover how Monks serves as a comprehensive digital partner by integrating a wide array of solutions. These encompass media, data, social platforms, studio production, brand strategy, and cutting-edge technology. Through this integration, Monks enables efficient content creation, scalable experiences, and AI-driven data insights, all powered by top-tier industry talent.


About the Authors

Benjamin MoodyBenjamin Moody is a Senior Solutions Architect at Monks. He focuses on designing and managing high-performance, robust, and secure architectures, utilizing a broad range of AWS services. Ben is particularly adept at handling projects with complex requirements, including those involving generative AI at scale. Outside of work, he enjoys snowboarding and traveling.

Karan JainKaran Jain is a Senior Machine Learning Specialist at AWS, where he leads the worldwide Go-To-Market strategy for Amazon SageMaker Inference. He helps customers accelerate their generative AI and ML journey on AWS by providing guidance on deployment, cost-optimization, and GTM strategy. He has led product, marketing, and business development efforts across industries for over 10 years, and is passionate about mapping complex service features to customer solutions.

Raghu RameshaRaghu Ramesha is a Senior Gen AI/ML Specialist Solutions Architect with AWS. He focuses on helping enterprise customers build and deploy AI/ ML production workloads to Amazon SageMaker at scale. He specializes in generative AI, machine learning, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.

Rupinder GrewalRupinder Grewal is a Senior Gen AI/ML Specialist Solutions Architect with AWS. He currently focuses on model serving and MLOps on SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.

Parag SrivastavaParag Srivastava is a Senior Solutions Architect at AWS, where he has been helping customers in successfully applying generative AI to real-life business scenarios. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.

Read More

Implement web crawling in Knowledge Bases for Amazon Bedrock

Implement web crawling in Knowledge Bases for Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

With Amazon Bedrock, you can experiment with and evaluate top FMs for various use cases. It allows you to privately customize them with your enterprise data using techniques like Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources. Knowledge Bases for Amazon Bedrock enables you to aggregate data sources into a repository of information. With knowledge bases, you can effortlessly build an application that takes advantage of RAG.

Accessing up-to-date and comprehensive information from various websites is crucial for many AI applications in order to have accurate and relevant data. Customers using Knowledge Bases for Amazon Bedrock want to extend the capability to crawl and index their public-facing websites. By integrating web crawlers into the knowledge base, you can gather and utilize this web data efficiently. In this post, we explore how to achieve this seamlessly.

Web crawler for knowledge bases

With a web crawler data source in the knowledge base, you can create a generative AI web application for your end-users based on the website data you crawl using either the AWS Management Console or the API. The default crawling behavior of the web connector starts by fetching the provided seed URLs and then traversing all child links within the same top primary domain (TPD) and having the same or deeper URL path.

The current considerations are that the URL can’t require any authentication, it can’t be an IP address for its host, and its scheme has to start with either http:// or https://. Additionally, the web connector will fetch non-HTML supported files such as PDFs, text files, markdown files, and CSVs referenced in the crawled pages regardless of their URL, as long as they aren’t explicitly excluded. If multiple seed URLs are provided, the web connector will crawl a URL if it fits any seed URL’s TPD and path. You can have up to 10 source URLs, which the knowledge base uses to as a starting point to crawl.

However, the web connector doesn’t traverse pages across different domains by default. The default behavior, however, will retrieve supported non-HTML files. This makes sure the crawling process remains within the specified boundaries, maintaining focus and relevance to the targeted data sources.

Understanding the sync scope

When setting up a knowledge base with web crawl functionality, you can choose from different sync types to control which webpages are included. The following table shows the example paths that will be crawled given the source URL for different sync scopes (https://example.com is used for illustration purposes).

Sync Scope Type Source URL Example Domain Paths Crawled Description
Default https://example.com/products

https://example.com/products

https://example.com/products/product1

https://example.com/products/product

https://example.com/products/discounts

Same host and the same initial path as the source URL
Host only https://example.com/sellers

https://example.com/

https://example.com/products

https://example.com/sellers

https://example.com/delivery

Same host as the source URL
Subdomains https://example.com

https://blog.example.com

https://blog.example.com/posts/post1

https://discovery.example.com

https://transport.example.com

Subdomain of the primary domain of the source URLs

You can set the maximum throttling for crawling speed to control the maximum crawl rate. Higher values will reduce the sync time. However, the crawling job will always adhere to the domain’s robots.txt file if one is present, respecting standard robots.txt directives like ‘Allow’, ‘Disallow’, and crawl rate.

You can further refine the scope of URLs to crawl by using inclusion and exclusion filters. These filters are regular expression (regex) patterns applied to each URL. If a URL matches any exclusion filter, it will be ignored. Conversely, if inclusion filters are set, the crawler will only process URLs that match at least one of these filters that are still within the scope. For example, to exclude URLs ending in .pdf, you can use the regex ^.*.pdf$. To include only URLs containing the word “products,” you can use the regex .*products.*.

Solution overview

In the following sections, we walk through the steps to create a knowledge base with a web crawler and test it. We also show how to create a knowledge base with a specific embedding model and an Amazon OpenSearch Service vector collection as a vector database, and discuss how to monitor your web crawler.

Prerequisites

Make sure you have permission to crawl the URLs you intend to use, and adhere to the Amazon Acceptable Use Policy. Also make sure any bot detection features are turned off for those URLs. A web crawler in a knowledge base uses the user-agent bedrockbot when crawling webpages.

Create a knowledge base with a web crawler

Complete the following steps to implement a web crawler in your knowledge base:

  1. On the Amazon Bedrock console, in the navigation pane, choose Knowledge bases.
  2. Choose Create knowledge base.
  3. On the Provide knowledge base details page, set up the following configurations:
    1. Provide a name for your knowledge base.
    2. In the IAM permissions section, select Create and use a new service role.
    3. In the Choose data source section, select Web Crawler as the data source.
    4. Choose Next.
  4. On the Configure data source page, set up the following configurations:
    1. Under Source URLs, enter https://www.aboutamazon.com/news/amazon-offices.
    2. For Sync scope, select Host only.
    3. For Include patterns, enter ^https?://www.aboutamazon.com/news/amazon-offices/.*$.
    4. For exclude pattern, enter .*plants.* (we don’t want any post with a URL containing the word “plants”).
    5. For Content chunking and parsing, chose Default.
    6. Choose Next.
  5. On the Select embeddings model and configure vector store page, set up the following configurations:
    1. In the Embeddings model section, chose Titan Text Embeddings v2.
    2. For Vector dimensions, enter 1024.
    3. For Vector database, choose Quick create a new vector store.
    4. Choose Next.
  6. Review the details and choose Create knowledge base.

In the preceding instructions, the combination of Include patterns and Host only sync scope is used to demonstrate the use of the include pattern for web crawling. The same results can be achieved with the default sync scope, as we learned in the previous section of this post.

Create knowledge base web crawler

You can use the Quick create vector store option when creating the knowledge base to create an Amazon OpenSearch Serverless vector search collection. With this option, a public vector search collection and vector index is set up for you with the required fields and necessary configurations. Additionally, Knowledge Bases for Amazon Bedrock manages the end-to-end ingestion and query workflows.

Test the knowledge base

Let’s go over the steps to test the knowledge base with a web crawler as the data source:

  1. On the Amazon Bedrock console, navigate to the knowledge base that you created.
  2. Under Data source, select the data source name and choose Sync. It could take several minutes to hours to sync, depending on the size of your data.
  1. When the sync job is complete, in the right panel, under Test knowledge base, choose Select model and select the model of your choice.
  2. Enter one of the following prompts and observe the response from the model:
    1. How do I tour the Seattle Amazon offices?
    2. Provide me with some information about Amazon’s HQ2.
    3. What is it like in the Amazon’s New York office?

As shown in the following screenshot, citations are returned within the response reference webpages. The value of x-amz-bedrock-kb-source-uri is a webpage link, which helps you verify the response accuracy.

knowledge base web crawler testing

Create a knowledge base using the AWS SDK

This following code uses the AWS SDK for Python (Boto3) to create a knowledge base in Amazon Bedrock with a specific embedding model and OpenSearch Service vector collection as a vector database:

import boto3

client = boto3.client('bedrock-agent')

response = client.create_knowledge_base(
    name='workshop-aoss-knowledge-base',
    roleArn='your-role-arn',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:your-region::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'your-opensearch-collection-arn',
            'vectorIndexName': 'blog_index',
            'fieldMapping': {
                'vectorField': 'documentid',
                'textField': 'data',
                'metadataField': 'metadata'
            }
        }
    }
)

The following Python code uses Boto3 to create a web crawler data source for an Amazon Bedrock knowledge base, specifying URL seeds, crawling limits, and inclusion and exclusion filters:

import boto3

client = boto3.client('bedrock-agent', region_name='us-east-1')

knowledge_base_id = 'knowledge-base-id'

response = client.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='example',
    description='test description',
    dataSourceConfiguration={
        'type': 'WEB',
        'webConfiguration': {
            'sourceConfiguration': {
                'urlConfiguration': {
                    'seedUrls': [
                        {'url': 'https://example.com/'}
                    ]
                }
            },
            'crawlerConfiguration': {
                'crawlerLimits': {
                    'rateLimit': 300
                },
                'inclusionFilters': [
                    '.*products.*'
                ],
                'exclusionFilters': [
                    '.*.pdf$'
                ],
                'scope': 'HOST_ONLY'
            }
        }
    }
)

Monitoring

You can track the status of an ongoing web crawl in your Amazon CloudWatch logs, which should report the URLs being visited and whether they are successfully retrieved, skipped, or failed. The following screenshot shows the CloudWatch logs for the crawl job.

knowledge base cloudwatch monitoring

Clean up

To clean up your resources, complete the following steps:

  1. Delete the knowledge base:
    1. On the Amazon Bedrock console, choose Knowledge bases under Orchestration in the navigation pane.
    2. Choose the knowledge base you created.
    3. Take note of the AWS Identity and Access Management (IAM) service role name in the knowledge base overview.
    4. In the Vector database section, take note of the OpenSearch Serverless collection ARN.
    5. Choose Delete, then enter delete to confirm.
  2. Delete the vector database:
    1. On the OpenSearch Service console, choose Collections under Serverless in the navigation pane.
    2. Enter the collection ARN you saved in the search bar.
    3. Select the collection and chose Delete.
    4. Enter confirm in the confirmation prompt, then choose Delete.
  3. Delete the IAM service role:
    1. On the IAM console, choose Roles in the navigation pane.
    2. Search for the role name you noted earlier.
    3. Select the role and choose Delete.
    4. Enter the role name in the confirmation prompt and delete the role.

Conclusion

In this post, we showcased how Knowledge Bases for Amazon Bedrock now supports the web data source, enabling you to index public webpages. This feature allows you to efficiently crawl and index websites, so your knowledge base includes diverse and relevant information from the web. By taking advantage of the infrastructure of Amazon Bedrock, you can enhance the accuracy and effectiveness of your generative AI applications with up-to-date and comprehensive data.

For pricing information, see Amazon Bedrock pricing. To get started using Knowledge Bases for Amazon Bedrock, refer to Create a knowledge base. For deep-dive technical content, refer to Crawl web pages for your Amazon Bedrock knowledge base. To learn how our Builder communities are using Amazon Bedrock in their solutions, visit our community.aws website.


About the Authors

Hardik Vasa is a Senior Solutions Architect at AWS. He focuses on Generative AI and Serverless technologies, helping customers make the best use of AWS services. Hardik shares his knowledge at various conferences and workshops. In his free time, he enjoys learning about new tech, playing video games, and spending time with his family.

Malini Chatterjee is a Senior Solutions Architect at AWS. She provides guidance to AWS customers on their workloads across a variety of AWS technologies. She brings a breadth of expertise in Data Analytics and Machine Learning. Prior to joining AWS, she was architecting data solutions in financial industries. She is very passionate about semi-classical dancing and performs in community events. She loves traveling and spending time with her family.

Read More