Moderate your Amazon IVS live stream using Amazon Rekognition

Moderate your Amazon IVS live stream using Amazon Rekognition

Amazon Interactive Video Service (Amazon IVS) is a managed live streaming solution that is designed to provide a quick and straightforward setup to let you build interactive video experiences and handles interactive video content from ingestion to delivery.

With the increased usage of live streaming, the need for effective content moderation becomes even more crucial. User-generated content (UGC) presents complex challenges for safety. Many companies rely on human moderators to monitor video streams, which is time-consuming, error-prone, and doesn’t scale with business growth speed. An automated moderation solution supporting a human in the loop (HITL) is increasingly needed.

Amazon Rekognition Content Moderation, a capability of Amazon Rekognition, automates and streamlines image and video moderation workflows without requiring machine learning (ML) experience. In this post, we explain the common practice of live stream visual moderation with a solution that uses the Amazon Rekognition Image API to moderate live streams. You can deploy this solution to your AWS account using the AWS Cloud Development Kit (AWS CDK) package available in our GitHub repo.

Moderate live stream visual content

The most common approach for UGC live stream visual moderation involves sampling images from the stream and utilizing image moderation to receive near-real-time results. Live stream platforms can use flexible rules to moderate visual content. For instance, platforms with younger audiences might have strict rules about adult content and certain products, whereas others might focus on hate symbols. These platforms establish different rules to match their policies effectively. Combining human and automatic review, a hybrid process is a common design approach. Certain streams will be stopped automatically, but human moderators will also assess whether a stream violates platform policies and should be deactivated.

The following diagram illustrates the conceptual workflow of a near-real-time moderation system, designed with loose coupling to the live stream system.

Overview

The workflow contains the following steps:

  1. The live stream service (or the client app) samples image frames from video streams based on a specific interval.
  2. A rules engine evaluates moderation guidelines, determining the frequency of stream sampling and the applicable moderation categories, all within predefined policies. This process involves the utilization of both ML and non-ML algorithms.
  3. The rules engine alerts human moderators upon detecting violations in the video streams.
  4. Human moderators assess the result and deactivate the live stream.

Moderating UGC live streams is distinct from classic video moderation in media. It caters to diverse regulations. How frequently images are sampled from video frames for moderation is typically determined by the platform’s Trust & Safety policy and the service-level agreement (SLA). For instance, if a live stream platform aims to stop channels within 3 minutes for policy violations, a practical approach is to sample every 1–2 minutes, allowing time for human moderators to verify and take action. Some platforms require flexible moderation frequency control. For instance, highly reputable streamers may need less moderation, whereas new ones require closer attention. This also enables cost-optimization by reducing sampling frequency.

Cost is an important consideration in any live stream moderation solution. As UGC live stream platforms rapidly expand, moderating concurrent streams at a high frequency can raise cost concerns. The solution presented in this post is designed to optimize cost by allowing you to define moderation rules to customize sample frequency, ignore similar image frames, and other techniques.

Recording Amazon IVS stream content to Amazon S3

Amazon IVS offers native solutions for recording stream content to an Amazon Simple Storage Service (Amazon S3) bucket and generating thumbnails—image frames from a video stream. It generates thumbnails every 60 seconds by default and provides users the option to customize the image quality and frequency. Using the AWS Management Console, you can create a recording configuration and link it to an Amazon IVS channel. When a recording configuration is associated with a channel, the channel’s live streams are automatically recorded to the specified S3 bucket.

There are no Amazon IVS charges for using the auto-record to Amazon S3 feature or for writing to Amazon S3. There are charges for Amazon S3 storage, Amazon S3 API calls that Amazon IVS makes on behalf of the customer, and serving the stored video to viewers. For details about Amazon IVS costs, refer to Costs (Low-Latency Streaming).

Amazon Rekognition Moderation APIs

In this solution, we use the Amazon Rekognition DetectModerationLabel API to moderate Amazon IVS thumbnails in near-real time. Amazon Rekognition Content Moderation provides pre-trained APIs to analyze a wide range of inappropriate or offensive content, such as violence, nudity, hate symbols, and more. For a comprehensive list of Amazon Rekognition Content Moderation taxonomies, refer to Moderating content.

The following code snippet demonstrates how to call the Amazon Rekognition DetectModerationLabel API to moderate images within an AWS Lambda function using the Python Boto3 library:

import boto3

# Initialize the Amazon Rekognition client object
rekognition = boto3.client('rekognition')

# Call the Rekognition Image moderation API
response = rekognition.detect_moderation_labels(
 Image={'S3Object': {'Bucket': data_bucket,'Name': s3_key}}
)

The following is an example response from the Amazon Rekognition Image Moderation API:

{
    "ModerationLabels": [
        {
            "Confidence": 99.9290542602539,
            "Name": "Female Swimwear Or Underwear",
            "ParentName": "Suggestive"
        },
        ...
    ],
    "ModerationModelVersion": "6.1"
}

For additional examples of the Amazon Rekognition Image Moderation API, refer to our Content Moderation Image Lab.

Solution overview

This solution integrates with Amazon IVS by reading thumbnail images from an S3 bucket and sending images to the Amazon Rekognition Image Moderation API. It provides choices for stopping the stream automatically and human-in-the-loop review. You can configure rules for the system to automatically halt streams based on conditions. It also includes a light human review portal, empowering moderators to monitor streams, manage violation alerts, and stop streams when necessary.

In this section, we briefly introduce the system architecture. For more detailed information, refer to the GitHub repo.

The following screen recording displays the moderator UI, enabling them to monitor active streams with moderation warnings, and take actions such as stopping the stream or dismissing warnings.

Demo Moderator

Users can customize moderation rules, controlling video stream sample frequency per channel, configuring Amazon Rekognition moderation categories with confidence thresholds, and enabling similarity checks, which ensures performance and cost-optimization by avoiding processing redundant images.

The following screen recording displays the UI for managing a global configuration.

Demo configuration

The solution uses a microservices architecture, which consists of two key components loosely coupled with Amazon IVS.

Overall Architecture

Rules engine

The rules engine forms the backbone of the live stream moderation system. It is a live processing service that enables near-real-time moderation. It uses Amazon Rekognition to moderate images, validates results against customizable rules, employs image hashing algorithms to recognize and exclude similar images, and can halt streams automatically or alert the human review subsystem upon rule violations. The service integrates with Amazon IVS through Amazon S3-based image reading and facilitates API invocation via Amazon API Gateway.

The following architecture diagram illustrates the near-real-time moderation workflow.

Rules Engine

There are two methods to trigger the rules engine processing workflow:

  • S3 file trigger – When a new image is added to the S3 bucket, the workflow starts. This is the recommended way for Amazon IVS integration.
  • REST API call – You can make a RESTful API call to API Gateway with the image bytes in the request body. The API stores the image in an S3 bucket, triggering near-real-time processing. This approach is fitting for images captured by the client side of the live stream app and transmitted over the internet.

The image processing workflow, managed by AWS Step Functions, involves several steps:

  1. Check the sample frequency rule. Processing halts if the previous sample time is too recent.
  2. If enabled in the config, perform a similarity check using image hash algorithms. The process skips the image if it’s similar to the previous one received for the same channel.
  3. Use the Amazon Rekognition Image Moderation API to assess the image against configured rules, applying a confidence threshold and ignoring unnecessary categories.
  4. If the moderation result violates any rules, send notifications to an Amazon Simple Notification Service (Amazon SNS) topic, alerting downstream systems with moderation warnings.
  5. If the auto stop moderation rule is violated, the Amazon IVS stream will be stopped automatically.

The design manages rules through a Step Functions state machine, providing a drag-and-drop GUI for flexible workflow definition. You can extend the rules engine by incorporating additional Step Functions workflows.

Monitoring and management dashboard

The monitoring and management dashboard is a web application with a UI that lets human moderators monitor Amazon IVS live streams. It provides near-real-time moderation alerts, allowing moderators to stop streams or dismiss warnings. The web portal also empowers administrators to manage moderation rules for the rules engine. It supports two types of configurations:

  • Channel rules – You can define rules for specific channels.
  • Global rules – These rules apply to all or a subset of Amazon IVS channels that lack specific configurations. You can define a regular expression to apply the global rule to Amazon IVS channel names matching a pattern. For example: .* applies to all channels. /^test-/ applies to channels with names starting with test-.

The system is a serverless web app, featuring a static React front end hosted on Amazon S3 with Amazon CloudFront for caching. Authentication is handled by Amazon Cognito. Data is served through API Gateway and Lambda, with state storage in Amazon DynamoDB. The following diagram illustrates this architecture.

Web application

The monitoring dashboard is a lightweight demo app that provides essential features for moderators. To enhance functionality, you can extend the implementation to support multiple moderators with a management system and reduce latency by implementing a push mechanism using WebSockets.

Moderation latency

The solution is designed for near-real-time moderation, with latency measured across two separate subsystems:

  • Rules engine workflow – The rules engine workflow, from receiving images to sending notifications via Amazon SNS, averages within 2 seconds. This service promptly handles images through a Step Functions state machine. The Amazon Rekognition Image Moderation API processes under 500 milliseconds for average file sizes below 1 MB. (These findings are based on tests conducted with the sample app, meeting near-real-time requirements.) In Amazon IVS, you have the option to select different thumbnail resolutions to adjust the image size.
  • Monitoring web portal – The monitoring web portal subscribes to the rules engine’s SNS topic. It records warnings in a DynamoDB table, while the website UI fetches the latest warnings every 10 seconds. This design showcases a lightweight demonstration of the moderator’s view. To further reduce latency, consider implementing a WebSocket to instantly push warnings to the UI upon their arrival via Amazon SNS.

Extend the solution

This post focuses on live stream visual content moderation. However, the solution is intentionally flexible, capable of accommodating complex business rules and extensible to support other media types, including moderating chat messages and audio in live streams. You can enhance the rules engine by introducing new Step Functions state machine workflows with upstream dispatching logic. We’ll delve deeper into live stream text and audio moderation using AWS AI services in upcoming posts.

Summary

In this post, we provided an overview of a sample solution that showcases how to moderate Amazon IVS live stream videos using Amazon Rekognition. You can experience the sample app by following the instructions in the GitHub repo and deploying it to your AWS account using the included AWS CDK package.

Learn more about content moderation on AWS. Take the first step towards streamlining your content moderation operations with AWS.


About the Authors

Author Lana ZhangLana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for Content Moderation, Computer Vision, Natural Language Processing and Generative AI. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising & marketing.

Author Tony VuTony Vu is a Senior Partner Engineer at Twitch. He specializes in assessing partner technology for integration with Amazon Interactive Video Service (IVS), aiming to develop and deliver comprehensive joint solutions to our IVS customers.

Read More

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content.

The Retrieval-Augmented Generation (RAG) framework augments prompts with external data from multiple sources, such as document repositories, databases, or APIs, to make foundation models effective for domain-specific tasks. This post presents the capabilities of the RAG model and highlights the transformative potential of MongoDB Atlas with its Vector Search feature.

MongoDB Atlas is an integrated suite of data services that accelerate and simplify the development of data-driven applications. Its vector data store seamlessly integrates with operational data storage, eliminating the need for a separate database. This integration enables powerful semantic search capabilities through Vector Search, a fast way to build semantic search and AI-powered applications.

Amazon SageMaker enables enterprises to build, train, and deploy machine learning (ML) models. Amazon SageMaker JumpStart provides pre-trained models and data to help you get started with ML. You can access, customize, and deploy pre-trained models and data through the SageMaker JumpStart landing page in Amazon SageMaker Studio with just a few clicks.

Amazon Lex is a conversational interface that helps businesses create chatbots and voice bots that engage in natural, lifelike interactions. By integrating Amazon Lex with generative AI, businesses can create a holistic ecosystem where user input seamlessly transitions into coherent and contextually relevant responses.

Solution overview

The following diagram illustrates the solution architecture.

Solution overview

In the following sections, we walk through the steps to implement this solution and its components.

Set up a MongoDB cluster

To create a free tier MongoDB Atlas cluster, follow the instructions in Create a Cluster. Set up the database access and network access.

Deploy the SageMaker embedding model

You can choose the embedding model (ALL MiniLM L6 v2) on the SageMaker JumpStart Models, notebooks, solutions page.

SageMaker JumpStart Models, notebooks, solutions

Choose Deploy to deploy the model.

Verify the model is successfully deployed and verify the endpoint is created.

model is successfully deployed

Vector embedding

Vector embedding is a process of converting a text or image into a vector representation. With the following code, we can generate vector embeddings with SageMaker JumpStart and update the collection with the created vector for every document:

payload = {"text_inputs": [document[field_name_to_be_vectorized]]}
query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'))
embeddings = parse_response_multiple_texts(query_response)

# update the document
update = {'$set': {vector_field_name :  embeddings[0]}}
collection.update_one(query, update)

The code above shows how to update a single object in a collection.  To update all objects follow the instructions.

MongoDB vector data store

MongoDB Atlas Vector Search is a new feature that allows you to store and search vector data in MongoDB. Vector data is a type of data that represents a point in a high-dimensional space. This type of data is often used in ML and artificial intelligence applications. MongoDB Atlas Vector Search uses a technique called k-nearest neighbors (k-NN) to search for similar vectors. k-NN works by finding the k most similar vectors to a given vector. The most similar vectors are the ones that are closest to the given vector in terms of the Euclidean distance.

Storing vector data next to operational data can improve performance by reducing the need to move data between different storage systems. This is especially beneficial for applications that require real-time access to vector data.

Create a Vector Search index

The next step is to create a MongoDB Vector Search index on the vector field you created in the previous step. MongoDB uses the knnVector type to index vector embeddings. The vector field should be represented as an array of numbers (BSON int32, int64, or double data types only).

Refer to Review knnVector Type Limitations for more information about the limitations of the knnVector type.

The following code is a sample index definition:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "egVector": {
        "dimensions": 384,
        "similarity": "euclidean",
        "type": "knnVector"
      }
    }
  }
}

Note that the dimension must match you embeddings model dimension.

Query the vector data store

You can query the vector data store using the Vector Search aggregation pipeline. It uses the Vector Search index and performs a semantic search on the vector data store.

The following code is a sample search definition:

{
  $search: {
    "index": "<index name>", // optional, defaults to "default"
    "knnBeta": {
      "vector": [<array-of-numbers>],
      "path": "<field-to-search>",
      "filter": {<filter-specification>},
      "k": <number>,
      "score": {<options>}
    }
  }
}

Deploy the SageMaker large language model

SageMaker JumpStart foundation models are pre-trained large language models (LLMs) that are used to solve a variety of natural language processing (NLP) tasks, such as text summarization, question answering, and natural language inference. They are available in a variety of sizes and configurations. In this solution, we use the Hugging Face FLAN-T5-XL model.

Search for the FLAN-T5-XL model in SageMaker JumpStart.

Search for the FLAN-T5-XL

Choose Deploy to set up the FLAN-T5-XL model.

Deploy

Verify the model is deployed successfully and the endpoint is active.

Create an Amazon Lex bot

To create an Amazon Lex bot, complete the following steps:

  1. On the Amazon Lex console, choose Create bot.

Create bot

  1. For Bot name, enter a name.
  2. For Runtime role, select Create a role with basic Amazon Lex permissions.
  3. Specify your language settings, then choose Done.
  4. Add a sample utterance in the NewIntent UI and choose Save intent.
  5. Navigate to the FallbackIntent that was created for you by default and toggle Active in the Fulfillment section.
    toggle Active
  6. Choose Build and after the build is successful, choose Test.
    Build and Test
  7. Before testing, choose the gear icon.
  8. Specify the AWS Lambda function that will interact with MongoDB Atlas and the LLM to provide responses.  To create the lambda function follow these steps.
    9. Specify the AWS Lambda function
  9. You can now interact with the LLM.

Clean up

To clean up your resources, complete the following steps:

  1. Delete the Amazon Lex bot.
  2. Delete the Lambda function.
  3. Delete the LLM SageMaker endpoint.
  4. Delete the embeddings model SageMaker endpoint.
  5. Delete the MongoDB Atlas cluster.

Conclusion

In the post, we showed how to create a simple bot that uses MongoDB Atlas semantic search and integrates with a model from SageMaker JumpStart. This bot allows you to quickly prototype user interaction with different LLMs in SageMaker Jumpstart while pairing them with the context originating in MongoDB Atlas.

As always, AWS welcomes feedback. Please leave your feedback and questions in the comments section.


About the authors

Igor Alekseev is a Senior Partner Solution Architect at AWS in Data and Analytics domain. In his role Igor is working with strategic partners helping them build complex, AWS-optimized architectures. Prior joining AWS, as a Data/Solution Architect he implemented many projects in Big Data domain, including several data lakes in Hadoop ecosystem. As a Data Engineer he was involved in applying AI/ML to fraud detection and office automation.


Babu Srinivasan
is a Senior Partner Solutions Architect at MongoDB. In his current role, he is working with AWS to build the technical integrations and reference architectures for the AWS and MongoDB solutions. He has more than two decades of experience in Database and Cloud technologies . He is passionate about providing technical solutions to customers working with multiple Global System Integrators(GSIs) across multiple geographies.

Read More

Build a foundation model (FM) powered customer service bot with agents for Amazon Bedrock

Build a foundation model (FM) powered customer service bot with agents for Amazon Bedrock

From enhancing the conversational experience to agent assistance, there are plenty of ways that generative artificial intelligence (AI) and foundation models (FMs) can help deliver faster, better support. With the increasing availability and diversity of FMs, it’s difficult to experiment and keep up-to-date with the latest model versions. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. With Amazon Bedrock’s comprehensive capabilities, you can easily experiment with a variety of top FMs, customize them privately with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG).

Agents for Amazon Bedrock

In July, AWS announced the preview of agents for Amazon Bedrock, a new capability for developers to create fully managed agents in a few clicks. Agents extend FMs to run complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code. With fully managed agents, you don’t have to worry about provisioning or managing infrastructure.

In this post, we provide a step-by-step guide with building blocks to create a customer service bot. We use a text generation model (Anthropic Claude V2) and agents for Amazon Bedrock for this solution. We provide an AWS CloudFormation template to provision the resources needed for building this solution. Then we walk you through steps to create an agent for Amazon Bedrock.

ReAct Prompting

FMs determine how to solve user-requested tasks with a technique called ReAct. It’s a general paradigm that combines reasoning and acting with FMs. ReAct prompts FMs to generate verbal reasoning traces and actions for a task. This allows the system to perform dynamic reasoning to create, maintain, and adjust plans for acting while incorporating additional information into the reasoning. The structured prompts include a sequence of question-thought-action-observation examples.

  • The question is the user-requested task or problem to solve.
  • The thought is a reasoning step that helps demonstrate to the FM how to tackle the problem and identify an action to take.
  • The action is an API that the model can invoke from an allowed set of APIs.
  • The observation is the result of carrying out the action.

Components in agents for Amazon Bedrock

Behind the scenes, agents for Amazon Bedrock automate the prompt engineering and orchestration of user-requested tasks. They can securely augment the prompts with company-specific information to provide responses back to the user in natural language. The agent breaks the user-requested task into multiple steps and orchestrates subtasks with the help of FMs. Action groups are tasks that the agent can perform autonomously. Action groups are mapped to an AWS Lambda function and related API schema to perform API calls. The following diagram depicts the agent structure.

Agents for Amazon Bedrock components

Solution overview

We use a shoe retailer use case to build the customer service bot. The bot helps customers purchase shoes by providing options in a humanlike conversation. Customers converse with the bot in natural language with multiple steps invoking external APIs to accomplish subtasks. The following diagram illustrates the sample process flow.

Sequence diagram for use case

The following diagram depicts a high-level architecture of this solution.

Solution architecture diagram

  1. You can create an agent with Amazon Bedrock-supported FMs such as Anthropic Claude V2.
  2. Attach API schema, residing in an Amazon Simple Storage Service (Amazon S3) bucket, and a Lambda function containing the business logic to the agent. (Note: This is a one-time setup step.)
  3. The agent uses customer requests to create a prompt using the ReAct framework. It, then, uses the API schema to invoke corresponding code in the Lambda function.
  4. You can perform a variety of tasks, including sending email notifications, writing to databases, and triggering application APIs in the Lambda functions.

In this post, we use the Lambda function to retrieve customer details, list shoes matching customer-preferred activity, and finally, place orders. Our code is backed by an in-memory SQLite database. You can use similar constructs to write to a persistent data store.

Prerequisites

To implement the solution provided in this post, you should have an AWS account and access to Amazon Bedrock with agents enabled (currently in preview). Use AWS CloudFormation template to create the resource stack needed for the solution.

us-east-1 CloudFormation stack

The CloudFormation template creates two IAM roles. Update these roles to apply least-privilege permissions as discussed in Security best practices. Click here to learn what IAM features are available to use with agents for Amazon Bedrock.

  1. LambdaBasicExecutionRole with Amazon S3 full access and CloudWatch access for logging.
  2. AmazonBedrockExecutionRoleForAgents with Amazon S3 full access and Lambda full access.

Important: Agents for Amazon Bedrock must have the role name prefixed by AmazonBedrockExecutionRoleForAgents_*

Bedrock Agents setup

In the next two sections, we will walk you through creating and testing an agent.

Create an agent for Amazon Bedrock

To create an agent, open the Amazon Bedrock console and choose Agents in the left navigation pane. Then select Create Agent.

This starts the agent creation workflow.

  1. Provide agent details: Give the agent a name and description (optional). Select the service role created by the CloudFormation stack and select Next.

Agent details

  1. Select a foundation model: In the Select model screen, you select a model. Provide clear and precise instructions to the agent about what tasks to perform and how to interact with the users.

Select foundation model

  1. Add action groups: An action is a task the agent can perform by making API calls. A set of actions comprise an action group. You provide an API schema that defines all the APIs in the action group. You must provide an API schema in the OpenAPI schema JSON format. The Lambda function contains the business logic needed to perform API calls. You must associate a Lambda function to each action group.

Give the action group a name and a description for the action. Select the Lambda function, provide an API schema file and select Next.

Agent action groups

  1. In the final step, review the agent configuration and select Create Agent.

Test and deploy agents for Amazon Bedrock

  1. Test the agent: After the agent is created, a dialog box shows the agent overview along with a working draft. The Amazon Bedrock console provides a UI to test your agent.

  1. Deploy: After successful testing, you can deploy your agent. To deploy an agent in your application, you must create an alias. Amazon Bedrock then automatically creates a version for that alias.

The following actions occur with the preceding agent setup and the Lambda code provided with this post:

  1. The agent creates a prompt from the developer-provided instructions (such as “You are an agent that helps customers purchase shoes.”), API schemas needed to complete the tasks, and data source details. The automatic prompt creation saves weeks of experimenting with prompts for different FMs.
  2. The agent orchestrates the user-requested task, such as “I am looking for shoes,” by breaking it into smaller subtasks such as getting customer details, matching the customer-preferred activity with shoe activity, and placing shoe orders. The agent determines the right sequence of tasks and handles error scenarios along the way.

The following screenshot displays some example responses from the agent.

Agent sample responses

By selecting Show trace for each response, a dialog box shows the reasoning technique used by the agent and the final response generated by the FM.

Agent trace1

Agent trace2

Agent trace3

Cleanup

To avoid incurring future charges, delete the resources. You can do this by deleting the stack from the CloudFormation console.

Delete CloudFormation stack

Feel free to download and test the code used in this post from the GitHub agents for Amazon Bedrock repository. You can also invoke the agents for Amazon Bedrock programmatically; an example Jupyter Notebook is provided in the repository.

Conclusion

Agents for Amazon Bedrock can help you increase productivity, improve your customer service experience, or automate DevOps tasks. In this post, we showed you how to set up agents for Amazon Bedrock to create a customer service bot.

We encourage you to learn more by reviewing additional features of Amazon Bedrock. You can use the example code provided in this post to create your implementation. Try our workshop to gain hands-on experience with Amazon Bedrock.


About the Authors

Amit AroraAmit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Manju PrasadManju Prasad is a Senior Solutions Architect within Strategic Accounts at Amazon Web Services. She focuses on providing technical guidance in a variety of domains, including AI/ML to a marquee M&E customer. Prior to joining AWS, she has worked for companies in the Financial Services sector and also a startup.

Archana InapudiArchana Inapudi is a Senior Solutions Architect at AWS supporting Strategic Customers. She has over a decade of experience helping customers design and build data analytics, and database solutions. She is passionate about using technology to provide value to customers and achieve business outcomes.

Read More

NVIDIA and Scaleway Speed Development for European Startups and Enterprises

NVIDIA and Scaleway Speed Development for European Startups and Enterprises

Europe’s startup ecosystem is getting a boost of accelerated computing for generative AI.

NVIDIA and cloud service provider (CSP) Scaleway are working together to deliver access to GPUs, NVIDIA AI Enterprise software, and services for turbocharging large language models (LLMs) and generative AI development for European startups.

Scaleway, a subsidiary of French telecommunications provider iliad Group, is offering cloud credits for access to its AI supercomputer cluster, which packs 1,016 NVIDIA H100 Tensor Core GPUs. As a regional CSP, Scaleway also provides sovereign infrastructure that ensures access and compliance with EU data protection laws — critical to businesses with a European footprint.

Sovereign Cloud, Generative AI 

Complying with regulations governing how data and metadata can be stored in cloud computing is critical. When doing business in Europe, U.S. companies, for example, need to comply with EU regulations on sovereignty to secure data against access from foreign adversaries or entities. Noncompliance risks data vulnerabilities, financial penalties and legal consequences.

Regional CSPs like Scaleway provide a strategic path forward for companies to do business in Europe with a sovereign infrastructure. iliad Group’s data centers, where Scaleway operates, are fortified by compliance certifications that ensure data security, covering key aspects like healthcare, public safety, governance and public service activities.

Delivering Sovereign Accelerated Computing 

NVIDIA is working with Scaleway to expand access to sovereign accelerated computing in the EU, enabling companies to deploy AI applications and scale up faster.   

Through the NVIDIA Inception program, startups already relying on the sovereign cloud computing capabilities of Scaleway’s NVIDIA-accelerated infrastructure include Hugging Face, with more to come. Inception is a free global program that provides technical guidance, training, discounts and networking opportunities.

Inception member Hugging Face, based in New York and with operations in France, creates tools and resources to help developers build, deploy and train AI models.

“AI is the new way of building technology, and making the fastest AI accelerators accessible within regional clouds is key to democratizing AI across the world, enabling enterprises and startups to build the experiences of tomorrow,” said Jeff Boudier, head of product at Hugging Face. “I’m really excited that selected French startups will be able to access NVIDIA H100 GPUs in Scaleway’s cluster through the new startup program Scaleway and Hugging Face just announced with Meta and Station F.”

H100 and NVIDIA AI to Scale 

Scaleway’s newly available Nabuchodonosor supercomputer, an NVIDIA DGX SuperPOD with 127 NVIDIA DGX H100 systems, will help startups in France and across Europe scale up AI workloads.

Regional Inception members will also be able to access NVIDIA AI Enterprise software on Scaleway Marketplace, including the NVIDIA NeMo framework and pretrained models for building LLMs, NVIDIA RAPIDS for accelerated data science, and NVIDIA Triton Inference Server and NVIDIA TensorRT-LLM for boosting inference.

NVIDIA Inception Services on Tap

NVIDIA Inception has more than 4,000 members across Europe. Member companies of Scaleway’s own startup program are eligible to join Inception for benefits and resources. Scaleway is earmarking companies to fast-track for Inception membership.

Inception members gain access to cloud computing credits, NVIDIA Deep Learning Institute courses, technology experts, preferred pricing on hardware and software, guidance on the latest software development kits and AI frameworks, as well as opportunities for matchmaking with investors.

Read More

AI Training AI: GatorTronGPT at the Forefront of University of Florida’s Medical AI Innovations

AI Training AI: GatorTronGPT at the Forefront of University of Florida’s Medical AI Innovations

How do you train an AI to understand clinical language with less clinical data? Train another AI to synthesize training data.

Artificial intelligence is changing the way medicine is done, and is increasingly being used in all sorts of clinical tasks.

This is fueled by generative AI and models like GatorTronGPT, a generative language model trained on the University of Florida’s HiPerGator AI supercomputer and detailed in a paper published in Nature Digital Medicine Thursday.

GatorTronGPT joins a growing number of large language models (LLMs) trained on clinical data. Researchers trained the model using the GPT-3 framework, also used by ChatGPT.

They used a massive corpus of 277 billion words for this purpose. The training corpora included 82 billion words from de-identified clinical notes and 195 billion words from various English texts.

But there’s a twist: The research team also used GatorTronGPT to generate a synthetic clinical text corpus with over 20 billion words of synthetic clinical text, with carefully prepared prompts. The synthetic clinical text focuses on clinical factors and reads just like real clinical notes written by doctors.

This synthetic data was then used to train a BERT-based model called GatorTron-S.

In a comparative evaluation, GatorTron-S exhibited remarkable performance on clinical natural language understanding tasks like clinical concept extraction and medical relation extraction, beating the records set by the original BERT-based model, GatorTron-OG, which was trained on the 82-billion-word clinical dataset.

More impressively, it was able to do so using less data.

Both GatorTron-OG and GatorTron-S models were trained on 560 NVIDIA A100 Tensor Core GPUs running NVIDIA’s Megatron-LM package on the University of Florida’s HiPerGator supercomputer. Technology from the Megatron LM framework used in the project has since been incorporated with the NVIDIA NeMo framework, which has been central to more recent work on GatorTronGPT.

Using synthetic data created by LLMs addresses several challenges. LLMs require vast amounts of data, and there’s a limited availability of quality medical data.

In addition, synthetic data allows for model training that complies with medical privacy regulations, such as HIPAA.

The work with GatorTronGPT is just the latest example of how LLMs — which exploded onto the scene last year with the rapid adoption of ChatGPT — can be tailored to assist in a growing number of fields.

It’s also an example of the advances made possible by new AI techniques powered by accelerated computing.

The GatorTronGPT effort is the latest result of an ambitious collaboration announced in 2020, when the University of Florida and NVIDIA unveiled plans to erect the world’s fastest AI supercomputer in academia.

This initiative was driven by a $50 million gift, a fusion of contributions from NVIDIA founder Chris Malachowsky and NVIDIA itself.

Using AI to train more AI is just one example of HiPerGator’s impact, with the supercomputer promising to power more innovations in medical sciences and across disciplines throughout the University of Florida system.

Read More

Responsible AI at Google Research: Adversarial testing for generative AI safety

Responsible AI at Google Research: Adversarial testing for generative AI safety

The Responsible AI and Human-Centered Technology (RAI-HCT) team within Google Research is committed to advancing the theory and practice of responsible human-centered AI through a lens of culturally-aware research, to meet the needs of billions of users today, and blaze the path forward for a better AI future. The BRAIDS (Building Responsible AI Data and Solutions) team within RAI-HCT aims to simplify the adoption of RAI practices through the utilization of scalable tools, high-quality data, streamlined processes, and novel research with a current emphasis on addressing the unique challenges posed by generative AI (GenAI).

GenAI models have enabled unprecedented capabilities leading to a rapid surge of innovative applications. Google actively leverages GenAI to enhance its products’ utility and to improve lives. While enormously beneficial, GenAI also presents risks for disinformation, bias, and security. In 2018, Google pioneered the AI Principles, emphasizing beneficial use and prevention of harm. Since then, Google has focused on effectively implementing our principles in Responsible AI practices through 1) a comprehensive risk assessment framework, 2) internal governance structures, 3) education, empowering Googlers to integrate AI Principles into their work, and 4) the development of processes and tools that identify, measure, and analyze ethical risks throughout the lifecycle of AI-powered products. The BRAIDS team focuses on the last area, creating tools and techniques for identification of ethical and safety risks in GenAI products that enable teams within Google to apply appropriate mitigations.

What makes GenAI challenging to build responsibly?

The unprecedented capabilities of GenAI models have been accompanied by a new spectrum of potential failures, underscoring the urgency for a comprehensive and systematic RAI approach to understanding and mitigating potential safety concerns before the model is made broadly available. One key technique used to understand potential risks is adversarial testing, which is testing performed to systematically evaluate the models to learn how they behave when provided with malicious or inadvertently harmful inputs across a range of scenarios. To that end, our research has focused on three directions:

  1. Scaled adversarial data generation
    Given the diverse user communities, use cases, and behaviors, it is difficult to comprehensively identify critical safety issues prior to launching a product or service. Scaled adversarial data generation with humans-in-the-loop addresses this need by creating test sets that contain a wide range of diverse and potentially unsafe model inputs that stress the model capabilities under adverse circumstances. Our unique focus in BRAIDS lies in identifying societal harms to the diverse user communities impacted by our models.
  2. Automated test set evaluation and community engagement
    Scaling the testing process so that many thousands of model responses can be quickly evaluated to learn how the model responds across a wide range of potentially harmful scenarios is aided with automated test set evaluation. Beyond testing with adversarial test sets, community engagement is a key component of our approach to identify “unknown unknowns” and to seed the data generation process.
  3. Rater diversity
    Safety evaluations rely on human judgment, which is shaped by community and culture and is not easily automated. To address this, we prioritize research on rater diversity.

Scaled adversarial data generation

High-quality, comprehensive data underpins many key programs across Google. Initially reliant on manual data generation, we’ve made significant strides to automate the adversarial data generation process. A centralized data repository with use-case and policy-aligned prompts is available to jump-start the generation of new adversarial tests. We have also developed multiple synthetic data generation tools based on large language models (LLMs) that prioritize the generation of data sets that reflect diverse societal contexts and that integrate data quality metrics for improved dataset quality and diversity.

Our data quality metrics include:

  • Analysis of language styles, including query length, query similarity, and diversity of language styles.
  • Measurement across a wide range of societal and multicultural dimensions, leveraging datasets such as SeeGULL, SPICE, the Societal Context Repository.
  • Measurement of alignment with Google’s generative AI policies and intended use cases.
  • Analysis of adversariality to ensure that we examine both explicit (the input is clearly designed to produce an unsafe output) and implicit (where the input is innocuous but the output is harmful) queries.

One of our approaches to scaled data generation is exemplified in our paper on AI-Assisted Red Teaming (AART). AART generates evaluation datasets with high diversity (e.g., sensitive and harmful concepts specific to a wide range of cultural and geographic regions), steered by AI-assisted recipes to define, scope and prioritize diversity within an application context. Compared to some state-of-the-art tools, AART shows promising results in terms of concept coverage and data quality. Separately, we are also working with MLCommons to contribute to public benchmarks for AI Safety.

Adversarial testing and community insights

Evaluating model output with adversarial test sets allows us to identify critical safety issues prior to deployment. Our initial evaluations relied exclusively on human ratings, which resulted in slow turnaround times and inconsistencies due to a lack of standardized safety definitions and policies. We have improved the quality of evaluations by introducing policy-aligned rater guidelines to improve human rater accuracy, and are researching additional improvements to better reflect the perspectives of diverse communities. Additionally, automated test set evaluation using LLM-based auto-raters enables efficiency and scaling, while allowing us to direct complex or ambiguous cases to humans for expert rating.

Beyond testing with adversarial test sets, gathering community insights is vital for continuously discovering “unknown unknowns”. To provide high quality human input that is required to seed the scaled processes, we partner with groups such as the Equitable AI Research Round Table (EARR), and with our internal ethics and analysis teams to ensure that we are representing the diverse communities who use our models. The Adversarial Nibbler Challenge engages external users to understand potential harms of unsafe, biased or violent outputs to end users at scale. Our continuous commitment to community engagement includes gathering feedback from diverse communities and collaborating with the research community, for example during The ART of Safety workshop at the Asia-Pacific Chapter of the Association for Computational Linguistics Conference (IJCNLP-AACL 2023) to address adversarial testing challenges for GenAI.

Rater diversity in safety evaluation

Understanding and mitigating GenAI safety risks is both a technical and social challenge. Safety perceptions are intrinsically subjective and influenced by a wide range of intersecting factors. Our in-depth study on demographic influences on safety perceptions explored the intersectional effects of rater demographics (e.g., race/ethnicity, gender, age) and content characteristics (e.g., degree of harm) on safety assessments of GenAI outputs. Traditional approaches largely ignore inherent subjectivity and the systematic disagreements among raters, which can mask important cultural differences. Our disagreement analysis framework surfaced a variety of disagreement patterns between raters from diverse backgrounds including also with “ground truth” expert ratings. This paves the way to new approaches for assessing quality of human annotation and model evaluations beyond the simplistic use of gold labels. Our NeurIPS 2023 publication introduces the DICES (Diversity In Conversational AI Evaluation for Safety) dataset that facilitates nuanced safety evaluation of LLMs and accounts for variance, ambiguity, and diversity in various cultural contexts.

Summary

GenAI has resulted in a technology transformation, opening possibilities for rapid development and customization even without coding. However, it also comes with a risk of generating harmful outputs. Our proactive adversarial testing program identifies and mitigates GenAI risks to ensure inclusive model behavior. Adversarial testing and red teaming are essential components of a Safety strategy, and conducting them in a comprehensive manner is essential. The rapid pace of innovation demands that we constantly challenge ourselves to find “unknown unknowns” in cooperation with our internal partners, diverse user communities, and other industry experts.

Read More