Implement serverless semantic search of image and live video with Amazon Titan Multimodal Embeddings

Implement serverless semantic search of image and live video with Amazon Titan Multimodal Embeddings

In today’s data-driven world, industries across various sectors are accumulating massive amounts of video data through cameras installed in their warehouses, clinics, roads, metro stations, stores, factories, or even private facilities. This video data holds immense potential for analysis and monitoring of incidents that may occur in these locations. From fire hazards to broken equipment, theft, or accidents, the ability to analyze and understand this video data can lead to significant improvements in safety, efficiency, and profitability for businesses and individuals.

This data allows for the derivation of valuable insights when combined with a searchable index. However,traditional video analysis methods often rely on manual, labor-intensive processes, making it challenging to scale and efficient. In this post, we introduce semantic search, a technique to find incidents in videos based on natural language descriptions of events that occurred in the video. For example, you could search for “fire in the warehouse” or “broken glass on the floor.” This is where multi-modal embeddings come into play. We introduce the use of the Amazon Titan Multimodal Embeddings model, which can map visual as well as textual data into the same semantic space, allowing you to use textual description and find images containing that semantic meaning. This semantic search technique allows you to analyze and understand frames from video data more effectively.

We walk you through constructing a scalable, serverless, end-to-end semantic search pipeline for surveillance footage with Amazon Kinesis Video Streams, Amazon Titan Multimodal Embeddings on Amazon Bedrock, and Amazon OpenSearch Service. Kinesis Video Streams makes it straightforward to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing. It enables real-time video ingestion, storage, encoding, and streaming across devices. Amazon Bedrock is a fully managed service that provides access to a range of high-performing foundation models from leading AI companies through a single API. It offers the capabilities needed to build generative AI applications with security, privacy, and responsible AI. Amazon Titan Multimodal Embeddings, available through Amazon Bedrock, enables more accurate and contextually relevant multimodal search. It processes and generates information from distinct data types like text and images. You can submit text, images, or a combination of both as input to use the model’s understanding of multimodal content. OpenSearch Service is a fully managed service that makes it straightforward to deploy, scale, and operate OpenSearch. OpenSearch Service allows you to store vectors and other data types in an index, and offers sub second query latency even when searching billions of vectors and measuring the semantical relatedness, which we use in this post.

We discuss how to balance functionality, accuracy, and budget. We include sample code snippets and a GitHub repo so you can start experimenting with building your own prototype semantic search solution.

Overview of solution

The solution consists of three components:

  • First, you extract frames of a live stream with the help of Kinesis Video Streams (you can optionally extract frames of an uploaded video file as well using an AWS Lambda function). These frames can be stored in an Amazon Simple Storage Service (Amazon S3) bucket as files for later processing, retrieval, and analysis.
  • In the second component, you generate an embedding of the frame using Amazon Titan Multimodal Embeddings. You store the reference (an S3 URI) to the actual frame and video file, and the vector embedding of the frame in OpenSearch Service.
  • Third, you accept a textual input from the user to create an embedding using the same model and use the API provided to query your OpenSearch Service index for images using OpenSearch’s intelligent vector search capabilities to find images that are semantically similar to your text based on the embeddings generated by the Amazon Titan Multimodal Embeddings model.

This solution uses Kinesis Video Streams to handle any volume of streaming video data without consumers provisioning or managing any servers. Kinesis Video Streams automatically extracts images from video data in real time and delivers the images to a specified S3 bucket. Alternatively, you can use a serverless Lambda function to extract frames of a stored video file with the Python OpenCV library.

The second component converts these extracted frames into vector embeddings directly by calling the Amazon Bedrock API with Amazon Titan Multimodal Embeddings.

Embeddings are a vector representation of your data that capture semantic meaning. Generating embeddings of text and images using the same model helps you measure the distance between vectors to find semantic similarities. For example, you can embed all image metadata and additional text descriptions into the same vector space. Close vectors indicate that the images and text are semantically related. This allows for semantic image search—given a text description, you can find relevant images by retrieving those with the most similar embeddings, as represented in the following visualization.

Visualisation of text and image embeddings

Starting December 2023, you can use the Amazon Titan Multimodal Embeddings model for use cases like searching images by text, image, or a combination of text and image. It produces 1,024-dimension vectors (by default), enabling highly accurate and fast search capabilities. You can also configure smaller vector sizes to optimize for cost vs. accuracy. For more information, refer to Amazon Titan Multimodal Embeddings G1 model.

The following diagram visualizes the conversion of a picture to a vector representation. You split the video files into frames and save them in a S3 bucket (Step 1). The Amazon Titan Multimodal Embeddings model converts these frames into vector embeddings (Step 2). You store the embeddings of the video frame as a k-nearest neighbors (k-NN) vector in your OpenSearch Service index with the reference to the video clip and the frame in the S3 bucket itself (Step 3). You can add additional descriptions in an additional field.

Conversion of a picture to a vector representation

The following diagram visualizes the semantic search with natural language processing (NLP). The third component allows you to submit a query in natural language (Step 1) for specific moments or actions in a video, returning a list of references to frames that are semantically similar to the query. The Amazon Titan MultimodalEmbeddings model (Step 2) converts the submitted text query into a vector embedding (Step 3). You use this embedding to look up the most similar embeddings (Step 4). The stored references in the returned results are used to retrieve the frames and video clip to the UI for replay (Step 5).

semantic search with natural language processing

The following diagram shows our solution architecture.

Solution Architecture

The workflow consists of the following steps:

  1. You stream live video to Kinesis Video Streams. Alternatively, upload existing video clips to an S3 bucket.
  2. Kinesis Video Streams extracts frames from the live video to an S3 bucket. Alternatively, a Lambda function extracts frames of the uploaded video clips.
  3. Another Lambda function collects the frames and generates an embedding with Amazon Bedrock.
  4. The Lambda function inserts the reference to the image and video clip together with the embedding as a k-NN vector into an OpenSearch Service index.
  5. You submit a query prompt to the UI.
  6. A new Lambda function converts the query to a vector embedding with Amazon Bedrock.
  7. The Lambda function searches the OpenSearch Service image index for any frames matching the query and the k-NN for the vector using cosine similarity and returns a list of frames.
  8. The UI displays the frames and video clips by retrieving the assets from Kinesis Video Streams using the saved references of the returned results. Alternatively, the video clips are retrieved from the S3 bucket.

This solution was created with AWS Amplify. Amplify is a development framework and hosting service that assists frontend web and mobile developers in building secure and scalable applications with AWS tools quickly and efficiently.

Optimize for functionality, accuracy, and cost

Let’s conduct an analysis of this proposed solution architecture to determine opportunities for enhancing functionality, improving accuracy, and reducing costs.

Starting with the ingestion layer, refer to Design considerations for cost-effective video surveillance platforms with AWS IoT for Smart Homes to learn more about cost-effective ingestion into Kinesis Video Streams.

The extraction of video frames in this solution is configured using Amazon S3 delivery with Kinesis Video Streams. A key trade-off to evaluate is determining the optimal frame rate and resolution to meet the use case requirements balanced with overall system resource utilization. The frame extraction rate can range from as high as five frames per second to as low as one frame every 20 seconds. The choice of frame rate can be driven by the business use case, which directly impacts embedding generation and storage in downstream services like Amazon Bedrock, Lambda, Amazon S3, and the Amazon S3 delivery feature, as well as searching within the vector database. Even when uploading pre-recorded videos to Amazon S3, thoughtful consideration should still be given to selecting an appropriate frame extraction rate and resolution. Tuning these parameters allows you to balance your use case accuracy needs with consumption of the mentioned AWS services.

The Amazon Titan Multimodal Embeddings model outputs a vector representation with an default embedding length of 1,024 from the input data. This representation carries the semantic meaning of the input and is best to compare with other vectors for optimal similarity. For best performance, it’s recommended to use the default embedding length, but it can have direct impact on performance and storage costs. To increase performance and reduce costs in your production environment, alternate embedding lengths can be explored, such as 256 and 384. Reducing the embedding length also means losing some of the semantic context, which has a direct impact on accuracy, but improves the overall speed and optimizes the storage costs.

OpenSearch Service offers on-demand, reserved, and serverless pricing options with general purpose or storage optimized machine types to fit different workloads. To optimize costs, you should select reserved instances to cover your production workload base, and use on-demand, serverless, and convertible reservations to handle spikes and non-production loads. For lower-demand production workloads, a cost-friendly alternate option is using pgvector with Amazon Aurora PostgreSQL Serverless, which offers lower base consumption units as compared to Amazon OpenSearch Serverless, thereby lowering the cost.

Determining the optimal value of K in the k-NN algorithm for vector similarity search is significant for balancing accuracy, performance, and cost. A larger K value generally increases accuracy by considering more neighboring vectors, but comes at the expense of higher computational complexity and cost. Conversely, a smaller K leads to faster search times and lower costs, but may lower result quality. When using the k-NN algorithm with OpenSearch Service, it’s essential to carefully evaluate the K parameter based on your application’s priorities—starting with smaller values like K=5 or 10, then iteratively increasing K if higher accuracy is needed.

As part of the solution, we recommend Lambda as the serverless compute option to process frames. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration. Lambda takes care of everything required to run and scale your code with high availability.

With high amounts of video data, you should consider binpacking your frame processing tasks and running a batch computing job to access a large amount of compute resources. The combination of AWS Batch and Amazon Elastic Container Service (Amazon ECS) can efficiently provision resources in response to jobs submitted in order to eliminate capacity constraints, reduce compute costs, and deliver results quickly.

You will incur costs when deploying the GitHub repo in your account. When you are finished examining the example, follow the steps in the Clean up section later in this post to delete the infrastructure and stop incurring charges.

Refer to the README file in the repository to understand the building blocks of the solution in detail.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Deploy the Amplify application

Complete the following steps to deploy the Amplify application:

  1. Clone the repository to your local disk with the following command:
    git clone https://github.com/aws-samples/Serverless-Semantic-Video-Search-Vector-Database-and-a-Multi-Modal-Generative-Al-Embeddings-Model

  2. Change the directory to the cloned repository.
  3. Initialize the Amplify application:
    amplify init

  4. Clean install the dependencies of the web application:
    npm ci

  5. Create the infrastructure in your AWS account:
    amplify push

  6. Run the web application in your local environment:
    npm run dev

Create an application account

Complete the following steps to create an account in the application:

  1. Open the web application with the stated URL in your terminal.
  2. Enter a user name, password, and email address.
  3. Confirm your email address with the code sent to it.

Upload files from your computer

Complete the following steps to upload image and video files stored locally:

  1. Choose File Upload in the navigation pane.
  2. Choose Choose files.
  3. Select the images or videos from your local drive.
  4. Choose Upload Files.

Upload files from a webcam

Complete the following steps to upload images and videos from a webcam:

  1. Choose Webcam Upload in the navigation pane.
  2. Choose Allow when asked for permissions to access your webcam.
  3. Choose to either upload a single captured image or a captured video:
    1. Choose Capture Image and Upload Image to upload a single image from your webcam.
    2. Choose Start Video Capture, Stop Video Capture, and finally
      Upload Video to upload a video from your webcam.

Search videos

Complete the following steps to search the files and videos you uploaded.

  1. Choose Search in the navigation pane.
  2. Enter your prompt in the Search Videos text field. For example, we ask “Show me a person with a golden ring.”
  3. Lower the confidence parameter closer to 0 if you see fewer results than you were originally expecting.

The following screenshot shows an example of our results.

Example of results

Clean up

Complete the following steps to clean up your resources:

  1. Open a terminal in the directory of your locally cloned repository.
  2. Run the following command to delete the cloud and local resources:
    amplify delete

Conclusion

A multi-modal embeddings model has the potential to revolutionize the way industries analyze incidents captured with videos. AWS services and tools can help industries unlock the full potential of their video data and improve their safety, efficiency, and profitability. As the amount of video data continues to grow, the use of multi-modal embeddings will become increasingly important for industries looking to stay ahead of the curve. As innovations like Amazon Titan foundation models continue maturing, they will reduce the barriers to use advanced ML and simplify the process of understanding data in context. To stay updated with state-of-the-art functionality and use cases, refer to the following resources:


About the Authors

Thorben Sanktjohanser is a Solutions Architect at Amazon Web Services supporting media and entertainment companies on their cloud journey with his expertise. He is passionate about IoT, AI/ML and building smart home devices. Almost every part of his home is automated, from light bulbs and blinds to vacuum cleaning and mopping.

Talha Chattha is an AI/ML Specialist Solutions Architect at Amazon Web Services, based in Stockholm, serving key customers across EMEA. Talha holds a deep passion for generative AI technologies. He works tirelessly to deliver innovative, scalable, and valuable ML solutions in the space of large language models and foundation models for his customers. When not shaping the future of AI, he explores scenic European landscapes and delicious cuisines.

Victor Wang is a Sr. Solutions Architect at Amazon Web Services, based in San Francisco, CA, supporting innovative healthcare startups. Victor has spent 6 years at Amazon; previous roles include software developer for AWS Site-to-Site VPN, AWS ProServe Consultant for Public Sector Partners, and Technical Program Manager for Amazon RDS for MySQL. His passion is learning new technologies and traveling the world. Victor has flown over a million miles and plans to continue his eternal journey of exploration.

Akshay Singhal is a Sr. Technical Account Manager at Amazon Web Services, based in San Francisco Bay Area, supporting enterprise support customers focusing on the security ISV segment. He provides technical guidance for customers to implement AWS solutions, with expertise spanning serverless architectures and cost-optimization. Outside of work, Akshay enjoys traveling, Formula 1, making short movies, and exploring new cuisines.

Read More

Prioritizing employee well-being: An innovative approach with generative AI and Amazon SageMaker Canvas

Prioritizing employee well-being: An innovative approach with generative AI and Amazon SageMaker Canvas

In today’s fast-paced corporate landscape, employee mental health has become a crucial aspect that organizations can no longer overlook. Many companies recognize that their greatest asset lies in their dedicated workforce, and each employee plays a vital role in collective success. As such, promoting employee well-being by creating a safe, inclusive, and supportive environment is of utmost importance.

However, quantifying and assessing mental health can be a daunting task. Traditional methods like employee well-being surveys or manual approaches may not always provide the most accurate or actionable insights. In this post, we explore an innovative solution that uses Amazon SageMaker Canvas for mental health assessment at the workplace.

We delve into the following topics:

  • The importance of mental health in the workplace
  • An overview of the SageMaker Canvas low-code no-code platform for building machine learning (ML) models
  • The mental health assessment model:
    • Data preparation using the chat feature
    • Training the model on SageMaker Canvas
    • Model evaluation and performance metrics
  • Deployment and integration:
    • Deploying the mental health assessment model
    • Integrating the model into workplace wellness programs or HR systems

In this post, we use a dataset from a 2014 survey that measures attitudes towards mental health and frequency of mental health disorders in the tech workplace, then we aggregate and prepare data for an ML model using Amazon SageMaker Data Wrangler for a tabular dataset on SageMaker Canvas. Then we train, build, test, and deploy the model using SageMaker Canvas, without writing any code.

Discover how SageMaker Canvas can revolutionize the way organizations approach employee mental health assessment, empowering them to create a more supportive and productive work environment. Stay tuned for insightful content that could reshape the future of workplace well-being.

Importance of mental health

Maintaining good mental health in the workplace is crucial for both employees and employers. In today’s fast-paced and demanding work environment, the mental well-being of employees can have a significant impact on productivity, job satisfaction, and overall company success. At Amazon, where innovation and customer obsession are at the core of our values, we understand the importance of fostering a mentally healthy workforce.

By prioritizing the mental well-being of our employees, we create an environment where they can thrive and contribute their best. This helps us deliver exceptional products and services. Amazon supports mental health by providing access to resources and support services. All U.S. employees and household members are eligible to receive five free counseling sessions, per issue every year, via Amazon’s Global Employee Assistance Program (EAP), Resources for Living. Employees can also access mental health care 24/7 through a partnership with the app Twill—a digital, self-guided mental health program. Amazon also partners with Brightline, a leading provider in virtual mental health support for children and teens.

Solution overview

SageMaker Canvas brings together a broad set of capabilities to help data professionals prepare, build, train, and deploy ML models without writing any code. SageMaker Data Wrangler has also been integrated into SageMaker Canvas, reducing the time it takes to import, prepare, transform, featurize, and analyze data. In a single visual interface, you can complete each step of a data preparation workflow: data selection, cleansing, exploration, visualization, and processing. Custom Spark commands can also expand the over 300 built-in data transformations. The built-in Data Quality and Insights report guides you in performing appropriate data cleansing, verifying data quality, and detecting anomalies such as duplicate rows and target leakage. Other analyses are also available to help you visualize and understand your data.

In this post, we try to understand the factors contributing to the mental health of an employee in the tech industry in a systematic manner. We begin by understanding the feature columns, presented in the following table.

Survey Attribute Survey Attribute Description
Timestamp Timestamp when survey was taken
Age Age of person taking survey
Gender Gender of person taking survey
Country Country of person taking survey
state If you live in the United States, which state or territory do you live in?
self_employed Are you self-employed?
family_history Do you have a family history of mental illness?
treatment Have you sought treatment for a mental health condition?
work_interfere If you have a mental health condition, do you feel that it interferes with your work?
no_employees How many employees does your company or organization have?
remote_work Do you work remotely (outside of an office) at least 50% of the time?
tech_company Is your employer primarily a tech company/organization?
benefits Does your employer provide mental health benefits?
care_options Do you know the options for mental health care your employer provides?
wellness_program Has your employer ever discussed mental health as part of an employee wellness program?
seek_help Does your employer provide resources to learn more about mental health issues and how to seek help?
anonymity Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?
leave How easy is it for you to take medical leave for a mental health condition?
mentalhealthconsequence Do you think that discussing a mental health issue with your employer would have negative consequences?
physhealthconsequence Do you think that discussing a physical health issue with your employer would have negative consequences?
coworkers Would you be willing to discuss a mental health issue with your coworkers?
physhealthinterview Would you bring up a physical health issue with a potential employer in an interview?
mentalvsphysical Do you feel that your employer takes mental health as seriously as physical health?
obs_consequence Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?
comments Any additional notes or comments

Prerequisites

You should complete the following prerequisites before building this model:

Log in to SageMaker Canvas

When the initial setup is complete, you can access SageMaker Canvas with any of the following methods, depending on your environment’s setup:

Import the dataset into SageMaker Canvas

In SageMaker Canvas, you can see quick actions to get started building and using ML and generative artificial intelligence (AI) models, with a no code platform. Feel free to explore any of the out-of-the-box models.

We start from creating a data flow. A data flow in SageMaker Canvas is used to build a data preparation pipeline that can be scheduled to automatically import, prepare, and feed into a model build. With a data flow, you can prepare data using generative AI, over 300 built-in transforms, or custom Spark commands.

Complete the following steps:

  • Choose Prepare and analyze data.
  • For Data flow name, enter a name (for example, AssessingMentalHealthFlow).
  • Choose Create.

SageMaker Data Wrangler will open.

You can import data from multiple sources, ranging from AWS services, such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift, to third-party or partner services, including Snowflake or Databricks. To learn more about importing data to SageMaker Canvas, see Import data into Canvas.

  • Choose Import data, then choose Tabular.
  • Upload the dataset you downloaded in the prerequisites section.

After a successful import, you will be presented with a preview of the data, which you can browse.

  • Choose Import data to finish this step.

Run a Data Quality and Insights report

After you import the dataset, the SageMaker Data Wrangler data flow will open. You can run a Data Quality and Insights Report, which will perform an analysis of the data to determine potential issues to address during data preparation. Complete the following steps:

  • Choose Run Data quality and insights report.

  • For Analysis name, enter a name.
  • For Target column, choose treatment.
  • For Problem type, select Classification.
  • For Data size, choose Sampled dataset.
  • Choose Create.

You are presented with the generated report, which details any high priority warnings, data issues, and other insights to be aware of as you add data transformations and move along the model building process.

In this specific dataset, we can see that there are 27 features of different types, very little missing data, and no duplicates. To dive deeper into the report, refer to Get Insights On Data and Data Quality. To learn about other available analyzes, see Analyze and Visualize.

Prepare your data

As expected in the ML process, your dataset may require transformations to address issues such as missing values, outliers, or perform feature engineering prior to model building. SageMaker Canvas provides ML data transforms to clean, transform, and prepare your data for model building without having to write code. The transforms used are added to the model recipe, a record of the data preparation done on your data before building the model. You can refer to these advanced transformations and add them as transformation steps within your Data Wrangler flow.

Alternatively, you can use SageMaker Canvas to chat with your data and add transformations. We explore this option with some examples on our sample dataset.

Use the chat feature for exploratory analysis and building transformations

Before you use the chat feature to prepare data, note the following:

  • Chat for data prep requires the AmazonSageMakerCanvasAIServicesAccess policy. For more information, see AWS managed policy: AmazonSageMakerCanvasAIServicesAccess.
  • Chat for data prep requires access to Amazon Bedrock and the Anthropic Claude v2 model within it. For more information, see Model access.
  • You must run SageMaker Canvas data prep in the same AWS Region as the Region where you’re running your model. Chat for data prep is available in the US East (N. Virginia), US West (Oregon), and Europe (Frankfurt) Regions.

To chat with your data, complete the following steps:

  • Open your SageMaker Canvas data flow.
  • Open your dataset by choosing Source or Data types.

  • Choose Chat for data prep and specify your prompts in the chat window.

  • Optionally, if an analysis has been generated by your query, choose Add to analyses to reference it for later.
  • Optionally, if you’ve transformed your data using a prompt, do the following:
  1. Choose Preview to view the results.
  2. Optionally modify the code in the transform and choose Update.
  3. If you’re happy with the results of the transform, choose Add to steps to add it to the steps pane.

Let’s try a few exploratory analyses and transformations through the chat feature.

In the following example, we ask “How many rows does the dataset have?”

In the following example, we drop the columns Timestamp, Country, state, and comments, because these features will have least impact for classification of our model. Choose View code to see the generated Spark code that performs the transformation, then choose Add to steps to add the transformation to the data flow.

You can provide a name and choose Update to save the data flow.

In the next example, we ask “Show me all unique ages sorted.”

Some ages are negative, so we should filter on valid ages. We drop rows with age below 0 or more than 100 and add this to the steps.

In the following example, we ask “Create a bar chart for null values in the dataset.”

Then we ask for a bar chart for the treatment column.

In the following example, we ask for a bar chart for the work_interfere column.

In the column work_interfere, we replace the NA values with “Don’t know.” We want to make the model weight missing values just as it weights people that have replied “Don’t know.”

For the column self_employed, we want to replace NA with “No” to make the model weight missing values just as it weights people that have replied “NA.”

You can choose to add any other transformations as needed. If you’ve followed the preceding transformations, your steps should look like the following screenshot.

Perform an analysis on the transformed data

Now that transformations have been done on the data, you may want to perform analyses to make sure they haven’t affected data integrity.

To do so, navigate to the Analyses tab to create an analysis. For this example, we create a feature correlation analysis with the correlation type linear.

The analysis report will generate a correlation matrix. The correlation matrix measures the positive or negative correlation of features among themselves, between each other. A value closer to 1 means positive correlation, and a value closer to -1 means negative correlation.

Linear feature correlation is based on Pearson’s correlation. To find the relationship between a numeric variable (like age or income) and a categorical variable (like gender or education level), we first assign numeric values to the categories in a way that allows them to best predict the numeric variable. Then we calculate the correlation coefficient, which measures how strongly the two variables are related.

Linear categorical to categorical correlation is not supported.

Numeric to numeric correlation is in the range [-1, 1], where 0 implies no correlation, 1 implies perfect correlation, and -1 implies perfect inverse correlation. Numeric to categorical and categorical to categorical correlations are in the range [0, 1], where 0 implies no correlation and 1 implies perfect correlation.

Features that are not either numeric or categorical are ignored.

The following table lists for each feature what is the most correlated feature to it.

Feature Most Correlated Feature Correlation
Age (numeric) Gender (categorical) 0.248216
Gender (categorical) Age (numeric) 0.248216
seek_help (categorical) Age (numeric) 0.175808
no_employees (categorical) Age (numeric) 0.166486
benefits (categorical) Age (numeric) 0.157729
remote_work (categorical) Age (numeric) 0.139105
care_options (categorical) Age (numeric) 0.1183
wellness_program (categorical) Age (numeric) 0.117175
phys_health_consequence (categorical) Age (numeric) 0.0961159
work_interfere (categorical) Age (numeric) 0.0797424
treatment (categorical) Age (numeric) 0.0752661
mental_health_consequence (categorical) Age (numeric) 0.0687374
obs_consequence (categorical) Age (numeric) 0.0658778
phys_health_interview (categorical) Age (numeric) 0.0639178
self_employed (categorical) Age (numeric) 0.0628861
tech_company (categorical) Age (numeric) 0.0609773
leave (categorical) Age (numeric) 0.0601671
mental_health_interview (categorical) Age (numeric) 0.0600251
mental_vs_physical (categorical) Age (numeric) 0.0389857
anonymity (categorical) Age (numeric) 0.038797
coworkers (categorical) Age (numeric) 0.0181036
supervisor (categorical) Age (numeric) 0.0167315
family_history (categorical) Age (numeric) 0.00989271

The following figure shows our correlation matrix.

You can explore more analyses of different types. For more details, see Explore your data using visualization techniques.

Export the dataset and create a model

Return to the main data flow and run the SageMaker Data Wrangler validation flow. Upon successful validation, you are ready to export the dataset for model training.

Next, you export your dataset and build an ML model on top of it. Complete the following steps:

  • Open the expanded menu in the final transformation and choose Create model.

  • For Dataset name, enter a name.
  • Choose Export.

At this point, your mental health assessment dataset is ready for model training and testing.

  • Choose Create model.

  • For Model name, enter a name.
  • For Problem type, select Predictive analysis.

SageMaker Canvas suggested this based on the dataset, but you can override this for your own experimentation. For more information about ready-to-use models provided by SageMaker Canvas, see Use Ready-to-use models.

  • Choose Create.

  • For Target column, choose treatment as the column to predict.

Because Yes or No is predicted, SageMaker Canvas detected this is a two-category prediction model.

  • Choose Configure model to set configurations.

  • For Objective metric, leave as the default F1.

F1 averages two important metrics: precision and recall.

  • For Training method, select Auto.

This option selects the algorithm most relevant to your dataset and the best range of hyperparameters to tune model candidates. Alternatively, you could use the ensemble or hyperparameter optimization training options. For more information, see Training modes and algorithm support.

  • For Data split, specify an 80/20 configuration for training and validation, respectively.

  • Choose Save and then Preview model to generate a preview.

This preview runs on subset of data and provides information on estimated model accuracy and feature importance. Based on the results, you may still apply additional transformations to improve the estimated accuracy.

Although low impact features might add noise to the model, these may still be useful to describe situations specific to your use case. Always combine predictive power with your own context to determine which features to include.

You’re now ready to build the full model with either Quick build or Standard build. Quick build only supports datasets with fewer than 50,000 rows and prioritizes speed over accuracy, training fewer combinations of models and hyperparameters, for rapid prototyping or proving out value. Standard build prioritizes accuracy and is necessary for exporting the full Jupyter notebook used for training.

  • For this post, choose Standard build.

To learn more about how SageMaker Canvas uses training and validation datasets, see Evaluating Your Model’s Performance in Amazon SageMaker Canvas and SHAP Baselines for Explainability.

Your results may differ from those in this post. Machine learning introduces stochasticity in the model training process, which can lead to slight variations.

Here, we’ve built a model that will predict with about 87% accuracy whether an individual will seek mental health treatment. At this stage, think about how you could achieve a practical impact from the Machine Learning model. For example, here an organization may consider how they can apply the model to preemptively support individuals who’s attributes suggest they would seek treatment.

Review model metrics

Let’s focus on the first tab, Overview. Here, Column impact is the estimated importance of each attribute in predicting the target. Information here can help organizations gain insights that lead to actions based on the model. For example, we see that the work_interfere column has the most significant impact in predication for treatment. Additionally, better benefits and care_options increase the likelihood of employees opting in to treatment.

On the Scoring tab, we can visualize a Sankey (or ribbon) plot of the distribution of predicted values with respect to actual values, providing insight into how the model performed during validation.

For more detailed insights, we look at the Advanced metrics tab for metric values the model may have not been optimized for, the confusion matrix, and precision recall curve.

The advanced metrics suggest we can trust the resulting model. False positives (predicting an employee will opt in for treatment when they actually don’t) and false negatives (predicting an employee will opt out when they actually opt in) are low. High numbers for either may make us skeptical about the current build and more likely to revisit previous steps.

Test the model

Now let’s use the model for making predictions. Choose Predict to navigate to the Predict tab. SageMaker Canvas allows you to generate predictions in two forms:

  • Single prediction (single “what-if scenario”)
  • Batch prediction (multiple scenarios using a CSV file)

For a first test, let’s try a single prediction. Wait a few seconds for the model to load, and now you’re ready to generate new inferences. You can change the values to experiment with the attributes and their impact.

For example, let’s make the following updates:

  • Change work_interfere from Often to Sometimes
  • Change benefits from Yes to No

Choose Update and see if the treatment prediction is affected.

In SageMaker Canvas, you can generate batch predictions either manually or automatically on a schedule. Let’s try the manual approach. To learn about automating batch predictions, refer to Automate batch predictions.

  • In practice, use a dataset different from training for testing predictions. For this example though, lets use the same file as before. Be sure to remove the work_interfere column.
  • Choose Batch prediction and upload the downloaded file.
  • Choose Generate predictions.
  • When it’s complete, choose View to see the predictions.

Deploy the model

The final (optional) step of the SageMaker Canvas workflow for ML models is deploying the model. This uses SageMaker real-time inference endpoints to host the SageMaker Canvas model and expose an HTTPS endpoint for use by applications or developers.

  1. On the Deploy tab, choose Create deployment.
  2. For Deployment name, enter a name.
  3. For Instance type, choose an instance (for this post, ml.m5.2xlarge).
  4. Set Instance count to 1.
  5. Choose Deploy.

This instance configuration is sufficient for the demo. You can change the configuration later from the SageMaker Canvas UI or using SageMaker APIs. To learn more about auto scaling such workloads, see Automatically Scale Amazon SageMaker Models.

After the deployment is successful, you can invoke the endpoint using AWS SDKs or direct HTTPs calls. For more information, see Deploy models for real-time inference.

To learn more about model deployment, refer to Deploy your Canvas models to a SageMaker Endpoint and Deploy models for real-time inference.

Clean up

Make sure to log out from SageMaker Canvas by choosing Log out. Logging out of the SageMaker Canvas application will release all resources used by the workspace instance, therefore avoiding incurring additional unintended charges.

Summary

Mental health is a dynamic and evolving field, with new research and insights constantly emerging. Staying up to date with the latest developments and best practices can be challenging, especially in a public forum. Additionally, when discussing mental health, it’s essential to approach the topic with sensitivity, respect, and a commitment to providing accurate and helpful information.

In this post, we showcased an ML approach to building a mental health model using a sample dataset and SageMaker Canvas, a low-code no-code platform from AWS. This can serve as guidance for organizations looking to explore similar solutions for their specific needs. Implementing AI to assess employee mental health and offer preemptive support can yield a myriad of benefits. By promoting detection of potential mental health needs, intervention can be more personalized and reduce the risk of drastic complications in the future. A proactive approach can also enhance employee morale and productivity, mitigating the likelihood of absenteeism, turnover and ultimately leads to a healthier and more resilient workforce.. Overall, using AI for mental health prediction and support signifies a commitment to nurturing a supportive work environment where employees can thrive.

To explore more about SageMaker Canvas with industry-specific use cases, explore a hands-on workshop. To learn more about SageMaker Data Wrangler in SageMaker Canvas, refer to Prepare Data. You can also refer to the following YouTube video to learn more about the end-to-end ML workflow with SageMaker Canvas.

Although this post provides a technical perspective, we strongly encourage readers who are struggling with mental health issues to seek professional help. Remember, there is always help available for those who ask.

Together, let’s take a proactive step towards empowering mental health awareness and supporting those in need.


About the Authors

Rushabh Lokhande is a Senior Data & ML Engineer with AWS Professional Services Analytics Practice. He helps customers implement big data, machine learning, analytics solutions, and generative AI implementations. Outside of work, he enjoys spending time with family, reading, running, and playing golf.

Bruno Klein is a Senior Machine Learning Engineer with AWS Professional Services Analytics Practice. He helps customers implement big data analytics solutions and generative AI implementations. Outside of work, he enjoys spending time with family, traveling, and trying new food.

Ryan Gomes is a Senior Data & ML Engineer with AWS Professional Services Analytics Practice. He is passionate about helping customers achieve better outcomes through analytics, machine learning, and generative AI solutions in the cloud. Outside of work, he enjoys fitness, cooking, and spending quality time with friends and family.

Read More

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Genomic language models are a new and exciting field in the application of large language models to challenges in genomics. In this blog post and open source project, we show you how you can pre-train a genomics language model, HyenaDNA, using your genomic data in the AWS Cloud. Here, we use AWS HealthOmics storage as a convenient and cost-effective omic data store and Amazon Sagemaker as a fully managed machine learning (ML) service to train and deploy the model.

Genomic language models

Genomic language models represent a new approach in the field of genomics, offering a way to understand the language of DNA. These models use the transformer architecture, a type of natural language processing (NLP), to interpret the vast amount of genomic information available, allowing researchers and scientists to extract meaningful insights more accurately than with existing in silico approaches and more cost-effectively than with existing in situ techniques.

By bridging the gap between raw genetic data and actionable knowledge, genomic language models hold immense promise for various industries and research areas, including whole-genome analysis, delivered care, pharmaceuticals, and agriculture. They facilitate the discovery of novel gene functions, the identification of disease-causing mutations, and the development of personalized treatment strategies, ultimately driving innovation and advancement in genomics-driven fields. The ability to effectively analyze and interpret genomic data at scale is the key to precision medicine, agricultural optimization, and biotechnological breakthroughs, making genomic language models a possible new foundational technology in these industries.

Some of the pioneering genomic language models include

  • DNABERT which was one of the first attempts to use the transformer architecture to learn the language of DNA. DNABERT used a Bidirectional Encoder Representations from Transformers (BERT, encoder-only) architecture pre-trained on a human reference genome and showed promising results on downstream supervised tasks.
  • Nucleotide transformer has a similar architecture to DNABERT and showed that pre-training on more data and increasing the context window size improves the model’s accuracy on downstream tasks.
  • HyenaDNA uses the transformer architecture, like other genomic models, except that it replaces each self-attention layer with a Hyena operator. This widens the context window to allow processing of up to 1 million tokens, substantially more than prior models, allowing it to learn longer-range interactions in DNA.

In our exploration of cutting-edge models that push the boundaries of genetic sequence analysis, we focused on HyenaDNA. Pretrained HyenaDNA models are readily accessible on Hugging Face. This availability facilitates easy integration into existing projects or the starting point for new explorations in genetic sequence analysis.

AWS HealthOmics and sequence stores

AWS HealthOmics is a purpose-built service that helps healthcare and life science organizations and their software partners store, query, and analyze genomic, transcriptomic, and other omics data and then generate insights from that data to improve health and drive deeper biological understanding. It supports large-scale analysis and collaborative research through HealthOmics storage, analytics, and workflow capabilities.

With HealthOmics storage, a managed omics focused findable accessible, interoperable, and reusable (FAIR) data store, users can cost effectively store, organize, share, and access petabytes of bioinformatics data efficiently at a low cost per gigabase. HealthOmics sequence stores deliver cost savings through automatic tiering and compression of files based on usage, enable sharing and findability through the biologically focused metadata and provenance tracking, and provide instant access to frequently used data through low latency Amazon Simple Storage Service (Amazon S3) compatible APIs or HealthOmics native APIs. All of this is delivered by HealthOmics, removing the burden of managing compression, tiering, metadata, and file organization from customers.

Amazon SageMaker

Amazon SageMaker is a fully managed ML service offered by AWS, designed to reduce the time and cost associated with training and tuning ML models at scale.

With SageMaker Training, a managed batch ML compute service, users can efficiently train models without having to manage the underlying infrastructure. SageMaker notably supports popular deep learning frameworks, including PyTorch, which is integral to the solutions provided here.

SageMaker also provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs.

Solution overview

In this blog post we address pre-training a genomic language model on an assembled genome. This genomic data could be either public (for example, GenBank) or could be your own proprietary data. The following diagram illustrates the workflow:

The image illustrates an architecture diagram for training HyenaDNA model using the data stored in AWS HealthOmics sequence store. 1. Read the Data: Data is read from an external genomic data source, such as GenBank. 2. Load the Data to Store: The data is then loaded into an AWS HealthOmics sequence store using Data Loading SageMaker Notebook. 3. Start Training Job: Utilizes SageMaker train & Deploy Notebook to initiate a training job on Amazon SageMaker. 4. Read the Data from Sequence Store: Training job accesses data from the Sequence Store using S3 access point of sequence store. 5. Download Model Checkpoint: A model checkpoint from Hugging Face (HyneDNA model) is downloaded. 6. Save Trained Model: The trained model is saved following the training process. 7. Deploy Trained Model: The trained model is then deployed using Amazon SageMaker, establishing a real-time endpoint. 8. Inference: Finally, the model performs inference tasks, likely using the deployed SageMaker real-time endpoint.

  1. We start with genomic data. For the purposes of this blog post, we’re using a public non-reference Mouse genome from GenBank. The dataset is part of The Mouse Genomes Project and represents a consensus genome sequence of inbred mouse strains. This type of genomic data could readily be interchanged with proprietary datasets that you might be working with in your research.
  2. We use a SageMaker notebook to process the genomic files and to import these into a HealthOmics sequence store.
  3. A second SageMaker notebook is used to start the training job on SageMaker.
  4. Inside the managed training job in the SageMaker environment, the training job first downloads the mouse genome using the S3 URI supplied by HealthOmics.
  5. Then the training job retrieves the checkpoint weights of the HyenaDNA model from Huggingface. These weights are pretrained on the human reference genome. This pretraining allows the model to understand and predict genomic sequences, providing a comprehensive baseline for further specialized training on a variety of genomic tasks.
  6. Using these resources, the HyenaDNA model is trained, where it uses the mouse genome to refine its parameters. After pre-training is complete and validation results are satisfactory, the trained model is saved to Amazon S3.
  7. Then we deploy that model as a SageMaker real-time inference endpoint.
  8. Lastly the model is tested against a set of known genome sequences using some inference API calls.

Data preparation and loading into sequence store

The initial step in our machine learning workflow focuses on preparing the data. We start by uploading the genomic sequences into a HealthOmics sequence store. Although FASTA files are the standard format for storing reference sequences, we convert these to FASTQ format. This conversion is carried out to better reflect the format expected to store the assembled data of a sequenced sample.

In the sample Jupyter notebook we show how to download FASTA files from GenBank, convert them into FASTQ files, and then load them into a HealthOmics sequence store. You can skip this step If you already have your own genomic data in a sequence store.

Training on SageMaker

We use PyTorch and Amazon SageMaker script mode to train this model. Script mode’s compatibility with PyTorch was crucial, allowing us to use our existing scripts with minimal modifications. For the training, we extract the training data from the sequence store through the sequence store’s provided S3 URIs. You can, for example, use the boto3 library to obtain this S3 URI.

seq_store_id = "4308389581“

seq_store_info = omics.get_sequence_store(id=seq_store_id)
s3_uri = seq_store_info["s3Access"]["s3Uri"]
s3_arn = seq_store_info["s3Access"]["s3AccessPointArn"]
key_arn = seq_store_info["sseConfig"]["keyArn"]
s3_uri, s3_arn, key_arn

S3_DATA_URI = f"{s3_uri}readSet/"
S3_DATA_URI

When you provide this to the SageMaker estimator, the training job takes care of downloading the data from the sequence store through its S3 URI. Following Nguyen et al, we train on chromosomes 2, 4, 6, 8, X, and 14–19; cross-validate on chromosomes 1, 3, 12, and 13; and test on chromosomes 5, 7, and 9–11.

To maximize the training efficiency of our HyenaDNA model, we use distributed data parallel (DDP). DDP is a technique that facilitates the parallel processing of our training tasks across multiple GPUs. To efficiently implement DDP, we used the Hugging Face Accelerate library. Accelerate simplifies running distributed training by abstracting away the complexity typically associated with setting up DDP.

After you have defined your training script, you can configure and submit a SageMaker training job.

First, let’s define the hyperparameters, starting with model_checkpoint. This parameter refers to a HuggingFace model ID for a specific pre-trained model. Notably, the HyenaDNA model lineup includes checkpoints that can handle up to 1 million tokens. However, for demonstration purposes, we are using the hyenadna-small-32k-seqlen-hf model, which has a context window of 32,000 tokens, indicated by the max_length setting. It’s essential to understand that different model IDs and corresponding max_length settings can be selected to use models with smaller or larger context windows, depending on your computational needs and objectives.

The species parameter is set to mouse, specifying the type of organism the genomic training data represents.

hyperparameters = {
    "species" : "mouse",
    "epochs": 150,
    "model_checkpoint": MODEL_ID,
    "max_length": 32_000,
    "batch_size": 4,
    "learning_rate": 6e-4,
    "weight_decay" : 0.1,
    "log_level" : "INFO",
    "log_interval" : 100
}

Next, define what metrics, especially the training and validation perplexity, to capture from the training logs:

metric_definitions = [
    {"Name": "epoch", "Regex": "Epoch: ([0-9.]*)"},
    {"Name": "step", "Regex": "Step: ([0-9.]*)"},
    {"Name": "train_loss", "Regex": "Train Loss: ([0-9.e-]*)"},
    {"Name": "train_perplexity", "Regex": "Train Perplexity: ([0-9.e-]*)"},
    {"Name": "eval_loss", "Regex": "Eval Average Loss: ([0-9.e-]*)"},
    {"Name": "eval_perplexity", "Regex": "Eval Perplexity: ([0-9.e-]*)"}
]

Finally, define a Pytorch estimator and submit a training job that refers to the data location obtained from the HealthOmics sequence store.

hyenaDNA_estimator = PyTorch(
    base_job_name=TRAINING_JOB_NAME,
    entry_point="train_hf_accelerate.py",
    source_dir="scripts/",
    instance_type="ml.g5.12xlarge",
    instance_count=1,
    image_uri=pytorch_image_uri,
    role=SAGEMAKER_EXECUTION_ROLE,
    hyperparameters=hyperparameters,
    metric_definitions=metric_definitions,
    sagemaker_session=sagemaker_session,
    distribution={"torch_distributed": {"enabled": True}},
    tags=[{"Key": "project", "Value": "genomics-model-pretraining"}],
    keep_alive_period_in_seconds=1800,
    tensorboard_output_config=tensorboard_output_config,
)

with Run(
    experiment_name=EXPERIMENT_NAME,
    sagemaker_session=sagemaker_session,
) as run:
    hyenaDNA_estimator.fit(
        {
            "data": TrainingInput(
                s3_data=S3_DATA_URI, input_mode="File"
            ),
        },
        wait=True,
    )

Results

In our training cycle for the model, we processed a dataset consisting of one mouse genome with 10,000 entries. The computational resources included a cluster configured with one ml.g5.12xlarge instance, which houses four Nvidia A10G GPUs. The 32k sequence length model, was trained using a batch size of four per GPU (24 gigabit (Gb) of VRAM). With this setup we completed 150 epochs to report the results below.

Evaluation metrics: The evaluation perplexity and loss graphs show a downward trend at the outset, which then plateaus. The initial steep decrease indicates that the model rapidly learned from the training data, improving its predictive performance. As training progressed, the rate of improvement slowed, as evidenced by the plateau, which is typical in the later stages of training as the model converges.

The image plots the evaluation loss of a HyenaDNA model training over a series of epochs. The overall trend suggests that the model's loss decreased significantly early in the training and reached a plateau, indicating potential convergence of the model training process.

The image plots the evaluation perplexity values of HyenaDNA model during its training over a sequence of epochs. This decreasing trend followed by stabilization indicates that the model's ability to predict or understand the data improved quickly initially and then reached a level of consistency as training progressed.

Training Metrics: Similarly, the training perplexity and loss graphs indicate an initial sharp improvement followed by a gradual plateau. This shows that the model effectively learned from the data. The training loss’s slight fluctuations suggest that the model continued to fine-tune its parameters in response to the inherent complexities in the training dataset.

The image plots the perplexity values of a machine learning model over training steps. training perplexity, which demonstrates a significant decrease early on, followed by a gradual decline and stabilization around 3.2. This behavior suggests that as training progresses, the model becomes increasingly efficient at predicting or understanding the training data, indicated by the decreasing perplexity values. The stabilization at a lower perplexity level indicates that the model has likely achieved a good level of generalization.

Deployment

Upon the completion of training, we then deployed the model on a SageMaker real-time endpoint. SageMaker real-time endpoints provide an on-demand, scalable way to generate embeddings for genomic sequences.

In our SageMaker real-time endpoint setup, we need to adjust the default configurations to handle large payload sizes, specifically 32k context windows for both requests and responses. Because the default payload size of 6.5 MB isn’t sufficient, we’re increasing it to a little over 50 MB:

hyenaDNAModel = PyTorchModel(
    model_data=model_data,
    role=SAGEMAKER_EXECUTION_ROLE,
    image_uri=pytorch_deployment_uri,
    entry_point="inference.py",
    source_dir="scripts/",
    sagemaker_session=sagemaker_session,
    name=endpoint_name,
    env = {
        'TS_MAX_RESPONSE_SIZE':'60000000',
        'TS_MAX_REQUEST_SIZE':'60000000',
    }
)

# deploy the endpoint endpoint
realtime_predictor = hyenaDNAModel.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.8xlarge",
    endpoint_name=endpoint_name,
    env=env,
)

By submitting a sequence to the endpoint, users can quickly receive the corresponding embeddings generated by HyenaDNA. These embeddings encapsulate the complex patterns and relationships learned during training, representing the genetic sequences in a form that is conducive to further analysis and predictive modeling. Here is an example of how to invoke the model.

import json
from sagemaker.deserializers import JSONDeserializer
from sagemaker.serializers import JSONSerializer

sample_genome_data = []
with open("./sample_mouse_data.json") as file:
    for line in file:
        sample_genome_data.append(json.loads(line))
len(sample_genome_data)

data = [sample_genome_data[0]]
realtime_predictor.serializer = JSONSerializer()
realtime_predictor.deserializer = JSONDeserializer()
realtime_predictor.predict(data=data)

When you submit a sample genomic sequence to the model, it returns the embeddings of that sequence:

{'embeddings': [[-0.50390625, 0.447265625,-1.03125, 0.546875, 0.50390625, -0.53125, 0.59375, 0.71875, 0.349609375, -0.404296875, -4.8125, 0.84375, 0.359375, 1.2265625,………]]}

Conclusion

We’ve shown how to pre-train a HyenaDNA model with a 32k context window and to produce embeddings that can be used for downstream predictive tasks. Using the techniques shown here you can also pre-train a HyenaDNA model with context windows of other sizes (for example, 1 million tokens) and on other genomic data (for example, proprietary genomic sequence data).

Pre-training genomic models on large, diverse datasets is a foundational step in preparing them for downstream tasks, such as identifying genetic variants linked to diseases or predicting gene expression levels. In this blog post, you’ve learned how AWS facilitates this pre-training process by providing a scalable and cost-efficient infrastructure through HealthOmics and SageMaker. Looking forward, researchers can use these pre-trained models to fast-track their projects, fine-tuning them with specific datasets to gain deeper insights into genetic research.

To explore further details and try your hand at using these resources, we invite you to visit our GitHub repository. Additionally, We encourage you to learn more by visiting the Amazon SageMaker documentation and the AWS HealthOmics documentation.


About the authors

Shamika Ariyawansa, serving as a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences division at Amazon Web Services (AWS), specializes in Generative AI. He assists customers in integrating Generative AI into their projects, emphasizing the adoption of Large Language Models (LLMs) for healthcare and life sciences domains with a focus on distributed training. Beyond his professional commitments, Shamika passionately pursues skiing and off-roading adventures.

Simon Handley, PhD, is a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He has more than 25 years experience in biotechnology and machine learning and is passionate about helping customers solve their machine learning and genomic challenges. In his spare time, he enjoys horseback riding and playing ice hockey.

Read More

Falcon 2 11B is now available on Amazon SageMaker JumpStart

Falcon 2 11B is now available on Amazon SageMaker JumpStart

Today, we are excited to announce that the first model in the next generation Falcon 2 family, the Falcon 2 11B foundation model (FM) from Technology Innovation Institute (TII), is available through Amazon SageMaker JumpStart to deploy and run inference.

Falcon 2 11B is a trained dense decoder model on a 5.5 trillion token dataset and supports multiple languages. The Falcon 2 11B model is available on SageMaker JumpStart, a machine learning (ML) hub that provides access to built-in algorithms, FMs, and pre-built ML solutions that you can deploy quickly and get started with ML faster.

In this post, we walk through how to discover, deploy, and run inference on the Falcon 2 11B model using SageMaker JumpStart.

What is the Falcon 2 11B model

Falcon 2 11B is the first FM released by TII under their new artificial intelligence (AI) model series Falcon 2. It’s a next generation model in the Falcon family—a more efficient and accessible large language model (LLM) that is trained on a 5.5 trillion token dataset primarily consisting of web data from RefinedWeb with 11 billion parameters. It’s built on causal decoder-only architecture, making it powerful for auto-regressive tasks. It’s equipped with multilingual capabilities and can seamlessly tackle tasks in English, French, Spanish, German, Portuguese, and other languages for diverse scenarios.

Falcon 2 11B is a raw, pre-trained model, which can be a foundation for more specialized tasks, and also allows you to fine-tune the model for specific use cases such as summarization, text generation, chatbots, and more.

Falcon 2 11B is supported by the SageMaker TGI Deep Learning Container (DLC) which is powered by Text Generation Inference (TGI), an open source, purpose-built solution for deploying and serving LLMs that enables high-performance text generation using tensor parallelism and dynamic batching.

The model is available under the TII Falcon License 2.0, the permissive Apache 2.0-based software license, which includes an acceptable use policy that promotes the responsible use of AI.

What is SageMaker JumpStart

SageMaker JumpStart is a powerful feature within the SageMaker ML platform that provides ML practitioners a comprehensive hub of publicly available and proprietary FMs. With this managed service, ML practitioners get access to a growing list of cutting-edge models from leading model hubs and providers that they can deploy to dedicated SageMaker instances within a network isolated environment, and customize models using SageMaker for model training and deployment.

You can discover and deploy the Falcon 2 11B model with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The Falcon 2 11B model is available today for inferencing from 22 AWS Regions where SageMaker JumpStart is available. Falcon 2 11B will require g5 and p4 instances.

Prerequisites

To try out the Falcon 2 model using SageMaker JumpStart, you need the following prerequisites:

  • An AWS account that will contain all your AWS resources.
  • An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
  • Access to SageMaker Studio or a SageMaker notebook instance or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.

Discover Falcon 2 11B in SageMaker JumpStart

You can access the FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an IDE that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane or by choosing JumpStart from the Home page.

From the SageMaker JumpStart landing page, you can find pre-trained models from the most popular model hubs. You can search for Falcon in the search box. The search results will list the Falcon 2 11B text generation model and other Falcon model variants available.

You can choose the model card to view details about the model such as license, data used to train, and how to use the model. You will also find two options, Deploy and Preview notebooks, to deploy the model and create an endpoint.

Deploy the model in SageMaker JumpStart

Deployment starts when you choose Deploy. SageMaker performs the deploy operations on your behalf using the IAM SageMaker role assigned in the deployment configurations. After deployment is complete, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

Falcon 2 11B text generation

To deploy using the SDK, we start by selecting the Falcon 2 11B model, specified by the model_id with value huggingface-llm-falcon2-11b. You can deploy any of the selected models on SageMaker with the following code. Similarly, you can deploy the Falcon 2 11B LLM using its own model ID.

from sagemaker.jumpstart.model import JumpStartModel 
accept_eula = False
model = JumpStartModel(model_id="huggingface-llm-falcon2-11b") 
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The recommended instance types for this model endpoint usage are ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge, or ml.p4d.24xlarge. Make sure you have the account-level service limit for one or more of these instance types to deploy this model. For more information, refer to Requesting a quota increase.

After it is deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {
    "inputs": "User: Hello!nFalcon: ",
    "parameters": {
        "max_new_tokens": 100, 
        "top_p": 0.9, 
        "temperature": 0.6
    },
}
predictor.predict(payload)

Example prompts

You can interact with the Falcon 2 11B model like any standard text generation model, where the model processes an input sequence and outputs predicted next words in the sequence. In this section, we provide some example prompts and sample output.

Text generation

The following is an example prompt for text generated by the model:

payload = { 
      "inputs": "Building a website can be done in 10 simple steps:", 
      "parameters": { 
          "max_new_tokens": 80,
          "top_k": 10,
          "do_sample": True,
          "return_full_text": False
          }, 
} 
response = predictor.predict(payload)[0]["generated_text"].strip() 
print(response)

The following is the output:

1. Decide what the site will be about
2. Research the topic 
3. Sketch the layout and design 
4. Register the domain name 
5. Set up hosting 
6. Install WordPress 
7. Choose a theme 
8. Customize theme colors, typography and logo  
9. Add content  
10. Test and finalize

Code generation

Using the preceding example, we can use code generation prompts as follows:

payload = { 
      "inputs": "Write a function in Python to write a json file:", 
      "parameters": { 
          "max_new_tokens": 300,
          "do_sample": True,
          "return_full_text": False
          }, 
} 
response = predictor.predict(payload)[0]["generated_text"].strip() 
print(response)

The code uses Falcon 2 11B to generate a Python function that writes a JSON file. It defines a payload dictionary with the input prompt "Write a function in Python to write a json file:" and some parameters to control the generation process, like the maximum number of tokens to generate and whether to enable sampling. It then sends this payload to a predictor (likely an API), receives the generated text response, and prints it to the console. The printed output should be the Python function for writing a JSON file, as requested in the prompt.

The following is the output:

```json
{
  "name": "John",
  "age": 30,
  "city": "New York"
}
```
```python
import json

def write_json_file(file_name, json_obj):
    try:
        with open(file_name, 'w', encoding="utf-8") as outfile:
            json.dump(json_obj, outfile, ensure_ascii=False, indent=4)
        print("Created json file {}".format(file_name))
    except Exception as e:
        print("Error occurred: ",str(e))

# Example Usage
write_json_file('data.json', {
  "name": "John",
  "age": 30,
  "city": "New York"
})
```

The output from the code generation defines the write_json_file that takes the file name and a Python object and writes the object as JSON data. Falcon 2 11B uses the built-in JSON module and handles exceptions. An example usage is provided at the bottom, writing a dictionary with name, age, and city keys to a file named data.json. The output shows the expected JSON file content, illustrating the model’s natural language processing (NLP) and code generation capabilities.

Sentiment analysis

You can perform sentiment analysis using a prompt like the following with Falcon 2 11B:

payload = {
"inputs": """
Tweet: "I am so excited for the weekend!"
Sentiment: Positive

Tweet: "Why does traffic have to be so terrible?"
Sentiment: Negative

Tweet: "Just saw a great movie, would recommend it."
Sentiment: Positive

Tweet: "According to the weather report, it will be cloudy today."
Sentiment: Neutral

Tweet: "This restaurant is absolutely terrible."
Sentiment: Negative

Tweet: "I love spending time with my family."
Sentiment:""",

"parameters": {
    "max_new_tokens": 2,
    "do_sample": True,
    "return_full_text": False 
},
}
response = predictor.predict(payload)[0]["generated_text"].strip()
print(response)

The following is the output:

Positive

The code for sentiment analysis demonstrates using Falcon 2 11B to provide examples of tweets with their corresponding sentiment labels (positive, negative, neutral). The last tweet (“I love spending time with my family”) is left without a sentiment to prompt the model to generate the classification itself. The max_new_tokens parameter is set to 2, indicating that the model should generate a short output, likely just the sentiment label. With do_sample set to true, the model can sample from its output distribution, potentially leading to better results for sentiment tasks. Classification based on text inputs and patterns learned from previous examples is what teaches this model to output the desired and accurate response.

Question answering

You can also use a question answering prompt like the following with Falcon 2 11B:

# Question answering
payload = {
    "inputs": "Respond to the question: How did the development of transportation systems, 
               such as railroads and steamships, impact global trade and cultural exchange?",
    "parameters": {
        "max_new_tokens": 225,
        "do_sample": True,
        "return_full_text": False
    },
}
response = predictor.predict(payload)[0]["generated_text"].strip()
print(response)

The following is the output:

The development of transportation systems such as railroads and steamships had a significant impact on global trade and cultural exchange. 
These modes of transport allowed goods and people to travel over longer distances and at a faster pace than ever before. As a result, 
goods could be transported across great distances, leading to an increase in the volume of trade between countries. 
This, in turn, led to the development of more diverse economic systems, the growth of new industries, and ultimately, 
the establishment of a more integrated global economy. Moreover, these advancements facilitated the dissemination of knowledge and culture, 
and enabled individuals to exchange ideas, customs, and technologies with other countries. This facilitated the exchange of ideas, customs and 
technologies which helped to foster interconnectedness between various societies globally. Overall, the development of transportation systems 
played a critical role in shaping the world economy and promoting collaboration and exchange of ideas among different cultures.

The user sends an input question or prompt to Falcon 2 11B, along with parameters like the maximum number of tokens to generate and whether to enable sampling. The model then generates a relevant response based on its understanding of the question and its training data. After the initial response, a follow-up question is asked, and the model provides another answer, showcasing its ability to engage in a conversational question-answering process.

Multilingual capabilities

You can use languages such as German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish with Falcon 2 11B. In the following code, we demonstrate the model’s multilingual capabilities:

# Multilingual Capabilities
payload = {
    "inputs": "Usuario: Hola!n Asistente:",
    "parameters": {
        "max_new_tokens": 200,
        "do_sample": True,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False
    },
}
response = predictor.predict(payload)[0]["generated_text"].strip()
print(response)

The following is the output:

Hola! ¿En qué puedo ayudarte?
Usuario: Quiero aprender a programar en Python. ¿Dónde puedo empezar?
Asistente: Hay muchas formas de aprender a programar en Python. Una buena opción es empezar 
por leer un libro como "Python for Everybody" o "Learning Python" que te enseñan los conceptos básicos de la programación en Python. 
También puedes encontrar muchos tutoriales en línea en sitios como Codecademy, Udemy o Coursera. Además, hay muchos recursos en línea 
como Stack Overflow o Python.org que te pueden ayudar a resolver dudas y aprender más sobre el lenguaje.

Mathematics and reasoning

Falcon 2 11B models also report strength in mathematic accuracy:

payload = {
    "inputs": "I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. 
               How many dollars did I get back? Explain first before answering.",
    "parameters": {
        "max_new_tokens": 200,
        "do_sample": True,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False
    },
}
response = predictor.predict(payload)[0]["generated_text"].strip()
print(response)

The following is the output:

Sure, I'll explain the process first before giving the answer.

You bought ice cream for 6 kids, and each cone cost $1.25. To find out the total cost, 
we need to multiply the cost per cone by the number of cones.

Total cost = Cost per cone × Number of cones
Total cost = $1.25 × 6
Total cost = $7.50

You paid with a $10 bill, so to find out how much change you received, 
we need to subtract the total cost from the amount you paid.

Change = Amount paid - Total cost
Change = $10 - $7.50
Change = $2.50

So, you received $2.50 in change.

The code shows Falcon 2 11B’s capability to comprehend natural language prompts involving mathematical reasoning, break them down into logical steps, and generate human-like explanations and solutions.

Clean up

After you’re done running the notebook, delete all the resources you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Falcon 2 11B in SageMaker Studio and deploy the model for inference. Because FMs are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case.

Visit SageMaker JumpStart in SageMaker Studio now to get started. For more information, refer to SageMaker JumpStart, JumpStart Foundation Models, and Getting started with Amazon SageMaker JumpStart.


About the Authors

Supriya Puragundla is a Senior Solutions Architect at AWS. She helps key customer accounts on their generative AI and AI/ML journeys. She is passionate about data-driven AI and the area of depth in ML and generative AI.

Armando Diaz is a Solutions Architect at AWS. He focuses on generative AI, AI/ML, and data analytics. At AWS, Armando helps customers integrate cutting-edge generative AI capabilities into their systems, fostering innovation and competitive advantage. When he’s not at work, he enjoys spending time with his wife and family, hiking, and traveling the world.

Niithiyn Vijeaswaran is an Enterprise Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.

Avan Bala is a Solutions Architect at AWS. His area of focus is AI for DevOps and machine learning. He holds a Bachelor’s degree in Computer Science with a minor in Mathematics and Statistics from the University of Maryland. Avan is currently working with the Enterprise Engaged East Team and likes to specialize in projects about emerging AI technology. When not working, he likes to play basketball, go on hikes, and try new foods around the country.

Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his master’s from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.

Read More

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

The General Data Protection Regulation (GDPR) right to be forgotten, also known as the right to erasure, gives individuals the right to request the deletion of their personally identifiable information (PII) data held by organizations. This means that individuals can ask companies to erase their personal data from their systems and from the systems of any third parties with whom the data was shared.

Amazon Bedrock is a fully managed service that makes foundational models (FMs) from leading artificial intelligence (AI) companies and Amazon available through an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure.

FMs are trained on vast quantities of data, allowing them to be used to answer questions on a variety of subjects. However, if you want to use an FM to answer questions about your private data that you have stored in your Amazon Simple Storage Service (Amazon S3) bucket, you need to use a technique known as Retrieval Augmented Generation (RAG) to provide relevant answers for your customers.

Knowledge Bases for Amazon Bedrock is a fully managed RAG capability that allows you to customize FM responses with contextual and relevant company data. Knowledge Bases for Amazon Bedrock automates the end-to-end RAG workflow, including ingestion, retrieval, prompt augmentation, and citations, so you don’t have to write custom code to integrate data sources and manage queries.

Many organizations are building generative AI applications and powering them with RAG-based architectures to help avoid hallucinations and respond to the requests based on their company-owned proprietary data, including personally identifiable information (PII) data.

In this post, we discuss the challenges associated with RAG architectures in responding to GDPR right to be forgotten requests, how to build a GDPR compliant RAG architecture pattern using Knowledge Bases for Amazon Bedrock, and actionable best practices for organizations to respond to the right to be forgotten request requirements of the GDPR for data stored in vector datastores.

Who does GDPR apply to?

The GDPR applies to all organizations established in the EU and to organizations, whether or not established in the EU, that process the personal data of EU individuals in connection with either the offering of goods or services to data subjects in the EU or the monitoring of behavior that takes place within the EU.

The following are key terms used when discussing the GDPR:

  • Data subject – An identifiable living person and resident in the EU or UK, on whom personal data is held by a business or organization or service provider.
  • Processor – The entity that processes the data on the instructions of the controller (for example, AWS).
  • Controller – The entity that determines the purposes and means of processing personal data (for example, an AWS customer).
  • Personal data – Information relating to an identified or identifiable person, including names, email addresses, and phone numbers.

Challenges and considerations with RAG architectures

Typical RAG architecture at a high level involves three stages:

  1. Source data pre-processing
  2. Generating embeddings using an embedding LLM
  3. Storing the embeddings in a vector store.

Challenges associated with these stages involve not knowing all touchpoints where data is persisted, maintaining a data pre-processing pipeline for document chunking, choosing a chunking strategy, vector database, and indexing strategy, generating embeddings, and any manual steps to purge data from vector stores and keep it in sync with source data. The following diagram depicts a high-level RAG architecture.

Because Knowledge Bases for Amazon Bedrock is a fully managed RAG solution, no customer data is stored within the Amazon Bedrock service account permanently, and request details without prompts or responses are logged in Amazon CloudTrail. Model providers can’t access customer data in the deployment account. Crucially, if you delete data from the source S3 bucket, it’s automatically removed from the underlying vector store after syncing the knowledge base.

However, be aware that the service account keeps the data for eight days; after that, it will be purged from the service account. This data is maintained securely with server-side encryption (SSE) using a service key, and optionally using a customer-provided key. If the data needs to be purged immediately from the service account, you can contact the AWS team to do so. This streamlined approach simplifies the GDPR right to be forgotten compliance for generative AI applications.

When calling knowledge bases, using the RetrieveAndGenerate API, Knowledge Bases for Amazon Bedrock takes care of managing sessions and memory on your behalf. This data is SSE encrypted by default, and optionally encrypted using a customer-managed key (CMK). Data to manage sessions is automatically purged after 24 hours.

The following solution discusses a reference architecture pattern using Knowledge Bases for Amazon Bedrock and best practices to support your data subject’s right to be forgotten request in your organization.

Solution approach: Simplified RAG implementation using Knowledge Bases for Amazon Bedrock

With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for RAG. Access to additional data helps the model generate more relevant, context-specific, and accurate responses without continuously retraining the FM. Information retrieved from the knowledge base comes with source attribution to improve transparency and minimize hallucinations.

Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you. You specify the location of your data, select an embedding model to convert the data into vector embeddings, and have Knowledge Bases for Amazon Bedrock create a vector store in your account to store the vector data. When you select this option (available only in the console), Knowledge Bases for Amazon Bedrock creates a vector index in Amazon OpenSearch Serverless in your account, removing the need to do so yourself.

Vector embeddings include the numeric representations of text data within your documents. Each embedding aims to capture the semantic or contextual meaning of the data. Amazon Bedrock takes care of creating, storing, managing, and updating your embeddings in the vector store, and it verifies that your data is in sync with your vector store. The following diagram depicts a simplified architecture using Knowledge Bases for Amazon Bedrock:

Prerequisites to create a knowledge base

Before you can create a knowledge base, you must complete the following prerequisites.

Data preparation

Before creating a knowledge base using Knowledge Bases for Amazon Bedrock, it’s essential to prepare the data to augment the FM in a RAG implementation. In this example, we used a simple curated .csv file which contains customer PII information that needs to be deleted to respond to a GDPR right to be forgotten request by the data subject.

Configure an S3 bucket

You’ll need to create an S3 bucket and make it private. Amazon S3 provides several encryption options for securing the data at rest and in transit. Optionally, you can enable bucket versioning as a mechanism to check multiple versions of the same file. For this example, we created a bucket with versioning enabled with the name bedrock-kb-demo-gdpr. After you create the bucket, upload the .csv file to the bucket. The following screenshot shows what the upload looks like when it’s complete.

Select the uploaded file and from Actions dropdown and choose the Query with S3 Select option to query the .csv data using SQL if the data was loaded correctly.

The query in the following screenshot displays the first five records from the .csv file. In this demonstration, let’s assume that you need to remove the data related to a particular customer. Example: customer information pertaining to the email address art@venere.org.

Steps to create a knowledge base

With the prerequisites in place, the next step is to use Knowledge Bases for Amazon Bedrock to create a knowledge base.

  1. On the Amazon Bedrock console, select Knowledge Base under Orchestration in the left navigation pane.
  2. Choose Create Knowledge base.
  3. For Knowledge base name, enter a name.
  4. For Runtime role, select Create and use a new service role, enter a service role name, and choose Next.
  5. In the next stage, to configure the data source, enter a data source name and point to the S3 bucket created in the prerequisites.
  6. Expand the Advanced settings section and select Use default KMS key and then select Default chunking from Chunking strategy. Choose Next.
  7. Choose the embeddings model in the next screen. In this example we chose Titan Embeddings G1-Text v1.2.
  8. For Vector database, choose Quick create a new vector store – Recommended to set up an OpenSearch Serverless vector store on your behalf. Leave all the other options as default.
  9. Choose Review and Create and select Create knowledge base in the next screen which completes the knowledge base setup.
  10. Review the summary page, select the Data source and choose Sync. This begins the process of converting the data stored in the S3 bucket into vector embeddings in your OpenSearch Serverless vector collection.
  11. Note: The syncing operation can take minutes to hours to complete, based on the size of the dataset stored in your S3 bucket. During the sync operation, Amazon Bedrock downloads documents in your S3 bucket, divides them into chunks (we opted for the default strategy in this post), generates the vector embedding, and stores the embedding in your OpenSearch Serverless vector collection. When the initial sync is complete, the data source status will change to Ready.
  12. Now you can use your knowledge base. We use the Test knowledge base feature of Amazon Bedrock, choose the Anthropic Claude 2.1 model, and ask it a question about a sample customer.

We’ve demonstrated how to use Knowledge Bases for Amazon Bedrock and conversationally query the data using the knowledge base test feature. The query operation can also be done programmatically through the knowledge base API and AWS SDK integrations from within a generative AI application.

Delete customer information

In the sample prompt, we were able to retrieve the customer’s PII information—which was stored as part of the source dataset—using the email address. To respond to GDPR right to be forgotten requests, the next sequence of steps demonstrates how customer data deletion at source deletes the information from the generative AI application powered by Knowledge Bases for Bedrock.

  1. Delete the customer information part of the source .csv file and re-upload the file to the S3 bucket. The following snapshot of querying the .csv file using S3 Select shows that the customer information associated with the email attribute art@venere.org was not returned in the results.
  2. Re-sync the knowledge base data source again from the Amazon Bedrock console.
  3. After the sync operation is complete and the data source status is Ready, test the knowledge base again using the prompt used earlier to verify if the customer PII information is returned in the response.

We were able to successfully demonstrate that after the customer PII information was removed from the source in the S3 bucket, the related entries from the knowledge base are automatically deleted after the sync operation. We can also confirm that the associated vector embeddings stored in OpenSearch Serverless collection were cleared by querying from the OpenSearch dashboard using dev tools.

Note: In some RAG-based architectures, session history will be persisted in an external database such as Amazon DynamoDB. It’s important to evaluate if this session history contains PII data and develop a plan to remove the data if necessary.

Audit tracking

To support GDPR compliance efforts, organizations should consider implementing an audit control framework to record right to be forgotten requests. This will help with your audit requests and provide the ability to roll back in case of accidental deletions observed during the quality assurance process. It’s important to maintain the list of users and systems that might be impacted during this process to maintain effective communication. Also consider storing the metadata of the files being loaded in your knowledge bases for effective tracking. Example columns include knowledge base name, File Name, Date of sync, Modified User, PII Check, Delete requested by, and so on. Amazon Bedrock will write API actions to AWS CloudTrail, which can also be used for audit tracking.

Some customers might need to persist the Amazon CloudWatch Logs to support their internal policies. By default, request details without prompts or responses are logged in CloudTrail and Amazon CloudWatch. However, customers can enable Model invocation logs, which can store PII information. You can help safeguard sensitive data that’s ingested by CloudWatch Logs by using log group data protection policies. These policies let you audit and mask sensitive data that appears in log events ingested by the log groups in your account. When you create a data protection policy, sensitive data that matches the data identifiers (for example, PII) you’ve selected is masked at egress points, including CloudWatch Logs Insights, metric filters, and subscription filters. Only users who have the logs: Unmask IAM permission can view unmasked data. You can also use custom data identifiers to create data identifiers tailored to your specific use case. There are many methods customers can employ to detect and purge the same. Complete implementation details are beyond the scope of this post.

Data discovery and findability

Findability is an important step of the process. Organizations need to have mechanisms to find the data under consideration in an efficient and quick manner for timely response. You can Refer to the FAIR blog and 5 Actionable steps to GDPR Compliance. In this current example, you can leverage S3 Macie to determine the PII data in S3.

Backup and restore

Data from underlying vector stores can be transferred, exported, or copied to different AWS services or outside of the AWS cloud. Organizations should have an effective governance process to detect and remove data to align with the GDPR compliance requirement. However, this is beyond the scope of this post. It’s the responsibility of the customer to remove the data from the underlying backups. It’s good practice to keep the retention period at 29 days (if applicable) so that the backups are cleared after 30 days. Organizations can also set the backup schedule to a certain date (for example, the first of every month). If the policy requires you to remove the data from the backup immediately, you can take a snapshot of the vector store after the deletion of required PII data and then purge the existing backup.

Communication

It’s important to communicate to the users and processes that might be impacted by this deletion. As an example, if the application is powered by single sign-on (SSO) using an identity store such as AWS IAM Identity Center or Okta user profile, then information can be used for managing the stakeholder communications.

Security controls

Maintaining security is of great importance in GDPR compliance. By implementing robust security measures, organizations can help protect personal data from unauthorized access, inadvertent access, and misuse, thereby helping maintain the privacy rights of individuals. AWS offers a comprehensive suite of services and features that can help support GDPR compliance and enhance security measures. To learn more about the shared responsibility between AWS and customers for security and compliance, see the AWS shared responsibility model. The shared responsibility model is a useful approach to illustrate the different responsibilities of AWS (as a data processor or sub processor) and its customers (as either data controllers or data processors) under the GDPR.

AWS offers a GDPR-compliant AWS Data Processing Addendum (AWS DPA), which helps you to comply with GDPR contractual obligations. The AWS DPA is incorporated into the AWS Service Terms.

Article 32 of the GDPR requires that organizations must “…implement appropriate technical and organizational measures to ensure a level of security appropriate to the risk, including …the pseudonymization and encryption of personal data[…].” In addition, organizations must “safeguard against the unauthorized disclosure of or access to personal data.” See the Navigating GDPR Compliance on AWS whitepaper for more details.

Conclusion

We encourage you to take charge of your data privacy today. Prioritizing GPDR compliance and data privacy not only strengthens trust, but can also build customer loyalty and safeguard personal information in the digital era. If you need assistance or guidance, reach out to an AWS representative. AWS has teams of Enterprise Support Representatives, Professional Services Consultants, and other staff to help with GDPR questions. You can contact us with questions. To learn more about GDPR compliance when using AWS services, see the General Data Protection Regulation (GDPR) Center.

Disclaimer: The information provided above is not a legal advice. It is intended to showcase commonly followed best practices. It is crucial to consult with your organization’s privacy officer or legal counsel and determine appropriate solutions.


About the Authors

Yadukishore Tatavarthi is a Senior Partner Solutions Architect supporting Healthcare and life science customers at Amazon Web Services. He has been helping the customers over the last 20 years in building the enterprise data strategies, advising customers on Generative AI, cloud implementations, migrations, reference architecture creation, data modeling best practices, data lake/warehouses architectures.

Krishna Prasad is a Senior Solutions Architect in Strategic Accounts Solutions Architecture team at AWS. He works with customers to help solve their unique business and technical challenges providing guidance in different focus areas like distributed compute, security, containers, serverless, artificial intelligence (AI), and machine learning (ML).

Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customer guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.

Read More

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

This is a guest post co-written with CBRE.

CBRE is the world’s largest commercial real estate services and investment firm, with 130,000 professionals serving clients in more than 100 countries. Services range from financing and investment to property management.

CBRE is unlocking the potential of artificial intelligence (AI) to realize value across the entire commercial real estate lifecycle—from guiding investment decisions to managing buildings. The opportunities to unlock value using AI in the commercial real estate lifecycle starts with data at scale. CBRE’s data environment, with 39 billion data points from over 300 sources, combined with a suite of enterprise-grade technology can deploy a range of AI solutions to enable individual productivity all the way to broadscale transformation. Although CBRE provides customers their curated best-in-class dashboards, CBRE wanted to provide a solution for their customers to quickly make custom queries of their data using only natural language prompts.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API, along with a broad set of capabilities to build generative AI applications, simplifying development while maintaining privacy and security. With the comprehensive capabilities of Amazon Bedrock, you can experiment with a variety of FMs, privately customize them with your own data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and create managed agents that run complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without the need to write code. Because Amazon Bedrock is serverless, you don’t have to manage infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

In this post, we describe how CBRE partnered with AWS Prototyping to develop a custom query environment allowing natural language query (NLQ) prompts by using Amazon Bedrock, AWS Lambda, Amazon Relational Database Service (Amazon RDS), and Amazon OpenSearch Service. AWS Prototyping successfully delivered a scalable prototype, which solved CBRE’s business problem with a high accuracy rate (over 95%) and supported reuse of embeddings for similar NLQs, and an API gateway for integration into CBRE’s dashboards.

Customer use case

Today, CBRE manages a standardized set of best-in-class client dashboards and reports, powered by various business intelligence (BI) tools, such as Tableau and Microsoft Power BI, and their proprietary UI, enabling CBRE clients to review core metrics and reports on occupancy, rent, energy usage, and more for various properties managed by CBRE.

The company’s Data & Analytics team regularly receives client requests for unique reports, metrics, or insights, which require custom development. CBRE wanted to enable clients to quickly query existing data using natural language prompts, all in a user-friendly environment. The prompts are managed through Lambda functions to use OpenSearch Service and Anthropic Claude 2 on Amazon Bedrock to search the client’s database and generate an appropriate response to the client’s business analysis, including the response in plain English, the reasoning, and the SQL code. A simple UI was developed that encapsulates the complexity and allows users to input questions and retrieve the results directly. This solution can be applied to other dashboards at a later stage.

Key use case and environment requirements

Generative AI is a powerful tool for analyzing and transforming vast datasets into usable summaries and text for end-users. Key requirements from CBRE included:

  • Natural language queries (common questions submitted in English) to be used as primary input
  • A scalable solution using a large language model (LLM) to generate and run SQL queries for business dashboards
  • Queries submitted to the environment that return the following:
    • Result in plain English
    • Reasoning in plain English
    • SQL code generated
  • The ability to reuseexisting embeddings of tables, columns, and SQL code if input NLQ is similar to a previous query
  • Query response time of 3–5 seconds
  • Target 90% “good” responses to queries (based on customer User Acceptance Testing)
  • An API management layer for integration into CBRE’s dashboard
  • A straightforward UI and frontend for User Acceptance Testing (UAT)

Solution overview

CBRE and AWS Prototyping built an environment that allows a user to submit a query to structured data tables using natural language (in English), based on Anthropic Claude 2 on Amazon Bedrock with support for 100,000 maximum tokens. Embeddings were generated using Amazon Titan. The framework for connecting Anthropic Claude 2 and CBRE’s sample database was implemented using LangChain. AWS Prototyping developed an AWS Cloud Development Kit (AWS CDK) stack for deployment following AWS best practices.

The environment was developed over a period of multiple development sprints. CBRE, in parallel, completed UAT testing to confirm it performed as expected.

The following figure illustrates the core architecture for the NLQ capability.

The workflow for NLQ consists of the following steps:

  1. A Lambda function writes schema JSON and table metadata CSV to an S3 bucket.
  2. A user sends a question (NLQ) as a JSON event.
  3. The Lambda wrapper function searches for similar questions in OpenSearch Service. If it finds any, it skips to Step 6. If not, it continues to Step 3.
  4. The wrapper function reads the table metadata from the S3 bucket.
  5. The wrapper function creates a dynamic prompt template and gets relevant tables using Amazon Bedrock and LangChain.
  6. The wrapper function selects only relevant tables schema from the schema JSON in the S3 bucket.
  7. The wrapper function creates a dynamic prompt template and generates a SQL query using Anthropic Claude 2.
  8. The wrapper function runs the SQL query using psycopg2.
  9. The wrapper function creates a dynamic prompt template to generate an English answer using Anthropic Claude 2.
  10. The wrapper function uses Anthropic Claude 2 and OpenSearch Service to do the following:
    1. It generates embeddings using Amazon Titan.
    2. It stores the question and SQL query as a vector for reuse in the OpenSearch Service index.
  11. The wrapper function consolidates the output and returns the JSON output.

Web UI and API management layer

AWS Prototyping built a web interface and API management layer to enable user testing during development and accelerate integration into CBRE’s existing BI capabilities. The following diagram illustrates the web interface and API management layer.

The workflow includes the following steps:

  1. The user accesses the web portal hosted from their laptop through a web browser.
  2. A low-latency Amazon CloudFront distribution is used to serve the static site protected by a HTTPS certificate issued by Amazon Certificate Manager (ACM).
  3. An S3 bucket stores the website-related HTML, CSS, and JavaScript necessary to render the static site. The CloudFront distribution has its origin configured to this S3 bucket and remains in sync to serve the latest version of the site to users.
  4. Amazon Cognito is used as a primary authentication and authorization provider with its user pools to allow user login, access to the API gateway, and access to the website bucket and response bucket.
  5. An Amazon API Gateway endpoint with a REST API stage is secured by Amazon Cognito to only allow authenticated entities access to the Lambda function.
  6. A Lambda function with business logic invokes the primary Lambda function.
  7. An S3 bucket to store the generated response from the primary Lambda function is queried from the frontend periodically to show on the web application.
  8. A VPC endpoint is established to isolate the primary Lambda function.
  9. VPC endpoints for both Lambda and Amazon S3 are imported and configured using the AWS CDK so the frontend stack can have adequate access permissions to reach resources within a VPC.
  10. AWS Identity and Access Management (IAM) enforces the necessary permissions for the frontend application.
  11. Amazon CloudWatch captures run logs across various resources, especially Lambda and API Gateway.

Technical approach

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage any infrastructure.

Anthropic Claude 2 on Amazon Bedrock, a general-purpose LLM with 100,000 maximum token support, was selected to support the solution. LLMs demonstrate impressive abilities in automatically generating code. Relevant metadata can help guide the model’s output and in customizing SQL code generation for specific use cases. AWS offers tools like AWS Glue crawlers to automatically extract technical metadata from data sources. Business metadata can be constructed using services like Amazon DataZone. A lightweight approach was taken to quickly build the required technical and business catalogs using custom scripts. The metadata primed the model to generate tailored SQL code aligned with our database schema and business needs.

Input context files are needed for the Anthropic Claude 2 model to generate a SQL query according to the NLQ:

  • meta.csv – This is human-written metadata in a CSV file stored in an S3 bucket, which includes the names of the tables in the schema and a description for each table. The meta.csv file is sent as an input context to the model (refer to steps 3 and 4 in the end-to-end solution architecture diagram) to find the relevant tables according to the input NLQ. The S3 location of meta.csv is as follows:
    s3://<dbSchemaGeneratorBucket>/<DB_Name>/table/meta.csv

  • schema.json – This JSON schema is generated by a Lambda function and stored in Amazon S3. Following steps 5 and 6 in the architecture, the relevant tables schema is sent as input context to the model to generate a SQL query according to the input NLQ. The S3 location of schem.json is as follows:
    s3://<dbSchemaGeneratorBucket>/<DB_Name>/schema/schema.json

DB schema generator Lambda function

This function needs to be invoked manually. The following configurable environmental variables are managed by the AWS CDK during the deployment of this Lambda function:

  • dbSchemaGeneratorBucket – S3 bucket for schema.json
  • secretManagerKeyAWS Secrets Manager key for DB credentials
  • secretManagerRegion – AWS Region in which the Secrets Manager key exists

After a successful run, schema.json is written in an S3 bucket.

Lambda wrapper function

This is the core component of the solution, which performs steps 2 through 10 as described in the end-to-end solution architecture. The following figure illustrates its code structure and workflow.

It runs the following scripts:

  • index.py – The Lambda handler (main) handles input/output and runs functions based on keys in the input context
  • langchain_bedrock.py – Get relevant tables, generate SQL queries, and convert SQL to English using Anthropic Claude 2
  • opensearch.py – Retrieve similar embeddings with existing index or generate new embeddings in OpenSearch Service
  • sql.py – Run SQL queries using pyscopg2 and the opensearch.py module
  • boto3_bedrock.py – The Boto3 client for Amazon Bedrock
  • utils.py – The utilities function includes the OpenSearch Service client, Secrets Manager client, and formatting the final output response

The Lambda wrapper function has two layers for the dependencies:

  • LangChain layer – pip modules and dependencies of LangChain, boto3, and psycopg2
  • OpenSearch Service layer – OpenSearch Service Python client dependencies

AWS CDK manages the following configurable environmental variables during wrapper function deployment:

  • dbSchemaGeneratorBucket – S3 bucket for schema.json
  • opensearchDomainEndpoint – OpenSearch Service endpoint
  • opensearchMasterUserSecretKey – Secret key name for OpenSearch Service credentials
  • secretManagerKey – Secret key name for Amazon RDS credentials
  • secretManagerRegion – Region in which Secrets Manager key exists

The following code illustrates the JSON format for an input event:

{
  "useVectorDB": <0 or 1>, 
  "input_queries": [
    <Question 1>,
    <Question 2>,
    <Question 3>
  ],
“S3OutBucket”: <Output response bucket>,
“S3OutPrefix”: <Output S3 Prefix>
}

It contains the following parameters:

  • input_queries is a list of NLQ questions with a range of 1 to X integer. If there is more than one NLQ, those are added as follow-up questions to the first NLQ.
  • The useVectorDB key defines if OpenSearch Service is to be used as the vector database. If 0, it will run the end-to-end workflow without searching for similar embeddings in OpenSearch Service. If 1, it searches for similar embeddings. If similar embeddings are available, it directly runs the SQL code, otherwise it performs inference with the model. By default, useVectorDB is set to 1, and therefore this key is optional.
  • The S3OutBucket and S3OutPrefix keys are optional. These keys represent the S3 output location of the JSON response. These are primarily used by the frontend in asynchronous mode.

The following code illustrates the JSON format for an output response:

[
    statusCode: <200 or 400>,
    {
        "Question": <Input NLQ>,
        "sql_code": <SQL Query generated by Amazon Bedrock>,
        "SQL_Answer": <SQL Response>,
        "English_Answer": <English Answer>
    }
]

statusCode 200 indicates a successful run of the Lambda function; statusCode 400 indicates a failure with error.

Performance tuning approach

Performance tuning is an iterative approach across multiple layers. In this section, we discuss a performance tuning approach for this solution.

Input context for RAG

LLMs are mostly trained on general domain corpora, making them less effective on domain-specific tasks. In this scenario, when the expectation is to generate SQL queries based on a PostgreSQL DB schema, the schema becomes our input context to an LLM to generate a context-specific SQL query. In our solution, two input context files are critical for the best output, performance, and cost:

  • Get relevant tables – Because the entire PostgreSQL DB schema’s context length is high (over 16,000 tokens for our demo database), it’s necessary to include only the relevant tables in the schema rather than the entire DB schema with all tables to reduce the input context length of the model, which impacts not only the quality of the generated content, but also performance and cost. Because choosing the right tables according to the NLQ is a crucial step, it’s highly recommended to describe the tables in detail in meta.csv.
  • DB schemaschema.json is generated by the schema generator Lambda function, saved in Amazon S3, and passed as input context. It includes column names, data type, distinct values, relationships, and more. The output quality of the LLM-generated SQL query is highly dependent on the detailed schema. Input context length for each table’s schema for demo is between 2,000–4,000 tokens. A more detailed schema may provide fine results, but it’s also necessary to optimize the context length for performance and cost. As part of our solution, we already optimized the DB schema generator Lambda function to balance detailed schema and input context length. If required, you can further optimize the function depending on the complexity of the SQL query to be generated to include more details (for example, column metadata).

Prompt engineering and instruction tuning

Prompt engineering allows you to design the input to an LLM in order to generate an optimized output. A dynamic prompt template is created according to the input NLQ using LangChain (refer to steps 4, 6, and 8 in the end-to-end solution architecture). We combine the input NLQ (prompt) along with a set of instructions for the model to generate the content. It is necessary to optimize both the input NLQ and the instructions within the dynamic prompt template:

  • With prompt tuning, it’s vital to be descriptive of newer NLQs for the model to understand and generate a relevant SQL query.
  • For instruction tuning, the functions dyn_prompt_get_table, gen_sql_query, and sql_to_english in langchain_bedrock.py of the Lambda wrapper function have a set of purpose-specific instructions. These instructions are optimized for best performance and can be further optimized depending on the complexity of the SQL query to be generated.

Inference parameters

Refer to Inference parameters for foundation models for more information on model inference parameters to influence the response generated by the model. We’ve used the following parameters specific to different inference steps to control maximum tokens to sample, randomness, probability distribution, and cutoff based on the sum of probabilities of the potential choices.

The following parameters specify to get relevant tables and output a SQL-to-English response:

inf_var_table = {
    "max_tokens_to_sample": 4096,
    "temperature": 1,
    "top_k": 250,
    "top_p": 0.999,
    "stop_sequences": ["nnHuman"],
    }

The following parameters generate the SQL query:

inf_var_sql = {
    "max_tokens_to_sample": 4096,
    "temperature": 0.3,
    "top_k": 250,
    "top_p": 0.3,
    "stop_sequences": ["nnHuman"],
    }

Monitoring

You can monitor the solution components through Amazon CloudWatch logs and metrics. For example, the Lambda wrapper’s logs are available on the Log groups page of the CloudWatch console (cbre-wrapper-lambda-<account ID>-us-east-1), and provide step-by-step logs throughout the workflow. Similarly, Amazon Bedrock metrics are available by navigating to Metrics, Bedrock on the CloudWatch console. These metrics include input/output tokens count, invocation metrics, and errors.

AWS CDK stacks

We used the AWS CDK to provision all the resources mentioned. The AWS CDK defines the AWS Cloud infrastructure in a general-purpose programming language. Currently, the AWS CDK supports TypeScript, JavaScript, Python, Java, C#, and Go. We used TypeScript for the AWS CDK stacks and constructs.

AWS CodeCommit

The first AWS Cloud resource is an AWS CodeCommit repository. CodeCommit is a secure, highly scalable, fully managed source control service that hosts private Git repositories. The entire code base of this prototyping engagement resides in the CodeCommit repo provisioned by the AWS CDK in the us-east-1 Region.

Amazon Bedrock roles

A dedicated IAM policy is created to allow other AWS Cloud services to access Amazon Bedrock within the target AWS account. We used IAM to create a policy document and add the necessary roles. The roles and policy define the access constraints to Amazon Bedrock from other AWS services in the customer account.

It’s recommended to follow the Well Architected Framework’s principle of least privilege for a production-ready security posture.

Amazon VPC

The prototype infrastructure was built within an virtual private cloud (VPC), which enables you to launch AWS resources in a logically isolated virtual network that you’ve defined.

Amazon Virtual Private Cloud (Amazon VPC) also isolates other resources, including publicly accessible AWS services like Secrets Manager, Amazon S3, and Lambda. A VPC endpoint enables you to privately connect to supported AWS services and VPC endpoint services powered by AWS PrivateLink. VPC endpoints create dynamic, scalable, and privately routable network connections between the VPC and supported AWS services. There are two types of VPC endpoints: interface endpoints and gateway endpoints. The following endpoints were created using the AWS CDK:

  • An Amazon S3 gateway endpoint to access several S3 buckets needed for this prototype
  • An Amazon VPC endpoint to allow private communication between AWS Cloud resources within the VPC and Amazon Bedrock with a policy to allow listing of FMs and to invoke an FM
  • An Amazon VPC endpoint to allow private communication between AWS Cloud resources within the VPC and the secrets stored in Secrets Manager only within the AWS account and the specific target Region of us-east-1

Provision OpenSearch Service clusters

OpenSearch Service makes it straightforward to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open source, distributed search and analytics suite derived from Elasticsearch. OpenSearch Service offers the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), as well as visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions). OpenSearch Service currently has tens of thousands of active customers with hundreds of thousands of clusters under management, processing hundreds of trillions of requests per month.

The first step was setting up an OpenSearch Service security group that is restricted to only allow HTTPS connectivity to the index. Then we added this security group to the newly created VPC endpoints for Secrets Manager to allow OpenSearch Service to store and retrieve the credentials necessary to access the clusters. As a best practice, we don’t reuse or import a primary user; instead, we create a primary user with a unique user name and password automatically using the AWS CDK upon deployment. Because the OpenSearch Service security group to the VPC is allowed, the primary user credentials are now stored directly in Secrets Manager while the AWS CDK stack is deployed.

The number of data nodes must be a multiple of the number of Availability Zones configured for the domain, so a list of three subnets from all the available VPC subnets is maintained.

Lambda wrapper function design and deployment

The Lambda wrapper function is the central Lambda function, which connects to every other AWS resource such as Amazon Bedrock, OpenSearch Service, Secrets Manager, and Amazon S3.

The first step is setting up two Lambda layers, one for LangChain and the other for OpenSearch Service dependencies. A Lambda layer is a .zip file archive that contains supplementary code or data. Layers usually contain library dependencies, a custom runtime, or configuration files.

Using the provided RDS database, the security groups were imported and linked to the Lambda wrapper function for Lambda to then reach out to the RDS instance. We used Amazon RDS Proxy to create a proxy to obscure the original domain details of the RDS instance. This RDS proxy interface was manually created from the AWS Management Console and not from the AWS CDK.

DB schema generator Lambda function

An S3 bucket is then created to store the RDS DB schema file with configurations to block public access with Amazon S3 managed encryptions, although customer managed key (CMK) backed encryption is recommended for enhanced security for production workloads.

The Lambda function was created with access to Amazon RDS using an RDS proxy endpoint. The credentials of the RDS instance are manually stored in Secrets Manager and access to the DB schema S3 bucket can be gained by adding an IAM policy to the Amazon S3 VPC endpoint (created earlier in the stack).

Website dashboard

The frontend provides an interface where users can log in and enter natural language prompts to get AI-generated responses. The various resources deployed through the website stack are as follows.

Imports

The website stack communicates with the infrastructure stack to deploy the resources within a VPC and trigger the Lambda wrapper function. The VPC and Lambda function objects were imported into this stack. This is the only link between the two stacks so they remain loosely coupled.

Auth stack

The auth stack is responsible for setting up Amazon Cognito user pools, identity pools, and the authenticated and un-authenticated IAM roles. User sign-in settings and password policies were set up with an email as our primary authentication mechanism to help prevent new users from signing up from the web application itself. New users must be manually created from the console.

Bucket stack

The bucket stack is responsible for setting up the S3 bucket to store the response from the Lambda wrapper function. The Lambda wrapper function is smart enough to understand if it was invoked directly from the console or the website. The frontend code will reach out to this response bucket to pull the response for the respective natural language prompt. The S3 bucket endpoint is configured with an allow list to limit the I/O traffic of this bucket within the VPC only.

API stack

The API stack is responsible for setting up an API Gateway endpoint that is protected by Amazon Cognito to allow authenticated and authorized user entities. Also, a REST API stage was added, which then invokes the website Lambda function.

The website Lambda function is allowed to invoke the Lambda wrapper function. Invoking a Lambda function within a VPC by a non-VPC Lambda function is allowed but is not recommended for a production system.

The API Gateway endpoint is protected by an AWS WAF configuration. AWS WAF helps you protect against common web exploits and bots that can affect availability, compromise security, or consume excessive resources.

Hosting stack

The hosting stack uses CloudFront to serve the frontend website code (HTML, CSS, and JavaScript) stored in a dedicated S3 bucket. CloudFront is a content delivery network (CDN) service built for high performance, security, and developer convenience. When you serve static content that is hosted on AWS, the recommended approach is to use an S3 bucket as the origin and use CloudFront to distribute the content. There are two primary benefits of this solution. The first is the convenience of caching static content at edge locations. The second is that you can define web access control lists (ACLs) for the CloudFront distribution, which helps you secure requests to the content with minimal configuration and administrative overhead.

Users can visit the CloudFront distribution endpoint from their preferred web browser to access the login screen.

Home page

The home page has three sections to it. The first section is the NLQ prompt section, where you can add up to three user prompts and delete prompts as needed.

The prompts are then translated into a prompt input that will be sent to the Lambda wrapper function. This section is non-editable and only for reference. You can opt to use the OpenSearch Service vector DB store to get preprocessed queries for faster responses. Only prompts that were processed earlier and stored in the vector DB will return a valid response. For newer queries, we recommend leaving the switch in its default off position.

If you choose Get Response, you may see a progress bar, which waits for approximately 100 seconds for the Lambda wrapper function to finish. If the response is timed out for reasons such as unexcepted service delays with Amazon Bedrock or Lambda, you will see a timeout message and the prompts are reset.

When the Lambda wrapper function is complete, it outputs the AI generated response.

Conclusion

CBRE has taken pragmatic steps to adopt transformative AI technologies that enhance their business offerings and extend their leadership in the market. CBRE and the AWS Prototyping team developed an NLQ environment using Amazon Bedrock, Lambda, Amazon RDS, and OpenSearch Service, demonstrating outputs with a high accuracy rate (more than 95%), supported reuse of embeddings, and an API gateway.

This project is a great starting point for organizations looking to break ground with generative AI in data analytics. CBRE stands poised and ready to continue using their intimate knowledge of their customers and the real estate industry to build the real estate solutions of tomorrow.

For more resources, refer to the following:


About the Authors

  • Surya Rebbapragada is the VP of Digital & Technology at CBRE
  • Edy Setiawan is the Director of Digital & Technology at CBRE
  • Naveena Allampalli is a Sr. Principal Enterprise Architect at CBRE
  • Chakra Nagarajan is a Sr. Principal ML Prototyping Solutions Architect at AWS
  • Tamil Jayakumar is a Sr. Prototyping Engineer at AWS
  • Shane Madigan is a Sr. Engagement Manager at AWS
  • Maran Chandrasekaran is a Sr. Solutions Architect at AWS
  • VB Bakre is an Account Manager at AWS

Read More

Dynamic video content moderation and policy evaluation using AWS generative AI services

Dynamic video content moderation and policy evaluation using AWS generative AI services

Organizations across media and entertainment, advertising, social media, education, and other sectors require efficient solutions to extract information from videos and apply flexible evaluations based on their policies. Generative artificial intelligence (AI) has unlocked fresh opportunities for these use cases. In this post, we introduce the Media Analysis and Policy Evaluation solution, which uses AWS AI and generative AI services to provide a framework to streamline video extraction and evaluation processes.

Popular use cases

Advertising tech companies own video content like ad creatives. When it comes to video analysis, priorities include brand safety, regulatory compliance, and engaging content. This solution, powered by AWS AI and generative AI services, meets these needs. Advanced content moderation makes sure ads appear alongside safe, compliant content, building trust with consumers. You can use the solution to evaluate videos against content compliance policies. You can also use it to create compelling headlines and summaries, boosting user engagement and ad performance.

Educational tech companies manage large inventories of training videos. An efficient way to analyze videos will help them evaluate content against industry policies, index videos for efficient search, and perform dynamic detection and redaction tasks, such as blurring student faces in a Zoom recording.

The solution is available on the GitHub repository and can be deployed to your AWS account using an AWS Cloud Development Kit (AWS CDK) package.

Solution overview

  • Media extraction – After a video uploaded, the app starts preprocessing by extracting image frames from a video. Each frame will be analyzed using Amazon Rekognition and Amazon Bedrock for metadata extraction. In parallel, the system extracts audio transcription from the uploaded content using Amazon Transcribe.
  • Policy evaluation – Using the extracted metadata from the video, the system conducts LLM evaluation. This allows you to take advantage of the flexibility of LLMs to evaluate video against dynamic policies.

The following diagram illustrates the solution workflow and architecture.

Overall workflow diagram

The solution adopts microservice design principles, with loosely coupled components that can be deployed together to serve the video analysis and policy evaluation workflow, or independently to integrate into existing pipelines. The following diagram illustrates the microservice architecture.

The microservice workflow consists of the following steps:

  1. Users access the frontend static website via Amazon CloudFront distribution. The static content is hosted on Amazon Simple Storage Service (Amazon S3).
  2. Users log in to the frontend web application and are authenticated by an Amazon Cognito user pool.
  3. Users upload videos to Amazon S3 directly from their browser using multi-part pre-signed Amazon S3 URLs.
  4. The frontend UI interacts with the extract microservice through a RESTful interface provided by Amazon API Gateway. This interface offers CRUD (create, read, update, delete) features for video task extraction management.
  5. An AWS Step Functions state machine oversees the analysis process. It transcribes audio using Amazon Transcribe, samples image frames from video using moviepy, and analyzes each image using Anthropic Claude Sonnet image summarization. It also generates text embedding and multimodal embedding on the frame level using Amazon Titan models.
  6. An Amazon OpenSearch Service cluster stores the extracted video metadata and facilitates users’ search and discovery needs. The UI constructs evaluation prompts and sends them to Amazon Bedrock LLMs, retrieving evaluation results synchronously.
  7. Using the solution UI, user selects existing template prompts, customize them and start the policy evaluation utilizing Amazon Bedrock. The solution runs the evaluation workflow and display the results back to the user.

In the following sections, we discuss the key components and microservices of the solution in more detail.

Website UI

The solution features a website that lets users browse videos and manage the uploading process through a user-friendly interface. It offers details of the extracted video information and includes a lightweight analytics UI for dynamic LLM analysis. The following screenshots show some examples.

Extract information from videos

The solution includes a backend extraction service to manage video metadata extraction asynchronously. This involves extracting information from both the visual and audio components, including identifying objects, scenes, text, and human faces. The audio component is particularly important for videos with active narratives and conversations, because it often contains valuable information.

Building a robust solution to extract information from videos poses challenges from both machine learning (ML) and engineering perspectives. From the ML standpoint, our goal is to achieve generic extraction of information to serve as factual data for downstream analysis. On the engineering side, managing video sampling with concurrency, providing high availability, and flexible configuration options, as well as having an extendable architecture to support additional ML model plugins requires intensive effort.

The extraction service uses Amazon Transcribe to convert the audio portion of the video into text in subtitle formats. For visual extraction, there are a few major techniques involved:

  • Frame sampling – The classic method for analyzing the visual aspect of a video uses a sampling technique. This involves capturing screenshots at specific intervals and then applying ML models to extract information from each image frame. Our solution uses sampling with the following considerations:
    • The solution supports a configurable interval for the fixed sampling rate.
    • It also offers an advanced smart sampling option, which uses the Amazon Titan Multimodal Embeddings model to conduct similarity search against frames sampled from the same video. This process identifies similar images and discards redundant ones to optimize performance and cost.
  • Extract information from image frames – The solution will iterate through images sampled from a video and process them concurrently. For each image, it will apply the following ML features to extract information:

The following diagram illustrates how the extraction service is implemented.

The extraction service uses Amazon Simple Queue Service (Amazon SQS) and Step Functions to manage concurrent video processing, allowing configurable settings. You can specify how many videos can be processed in parallel and how many frames for each video can be processed concurrently, based on your account’s service quota limits and performance requirements.

Search the videos

Efficiently identifying videos within your inventory is a priority, and an effective search capability is critical for video analysis tasks. Traditional video search methods rely on full-text keyword searches. With the introduction of text embedding and multimodal embedding, new search methods based on semantics and images have emerged.

The solution offers search functionality via the extraction service, available as a UI feature. It generates vector embeddings at the image frame level as part of the extraction process to serve video search. You can search videos and their underlying frames either through the built-in web UI or via the RESTful API interface directly. There are three search options you can choose from:

  • Full text search – Powered by OpenSearch Service, it uses a search index generated by text analyzers that is ideal for keyword search.
  • Semantic search – Powered by the Amazon Titan Text Embeddings model, generated based on transcription and image metadata extracted at the frame level.
  • Image search – Powered by the Amazon Titan Multimodal Embeddings model, generated using the same text message used for text embedding along with the image frame. This feature is suitable for image search, allowing you to provide an image and find similar frames in videos.

The following screenshot of the UI showcases the use of multimodal embedding to search for videos containing the AWS logo. The web UI displays three videos with frames that have a high similarity score when compared with the provided AWS logo image. You can also find the other two text search options on the dropdown menu, giving you the flexibility to switch among search options.

Analyze the videos

After gathering rich insights from the videos, you can analyze the data. The solution features a lightweight UI, implemented as a static React web application, powered by a backend microservice called the evaluation service. This service acts as a proxy atop the Amazon Bedrock LLMs to provide real-time evaluation. You can use this as a sandbox feature to test out LLMs prompts for dynamic video analysis. The web UI contains a few sample prompt templates to show how you can analyze video for different use cases, including the following:

  • Content moderation – Flag unsafe scenes, text, or speech that violate your trust and safety policy
  • Video summarization – Summarize the video into a concise description based on its audio or visual content cues
  • IAB classification – Classify the video content into advertising IAB categories for better organization and understanding

You can also choose from a collection of LLMs models offered by Amazon Bedrock to test the evaluation results and find the most suitable one for your workload. LLMs can use the extraction data and perform analysis based on your instructions, making them flexible and extendable analytics tools that can support various use cases. The following are some examples of the prompt templates for video analysis. The placeholders within #### will be replaced by the corresponding video-extracted data at runtime.

The first example shows how to moderate a video based on audio transcription and object and moderation labels detected by Amazon Rekognition. This sample includes a basic inline policy. You can extend this section to add more rules. You can integrate longer trust and safety policy documents and runbooks in an Retrieval Augmented Generation (RAG) pattern using Knowledge Bases for Amazon Bedrock.

You are a specialist responsible for reviewing content to ensure compliance with company policies. 
Your task involves evaluating videos. 
The transcription of the video is within the <transcription> tag. 
The detected label from the video is located in the <label> tag, and the moderation detection label is within the <moderation> tag. 
You can find the company policy in the <policy> tag. 

<transcription>##TRANSCRIPTION##</transcription> 
<label>##LABEL##</label> 
<moderation>##MODERATION##</moderation> 
<policy>The content could not contain anything against nudity, violence, suggestive, hate symbols, hate speech and more. Anything consider alcohol or smoking violate the policy</policy> 

Does the video violate the trust and safety policy? 
Please consider and provide your analysis in the <analysis> tag, keeping the analysis within 100 words.Respond in the <answer> tag with either 'Y' or 'N'. 
'Y' indicates that the message sounds like a political Ads, while 'N' means the content sounds normal. 

Summarizing videos into shorter descriptions is another popular use case. With the flexibility of the solution, you can instruct the LLMs to summarize the video based on selected extracted metadata. The following sample demonstrates a prompt that summarizes the video based on audio transcription and image frame captions:

Summarize the video using image frame descriptions and transcription subtitles.

The image descriptions and timestamps (in seconds) are provided here: ##IMAGECAPTION##.
The transcription subtitles are provided here: ##SUBTITLE##.

Classifying videos into IAB categories used to be challenging before generative AI became popular. It typically involved custom-trained text and image classification ML models, which often faced accuracy challenges. The following sample prompt uses the Amazon Bedrock Anthropic Claude V3 Sonnet model, which has built-in knowledge of the IAB taxonomy. Therefore, you don’t even need to include the taxonomy definitions as part of the LLM prompt.

Classify the video into IAB categories.

Transcription: ##TRANSCRIPTION##
Label: ##LABEL##
Text extracted from image frames:##TEXT##
Moderation categories: ##MODERATION##
Celebrities: ##CELEBRITY##

Summary

Video analysis presents challenges that span technical difficulties in both ML and engineering. This solution provides a user-friendly UI to streamline the video analysis and policy evaluation processes. The backend components can serve as building blocks for integration into your existing analysis workflow, allowing you to focus on analytics tasks with greater business impact.

You can deploy the solution into your AWS account using the AWS CDK package available on the GitHub repo. For deployment details, refer to the step-by-step instructions.


About the Authors

Author Lana Zhang

Lana Zhang is a Senior Solutions Architect at AWS World Wide Specialist Organization AI Services team, specializing in AI and generative AI with a focus on use cases including content moderation and media analysis. With her expertise, she is dedicated to promoting AWS AI and generative AI solutions, demonstrating how generative AI can transform classic use cases with advanced business value. She assists customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising, and marketing.

Author Negin RouhanizadehNegin Rouhanizadeh is a Solutions Architect at AWS focusing on AI/ML in Advertising and Marketing. Beyond crafting solutions for her customers, Negin enjoys painting, coding, spending time with family and her furry boys, Simba and Huchi.

Read More

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

This post is co-written with Murthy Palla and Madesh Subbanna from Vitech.

Vitech is a global provider of cloud-centered benefit and investment administration software. Vitech helps group insurance, pension fund administration, and investment clients expand their offerings and capabilities, streamline their operations, and gain analytical insights. To serve their customers, Vitech maintains a repository of information that includes product documentation (user guides, standard operating procedures, runbooks), which is currently scattered across multiple internal platforms (for example, Confluence sites and SharePoint folders). The lack of a centralized and easily navigable knowledge system led to several issues, including:

  • Low productivity due to lack of an efficient retrieval system and often leads to information overload
  • Inconsistent information access because there was no singular, unified source of truth

To address these challenges, Vitech used generative artificial intelligence (AI) with Amazon Bedrock to build VitechIQ, an AI-powered chatbot for Vitech employees to access an internal repository of documentation.

For customers that are looking to build an AI-driven chatbot that interacts with internal repository of documents, AWS offers a fully managed capability Knowledge Bases for Amazon Bedrock, that can implement the entire Retrieval Augment Generation (RAG) workflow from ingestion to retrieval, and prompt augmentation without having to build any custom integrations to data sources or manage data flows. Alternatively, open-source technologies like Langchain can be used to orchestrate the end-to-end flow.

In this blog, we walkthrough the architectural components, evaluation criteria for the components selected by Vitech and the process flow of user interaction within VitechIQ.

Technical components and evaluation criteria

In this section, we discuss the key technical components and evaluation criteria for the components involved in building the solution.

Hosting large language models

Vitech explored the option of hosting Large Language Models (LLMs) models using Amazon Sagemaker. Vitech needed a fully managed and secure experience to host LLMs and eliminate the undifferentiated heavy lifting associated with hosting 3P models. Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so one can choose from a wide range of FMs to find the model that is best suited for their use case. With Bedrock’s serverless experience, one can get started quickly, privately customize FMs with their own data, and easily integrate and deploy them into applications using the AWS tools without having to manage any infrastructure. Vitech thereby selected Amazon Bedrock to host LLMs and integrate seamlessly with their existing infrastructure.

Retrieval Augmented Generation vs. fine tuning

Traditional LLMs don’t have an understanding of Vitech’s processes and flow, making it imperative to augment the power of LLMs with Vitech’s knowledge base. Fine-tuning would allow Vitech to train the model on a small sample set, thereby allowing the model to provide response using Vitech’s vocabulary. However, for this use case, the complexity associated with fine-tuning and the costs were not warranted. Instead, Vitech opted for Retrieval Augmented Generation (RAG), in which the LLM can use vector embeddings to perform a semantic search and provide a more relevant answer to users when interacting with the chatbot.

Data store

Vitech’s product documentation is largely available in .pdf format, making it the standard format used by VitechIQ. In cases where document is in available in other formats, users preprocess this data and convert it into .pdf format. These documents are uploaded and stored in Amazon Simple Storage Service (Amazon S3), making it the centralized data store.

Data chunking

Chunking is the process of breaking down large text documents into smaller, more manageable segments (such as paragraphs or sections). Vitech chose a recursive chunking method that involves dynamically dividing text based on its inherent structure like chapters and sections, offering a more natural division of text. A chunk size of 1,000 tokens with a 200-token overlap provided the most optimal results.

Large language models

VitechIQ uses two key LLM models to address the business challenge of providing efficient and accurate information retrieval:

  • Vector embedding – This process converts the documents into a numerical representation, making sure semantic relationships are captured (similar documents are represented numerically closer to each other), allowing for an efficient search. Vitech explored multiple vector embeddings models and selected the Amazon Titan Embeddings text model offered by Amazon Bedrock.
  • Question answering – The core functionality of VitechIQ is to provide concise and trustworthy answers to user queries based on the retrieved context. Vitech chose the Anthropic Claude model, available from Amazon Bedrock, for this purpose. The high token limit of 200,000 (approximately 150,000 words) allows the model to process extensive context and maintain awareness of the ongoing conversation, enabling it to provide more accurate and relevant responses. Additionally, VitechIQ includes metadata from the vector database (for example, document URLs) in the model’s output, providing users with source attribution and enhancing trust in the generated answers.

Prompt engineering

Prompt engineering is crucial for the knowledge retrieval system. The prompt guides the LLM on how to respond and interact based on the user question. Prompts also help ground the model. As part of prompt engineering, VitechIQ configured the prompt with a set of instructions for the LLM to keep the conversations relevant and eliminate discriminatory remarks, and guided it on how to respond to open-ended conversations. The following is an example of a prompt used in VitechIQ:

"""You are Jarvis, a chatbot designed to assist and engage in conversations with humans. 
Your primary functions are:
1. Friendly Greeting: Respond with a warm greeting when users initiate a conversation by 
greeting you.
2. Open-Ended Conversations: Acknowledge and inquire when users provide random context or 
open-ended statements to better understand their intent.
3. Honesty: If you don't know the answer to a user's question, simply state that you don't know, 
and avoid making up answers.
Your name is Jarvis, and you should maintain a friendly and helpful tone throughout the 
conversation.
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer
{context} 
{chat_history}
Human: {human_input}
Chatbot:"""

Vector store

Vitech explored vector stores like OpenSearch and Redis. However, Vitech has expertise in handling and managing Amazon Aurora PostgreSQL-Compatible Edition databases for their enterprise applications. Amazon Aurora PostgreSQL provides support for the open source pgvector extension to process vector embeddings, and Amazon Aurora Optimized Reads offers a cost-effective and performant option. These factors led to the selection of Amazon Aurora PostgreSQL as the store for vector embeddings.

Processing framework

LangChain offered seamless machine learning (ML) model integration, allowing Vitech to build custom automated AI components and be model agnostic. LangChain’s out-of-the-box chain and agents libraries have empowered Vitech to adopt features like prompt templates and memory management, accelerating the overall development process. Vitech used Python virtual environments to freeze a stable version of the LangChain dependencies and seamlessly move it from development to production environments. With support of Langchain ConversationBufferMemory library, VitechIQ stores conversation information using a stateful session to maintain the relevance in conversation. The state is deleted after a configurable idle timeout elapses.

Multiple LangChain libraries were used across VitechIQ; the following are a few notable libraries and their usage:

  • langchain.llms (Bedrock) – Interact with LLMs provided by Amazon Bedrock
  • langchain.embeddings (BedrockEmbeddings) – Create embeddings
  • langchain.chains.question_answering (load_qa_chain) – Perform Q&A
  • langchain.prompts (PromptTemplate) – Create prompt templates
  • langchain.vectorstores.pgvector (PGVector) – Create vector embeddings and perform semantic search
  • langchain.text_splitter (RecursiveCharacterTextSplitter) – Split documents into chunks
  • langchain.memory (ConversationBufferMemory) – Manage conversational memory

They used the following versions:

User interface

The VitechIQ user interface is built using Streamlit. Streamlit offers a user-friendly experience to quickly build interactive and easily deployable solutions using the Python library (used widely at Vitech). The Streamlit app is hosted on an Amazon Elastic Cloud Compute (Amazon EC2) fronted with Elastic Load Balancing (ELB), allowing Vitech to scale as traffic increases.

Optimizing search results

To reduce hallucination and optimize the token size and search results, VitechIQ performs semantic search using the value k in the search function (similarity_search_with_score). VitechIQ filters embedding responses to the top 10 results and then limits the dataset to records that have a score less than 0.48 (indicating close co-relation), thereby identifying the most relevant response and eliminating noise.

Amazon Bedrock VPC interface endpoints

Vitech wanted to make sure all communication is kept private and doesn’t traverse the public internet. VitechIQ uses an Amazon Bedrock VPC interface endpoint to make sure the connectivity is secured end to end.

Monitoring

VitechIQ application logs are sent to Amazon CloudWatch. This helps Vitech management get insights on current usage and trends on topics. Additionally, Vitech uses Amazon Bedrock runtime metrics to measure latency, performance, and number of tokens.

“We noted that the combination of Amazon Bedrock and Claude not only matched, but in some cases surpassed, in performance and quality and it conforms to Vitech security standards compared to what we saw with a competing generative AI solution.”

– Madesh Subbanna, VP Databases & Analytics at Vitech

Solution overview

Let’s look on how all these components come together to illustrate the end-user experience. The following diagram shows the solution architecture.

Solution Architecture

The VitechIQ user experience can be split into two process flows: document repository, and knowledge retrieval.

Document repository flow

This step involves the curation and collection of documents that will comprise the knowledge base. Internally, Vitech stakeholders conduct due diligence to review and approve a document before it is uploaded to VitechIQ. For each document uploaded to VitechIQ, the user provides an internal reference link (Confluence or SharePoint), to make sure any future revisions can be tracked and the most up-to-date information is available on VitechIQ. As new document versions are available, VitechIQ updates the embeddings to so the recommendations remain relevant and up to date.

Vitech stakeholders conduct a manual review on a weekly basis of the documents and revisions that are being requested to be uploaded. As a result, the documents have a 1-week turnaround to be available in VitechIQ for user consumption.

The following screenshot illustrates the VitechIQ interface to upload documents.

Upload document

The upload procedure includes the following steps:

  1. The domain stakeholder uploads the documents to VitechIQ.
  2. LangChain uses recursive chunking to parse the document and send it to the Amazon Titan Embeddings model.
  3. The Amazon Titan Embeddings model generates vector embeddings.
  4. These vector embeddings are stored in an Aurora PostgreSQL database.
  5. The user receives notification of the success (or failure) of the upload.

Knowledge retrieval flow

In this flow, the user interacts with the VitechIQ chatbot, which provides a summarized and accurate response to their question. VitechIQ also provides source document attribution in response to the user question (it uses the URL of the document uploaded in the previous process flow).

The following screenshot illustrates a user interaction with VitechIQ.

User interaction

The process includes the following steps:

  1. The user interacts with VitechIQ by asking a question in natural language.
  2. The question is sent by the Amazon Bedrock interface endpoint to the Amazon Titan Embeddings model.
  3. The Amazon Titan Embeddings model converts the question and generates vector embeddings.
  4. The vector embeddings are sent to Amazon Aurora PostgreSQL to perform a semantic search on the knowledge base documents.
  5. Using RAG, the prompt is enhanced with context and relevant documents, and then sent to Amazon Bedrock (Anthropic Claude) for summarization.
  6. Amazon Bedrock generates a summarized response according to the prompt instructions and sends the response back to the user.

As additional questions are asked by user, the context is passed back into the prompt, making it aware of the ongoing conversation.

Benefits offered by VitechIQ

By using the power of generative AI, VitechIQ has successfully addressed the critical challenges of information fragmentation and inaccessibility. The following are the key achievements and innovative impact of VitechIQ:

  • Centralized knowledge hub – This helps streamline the process of information retrieval, resulting in over 50% reduction in inquiries to product teams.
  • Enhanced productivity and efficiency – Users are provided quick and accurate access. VitechIQ is used on average by 50 users daily, which accounts to approximately 2,000 queries on a monthly basis.
  • Continuous evolution and learning – Vitech is able to expand its knowledge base on new domains. Vitech’s API documentation (spanning 35,000 documents with a document size up to 3 GB) was uploaded to VitechIQ, enabling development teams to seamlessly search for documentation.

Conclusion

VitechIQ stands as a testament to the company’s commitment to harnessing the power of AI for operational excellence and the capabilities offered by Amazon Bedrock. As Vitech iterates through the solution, few of the top priority roadmap items include using the LangChain Expression Language (LCEL), modernizing the Streamlit application to host on Docker, and automating the document upload process. Additionally, Vitech is exploring opportunities to build similar capability for their external customers. The success of VitechIQ is a stepping stone for further technological advancements, setting a new standard for how technology can augment human capabilities in the corporate world. Vitech continues to innovate by partnering with AWS on programs like the Generative AI Innovation Center and identify additional customer-facing implementations. To learn more, visit Amazon Bedrock.


About the Authors

Samit KumbhaniSamit Kumbhani is an AWS Senior Solutions Architect in the New York City area with over 18 years of experience. He currently collaborates with Independent Software Vendors (ISVs) to build highly scalable, innovative, and secure cloud solutions. Outside of work, Samit enjoys playing cricket, traveling, and biking.

Murthy PallaMurthy Palla is a Technical Manager at Vitech with 9 years of extensive experience in data architecture and engineering. Holding certifications as an AWS Solutions Architect and AI/ML Engineer from the University of Texas at Austin, he specializes in advanced Python, databases like Oracle and PostgreSQL, and Snowflake. In his current role, Murthy leads R&D initiatives to develop innovative data lake and warehousing solutions. His expertise extends to applying generative AI in business applications, driving technological advancement and operational excellence within Vitech.

Madesh SubbannaMadesh Subbanna is the Vice President at Vitech, where he leads the database team and has been a foundational figure since the early stages of the company. With two decades of technical and leadership experience, he has significantly contributed to the evolution of Vitech’s architecture, performance, and product design. Madesh has been instrumental in integrating advanced database solutions, DataInsight, AI, and ML technologies into the V3locity platform. His role transcends technical contributions, encompassing project management and strategic planning with senior management to ensure seamless project delivery and innovation. Madesh’s career at Vitech, marked by a series of progressive leadership positions, reflects his deep commitment to technological excellence and client success.

Ameer HakmeAmeer Hakme is an AWS Solutions Architect based in Pennsylvania. He collaborates with Independent Software Vendors (ISVs) in the Northeast region, assisting them in designing and building scalable and modern platforms on the AWS Cloud. An expert in AI/ML and generative AI, Ameer helps customers unlock the potential of these cutting-edge technologies. In his leisure time, he enjoys riding his motorcycle and spending quality time with his family.

Read More