Build a self-service digital assistant using Amazon Lex and Knowledge Bases for Amazon Bedrock

Build a self-service digital assistant using Amazon Lex and Knowledge Bases for Amazon Bedrock

Organizations strive to implement efficient, scalable, cost-effective, and automated customer support solutions without compromising the customer experience. Generative artificial intelligence (AI)-powered chatbots play a crucial role in delivering human-like interactions by providing responses from a knowledge base without the involvement of live agents. These chatbots can be efficiently utilized for handling generic inquiries, freeing up live agents to focus on more complex tasks.

Amazon Lex provides advanced conversational interfaces using voice and text channels. It features natural language understanding capabilities to recognize more accurate identification of user intent and fulfills the user intent faster.

Amazon Bedrock simplifies the process of developing and scaling generative AI applications powered by large language models (LLMs) and other foundation models (FMs). It offers access to a diverse range of FMs from leading providers such as Anthropic Claude, AI21 Labs, Cohere, and Stability AI, as well as Amazon’s proprietary Amazon Titan models. Additionally, Knowledge Bases for Amazon Bedrock empowers you to develop applications that harness the power of Retrieval Augmented Generation (RAG), an approach where retrieving relevant information from data sources enhances the model’s ability to generate contextually appropriate and informed responses.

The generative AI capability of QnAIntent in Amazon Lex lets you securely connect FMs to company data for RAG. QnAIntent provides an interface to use enterprise data and FMs on Amazon Bedrock to generate relevant, accurate, and contextual responses. You can use QnAIntent with new or existing Amazon Lex bots to automate FAQs through text and voice channels, such as Amazon Connect.

With this capability, you no longer need to create variations of intents, sample utterances, slots, and prompts to predict and handle a wide range of FAQs. You can simply connect QnAIntent to company knowledge sources and the bot can immediately handle questions using the allowed content.

In this post, we demonstrate how you can build chatbots with QnAIntent that connects to a knowledge base in Amazon Bedrock (powered by Amazon OpenSearch Serverless as a vector database) and build rich, self-service, conversational experiences for your customers.

Solution overview

The solution uses Amazon Lex, Amazon Simple Storage Service (Amazon S3), and Amazon Bedrock in the following steps:

  1. Users interact with the chatbot through a prebuilt Amazon Lex web UI.
  2. Each user request is processed by Amazon Lex to determine user intent through a process called intent recognition.
  3. Amazon Lex provides the built-in generative AI feature QnAIntent, which can be directly attached to a knowledge base to fulfill user requests.
  4. Knowledge Bases for Amazon Bedrock uses the Amazon Titan embeddings model to convert the user query to a vector and queries the knowledge base to find the chunks that are semantically similar to the user query. The user prompt is augmented along with the results returned from the knowledge base as an additional context and sent to the LLM to generate a response.
  5. The generated response is returned through QnAIntent and sent back to the user in the chat application through Amazon Lex.

The following diagram illustrates the solution architecture and workflow.

In the following sections, we look at the key components of the solution in more detail and the high-level steps to implement the solution:

  1. Create a knowledge base in Amazon Bedrock for OpenSearch Serverless.
  2. Create an Amazon Lex bot.
  3. Create new generative AI-powered intent in Amazon Lex using the built-in QnAIntent and point the knowledge base.
  4. Deploy the sample Amazon Lex web UI available in the GitHub repo. Use the provided AWS CloudFormation template in your preferred AWS Region and configure the bot.

Prerequisites

To implement this solution, you need the following:

  1. An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
  2. Familiarity with AWS services such as Amazon S3, Amazon Lex, Amazon OpenSearch Service, and Amazon Bedrock.
  3. Access enabled for the Amazon Titan Embeddings G1 – Text model and Anthropic Claude 3 Haiku on Amazon Bedrock. For instructions, see Model access.
  4. A data source in Amazon S3. For this post, we use Amazon shareholder docs (Amazon Shareholder letters – 2023 & 2022) as a data source to hydrate the knowledge base.

Create a knowledge base

To create a new knowledge base in Amazon Bedrock, complete the following steps. For more information, refer to Create a knowledge base.

  1. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  2. Choose Create knowledge base.
  3. On the Provide knowledge base details page, enter a knowledge base name, IAM permissions, and tags.
  4. Choose Next.
  5. For Data source name, Amazon Bedrock prepopulates the auto-generated data source name; however, you can change it to your requirements.
  6. Keep the data source location as the same AWS account and choose Browse S3.
  7. Select the S3 bucket where you uploaded the Amazon shareholder documents and choose Choose.
    This will populate the S3 URI, as shown in the following screenshot.
  8. Choose Next.
  9. Select the embedding model to vectorize the documents. For this post, we select Titan embedding G1 – Text v1.2.
  10. Select Quick create a new vector store to create a default vector store with OpenSearch Serverless.
  11. Choose Next.
  12. Review the configurations and create your knowledge base.
    After the knowledge base is successfully created, you should see a knowledge base ID, which you need when creating the Amazon Lex bot.
  13. Choose Sync to index the documents.

Create an Amazon Lex bot

Complete the following steps to create your bot:

  1. On the Amazon Lex console, choose Bots in the navigation pane.
  2. Choose Create bot.
  3. For Creation method, select Create a blank bot.
  4. For Bot name, enter a name (for example, FAQBot).
  5. For Runtime role, select Create a new IAM role with basic Amazon Lex permissions to access other services on your behalf.
  6. Configure the remaining settings based on your requirements and choose Next.
  7. On the Add language to bot page, you can choose from different languages supported.
    For this post, we choose English (US).
  8. Choose Done.

    After the bot is successfully created, you’re redirected to create a new intent.
  9. Add utterances for the new intent and choose Save intent.

Add QnAIntent to your intent

Complete the following steps to add QnAIntent:

  1. On the Amazon Lex console, navigate to the intent you created.
  2. On the Add intent dropdown menu, choose Use built-in intent.
  3. For Built-in intent, choose AMAZON.QnAIntent – GenAI feature.
  4. For Intent name, enter a name (for example, QnABotIntent).
  5. Choose Add.

    After you add the QnAIntent, you’re redirected to configure the knowledge base.
  6. For Select model, choose Anthropic and Claude3 Haiku.
  7. For Choose a knowledge store, select Knowledge base for Amazon Bedrock and enter your knowledge base ID.
  8. Choose Save intent.
  9. After you save the intent, choose Build to build the bot.
    You should see a Successfully built message when the build is complete.
    You can now test the bot on the Amazon Lex console.
  10. Choose Test to launch a draft version of your bot in a chat window within the console.
  11. Enter questions to get responses.

Deploy the Amazon Lex web UI

The Amazon Lex web UI is a prebuilt fully featured web client for Amazon Lex chatbots. It eliminates the heavy lifting of recreating a chat UI from scratch. You can quickly deploy its features and minimize time to value for your chatbot-powered applications. Complete the following steps to deploy the UI:

  1. Follow the instructions in the GitHub repo.
  2. Before you deploy the CloudFormation template, update the LexV2BotId and LexV2BotAliasId values in the template based on the chatbot you created in your account.
  3. After the CloudFormation stack is deployed successfully, copy the WebAppUrl value from the stack Outputs tab.
  4. Navigate to the web UI to test the solution in your browser.

Clean up

To avoid incurring unnecessary future charges, clean up the resources you created as part of this solution:

  1. Delete the Amazon Bedrock knowledge base and the data in the S3 bucket if you created one specifically for this solution.
  2. Delete the Amazon Lex bot you created.
  3. Delete the CloudFormation stack.

Conclusion

In this post, we discussed the significance of generative AI-powered chatbots in customer support systems. We then provided an overview of the new Amazon Lex feature, QnAIntent, designed to connect FMs to your company data. Finally, we demonstrated a practical use case of setting up a Q&A chatbot to analyze Amazon shareholder documents. This implementation not only provides prompt and consistent customer service, but also empowers live agents to dedicate their expertise to resolving more complex issues.

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.


About the Authors

Supriya Puragundla is a Senior Solutions Architect at AWS. She has over 15 years of IT experience in software development, design and architecture. She helps key customer accounts on their data, generative AI and AI/ML journeys. She is passionate about data-driven AI and the area of depth in ML and generative AI.

Manjula Nagineni is a Senior Solutions Architect with AWS based in New York. She works with major financial service institutions, architecting and modernizing their large-scale applications while adopting AWS Cloud services. She is passionate about designing cloud-centered big data workloads. She has over 20 years of IT experience in software development, analytics, and architecture across multiple domains such as finance, retail, and telecom.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Read More

Identify idle endpoints in Amazon SageMaker

Identify idle endpoints in Amazon SageMaker

Amazon SageMaker is a machine learning (ML) platform designed to simplify the process of building, training, deploying, and managing ML models at scale. With a comprehensive suite of tools and services, SageMaker offers developers and data scientists the resources they need to accelerate the development and deployment of ML solutions.

In today’s fast-paced technological landscape, efficiency and agility are essential for businesses and developers striving to innovate. AWS plays a critical role in enabling this innovation by providing a range of services that abstract away the complexities of infrastructure management. By handling tasks such as provisioning, scaling, and managing resources, AWS allows developers to focus more on their core business logic and iterate quickly on new ideas.

As developers deploy and scale applications, unused resources such as idle SageMaker endpoints can accumulate unnoticed, leading to higher operational costs. This post addresses the issue of identifying and managing idle endpoints in SageMaker. We explore methods to monitor SageMaker endpoints effectively and distinguish between active and idle ones. Additionally, we walk through a Python script that automates the identification of idle endpoints using Amazon CloudWatch metrics.

Identify idle endpoints with a Python script

To effectively manage SageMaker endpoints and optimize resource utilization, we use a Python script that uses the AWS SDK for Python (Boto3) to interact with SageMaker and CloudWatch. This script automates the process of querying CloudWatch metrics to determine endpoint activity and identifies idle endpoints based on the number of invocations over a specified time period.

Let’s break down the key components of the Python script and explain how each part contributes to the identification of idle endpoints:

  • Global variables and AWS client initialization – The script begins by importing necessary modules and initializing global variables such as NAMESPACE, METRIC, LOOKBACK, and PERIOD. These variables define parameters for querying CloudWatch metrics and SageMaker endpoints. Additionally, AWS clients for interacting with SageMaker and CloudWatch services are initialized using Boto3.
from datetime import datetime, timedelta
import boto3
import logging

# AWS clients initialization
cloudwatch = boto3.client("cloudwatch")
sagemaker = boto3.client("sagemaker")

# Global variables
NAMESPACE = "AWS/SageMaker"
METRIC = "Invocations"
LOOKBACK = 1  # Number of days to look back for activity
PERIOD = 86400  # We opt for a granularity of 1 Day to reduce the volume of metrics retrieved while maintaining accuracy.

# Calculate time range for querying CloudWatch metrics
ago = datetime.utcnow() - timedelta(days=LOOKBACK)
now = datetime.utcnow()
  • Identify idle endpoints – Based on the CloudWatch metrics data, the script determines whether an endpoint is idle or active. If an endpoint has received no invocations over the defined period, it’s flagged as idle. In this case, we select a cautious default threshold of zero invocations over the analyzed period. However, depending on your specific use case, you can adjust this threshold to suit your requirements.
# Helper function to extract endpoint name from CloudWatch metric

def get_endpoint_name_from_metric(metric):
    for d in metric["Dimensions"]:
        if d["Name"] == "EndpointName" or d["Name"] == "InferenceComponentName" :
            yield d["Value"]

# Helper Function to aggregate individual metrics for a designated endpoint and output the total. This validation helps in determining if the endpoint has been idle during the specified period.

def list_metrics():
    paginator = cloudwatch.get_paginator("list_metrics")
    response_iterator = paginator.paginate(Namespace=NAMESPACE, MetricName=METRIC)
    return [m for r in response_iterator for m in r["Metrics"]]


# Helper function to check if endpoint is in use based on CloudWatch metrics

def is_endpoint_busy(metric):
    metric_values = cloudwatch.get_metric_data(
        MetricDataQueries=[{
            "Id": "metricname",
            "MetricStat": {
                "Metric": {
                    "Namespace": metric["Namespace"],
                    "MetricName": metric["MetricName"],
                    "Dimensions": metric["Dimensions"],
                },
                "Period": PERIOD,
                "Stat": "Sum",
                "Unit": "None",
            },
        }],
        StartTime=ago,
        EndTime=now,
        ScanBy="TimestampAscending",
        MaxDatapoints=24 * (LOOKBACK + 1),
    )
    return sum(metric_values.get("MetricDataResults", [{}])[0].get("Values", [])) > 0

# Helper function to log endpoint activity

def log_endpoint_activity(endpoint_name, is_busy):
    status = "BUSY" if is_busy else "IDLE"
    log_message = f"{datetime.utcnow()} - Endpoint {endpoint_name} {status}"
    print(log_message)
  • Main function – The main() function serves as the entry point to run the script. It orchestrates the process of retrieving SageMaker endpoints, querying CloudWatch metrics, and logging endpoint activity.
# Main function to identify idle endpoints and log their activity status
def main():
    endpoints = sagemaker.list_endpoints()["Endpoints"]
    
    if not endpoints:
        print("No endpoints found")
        return

    existing_endpoints_name = []
    for endpoint in endpoints:
        existing_endpoints_name.append(endpoint["EndpointName"])
    
    for metric in list_metrics():
        for endpoint_name in get_endpoint_name_from_metric(metric):
            if endpoint_name in existing_endpoints_name:
                is_busy = is_endpoint_busy(metric)
                log_endpoint_activity(endpoint_name, is_busy)
            else:
                print(f"Endpoint {endpoint_name} not active")

if __name__ == "__main__":
    main()

By following along with the explanation of the script, you’ll gain a deeper understanding of how to automate the identification of idle endpoints in SageMaker, paving the way for more efficient resource management and cost optimization.

Permissions required to run the script

Before you run the provided Python script to identify idle endpoints in SageMaker, make sure your AWS Identity and Access Management (IAM) user or role has the necessary permissions. The permissions required for the script include:

  • CloudWatch permissions – The IAM entity running the script must have permissions for the CloudWatch actions cloudwatch:GetMetricData and cloudwatch:ListMetrics
  • SageMaker permissions – The IAM entity must have permissions to list SageMaker endpoints using the sagemaker:ListEndpoints action

Run the Python script

You can run the Python script using various methods, including:

  • The AWS CLI – Make sure the AWS Command Line Interface (AWS CLI) is installed and configured with the appropriate credentials.
  • AWS Cloud9 – If you prefer a cloud-based integrated development environment (IDE), AWS Cloud9 provides an IDE with preconfigured settings for AWS development. Simply create a new environment, clone the script repository, and run the script within the Cloud9 environment.

In this post, we demonstrate running the Python script through the AWS CLI.

Actions to take after identifying idle endpoints

After you’ve successfully identified idle endpoints in your SageMaker environment using the Python script, you can take proactive steps to optimize resource utilization and reduce operational costs. The following are some actionable measures you can implement:

  • Delete or scale down endpoints – For endpoints that consistently show no activity over an extended period, consider deleting or scaling them down to minimize resource wastage. SageMaker allows you to delete idle endpoints through the AWS Management Console or programmatically using the AWS SDK.
  • Review and refine the model deployment strategy – Evaluate the deployment strategy for your ML models and assess whether all deployed endpoints are necessary. Sometimes, endpoints may become idle due to changes in business requirements or model updates. By reviewing your deployment strategy, you can identify opportunities to consolidate or optimize endpoints for better efficiency.
  • Implement auto scaling policies – Configure auto scaling policies for active endpoints to dynamically adjust the compute capacity based on workload demand. SageMaker supports auto scaling, allowing you to automatically increase or decrease the number of instances serving predictions based on predefined metrics such as CPU utilization or inference latency.
  • Explore serverless inference options – Consider using SageMaker serverless inference as an alternative to traditional endpoint provisioning. Serverless inference eliminates the need for manual endpoint management by automatically scaling compute resources based on incoming prediction requests. This can significantly reduce idle capacity and optimize costs for intermittent or unpredictable workloads.

Conclusion

In this post, we discussed the importance of identifying idle endpoints in SageMaker and provided a Python script to help automate this process. By implementing proactive monitoring solutions and optimizing resource utilization, SageMaker users can effectively manage their endpoints, reduce operational costs, and maximize the efficiency of their machine learning workflows.

Get started with the techniques demonstrated in this post to automate cost monitoring for SageMaker inference. Explore AWS re:Post for valuable resources on optimizing your cloud infrastructure and maximizing AWS services.

Resources

For more information about the features and services used in this post, refer to the following:


About the authors

Pablo Colazurdo is a Principal Solutions Architect at AWS where he enjoys helping customers to launch successful projects in the Cloud. He has many years of experience working on varied technologies and is passionate about learning new things. Pablo grew up in Argentina but now enjoys the rain in Ireland while listening to music, reading or playing D&D with his kids.

Ozgur Canibeyaz is a Senior Technical Account Manager at AWS with 8 years of experience. Ozgur helps customers optimize their AWS usage by navigating technical challenges, exploring cost-saving opportunities, achieving operational excellence, and building innovative services using AWS products.

Read More

Indian language RAG with Cohere multilingual embeddings and Anthropic Claude 3 on Amazon Bedrock

Indian language RAG with Cohere multilingual embeddings and Anthropic Claude 3 on Amazon Bedrock

Media and entertainment companies serve multilingual audiences with a wide range of content catering to diverse audience segments. These enterprises have access to massive amounts of data collected over their many years of operations. Much of this data is unstructured text and images. Conventional approaches to analyzing unstructured data for generating new content rely on the use of keyword or synonym matching. These approaches don’t capture the full semantic context of a document, making them less effective for users’ search, content creation, and several other downstream tasks.

Text embeddings use machine learning (ML) capabilities to capture the essence of unstructured data. These embeddings are generated by language models that map natural language text into their numerical representations and, in the process, encode contextual information in the natural language document. Generating text embeddings is the first step to many natural language processing (NLP) applications powered by large language models (LLMs) such as Retrieval Augmented Generation (RAG), text generation, entity extraction, and several other downstream business processes.

Cohere Multilingual V3 converting text to embeddings

Converting text to embeddings using cohere multilingual embedding model

Despite the rising popularity and capabilities of LLMs, the language most often used to converse with the LLM, often through a chat-like interface, is English. And although progress has been made in adapting open source models to comprehend and respond in Indian languages, such efforts fall short of the English language capabilities displayed among larger, state-of-the-art LLMs. This makes it difficult to adopt such models for RAG applications based on Indian languages.

In this post, we showcase a RAG application that can search and query across multiple Indian languages using the Cohere Embed – Multilingual model and Anthropic Claude 3 on Amazon Bedrock. This post focuses on Indian languages, but you can use the approach with other languages that are supported by the LLM.

Solution overview

We use the Flores dataset [1], a benchmark dataset for machine translation between English and low-resource languages. This also serves as a parallel corpus, which is a collection of texts that have been translated into one or more languages.

With the Flores dataset, we can demonstrate that the embeddings and, subsequently, the documents retrieved from the retriever, are relevant for the same question being asked in multiple languages. However, given the sparsity of the dataset (approximately 1,000 lines per language from more than 200 languages), the nature and number of questions that can be asked against the dataset is limited.

After you have downloaded the data, load the data into the pandas data frame for processing. For this demo, we are restricting ourselves to Bengali, Kannada, Malayalam, Tamil, Telugu, Hindi, Marathi, and English. If you are looking to adopt this approach for other languages, make sure the language is supported by both the embedding model and the LLM that’s being used in the RAG setup.

Load the data with the following code:

import pandas as pd

df_ben = pd.read_csv('./data/Flores/dev/dev.ben_Beng', sep='t') 
df_kan = pd.read_csv('./data/Flores/dev/dev.kan_Knda', sep='t') 
df_mal = pd.read_csv('./data/Flores/dev/dev.mal_Mlym', sep='t') 
df_tam = pd.read_csv('./data/Flores/dev/dev.tam_Taml', sep='t') 
df_tel = pd.read_csv('./data/Flores/dev/dev.tel_Telu', sep='t') 
df_hin = pd.read_csv('./data/Flores/dev/dev.hin_Deva', sep='t') 
df_mar = pd.read_csv('./data/Flores/dev/dev.mar_Deva', sep='t') 
df_eng = pd.read_csv('./data/Flores/dev/dev.eng_Latn', sep='t') 
# Choose fewer/more languages if needed

df_all_Langs = pd.concat([df_ben, df_kan, df_mal, df_tam, df_tel, df_hin, df_mar,df_eng], axis=1)
df_all_Langs.columns = ['Bengali', 'Kannada', 'Malayalam', 'Tamil', 'Telugu', 'Hindi', 'Marathi','English']

df_all_Langs.shape #(996,8)


df = df_all_Langs
stacked_df = df.stack().reset_index() # for ease of handling

# select only the required columns, rename them
stacked_df = stacked_df.iloc[:,[1,2]]
stacked_df.columns = ['language','text'] 

The Cohere multilingual embedding model

Cohere is a leading enterprise artificial intelligence (AI) platform that builds world-class LLMs and LLM-powered solutions that allow computers to search, capture meaning, and converse in text. They provide ease of use and strong security and privacy controls.

The Cohere Embed – Multilingual model generates vector representations of documents for over 100 languages and is available on Amazon Bedrock. With Amazon Bedrock, you can access the embedding model through an API call, which eliminates the need to manage the underlying infrastructure and makes sure sensitive information remains securely managed and protected.

The multilingual embedding model groups text with similar meanings by assigning them positions in the semantic vector space that are close to each other. Developers can process text in multiple languages without switching between different models. This makes processing more efficient and improves performance for multilingual applications.

Text embeddings turn unstructured data into a structured form. This allows you to objectively compare, dissect, and derive insights from all these documents. Cohere’s new embedding models have a new required input parameter, input_type, which must be set for every API call and include one of the following four values, which align towards the most frequent use cases for text embeddings:

  • input_type=”search_document” – Use this for texts (documents) you want to store in your vector database
  • input_type=”search_query” – Use this for search queries to find the most relevant documents in your vector database
  • input_type=”classification” – Use this if you use the embeddings as input for a classification system
  • input_type=”clustering” – Use this if you use the embeddings for text clustering

Using these input types provides the highest possible quality for the respective tasks. If you want to use the embeddings for multiple use cases, we recommend using input_type="search_document".

Prerequisites

To use the Claude 3 Sonnet LLM and the Cohere multilingual embeddings model on this dataset, ensure that you have access to the models in your AWS account under Amazon Bedrock, Model Access section and then proceed with installing the following packages. The following code has been tested to work with the Amazon SageMaker Data Science 3.0 Image, backed by an ml.t3.medium instance.

! apt-get update 
! apt-get install build-essential -y # for the hnswlib package below
! pip install hnswlib

Create a search index

With all of the prerequisites in place, you can now convert the multilingual corpus into embeddings and store those in hnswlib, a header-only C++ Hierarchical Navigable Small Worlds (HNSW) implementation with Python bindings, insertions, and updates. HNSWLib is an in-memory vector store that can be saved to a file, which should be sufficient for the small dataset we are working with. Use the following code:

import hnswlib
import os
import json
import botocore
import boto3

boto3_bedrock = boto3.client('bedrock')
bedrock_runtime = boto3.client('bedrock-runtime')

# Create a search index
index = hnswlib.Index(space='ip', dim=1024)
index.init_index(max_elements=10000, ef_construction=512, M=64)

all_text = stacked_df['text'].to_list()
all_text_lang = stacked_df['language'].to_list()

Embed and index documents

To embed and store the small multilingual dataset, use the Cohere embed-multilingual-v3.0 model, which creates embeddings with 1,024 dimensions, using the Amazon Bedrock runtime API:

modelId="cohere.embed-multilingual-v3"
contentType= "application/json"
accept = "*/*"


df_chunk_size = 80
chunk_embeddings = []
for i in range(0,len(all_text), df_chunk_size):
    chunk = all_text[i:i+df_chunk_size]
    body=json.dumps(
            {"texts":chunk,"input_type":"search_document"} # search documents
    ) 
    response = bedrock_runtime.invoke_model(body=body, 
                                            modelId=modelId,
                                            accept=accept,
                                            contentType=contentType)
    response_body = json.loads(response.get('body').read())
    index.add_items(response_body['embeddings'])

Verify that the embeddings work

To test the solution, write a function that takes a query as input, embeds it, and finds the top N documents most closely related to it:

# Retrieval of closest N docs to query
def retrieval(query, num_docs_to_return=10):
    modelId="cohere.embed-multilingual-v3"
    contentType= "application/json"
    accept = "*/*"
    body=json.dumps(
            {"texts":[query],"input_type":"search_query"} # search query
    ) 
    response = bedrock_runtime.invoke_model(body=body, 
                                            modelId=modelId,
                                            accept=accept,
                                            contentType=contentType)
    response_body = json.loads(response.get('body').read())
    doc_ids = index.knn_query(response_body['embeddings'], 
                              k=num_docs_to_return)[0][0] 
    print(f"Query: {query} n")
    retrieved_docs = []

    for doc_id in doc_ids:
        # Append results
        retrieved_docs.append(all_text[doc_id]) # original vernacular language docs

        # Print results
        print(f"Original Flores Text {all_text[doc_id]}")
        print("-"*30)

    print("END OF RESULTS nn")
    return retrieved_docs   

You can explore what the RAG stack does with a couple of queries in different languages, such as Hindi:

queries = [
    "मुझे सिंधु नदी घाटी सभ्यता के बारे में बताइए","
]
# translation: tell me about Indus Valley Civilization
for query in queries:
    retrieval(query)

The index returns documents relevant to the search query from across languages:

Query: मुझे सिंधु नदी घाटी सभ्यता के बारे में बताइए 

Original Flores Text सिंधु घाटी सभ्यता उत्तर-पश्चिम भारतीय उपमहाद्वीप में कांस्य युग की सभ्यता थी जिसमें आस-पास के आधुनिक पाकिस्तान और उत्तर पश्चिम भारत और उत्तर-पूर्व अफ़गानिस्तान के कुछ क्षेत्र शामिल थे.
------------------------------
Original Flores Text सिंधु नदी के घाटों में पनपी सभ्यता के कारण यह इसके नाम पर बनी है.
------------------------------
Original Flores Text यद्यपि कुछ विद्वानों का अनुमान है कि चूंकि सभ्यता अब सूख चुकी सरस्वती नदी के घाटियों में विद्यमान थी, इसलिए इसे सिंधु-सरस्वती सभ्यता कहा जाना चाहिए, जबकि 1920 के दशक में हड़प्पा की पहली खुदाई के बाद से कुछ इसे हड़प्पा सभ्यता कहते हैं।
------------------------------
Original Flores Text సింధు నది పరీవాహక ప్రాంతాల్లో నాగరికత విలసిల్లింది.
------------------------------
Original Flores Text सिंधू संस्कृती ही वायव्य भारतीय उपखंडातील कांस्य युग संस्कृती होती ज्यामध्ये  आधुनिक काळातील पाकिस्तान, वायव्य भारत आणि ईशान्य अफगाणिस्तानातील काही प्रदेशांचा समावेश होता.
------------------------------
Original Flores Text সিন্ধু সভ্যতা হল উত্তর-পশ্চিম ভারতীয় উপমহাদেশের একটি তাম্রযুগের সভ্যতা যা আধুনিক-পাকিস্তানের অধিকাংশ ও উত্তর-পশ্চিম ভারত এবং উত্তর-পূর্ব আফগানিস্তানের কিছু অঞ্চলকে ঘিরে রয়েছে।
-------------------------
 .....

You can now use these documents retrieved from the index as context while calling the Anthropic Claude 3 Sonnet model on Amazon Bedrock. In production settings with datasets that are several orders of magnitude larger than the Flores dataset, we can make the search results from the index even more relevant by using Cohere’s Rerank models.

Use the system prompt to outline how you want the LLM to process your query:

# Retrieval of docs relevant to the query
def context_retrieval(query, num_docs_to_return=10):

    modelId="cohere.embed-multilingual-v3"
    contentType= "application/json"
    accept = "*/*"
    body=json.dumps(
            {"texts":[query],"input_type":"search_query"} # search query
    ) 
    response = bedrock_runtime.invoke_model(body=body, 
                                            modelId=modelId,
                                            accept=accept,
                                            contentType=contentType)
    response_body = json.loads(response.get('body').read())
    doc_ids = index.knn_query(response_body['embeddings'], 
                              k=num_docs_to_return)[0][0] 
    retrieved_docs = []
    
    for doc_id in doc_ids:
        retrieved_docs.append(all_text[doc_id])
    return " ".join(retrieved_docs)

def query_rag_bedrock(query, model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'):

    system_prompt = '''
    You are a helpful emphathetic multilingual assitant. 
    Identify the language of the user query, and respond to the user query in the same language. 

    For example 
    if the user query is in English your response will be in English, 
    if the user query is in Malayalam, your response will be in Malayalam, 
    if the user query is in Tamil, your response will be in Tamil
    and so on...

    if you cannot identify the language: Say you cannot idenitify the language

    You will use only the data provided within the <context> </context> tags, that matches the user's query's language, to answer the user's query
    If there is no data provided within the <context> </context> tags, Say that you do not have enough information to answer the question
    
    Restrict your response to a paragraph of less than 400 words avoid bullet points
    '''
    max_tokens = 1000

    messages  = [{"role": "user", "content": f'''
                    query : {query}
                    <context>
                    {context_retrieval(query)}
                    </context>
                '''}]

    body=json.dumps(
            {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": max_tokens,
                "system": system_prompt,
                "messages": messages
            }  
        )  


    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())
    return response_body['content'][0]['text']

Let’s pass in the same query in multiple Indian languages:

queries = ["tell me about the indus river valley civilization",
           "मुझे सिंधु नदी घाटी सभ्यता के बारे में बताइए",
           "मला सिंधू नदीच्या संस्कृतीबद्दल सांगा",
           "సింధు నది నాగరికత గురించి చెప్పండి",
           "ಸಿಂಧೂ ನದಿ ಕಣಿವೆ ನಾಗರಿಕತೆಯ ಬಗ್ಗೆ ಹೇಳಿ", 
           "সিন্ধু নদী উপত্যকা সভ্যতা সম্পর্কে বলুন",
           "சிந்து நதி பள்ளத்தாக்கு நாகரிகத்தைப் பற்றி சொல்",
           "സിന്ധു നദീതാഴ്വര നാഗരികതയെക്കുറിച്ച് പറയുക"] 

for query in queries:
    print(query_rag_bedrock(query))
    print('_'*20)

The query is in English, so I will respond in English.

The Indus Valley Civilization, also known as the Harappan Civilization, was a Bronze Age civilization that flourished in the northwestern regions of the Indian subcontinent, primarily in the basins of the Indus River and its tributaries. It encompassed parts of modern-day Pakistan, northwest India, and northeast Afghanistan. While some scholars suggest calling it the Indus-Sarasvati Civilization due to its presence in the now-dried-up Sarasvati River basin, the name "Indus Valley Civilization" is derived from its development along the Indus River valley. This ancient civilization dates back to around 3300–1300 BCE and was one of the earliest urban civilizations in the world. It was known for its well-planned cities, advanced drainage systems, and a writing system that has not yet been deciphered.
____________________
सिंधु घाटी सभ्यता एक प्राचीन नगर सभ्यता थी जो उत्तर-पश्चिम भारतीय उपमहाद्वीप में फैली हुई थी। यह लगभग 3300 से 1300 ईसा पूर्व की अवधि तक विकसित रही। इस सभ्यता के केंद्र वर्तमान पाकिस्तान के सिंध और पंजाब प्रांतों में स्थित थे, लेकिन इसके अवशेष भारत के राजस्थान, गुजरात, मध्य प्रदेश, महाराष्ट्र और उत्तर प्रदेश में भी मिले हैं। सभ्यता का नाम सिंधु नदी से लिया गया है क्योंकि इसके प्रमुख स्थल इस नदी के किनारे स्थित थे। हालांकि, कुछ विद्वानों का अनुमान है कि सरस्वती नदी के किनारे भी इस सभ्यता के स्थल विद्यमान थे इसलिए इसे सिंधु-सरस्वती सभ्यता भी कहा जाता है। यह एक महत्वपूर्ण शहरी समाज था जिसमें विकसित योजना बनाने की क्षमता, नगरीय संरचना और स्वच्छ जलापूर्ति आदि प्रमुख विशेषताएं थीं।
____________________
सिंधू संस्कृती म्हणजे सिंधू नदीच्या पट्टीकेतील प्राचीन संस्कृती होती. ही संस्कृती सुमारे ई.पू. ३३०० ते ई.पू. १३०० या कालखंडात फुलणारी होती. ती भारतातील कांस्ययुगीन संस्कृतींपैकी एक मोठी होती. या संस्कृतीचे अवशेष आजच्या पाकिस्तान, भारत आणि अफगाणिस्तानमध्ये आढळून आले आहेत. या संस्कृतीत नगररचना, नागरी सोयी सुविधांचा विकास झाला होता. जलवाहिनी, नगरदेवालय इत्यादी अद्भुत बाबी या संस्कृतीत होत्या. सिंधू संस्कृतीत लिपीसुद्धा विकसित झाली होती परंतु ती अजूनही वाचण्यास आलेली नाही. सिंधू संस्कृती ही भारतातील पहिली शहरी संस्कृती मानली जाते.
____________________
సింధు నది నాగరికత గురించి చెప్పుతూ, ఈ నాగరికత సింధు నది పరిసర ప్రాంతాల్లో ఉన్నదని చెప్పవచ్చు. దీనిని సింధు-సరస్వతి నాగరికత అనీ, హరప్ప నాగరికత అనీ కూడా పిలుస్తారు. ఇది ఉత్తర-ఆర్య భారతదేశం, ఆధునిక పాకిస్తాన్, ఉత్తర-పశ్చిమ భారతదేశం మరియు ఉత్తర-ఆర్థిక అఫ్గానిస్తాన్ కు చెందిన తామ్రయుగపు నాగరికత. సరస్వతి నది పరీవాహక ప్రాంతాల్లోనూ నాగరికత ఉందని కొందరు పండితులు అభిప్రాయపడ్డారు. దీని మొదటి స్థలాన్ని 1920లలో హరప్పాలో త్రవ్వారు. ఈ నాగరికతలో ప్రశస్తమైన బస్తీలు, నగరాలు, మలిచ్చి రంగులతో నిర్మించిన భవనాలు, పట్టణ నిర్మాణాలు ఉన్నాయి.
____________________
ಸಿಂಧೂ ಕಣಿವೆ ನಾಗರಿಕತೆಯು ವಾಯುವ್ಯ ಭಾರತದ ಉಪಖಂಡದಲ್ಲಿ ಕಂಚಿನ ಯುಗದ ನಾಗರಿಕತೆಯಾಗಿದ್ದು, ಪ್ರಾಚೀನ ಭಾರತದ ಇತಿಹಾಸದಲ್ಲಿ ಮುಖ್ಯವಾದ ಪಾತ್ರವನ್ನು ವಹಿಸಿದೆ. ಈ ನಾಗರಿಕತೆಯು ಆಧುನಿಕ-ದಿನದ ಪಾಕಿಸ್ತಾನ ಮತ್ತು ವಾಯುವ್ಯ ಭಾರತದ ಭೂಪ್ರದೇಶಗಳನ್ನು ಹಾಗೂ ಈಶಾನ್ಯ ಅಫ್ಘಾನಿಸ್ತಾನದ ಕೆಲವು ಪ್ರದೇಶಗಳನ್ನು ಒಳಗೊಂಡಿರುವುದರಿಂದ ಅದಕ್ಕೆ ಸಿಂಧೂ ನಾಗರಿಕತೆ ಎಂದು ಹೆಸರಿಸಲಾಗಿದೆ. ಸಿಂಧೂ ನದಿಯ ಪ್ರದೇಶಗಳಲ್ಲಿ ಈ ನಾಗರಿಕತೆಯು ವಿಕಸಿತಗೊಂಡಿದ್ದರಿಂದ ಅದಕ್ಕೆ ಸಿಂಧೂ ನಾಗರಿಕತೆ ಎಂದು ಹೆಸರಿಸಲಾಗಿದೆ. ಈಗ ಬತ್ತಿ ಹೋದ ಸರಸ್ವತಿ ನದಿಯ ಪ್ರದೇಶಗಳಲ್ಲಿ ಸಹ ನಾಗರೀಕತೆಯ ಅಸ್ತಿತ್ವವಿದ್ದಿರಬಹುದೆಂದು ಕೆಲವು ಪ್ರಾಜ್ಞರು ಶಂಕಿಸುತ್ತಾರೆ. ಆದ್ದರಿಂದ ಈ ನಾಗರಿಕತೆಯನ್ನು ಸಿಂಧೂ-ಸರಸ್ವತಿ ನಾಗರಿಕತೆ ಎಂದು ಸೂಕ್ತವಾಗಿ ಕರೆ
____________________
সিন্ধু নদী উপত্যকা সভ্যতা ছিল একটি প্রাচীন তাম্রযুগীয় সভ্যতা যা বর্তমান পাকিস্তান এবং উত্তর-পশ্চিম ভারত ও উত্তর-পূর্ব আফগানিস্তানের কিছু অঞ্চলকে নিয়ে গঠিত ছিল। এই সভ্যতার নাম সিন্ধু নদীর অববাহিকা অঞ্চলে এটির বিকাশের কারণে এরকম দেওয়া হয়েছে। কিছু পণ্ডিত মনে করেন যে সরস্বতী নদীর ভূমি-প্রদেশেও এই সভ্যতা বিদ্যমান ছিল, তাই এটিকে সিন্ধু-সরস্বতী সভ্যতা বলা উচিত। আবার কেউ কেউ এই সভ্যতাকে হরপ্পা পরবর্তী হরপ্পান সভ্যতা নামেও অবিহিত করেন। যাই হোক, সিন্ধু সভ্যতা ছিল প্রাচীন তাম্রযুগের এক উল্লেখযোগ্য সভ্যতা যা সিন্ধু নদী উপত্যকার এলাকায় বিকশিত হয়েছিল।
____________________
சிந்து நதிப் பள்ளத்தாக்கில் தோன்றிய நாகரிகம் சிந்து நாகரிகம் என்றழைக்கப்படுகிறது. சிந்து நதியின் படுகைகளில் இந்த நாகரிகம் மலர்ந்ததால் இப்பெயர் வழங்கப்பட்டது. ஆனால், தற்போது வறண்டுபோன சரஸ்வதி நதிப் பகுதியிலும் இந்நாகரிகம் இருந்திருக்கலாம் என சில அறிஞர்கள் கருதுவதால், சிந்து சரஸ்வதி நாகரிகம் என்று அழைக்கப்பட வேண்டும் என்று வாதிடுகின்றனர். மேலும், இந்நாகரிகத்தின் முதல் தளமான ஹரப்பாவின் பெயரால் ஹரப்பா நாகரிகம் என்றும் அழைக்கப்படுகிறது. இந்த நாகரிகம் வெண்கலயுக நாகரிகமாக கருதப்படுகிறது. இது தற்கால பாகிஸ்தானின் பெரும்பகுதி, வடமேற்கு இந்தியா மற்றும் வடகிழக்கு ஆப்கானிஸ்தானின் சில பகுதிகளை உள்ளடக்கியது.
____________________
സിന്ധു നദീതട സംസ്കാരം അഥവാ ഹാരപ്പൻ സംസ്കാരം ആധുനിക പാകിസ്ഥാൻ, വടക്ക് പടിഞ്ഞാറൻ ഇന്ത്യ, വടക്ക് കിഴക്കൻ അഫ്ഗാനിസ്ഥാൻ എന്നിവിടങ്ങളിൽ നിലനിന്ന ഒരു വെങ്കല യുഗ സംസ്കാരമായിരുന്നു. ഈ സംസ്കാരത്തിന്റെ അടിസ്ഥാനം സിന്ധു നദിയുടെ തടങ്ങളായതിനാലാണ് ഇതിന് സിന്ധു നദീതട സംസ്കാരം എന്ന പേര് ലഭിച്ചത്. ചില പണ്ഡിതർ ഇപ്പോൾ വറ്റിപ്പോയ സരസ്വതി നദിയുടെ തടങ്ങളിലും ഈ സംസ്കാരം നിലനിന്നിരുന്നതിനാൽ സിന്ധു-സരസ്വതി നദീതട സംസ്കാരമെന്ന് വിളിക്കുന്നത് ശരിയായിരിക്കുമെന്ന് അഭിപ്രായപ്പെടുന്നു. എന്നാൽ ചിലർ 1920കളിൽ ആദ്യമായി ഉത്ഖനനം നടത്തിയ ഹാരപ്പ എന്ന സ്ഥലത്തെ പേര് പ്രകാരം ഈ സംസ്കാരത്തെ ഹാരപ്പൻ സംസ്കാരമെന്ന് വിളിക്കുന്നു.

Conclusion

This post presented a walkthrough for using Cohere’s multilingual embedding model along with Anthropic Claude 3 Sonnet on Amazon Bedrock. In particular, we showed how the same question asked in multiple Indian languages, is getting answered using relevant documents retrieved from a vector store

Cohere’s multilingual embedding model supports over 100 languages. It removes the complexity of building applications that require working with a corpus of documents in different languages. The Cohere Embed model is trained to deliver results in real-world applications. It handles noisy data as inputs, adapts to complex RAG systems, and delivers cost-efficiency from its compression-aware training method.

Start building with Cohere’s multilingual embedding model and Anthropic Claude 3 Sonnet on Amazon Bedrock today.

References

[1] Flores Dataset: https://github.com/facebookresearch/flores/tree/main/flores200


About the Author

ronykroy

Rony K Roy is a Sr. Specialist Solutions Architect, Specializing in AI/ML. Rony helps partners build AI/ML solutions on AWS.

Read More

Optimization Without Retraction on the Random Generalized Stiefel Manifold

Optimization over the set of matrices X that satisfy X^TBX = Ip, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed B. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to a random estimates of B. Our method does not enforce the constraint in every…Apple Machine Learning Research

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of distribution locality to capture when weak learning is efficiently achievable by regular Transformers, where the locality measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally…Apple Machine Learning Research

Applying RLAIF for Code Generation with API-usage in Lightweight LLMs

This paper was accepted at the Natural Language Reasoning and Structured Explanations workshop at ACL 2024.
Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains, including mitigating harm in LLM outputs, enhancing text summarization, and mathematical reasoning. This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (Apple Machine Learning Research

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing…Apple Machine Learning Research