How IDIADA optimized its intelligent chatbot with Amazon Bedrock

How IDIADA optimized its intelligent chatbot with Amazon Bedrock

This post is co-written with Xavier Vizcaino, Diego Martín Montoro, and Jordi Sánchez Ferrer from Applus+ Idiada.

In 2021, Applus+ IDIADA, a global partner to the automotive industry with over 30 years of experience supporting customers in product development activities through design, engineering, testing, and homologation services, established the Digital Solutions department. This strategic move aimed to drive innovation by using digital tools and processes. Since then, we have optimized data strategies, developed customized solutions for customers, and prepared for the technological revolution reshaping the industry.

AI now plays a pivotal role in the development and evolution of the automotive sector, in which Applus+ IDIADA operates. Within this landscape, we developed an intelligent chatbot, AIDA (Applus Idiada Digital Assistant)— an Amazon Bedrock powered virtual assistant serving as a versatile companion to IDIADA’s workforce.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

With Amazon Bedrock, AIDA assists with a multitude of tasks, from addressing inquiries to tackling complex technical challenges spanning code, mathematics, and translation. Its capabilities are truly boundless.

With AIDA, we take another step towards our vision of providing global and integrated digital solutions that add value for our customers. Its internal deployment strengthens our leadership in developing data analysis, homologation, and vehicle engineering solutions. Additionally, in the medium term, IDIADA plans to offer AIDA as an integrable product for customers’ environments and develop “light” versions seamlessly integrable into existing systems.

In this post, we showcase the research process undertaken to develop a classifier for human interactions in this AI-based environment using Amazon Bedrock. The objective was to accurately identify the type of interaction received by the intelligent agent to route the request to the appropriate pipeline, providing a more specialized and efficient service.

The challenge: Optimize intelligent chatbot responses, allocate resources more effectively, and enhance the overall user experience

Built on a flexible and secure architecture, AIDA offers a versatile environment for integrating multiple data sources, including structured data from enterprise databases and unstructured data from internal sources like Amazon Simple Storage Service (Amazon S3). It boasts advanced capabilities like chat with data, advanced Retrieval Augmented Generation (RAG), and agents, enabling complex tasks such as reasoning, code execution, or API calls.

As AIDA’s interactions with humans proliferated, a pressing need emerged to establish a coherent system for categorizing these diverse exchanges.

Initially, users were making simple queries to AIDA, but over time, they started to request more specific and complex tasks. These included document translations, inquiries about IDIADA’s internal services, file uploads, and other specialized requests.

The main reason for this categorization was to develop distinct pipelines that could more effectively address various types of requests. By sorting interactions into categories, AIDA could be optimized to handle specific kinds of tasks more efficiently. This approach allows for tailored responses and processes for different types of user needs, whether it’s a simple question, a document translation, or a complex inquiry about IDIADA’s services.

The primary objective is to offer a more specialized service through the creation of dedicated pipelines for various contexts, such as conversation, document translation, and services to provide more accurate, relevant, and efficient responses to users’ increasingly diverse and specialized requests.

Solution overview

By categorizing the interactions into three main groups—conversation, services, and document translation—the system can better understand the user’s intent and respond accordingly. The Conversation class encompasses general inquiries and exchanges, the Services class covers requests for specific functionalities or support, and the Document_Translation class handles text translation needs.

The specialized pipelines, designed specifically for each use case, allow for a significant increase in efficiency and accuracy of AIDA’s responses. This is achieved in several ways:

  • Enhanced efficiency – By having dedicated pipelines for specific types of tasks, AIDA can process requests more quickly. Each pipeline is optimized for its particular use case, which reduces the computation time needed to generate an appropriate response.
  • Increased accuracy – The specialized pipelines are equipped with specific tools and knowledge for each type of task. This allows AIDA to provide more accurate and relevant responses, because it uses the most appropriate resources for each type of request.
  • Optimized resource allocation – By classifying interactions, AIDA can allocate computational resources more efficiently, directing the appropriate processing power to each type of task.
  • Improved response time – The combination of greater efficiency and optimized resource allocation results in faster response times for users.
  • Enhanced adaptability – This system allows AIDA to better adapt to different types of requests, from simple queries to complex tasks such as document translations or specialized inquiries about IDIADA services.

The research and development of this large language model (LLM) based classifier is an important step in the continuous improvement of the intelligent agent’s capabilities within the Applus IDIADA environment.

For this occasion, we use a set of 1,668 examples of pre-classified human interactions. These have been divided into 666 for training and 1,002 for testing. A 40/60 split has been applied, giving significant importance to the test set.

The following table shows some examples.

SAMPLE CLASS
Can you make a summary of this text? “Legislation for the Australian Government’s …” Conversation
No, only focus on this sentence : Braking technique to enable maximum brake application speed Conversation
In a factory give me synonyms of a limiting resource of activities Conversation
We need a translation of the file “Company_Bylaws.pdf” into English, could you handle it? Document_Translation
Please translate the file “Product_Manual.xlsx” into English Document_Translation
Could you convert the document “Data_Privacy_Policy.doc’ into English, please? Document_Translation
Register my username in the IDIADA’s human resources database Services
Send a mail to random_user@mail.com to schedule a meeting for the next weekend Services
Book an electric car charger for me at IDIADA Services

We present three different classification approaches: two based on LLMs and one using a classic machine learning (ML) algorithm. The aim is to understand which approach is most suitable for addressing the presented challenge.

LLM-based classifier: Simple prompt

In this case, we developed an LLM-based classifier to categorize inputs into three classes: Conversation, Services, and Document_Translation. Instead of relying on predefined, rigid definitions, our approach follows the principle of understanding a set. This principle involves analyzing the common characteristics and patterns present in the examples or instances that belong to each class. By studying the shared traits of inputs within a class, we can derive an understanding of the class itself, without being constrained by preconceived notions.

It’s important to note that the learned definitions might differ from common expectations. For instance, the Conversation class encompasses not only typical conversational exchanges but also tasks like text summarization, which share similar linguistic and contextual traits with conversational inputs.

By following this data-driven approach, the classifier can accurately categorize new inputs based on their similarity to the learned characteristics of each class, capturing the nuances and diversity within each category.

The code consists of the following key components: libraries, a prompt, model invocation, and an output parser.

Libraries

The programming language used in this code is Python, complemented by the LangChain module, which is specifically designed to facilitate the integration and use of LLMs. This module provides a comprehensive set of tools and abstractions that streamline the process of incorporating and deploying these advanced AI models.

To take advantage of the power of these language models, we use Amazon Bedrock. The integration with Amazon Bedrock is achieved through the Boto3 Python module, which serves as an interface to the AWS, enabling seamless interaction with Amazon Bedrock and the deployment of the classification model.

Prompt

The task is to assign one of three classes (Conversation, Services, or Document_Translation) to a given sentence, represented by question:

  • Conversation class – This class encompasses casual messages, summarization requests, general questions, affirmations, greetings, and similar types of text. It also includes requests for text translation, summarization, or explicit inquiries about the meaning of words or sentences in a specific language.
  • Services class – Texts belonging to this class consist of explicit requests for services such as room reservations, hotel bookings, dining services, cinema information, tourism-related inquiries, and similar service-oriented requests.
  • Document_Translation class – This class is characterized by requests for the translation of a document to a specific language. Unlike the Conversation class, these requests don’t involve summarization. Additionally, the name of the document to be translated and the target language are specified.

The prompt suggests a hierarchical approach to the classification process. First, the sentence should be evaluated to determine if it can be classified as a conversation. If the sentence doesn’t fit the Conversation class, one of the other two classes (Services or Document_Translation) should be assigned.

The priority for the Conversation class stems from the fact that 99% of the interactions are actually simple questions regarding various matters.

Model invocation

We use Anthropic’s Claude 3 Sonnet model for the natural language processing task. This LLM model has a context window of 200,000 tokens, enabling it to manage different languages and retrieve highly accurate answers. We use two key parameters:

  • max_tokens – This parameter limits the maximum number of tokens (words or subwords) that the language model can generate in its output to 50.
  • temperature – This parameter controls the randomness of the language model’s output. A temperature of 0.0 means that the model will produce the most likely output according to its training, without introducing randomness.

Output parser

Another important component is the output parser, which allows us to gather the desired information in JSON format. To achieve this, we use the LangChain parameter output_parsers.

The following code illustrates a simple prompt approach:

def classify_interaction(question):
   response_schemas = [
        ResponseSchema(name="class", description="the assigned class")
    ]
    output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
    format_instructions = output_parser.get_format_instructions()
    prompt =f"""
    We have 3 classes Conversation (for example asking for assistance), Services and Document_Translation.
    Conversation:text consist of casual messages, summarization requests, general questions, afirmations, greetings, 
    and similar. Requests for text translation, text summarisation or explicit text translation requests, 
    questions about the meaning of words or sentences in a concrete language.
    Services:the text consist of explicit requests for rooms, hotels, eating services, cinema, tourism, and similar.
    Document_Translation: A translation of a document to a specific language is requested, and a summary is not requested. 
    The length of the document is specified.
    Assign a class to the following sentence.
    {question}
    Try to understand the sentence as a Conversation one, if you can't, then asign one of the other classes.
    {format_instructions} 
    """          
    response = bedrock_runtime.invoke_model(
            modelId='anthropic.claude-3-sonnet-20240229-v1:0',
            body=json.dumps(
                {
                    "anthropic_version": "bedrock-2023-05-31",
                    "max_tokens": 20,
                    "temperature":0,
                    "messages": [
                        {
                            "role": "user",
                            "content": [{"type": "text", "text": prompt}],
                        }
                    ],
                }
            ),
        )
    result_message = json.loads(response.get("body").read())
    texto = result_message['content'][0]['text']
    try:
        output_dict = output_parser.parse(texto.replace(''', '"'))['class']
    except:
        output_dict = 'Conversation' 
    return output_dict

LLM-based classifier: Example augmented inference

We use RAG techniques to enhance the model’s response capabilities. Instead of relying solely on compressed definitions, we provide the model with a quasi-definition by extension. Specifically, we present the model with a diverse set of examples for each class, allowing it to learn the inherent characteristics and patterns that define each category. For instance, in addition to a concise definition of the Conversation class, the model is exposed to various conversational inputs, enabling it to identify common traits such as informal language, open-ended questions, and back-and-forth exchanges. This example-driven approach complements the initial descriptions provided, allowing the model to capture the nuances and diversity within each class. By combining concise definitions with representative examples, the RAG technique helps the model develop a more comprehensive understanding of the classes, enhancing its ability to accurately categorize new inputs based on their inherent nature and characteristics.

The following code provides examples in JSON format for RAG:

{
    "Conversation":[
       ""Could you give me examples of how to solve it?",
       "cool but anything short and sweet",
       "..."
    ],
    "Services":[
       "make a review of my investments in the eBull.com platform",
       "I need a room in IDIADA",
       "schedule a meeting with",
       "..."
    ]"Document_Translation":[
       "Translate the file into Catalan",
       "Could you translate the document I added earlier into Swedish?",
       "Translate the Guía_Rápida.doc file into Romanian",
       "..."
    ]
 }

The total number of examples provided for each class is as follows:

  • Conversation – 500 examples. This is the most common class, and only 500 samples are given to the model due to the vast amount of information, which could cause infrastructure overflow (very high delays, throttling, connection shutouts). This is a crucial point to note because it represents a significant bottleneck. Providing more examples to this approach could potentially improve performance, but the question remains: How many examples? Surely, a substantial amount would be required.
  • Services – 26 examples. This is the least common class, and in this case, all available training data has been used.
  • Document_Translation – 140 examples. Again, all available training data has been used for this class.

One of the key challenges with this approach is scalability. Although the model’s performance improves with more training examples, the computational demands quickly become overwhelming for our current infrastructure. The sheer volume of data required can lead to quota issues with Amazon Bedrock and unacceptably long response times. Rapid response times are essential for providing a satisfactory user experience, and this approach falls short in that regard.

In this case, we need to modify the code to embed all the examples. The following code shows the changes applied to the first version of the classifier. The prompt is modified to include all the examples in JSON format under the “Here you have some examples” section.

def classify_interaction(question, agent_examples):
    response_schemas = [
        ResponseSchema(name="class", description="the assigned class")
    ]
    output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
    format_instructions = output_parser.get_format_instructions()
    prompt =f"""
    We have 3 classes Conversation (for example asking for assistance), Services and Document_Translation.
    Conversation:text consist of casual messages, summarization requests, general questions, afirmations, greetings, 
    and similar. Requests for text translation, text summarisation or explicit text translation requests, 
    questions about the meaning of words or sentences in a concrete language.
    Services:the text consist of explicit requests for rooms, hotels, eating services, cinema, tourism, and similar.
    Document_Translation: A translation of a document to a specific language is requested, and a summary is not requested. 
    The length of the document is specified.
    
    Here you have some examples:
    {agent_examples}

    Assign a class to the following sentence.
    {question}

    Try to understand the sentence as a Conversation one, if you can't, then asign one of the other classes.
    {format_instructions}   
    """
    
    response = bedrock_runtime.invoke_model(
            modelId='anthropic.claude-3-sonnet-20240229-v1:0',
            body=json.dumps(
                {
                    "anthropic_version": "bedrock-2023-05-31",
                    "max_tokens": 50,
                    "messages": [
                        {
                            "role": "user",
                            "content": [{"type": "text", "text": prompt}],
                        }
                    ],
                }
            ),
        )

    result_message = json.loads(response.get("body").read())
    texto = result_message['content'][0]['text']
    output_dict = output_parser.parse(texto.replace(''', '"'))['class']
    
    return output_dict

K-NN-based classifier: Amazon Titan Embeddings

In this case, we take a different approach by recognizing that despite the multitude of possible interactions, they often share similarities and repetitive patterns. Instead of treating each input as entirely unique, we can use a distance-based approach like k-nearest neighbors (k-NN) to assign a class based on the most similar examples surrounding the input. To make this work, we need to transform the textual interactions into a format that allows algebraic operations. This is where embeddings come into play. Embeddings are vector representations of text that capture semantic and contextual information. We can calculate the semantic similarity between different interactions by converting text into these vector representations and comparing their vectors and determining their proximity in the embedding space.

To accommodate this approach, we need to modify the code accordingly:

from langchain.embeddings import BedrockEmbeddings
from sklearn.neighbors import KNeighborsClassifier

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2'
)

bedrock_embedding = BedrockEmbeddings(
    client=bedrock_runtime,
    model_id="amazon.titan-embed-text-v1",
)
df_train = pd.read_excel('coordinator_dataset/casos_coordinador_train.xlsx')
df_test = pd.read_excel('coordinator_dataset/casos_coordinador_test.xlsx')

X_train_emb = bedrock_embedding.embed_documents(df_train['sample'].values.tolist())
X_test_emb = bedrock_embedding.embed_documents(df_test['sample'].values.tolist())
y_train = df_train['agent'].values.tolist()
y_test = df_test['agent'].values.tolist()
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X_train_emb, y_train)
y_pred = neigh.predict(X_test_emb)
print(classification_report(y_test, y_pred, target_names=['Conversation', 'Document_Translation', 'Services']))

We used the Amazon Titan Text Embeddings G1 model, which generates vectors of 1,536 dimensions. This model is trained to accept multiple languages while retaining the semantic meaning of the embedded phrases.

For the classfier, we employed a classic ML algorithm, k-NN, using the scikit-learn Python module. This method takes a parameter, which we set to 3.

The following figure illustrates the F1 scores for each class plotted against the number of neighbors (k) used in the k-NN algorithm. As the graph shows, the optimal value for k is 3, which yields the highest F1 score for the most prevalent class, Document_Translation. Although it’s not the absolute highest score for the Services class, Document_Translation is significantly more common, making k=3 the best overall choice to maximize performance across all classes.

Number of Neighbors

K-NN-based classifier: Cohere’s multilingual embeddings model

In the previous section, we used the popular Amazon Titan Text Embeddings G1 model to generate text embeddings. However, other models might offer different advantages. In this section, we explore the use of Cohere’s multilingual model on Amazon Bedrock for generating embeddings. We chose the Cohere model due to its excellent capability in handling multiple languages without compromising the vectorization of phrases. As we will demonstrate, this model doesn’t introduce significant differences in the generated vectors compared to other models, making it more suitable for use in a multilingual environment like AIDA.

To use the Cohere model, we need to change the model_id:

from langchain.embeddings import BedrockEmbeddings
from sklearn.neighbors import KNeighborsClassifier

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2'
)

bedrock_embedding = BedrockEmbeddings(
    client=bedrock_runtime,
    model_id=" cohere.embed-multilingual-v3",
)
df_train = pd.read_excel('coordinator_dataset/casos_coordinador_train.xlsx')
df_test = pd.read_excel('coordinator_dataset/casos_coordinador_test.xlsx')
data_train = [s[:1500] for s in df_train['sample']]
data_test = [s[:1500] for s in df_test['sample']]

y_train = df_train['agent'].values.tolist()
y_test = df_test['agent'].values.tolist()
X_test = df_test['sample'].values.tolist()

X_train_emb = bedrock_embedding.embed_documents(data_train)
X_test_emb = bedrock_embedding.embed_documents(data_test)

neigh = KNeighborsClassifier(n_neighbors=11)
neigh.fit(X_train_emb, y_train)
y_pred = neigh.predict(X_test_emb)
print(classification_report(y_test, y_pred, target_names=['Conversation', 'Document_Translation', 'Services']))

We use Cohere’s multilingual embeddings model to generate vectors with 1,024 dimensions. This model is trained to accept multiple languages and retain the semantic meaning of the embedded phrases.

For the classifier, we employ k-NN, using the scikit-learn Python module. This method takes a parameter, which we have set to 11.

The following figure illustrates the F1 scores for each class plotted against the number of neighbors used. As depicted, the optimal point is k=11, achieving the highest value for Document_Translation and the second-highest for Services. In this instance, the trade-off between Documents_Translation and Services is favorable.

Number of Neighbors

Amazon Titan Embeddings vs. Cohere’s multilingual embeddings model

In this section, we delve deeper into the embeddings generated by both models, aiming to understand their nature and consequently comprehend the results obtained. To achieve this, we have performed dimensionality reduction to visualize the vectors obtained in both cases in 2D.

Cohere’s multilingual embeddings model has a limitation on the size of the text it can vectorize, posing a significant constraint. Therefore, in the implementation showcased in the previous section, we applied a filter to only include interactions up to 1,500 characters (excluding cases that exceed this limit).

The following figure illustrates the vector spaces generated in each case.

Vector Space

As we can observe, the generated vector spaces are relatively similar, initially appearing to be analogous spaces with a rotation between one another. However, upon closer inspection, it becomes evident that the direction of maximum variance in the case of Cohere’s multilingual embeddings model is distinct (deducible from observing the relative position and shape of the different groups). This type of situation, where high class overlap is observed, presents an ideal case for applying algorithms such as k-NN.

As mentioned in the introduction, most human interactions with AI are very similar to each other within the same class. This would explain why k-NN-based models outperform LLM-based models.

SVM-based classifier: Amazon Titan Embeddings

In this scenario, it is likely that user interactions belonging to the three main categories (Conversation, Services, and Document_Translation) form distinct clusters or groups within the embedding space. Each category possesses particular linguistic and semantic characteristics that would be reflected in the geometric structure of the embedding vectors. The previous visualization of the embeddings space displayed only a 2D transformation of this space. This doesn’t imply that clusters coudn’t be highly separable in higher dimensions.

Classification algorithms like support vector machines (SVMs) are especially well-suited to use this implicit geometry of the data. SVMs seek to find the optimal hyperplane that separates the different groups or classes in the embedding space, maximizing the margin between them. This ability of SVMs to use the underlying geometric structure of the data makes them an intriguing option for this user interaction classification problem.

Furthermore, SVMs are a robust and efficient algorithm that can effectively handle high-dimensional datasets, such as text embeddings. This makes them particularly suitable for this scenario, where the embedding vectors of the user interactions are expected to have a high dimensionality.

The following code illustrates the implementation:

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='eu-central-1'
)

bedrock_embedding = BedrockEmbeddings(
    client=bedrock_runtime,
    model_id="amazon.titan-embed-text-v1",
)

df_train = pd.read_excel('coordinator_dataset/casos_coordinador_train.xlsx')
df_test = pd.read_excel('coordinator_dataset/casos_coordinador_test.xlsx')

y_train = df_train['agent'].values.tolist()
y_test = df_test['agent'].values.tolist()
X_test = df_test['sample'].values.tolist()

X_train_emb = bedrock_embedding.embed_documents(df_train['sample'].values.tolist())
X_test_emb = bedrock_embedding.embed_documents(df_test['sample'].values.tolist())

f1 = make_scorer(f1_score , average='weighted')
parameters = {'kernel':('linear', 'rbf','poly', 'sigmoid'), 
              'C':[1, 2, 4, 6, 8, 10],
              'class_weight':[None, 'balanced']}

svc = svm.SVC()
clf = GridSearchCV(svc, parameters,cv=10,n_jobs= -1, scoring=f1)
clf.fit(X_train_emb, y_train)

y_pred = clf.predict(X_test_emb)

We use Amazon Titan Text Embeddings G1. This model generates vectors of 1,536 dimensions, and is trained to accept several languages and to retain the semantic meaning of the phrases embedded.

To implement the classifier, we employed a classic ML algorithm, SVM, using the scikit-learn Python module. The SVM algorithm requires the tuning of several parameters to achieve optimal performance. To determine the best parameter values, we conducted a grid search with 10-fold cross-validation, using the F1 multi-class score as the evaluation metric. This systematic approach allowed us to identify the following set of parameters that yielded the highest performance for our classifier:

  • C – We set this parameter to 1. This parameter controls the trade-off between allowing training errors and forcing rigid margins. It acts as a regularization parameter. A higher value of C (for example, 10) indicates a higher penalty for misclassification errors. This results in a more complex model that tries to fit the training data more closely. A higher C value can be beneficial when the classes in the data are well separated, because it allows the algorithm to create a more intricate decision boundary to accurately classify the samples. On the other hand, a C value of 1 indicates a reasonable balance between fitting the training set and the model’s generalization ability. This value might be appropriate when the data has a simple structure, and a more flexible model isn’t necessary to capture the underlying relationships. In our case, the selected C value of 1 suggests that the data has a relatively simple structure, and a balanced model with moderate complexity is sufficient for accurate classification.
  • class_weight – We set this parameter to None. This parameter adjusts the weights of each class during the training process. Setting class_weight to balanced automatically adjusts the weights inversely proportional to the class frequencies in the input data. This is particularly useful when dealing with imbalanced datasets, where one class is significantly more prevalent than the others. In our case, the value of None for the class_weight parameter suggests that the minor classes don’t have much relevance or impact on the overall classification task. This choice implies that the implicit geometry or decision boundaries learned by the model might not be optimized for separating the different classes effectively.
  • Kernel – We set this parameter to linear. This parameter specifies the type of kernel function to be used by the SVC algorithm. The linear kernel is a simple and efficient choice because it assumes that the decision boundary between classes can be represented by a linear hyperplane in the feature space. This value suggests that, in a higher dimension vector space, the categories could be linearly separated by an hyperplane.

SVM-based classifier: Cohere’s multilingual embeddings model

The implementation details of the classifier are presented in the following code:

from langchain.embeddings import BedrockEmbeddings

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2'
)

bedrock_embedding = BedrockEmbeddings(
    client=bedrock_runtime,
    model_id="cohere.embed-multilingual-v3",
)

df_train = pd.read_excel('coordinator_dataset/casos_coordinador_train.xlsx')
df_test = pd.read_excel('coordinator_dataset/casos_coordinador_test.xlsx')

data_train = [s[:1500] for s in df_train['sample']]
data_test = [s[:1500] for s in df_test['sample']]

y_train = df_train['agent'].values.tolist()

X_train_emb = bedrock_embedding.embed_documents(data_train)
X_test_emb = bedrock_embedding.embed_documents(data_test)

f1 = make_scorer(f1_score , average='weighted')

parameters = {'kernel':('linear', 'rbf','poly', 'sigmoid'), 
              'C':[1, 2, 4, 6, 8, 10],
              'class_weight':[None, 'balanced']}

svc = svm.SVC()
clf = GridSearchCV(svc, parameters,cv=10,n_jobs= -1, scoring=f1)
clf.fit(X_train_emb, y_train)

y_pred = clf.predict(X_test_emb)

We use the Amazon Titan Text Embeddings G1 model, which generates vectors of 1,536 dimensions. This model is trained to accept multiple languages and retain the semantic meaning of the embedded phrases.

For the classifier, we employ SVM, using the scikit-learn Python module. To obtain the optimal parameters, we performed a grid search with 10-fold cross-validation based on the multi-class F1 score, resulting in the following selected parameters (as detailed in the previous section):

  • C – We set this parameter to 1, which indicates a reasonable balance between fitting the training set and the model’s generalization ability. This setting suggests that the data has a simple structure and that a more flexible model might not be necessary to capture the underlying relationships.
  • class_weight – We set this parameter to None. A value of None suggests that the minor classes don’t have much relevance, which in turn implies that the implicit geometry might not be suitable for separating the different classes.
  • kernel – We set this parameter to linear. This value suggests that in a higher-dimensional vector space, the categories could be linearly separated by a hyperplane.

ANN-based classifier: Amazon Titan and Cohere’s multilingual embeddings model

Given the promising results obtained with SVMs, we decided to explore another geometry-based method by employing an Artificial Neural Network (ANN) approach.

In this case, we performed normalization of the input vectors to use the advantages of normalization when using neural networks. Normalizing the input data is a crucial step when working with ANNs, because it can help improve the model’s during training. We applied min/max scaling for normalization.

The use of an ANN-based approach provides the ability to capture complex non-linear relationships in the data, which might not be easily modeled using traditional linear methods like SVMs. The combination of the geometric insights and the normalization of inputs can potentially lead to improved predictive accuracy compared to the previous SVM results.

This approach consists of the following parameters:

  • Model definition – We define a sequential deep learning model using the Keras library from TensorFlow.
  • Model architecture – The model consists of three densely connected layers. The first layer has 16 neurons and uses the ReLU activation function. The second layer has 8 neurons and employs the ReLU activation function. The third layer has 3 neurons and uses the softmax activation function.
  • Model compilation – We compile the model using the categorical_crossentropy loss function, the Adam optimizer with a learning rate of 0.01, and the categorical_accuracy. We incorporate an EarlyStopping callback to stop the training if the categorical_accuracy metric doesn’t improve for 25 epochs.
  • Model training – We train the model for a maximum of 500 epochs using the training set and validate it on the test set. The batch size is set to 64. The performance metric used is the maximum classification accuracy (categorical_accuracy) obtained during the training.

We applied the same methodology, but using the embeddings generated by Cohere’s multilingual embeddings model after being normalized through min/max scaling. In both cases, we employed the same preprocessing steps:

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2'
)

bedrock_embedding = BedrockEmbeddings(
    client=bedrock_runtime,
    model_id="cohere.embed-multilingual-v3",
)

df_train = pd.read_excel('coordinator_dataset/casos_coordinador_train.xlsx')
df_test = pd.read_excel('coordinator_dataset/casos_coordinador_test.xlsx')

df_train['sample'] = [s[:1500] for s in df_train['sample']]
df_test['sample'] = [s[:1500] for s in df_test['sample']]

X_train_emb = bedrock_embedding.embed_documents(df_train['sample'].values.tolist())
X_test_emb = bedrock_embedding.embed_documents(df_test['sample'].values.tolist())
y_train = df_train['agent'].values.tolist()

y_train_ohe = [ [int(y=='Conversation'), int(y=='Document_Translation'), int(y=='Services')] for y in y_train]
y_test = df_test['agent'].values.tolist()
y_test = [ ['Conversation', 'Document_Translation', 'Services'].index(y) for y in y_test]
X_test = df_test['sample'].values.tolist()

To help avoid ordinal assumptions, we employed a one-hot encoding representation for the output of the network. One-hot encoding doesn’t make any assumptions about the inherent order or hierarchy among the categories. This is particularly useful when the categorical variable doesn’t have a clear ordinal relationship, because the model can learn the relationships without being biased by any assumed ordering.

The following code illustrates the implementation:

def train_model( X, y, n_hebras = 10, reps = 30, train_size = 0.7, tipo_optimizacion = "low"):
    import threading

    reps_por_hebra = int(reps/n_hebras)
    hebras = [0]*n_hebras
    results = [0]*reps
    models = [0]*reps    
    
    for i in range(len(hebras)):
        hebras[i] = threading.Thread(target=eval_model_rep_times,
            args=(X, y, train_size, reps_por_hebra, i*reps_por_hebra, models, results))
        hebras[i].start()
        
    for i in range(len(hebras)):
        hebras[i].join()
        
    if tipo_optimizacion == "low":
        result = models[np.argmin(results)], min(results)
    else:
        result = models[np.argmax(results)], max(results)
    return result

def eval_model_rep_times(X, y, train_size, reps, index, models, results):
    for rep in range(reps):
        X_train, X_test, y_train, y_test = train_test_split( X, y, train_size = train_size)
        model, metric = create_and_fit_model(X_train, y_train, X_test, y_test) 
        models[index+rep] = model
        results[index+rep] = metric

def create_and_fit_model(X_train, y_train, X_test, y_test):
    ### DEFINITION GOES HERE ###
    model = Sequential() 
    model.add(Dense(16, input_shape = (len(X_train[0]),), activation='relu')  )
    model.add(Dense(8, activation='relu')  )
    model.add(Dense(3, activation='softmax' ))
    model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.01), metrics=['categorical_accuracy'])
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor="categorical_accuracy", patience=25, mode = 'max')
    ### DEFINITION GOES HERE ###
  
    ### TRAINING GOES HERE ###
    history = model.fit(X_train,
              y_train,
              epochs=500,
              validation_data = (X_test, y_test),
              batch_size=64,
              callbacks= early_stopping,
              verbose=0)
    ### TRAINING GOES HERE ###
    
    metrica = max(history.history['categorical_accuracy'])
    
    #ALWAYS RETURN THE MODEL
    return model, metrica

model, mse = train_model(X_train_emb_norm, y_train_ohe, 5, 20, tipo_optimizacion="high")
y_pred = [ est.argmax() for est in model.predict(X_test_emb_norm) ]

Results

We conducted a comparative analysis using the previously presented code and data. The models were assessed based on their F1 scores for the conversation, services, and document translation tasks, as well as their runtimes. The following table summarizes our results.

MODEL CONVERSATION F1 SERVICES F1 DOCUMENT_ TRANSLATION F1 RUNTIME (Seconds)
LLM 0.81 0.22 0.46 1.2
LLM with examples 0.86 0.13 0.68 18
KNN – Amazon Titan Embedding 0.98 0.57 0.88 0.35
KNN – Cohere Embedding 0.96 0.72 0.72 0.35
SVM Amazon Titan Embedding 0.98 0.69 0.82 0.3
SVM Cohere Embedding 0.99 0.80 0.93 0.3
ANN Amazon Titan Embedding 0.98 0.60 0.87 0.15
ANN Cohere Embedding 0.99 0.77 0.96 0.15

As illustrated in the table, the SVM and ANN models using Cohere’s multilingual embeddings model demonstrated the strongest overall performance. The SVM with Cohere’s multilingual embeddings model achieved the highest F1 scores in two out of three tasks, reaching 0.99 for Conversation, 0.80 for Services, and 0.93 for Document_Translation. Similarly, the ANN with Cohere’s multilingual embeddings model also performed exceptionally well, with F1 scores of 0.99, 0.77, and 0.96 for the respective tasks.

In contrast, the LLM exhibited relatively lower F1 scores, particularly for the services (0.22) and document translation (0.46) tasks. However, the performance of the LLM improved when provided with examples, with the F1 score for document translation increasing from 0.46 to 0.68.

Regarding runtime, the k-NN, SVM, and ANN models demonstrated significantly faster inference times compared to the LLM. The k-NN and SVM models with both Amazon Titan and Cohere’s multilingual embeddings model had runtimes of approximately 0.3–0.35 seconds. The ANN models were even faster, with runtimes of approximately 0.15 seconds. In contrast, the LLM required approximately 1.2 seconds for inference, and the LLM with examples took around 18 seconds.

These results suggest that the SVM and ANN models using Cohere’s multilingual embeddings model offer the best balance of performance and efficiency for the given tasks. The superior F1 scores of these models, coupled with their faster runtimes, make them promising candidates for application. The potential benefits of providing examples to the LLM model are also noteworthy, because this approach can help improve its performance on specific tasks.

Conclusion

The optimization of AIDA, Applus IDIADA’s intelligent chatbot powered by Amazon Bedrock, has been a resounding success. By developing dedicated pipelines to handle different types of user interactions—from general conversations to specialized service requests and document translations—AIDA has significantly improved its efficiency, accuracy, and overall user experience. The innovative use of LLMs, embeddings, and advanced classification algorithms has allowed AIDA to adapt to the evolving needs of IDIADA’s workforce, providing a versatile and reliable virtual assistant. AIDA now handles over 1,000 interactions per day, with a 95% accuracy rate in routing requests to the appropriate pipeline and driving a 20% increase in their team’s productivity.

Looking ahead, IDIADA plans to offer AIDA as an integrated product for customer environments, further expanding the reach and impact of this transformative technology.

Amazon Bedrock offers a comprehensive approach to security, compliance, and responsible AI development that empowers IDIADA and other customers to harness the full potential of generative AI without compromising on safety and trust. As this advanced technology continues to rapidly evolve, Amazon Bedrock provides the transparent framework needed to build innovative applications that inspire confidence.

Unlock new growth opportunities by creating custom, secure AI models tailored to your organization’s unique needs. Take the first step in your generative AI transformation—connect with an AWS expert today to begin your journey.


About the Authors

Xavier VizcainoXavier Vizcaino is the head of the DataLab, in the Digital Solutions department of Applus IDIADA. DataLab is the unit focused on the development of solutions for generating value from the exploitation of data through artificial intelligence.

Diego Martín MontoroDiego Martín Montoro is an AI Expert and Machine Learning Engineer at Applus+ Idiada Datalab. With a Computer Science degree and a Master’s in Data Science, Diego has built his career in the field of artificial intelligence and machine learning. His experience includes roles as a Machine Learning Engineer at companies like AppliedIT and Applus+ IDIADA, where he has worked on developing advanced AI systems and anomaly detection solutions.

Jordi Sánchez FerrerJordi Sánchez Ferrer is the current Product Owner of the Datalab at Applus+ Idiada. A Computer Engineer with a Master’s degree in Data Science, Jordi’s trajectory includes roles as a Business Intelligence developer, Machine Learning engineer, and lead developer in Datalab. In his current role, Jordi combines his technical expertise with product management skills, leading strategic initiatives that align data science and AI projects with business objectives at Applus+ Idiada.

Daniel CollsDaniel Colls is a professional with more than 25 years of experience who has lived through the digital transformation and the transition from the on-premises model to the cloud from different perspectives in the IT sector. For the past 3 years, as a Solutions Architect at AWS, he has made this experience available to his customers, helping them successfully implement or move their workloads to the cloud.

Read More

Accelerate IaC troubleshooting with Amazon Bedrock Agents

Accelerate IaC troubleshooting with Amazon Bedrock Agents

Troubleshooting infrastructure as code (IaC) errors often consumes valuable time and resources. Developers can spend multiple cycles searching for solutions across forums, troubleshooting repetitive issues, or trying to identify the root cause. These delays can lead to missed security errors or compliance violations, especially in complex, multi-account environments.

This post demonstrates how you can use Amazon Bedrock Agents to create an intelligent solution to streamline the resolution of Terraform and AWS CloudFormation code issues through context-aware troubleshooting. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Agents is a fully managed service that helps developers create AI agents that can break down complex tasks into steps and execute them using FMs and APIs to accomplish specific business objectives.

Our solution uses Amazon Bedrock Agents to analyze error messages and code context, generating detailed troubleshooting steps for IaC errors. In organizations with multi-account AWS environments, teams often maintain a centralized AWS environment for developers to deploy applications. This setup makes sure that AWS infrastructure deployments using IaC align with organizational security and compliance measures. For specific IaC errors related to these compliance measures, such as those involving service control policies (SCPs) or resource-based policies, our solution intelligently directs developers to contact appropriate teams like Security or Enablement. This targeted guidance maintains security protocols and makes sure that sensitive issues are handled by the right experts. The solution is flexible and can be adapted for similar use cases beyond these examples.

Although we focus on Terraform Cloud workspaces in this example, the same principles apply to GitLab CI/CD pipelines or other continuous integration and delivery (CI/CD) approaches executing IaC code. By automating initial error analysis and providing targeted solutions or guidance, you can improve operational efficiency and focus on solving complex infrastructure challenges within your organization’s compliance framework.

Solution overview

Before we dive into the deployment process, let’s walk through the key steps of the architecture as illustrated in the following figure.

Terraform-troubleshooting

The workflow for the Terraform solution is as follows:

  1. Initial input through the Amazon Bedrock Agents chat console – The user begins by entering details about their Terraform error into the chat console for Amazon Bedrock Agents. This typically includes the Terraform Cloud workspace URL where the error occurred, and optionally, a Git repository URL and branch name if additional context is needed.
  2. Error retrieval and context gathering – The Amazon Bedrock agent forwards these details to an action group that invokes the first AWS Lambda function (see the following Lambda function code). This function invokes another Lambda function (see the following Lambda function code) which retrieves the latest error message from the specified Terraform Cloud workspace. If a Git repository URL is provided, it also retrieves relevant Terraform files from the repository. This contextual information is then sent back to the first Lambda function.
  3. Error analysis and response generation – Lambda function would then construct a detailed prompt that includes the error message, repository files (if available), and specific use case instructions. It then uses the Amazon Bedrock model to analyze the error and generate either troubleshooting steps or guidance to contact specific teams.
  4. Interaction and user guidance – The agent displays the generated response to the user. For most Terraform errors, this includes detailed troubleshooting steps. For specific cases related to organizational policies (for example, service control policies or resource-based policies), the response directs the user to contact the appropriate team, such as Security or Enablement.
  5. Continuous improvement – The solution can be continually updated with new specific use cases and organizational guidelines, making sure that the troubleshooting advice stays current with the organization’s evolving infrastructure and compliance requirements. For example:
    1. SCP or IAM policy violations – Guides developers when they encounter permission issues due to SCPs or strict AWS Identity and Access Management (IAM) boundaries, offering alternatives or escalation paths.
    2. VPC and networking restrictions – Flags non-compliant virtual private cloud (VPC) or subnet configurations (such as public subnets) and suggests security-compliant adjustments.
    3. Encryption requirements – Detects missing or incorrect encryption for Amazon Simple Storage Service (Amazon S3) or Amazon Elastic Block Store (Amazon EBS) resources and recommends the appropriate configurations to align with compliance standards.

The following diagram illustrates the step-by-step process of how the solution works.

flow-diagram

This solution streamlines the process of resolving Terraform errors, providing immediate, context-aware guidance to developers while making sure that sensitive or complex issues are directed to the appropriate teams. By using the capabilities of Amazon Bedrock Agents, it offers a scalable and intelligent approach to managing IaC challenges in large, multi-account AWS environments.

Prerequisites

To implement the solution, you need the following:

Create the Amazon Bedrock agent

To create and configure the Amazon Bedrock agent, complete the following steps:

  1. On the Amazon Bedrock console, choose Agents in the navigation pane.
  2. Choose Create agent.
  3. Provide agent details, including agent name and description (optional).
  4. Grant the agent permissions to AWS services through the IAM service role. This gives your agent access to required services, such as Lambda.
  5. Select an FM from Amazon Bedrock (such as Anthropic’s Claude 3 Sonnet).
  6. For troubleshooting Terraform errors through Amazon Bedrock Agents, attach the following instruction to the agent. This instruction makes sure that the agent gathers the required input from the user and executes the action group to provide detailed troubleshooting steps.

“You are a terraform code error specialist. Greet the user and ask for terraform workspace url, branch name, code repository url. Once received, trigger troubleshooting action group. Provide the troubleshooting steps to the user.”

Configure the Lambda function for the action group

After you configure the initial agent and add the preceding instruction to the agent, you need to create two Lambda functions:

  • The first Lambda function will be added to the action group, which is invoked by the Amazon Bedrock agent, and will subsequently trigger the second Lambda function using the invoke method. Refer to the Lambda function code for more details. Make sure the LAMBDA_2_FUNCTION_NAME environment variable is set.
  • The second Lambda function will handle fetching the Terraform workspace error and the associated Terraform code from GitLab. Refer to the Lambda function code. Make sure that the TERRAFORM_API_URL, TERRAFORM_SECRET_NAME, and VCS_SECRET_NAME environment variables are set.

After the Terraform workspace error and code details are retrieved, these details will be passed back to the first Lambda function, which will use the Amazon Bedrock API with an FM to generate and provide the appropriate troubleshooting steps based on the error and code information.

Add the action group to the Amazon Bedrock agent

Complete the following steps to add the action group to the Amazon Bedrock agent:

  1. Add an action group to the Amazon Bedrock agent.
  2. Assign a descriptive name (for example, troubleshooting) to the action group and provide a description. This helps clarify the purpose of the action group within the workflow.
  3. For Action group type, select Define with function details.

For more details, see Define function details for your agent’s action groups in Amazon Bedrock.

  1. For Action group invocation, choose the first Lambda function that you created previously.

This function runs the business logic required when an action is invoked. Make sure to choose the correct version of the first Lambda function. For more details on how to configure Lambda functions for action groups, see Configure Lambda functions to send information that an Amazon Bedrock agent elicits from the user.

  1. For Action group function 1, provide a name and description.
  2. Add the following parameters.

Name

Description Type Required
workspace_url

Terraform workspace url

string

True

      repo_url

Code repository URL

string

True

branch_name Code repository branch name string

True

Test the solution

The following example is of a Terraform error due to a service control polcy. The troubleshooting steps provided would be aligned to address those specific constraints. The action group triggers the Lambda function, which follows structured single-shot prompting by passing the complete context—such as the error message and repository contents—in a single input to the Amazon Bedrock model to generate precise troubleshooting steps.

Example 1: The following screenshot shows an example of a Terraform error caused by an SCP limitation managed by the security team.

scp-error

The following screenshot shows an example of the user interaction with Amazon Bedrock Agents and the troubleshooting steps provided.

scp-output

Example 2: The following screenshot shows an example of a Terraform error due to a missing variable value.

general-error

The following screenshot shows an example of the user interaction with Amazon Bedrock Agents and the troubleshooting steps provided.

general-output

Clean up

The services used in this demo can incur costs. Complete the following steps to clean up your resources:

  1. Delete the Lambda functions if they are no longer required.
  2. Delete the action group and Amazon Bedrock agent you created.

Conclusion

IaC offers flexibility for managing cloud environments, but troubleshooting code errors can be time-consuming, especially in environments with strict organizational guardrails. This post demonstrated how Amazon Bedrock Agents, combined with action groups and generative AI models, streamlines and accelerates the resolution of Terraform errors while maintaining compliance with environment security and operational guidelines.

Using the capabilities of Amazon Bedrock Agents, developers can receive context-aware troubleshooting steps tailored to environment-related issues such as SCP or IAM violations, VPC restrictions, and encryption policies. The solution provides specific guidance based on the error’s context and directs users to the appropriate teams for issues that require further escalation. This reduces the time spent on IaC errors, improves developer productivity, and maintains organizational compliance.

Are you ready to streamline your cloud deployment process with the generative AI of Amazon Bedrock? Start by exploring the Amazon Bedrock User Guide to see how it can facilitate your organization’s transition to the cloud. For specialized assistance, consider engaging with AWS Professional Services to maximize the efficiency and benefits of using Amazon Bedrock.


About the Authors

Akhil Raj Yallamelli is a Cloud Infrastructure Architect at AWS, specializing in architecting cloud infrastructure solutions for enhanced data security and cost efficiency. He is experienced in integrating technical solutions with business strategies to create scalable, reliable, and secure cloud environments. Akhil enjoys developing solutions focusing on customer business outcomes, incorporating generative AI (Gen AI) technologies to drive innovation and cloud enablement. He holds an MS degree in Computer Science. Outside of his professional work, Akhil enjoys watching and playing sports.

Ebbey Thomas is a Senior Generative AI Specialist Solutions Architect at AWS. He designs and implements generative AI solutions that address specific customer business problems. He is recognized for simplifying complexity and delivering measurable business outcomes for clients. Ebbey holds a BS in Computer Engineering and an MS in Information Systems from Syracuse University.

Read More

Derive generative AI powered insights from Alation Cloud Services using Amazon Q Business Custom Connector

Derive generative AI powered insights from Alation Cloud Services using Amazon Q Business Custom Connector

This blog post is co-written with Gene Arnold from Alation.

To build a generative AI-based conversational application integrated with relevant data sources, an enterprise needs to invest time, money, and people. First, you would need build connectors to the data sources. Next you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve data, rank the answers, and build a feature rich web application. Additionally, you might need to hire and staff a large team to build, maintain, and manage such a system.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories, code, and enterprise systems. To do this Amazon Q Business provides out-of-the-box native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well written answers. A data source connector is a component of Amazon Q Business that helps to integrate and synchronize data from multiple repositories into one index. Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including ServiceNow, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more. For a full list of supported data source connectors, see Amazon Q Business connectors.

However, many organizations store relevant information in the form of unstructured data on company intranets or within file systems on corporate networks that are inaccessible to Amazon Q Business using its native data source connectors. You can now use the custom data source connector within Amazon Q Business to upload content to your index from a wider range of data sources.

Using an Amazon Q Business custom data source connector, you can gain insights into your organization’s third party applications with the integration of generative AI and natural language processing. This post shows how to configure an Amazon Q Business custom connector and derive insights by creating a generative AI-powered conversation experience on AWS using Amazon Q Business while using access control lists (ACLs) to restrict access to documents based on user permissions.

Alation is a data intelligence company serving more than 600 global enterprises, including 40% of the Fortune 100. Customers rely on Alation to realize the value of their data and AI initiatives. Headquartered in Redwood City, California, Alation is an AWS Specialization Partner and AWS Marketplace Seller with Data and Analytics Competency. Organizations trust Alation’s platform for self-service analytics, cloud transformation, data governance, and AI-ready data, fostering innovation at scale. In this post, we will showcase a sample of how Alation’s business policies can be integrated with an Amazon Q Business application using a custom data source connector.

Finding accurate answers from content in custom data sources using Amazon Q Business

After you integrate Amazon Q Business with data sources such as Alation, users can ask questions from the description of the document. For example,

  1. What are the top sections of the HR benefits policies?
  2. Who are the data stewards for my proprietary database sources?

Overview of a custom connector

data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business offers multiple pre-built data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration. However, if you have valuable data residing in spots for which those pre-built connectors cannot be used, you can use a custom connector.

When you connect Amazon Q Business to a data source and initiate the data synchronization process, Amazon Q Business crawls and adds documents from the data source to its index.

You would typically use an Amazon Q Business custom connector when you have a repository that Amazon Business doesn’t yet provide a data source connector for. Amazon Q Business only provides metric information that you can use to monitor your data source sync jobs. You must create and run the crawler that determines the documents your data source indexes. A simple architectural representation of the steps involved is shown in the following figure.

Architecture Diagram

Solution overview

The solution shown of integrating Alation’s business policies is for demonstration purposes only. We recommend running similar scripts only on your own data sources after consulting with the team who manages them, or be sure to follow the terms of service for the sources that you’re trying to fetch data from. The steps involved for other custom data sources are very similar except the part where we connect to Alation and fetch data from it. To crawl and index contents in Alation you configure an Amazon Q Business custom connector as a data source in your Amazon Q Business application.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Configure your Alation connection

In your Alation cloud account, create an OAuth2 client application that can be consumed from an Amazon Q Business application.

  1. In Alation, sign in as a user with administrator privileges, navigate to the settings page, and choose Authentication (https://[[your-domain]].alationcloud.com/admin/auth/).

Alation Admin Settings

  1. In the OAuth Client Applications section, choose Add.

Alation OAuth Client Applications

  1. Enter an easily identifiable application name, and choose Save.

Create OAuth Client Application

  1. Take note of the OAuth client application data—the Client ID and the Client Secret—created and choose Close.

OAuth Client ID

  1. As a security best practice, storing the client application data in Secrets Manager is recommended. In AWS console, navigate to AWS Secrets Manager and add a new secret. Key in the Client_Id and Client_Secret values copied from the previous step.

AWS Secrets Manager - Create Secret1

  1. Provide a name and description for the secret and choose Next.

AWS Secrets Manager - Create Secret2

  1. Leave the defaults and choose Next.

AWS Secrets Manager - Create Secret3

  1. Choose Store in the last page.

AWS Secrets Manager - Create Secret4

Create sample Alation policies

In our example, you would create three different sets of Alation policies for a fictional organization named Unicorn Rentals. Grouped as Workplace, HR, and Regulatory, each policy contains a rough two-page summary of crucial organizational items of interest. You can find details on how to create policies on Alation documentation.

Alation Create Policies

On the Amazon Q Business side, let’s assume that we want to ensure that the following access policies are enforced. Users and access are setup via code illustrated in later sections.

# First name Last name Policies authorized for access
1 Alejandro Rosalez Workplace, HR, and Regulatory
2 Sofia Martinez Workplace and HR
3 Diego Ramirez Workplace and Regulatory

Create an Amazon Q Business application

  1. Sign in to the AWS Management Console and navigate to Amazon Q Business from the search bar at the top.
  1. On the Amazon Q Business console, choose Get Started.

Amazon Q Business Console

  1. On the Applications page, choose Create application.

Amazon Q Business List Applications

  1. In the first step of the Create application wizard, enter the default values. Additionally, you need to choose a list of users who require access to the Amazon Q Business application by including them through the IAM Identity Center settings.

Q Business Create Application1

  1. In the access management settings page, you would create and add users via AWS IAM Identity Center.

Q Business Create Application2

  1. Once all users are added, choose Create.

Q Business Create Application3

  1. After the application is created, take note of the Application ID value from the landing page.

Q Business Create Application4

  1. Next is to choose an index type for the Amazon Q Business application. Choose the native retriever option.

Q Business Create Index1

Q Business Create Index2

  1. After the index is created, verify that the status has changed to Active. You can then take a note of the Index ID.

Q Business Create Index3

  1. Next step is for you to add the custom data source.

Q Business Add Data Source1

  1. Search for Custom data source and choose the plus sign next to it.

Q Business Add Data Source2

  1. Provide a name and description for the custom data source.

Q Business Add Data Source2

  1. Once done, choose Add data source.

Q Business Add Data Source4

  1. After the data source is added and its status is Active, take note of the Data source ID.

Q Business Add Data Source4

Load policy data from Alation to Amazon Q Business using the custom connector

Now let’s load the Alation data into Amazon Q Business using the correct access permissions. The code examples that follow are also available on the accompanying GitHub code repository.

  1. With the connector ready, move over to the SageMaker Studio notebook and perform data synchronization operations by invoking Amazon Q Business APIs.
  2. To start, retrieve the Alation OAuth client application credentials stored in Secrets Manager.
    secrets_manager_client = boto3.client('secretsmanager')
    secret_name = "alation_test"
    
    try:
        get_secret_value_response = secrets_manager_client.get_secret_value(
            SecretId=secret_name
        )
        secret = eval(get_secret_value_response['SecretString'])
    
    except ClientError as e:
            raise e

  1. Next, initiate the connection using the OAuth client application credentials from Alation.
    base_url = "https://[[your-domain]].alationcloud.com"
    token_url = "/oauth/v2/token/"
    introspect_url = "/oauth/v2/introspect/"
    jwks_url = "/oauth/v2/.well-known/jwks.json/"
    
    api_url = base_url + token_url
    data = {
            "grant_type": "client_credentials",
           }
    client_id = secret['Client_Id']
    client_secret = secret['Client_Secret']
    
    auth = HTTPBasicAuth(username=client_id, password=client_secret)
    response = requests.post(url=api_url, data=data, auth=auth)
    print(response.json())
    
    access_token = response.json().get('access_token','')
    api_url = base_url + introspect_url + "?verify_token=true"
    data = {
            "token": access_token,
           }
    response = requests.post(url=api_url, data=data, auth=auth)
    

  1. You then configure policy type level user access. This section can be customized based on how user access information is stored on any data sources. Here, we assume a pre-set access based on the user’s email IDs.
    primary_principal_list = []
    workplace_policy_principals = []
    hr_policy_principals = []
    regulatory_policy_principals = []
    
    principal_user_email_ids = ['alejandro_rosalez@example.com', ‘sofia_martinez@example.com', ‘diego_martinez@example.com']
    
    workplace_policy_email_ids = ['alejandro_rosalez@example.com', 'sofia_martinez@example.com', 'diego_ramirez@example.com']
    hr_policy_email_ids = ['alejandro_rosalez@example.com', 'sofia_martinez@example.com']
    regulatory_policy_email_ids = ['alejandro_rosalez@example.com', 'diego_ramirez@example.com']
    
    for workplace_policy_member in workplace_policy_email_ids:
        workplace_policy_members_dict = { 'user': { 'id': workplace_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
        workplace_policy_principals.append(workplace_policy_members_dict)
        if workplace_policy_member not in primary_principal_list:
            primary_principal_list.append(workplace_policy_member)
    
    for hr_policy_member in hr_policy_email_ids:
        hr_policy_members_dict = { 'user': { 'id': hr_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
        hr_policy_principals.append(hr_policy_members_dict)
        if hr_policy_member not in primary_principal_list:
            primary_principal_list.append(hr_policy_member)
            
    for regulatory_policy_member in regulatory_policy_email_ids:
        regulatory_policy_members_dict = { 'user': { 'id': regulatory_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
        regulatory_policy_principals.append(regulatory_policy_members_dict)
        if regulatory_policy_member not in primary_principal_list:
            primary_principal_list.append(regulatory_policy_member)

  1. You then pull individual policy details from Alation. This step can be repeated for all three policy types: Workplace, HR, and regulatory
    url = "https://[[your-domain]].com/integration/v1/business_policies/?limit=200&skip=0&search=[[Workplace/HR/Regulatory]]&deleted=false"
    
    headers = {
        "accept": "application/json",
        "TOKEN": access_token
    }
    
    response = requests.get(url, headers=headers)
    policy_data = ""
    
    for policy in json.loads(response.text):
        if policy["title"] is not None:
            policy_title = cleanhtml(policy["title"])
        else:
            policy_title = "None"
        if policy["description"] is not None:
            policy_description = cleanhtml(policy["description"])
        else:
            policy_description = "None"
        temp_data = policy_title + ":n" + policy_description + "nn"
        policy_data += temp_data
    

  1. The next step is to define the Amazon Q Business application, index, and data source information that you created in the previous steps.
    qbusiness_client = boto3.client('qbusiness')
    application_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    index_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    data_source_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

  1. Now you explicitly create the users in Amazon Q Business. Individual user access to different policy type data sets is configured later.
    for principal in primary_principal_list:
        create_user_response = qbusiness_client.create_user(
            applicationId=application_id,
            userId=principal,
            userAliases=[
                {
                    'indexId': index_id,
                    'dataSourceId': data_source_id,
                    'userId': principal
                },
            ],
        )
    
    for principal in primary_principal_list:
        get_user_response = qbusiness_client.get_user(
            applicationId=application_id,
            userId=principal
        )
        for user_alias in get_user_response['userAliases']:
            if "dataSourceId" in user_alias:
                print(user_alias['userId'])

  1. For each policy type data set (Workplace, HR, and Regulatory), we execute the following three steps.
    1. Start an Amazon Q Business data source sync job.
      start_data_source_sync_job_response = qbusiness_client.start_data_source_sync_job(
          dataSourceId = data_source_id,
          indexId = index_id,
          applicationId = application_id
      )
      job_execution_id = start_data_source_sync_job_response['executionId']

    1. Encode and batch upload data with user access mapping.
      workplace_policy_document_id = hashlib.shake_256(policy_data.encode('utf-8')).hexdigest(128)
          docs = [ {
              "id": policy_document_id,
              "content" : {
                  'blob': policy_data.encode('utf-8')
              },
              "contentType": "PLAIN_TEXT",
              "title": "Unicorn Rentals – Workplace/HR/Regulatory Policy",
              "accessConfiguration" : { 'accessControls': [ { 'principals': [[xx]]_policy_principals } ] }   
          }    
          ]
          
          batch_put_document_response = qbusiness_client.batch_put_document(
              applicationId = application_id,
              indexId = index_id,
              dataSourceSyncId = job_execution_id,
              documents = docs,
          )

    1. Stop the data source sync job and wait for the data set to be indexed.
      stop_data_source_sync_job_response = qbusiness_client.stop_data_source_sync_job(
              dataSourceId = data_source_id,
              indexId = index_id,
              applicationId = application_id
          )
          max_time = time.time() + 1*60*60
          found = False
          while time.time() < max_time and bool(found) == False:
              list_documents_response = qbusiness_client.list_documents(
                  applicationId=application_id,
                  indexId=index_id
              )
              if list_documents_response:
                  for document in list_documents_response["documentDetailList"]:
                      if document["documentId"] == workplace_policy_document_id:
                          status = document["status"]
                          print(status)
                          if status == "INDEXED" or status == "FAILED" or status == "DOCUMENT_FAILED_TO_INDEX" or status == "UPDATED":
                              found = True        
                          else:
                              time.sleep(10)        
      except:
          print("Exception when calling API")

  1. Go back to the Amazon Q Business console and see if the data uploads were successful.

Q Business Sync Status1

  1. Find and open the custom data source from the list of data sources.

Q Business Sync Status2

  1. Ensure the ingested documents are added in the Sync history tab and are in the Completed status.

Q Business Sync Status3

  1. Also ensure the Last sync status for the custom data source connector is Completed.

Q Business Sync Status5

Run queries with the Amazon Q Business web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q Business. With the newly created Amazon Q Business application, select the Web Application settings tab and navigate to the auto-created URL. This will open a new tab with a preview of the user interface and options that you can customize to fit your use case.

Q Business Web Experience

  1. Sign in as user Alejandro Rosales. As you might recall, Alejandro has access to all three policy type data sets (Workplace, HR and Regulator).
    1. Start by asking a question about HR policy, such as “Per the HR Payroll Policy of Unicorn Rents, what are some additional voluntary deductions taken from employee paychecks.” Note how Q Business provides an answers and also shows where it pulled the answer from.

    Q Business Web Experience

    1. Next, ask a question about a Regulatory policy: “Per the PCI DSS compliance policy of Unicorn Rentals, how is the third-party service provider access to cardholder information protected?” The result includes the summarized answer on PCI DSS compliance and also shows sources where it gathered the data from.

    Q Business Web Experience

    1. Lastly, see how Amazon Q Business responds when asked a question about generic workplace policy. “What does Unicorn Rentals do to protect information of children under the age of 13.” In this case, the application returns the answer and marks it as a Workplace policy question.

    Q Business Web Experience

  1. Let’s next sign in as Sofia Martinez. Sofia has access to HR and Workplace policy types, but not to Regulatory policies.
    1. Start by asking a question about HR policy: “Per the HR Payroll Policy of Unicorn Rentals, list the additional voluntary deductions taken from employee paychecks.” Note how Q Business list the deductions and cite policy where the answer is gathered from.

    Q Business Web Experience

    1. Next, ask a Regulatory policy question: “What are the record keeping requirements mentioned in the ECOA compliance policy of Unicorn Rentals?”. Note how Amazon Q Business contextually answers the question mentioning Sofia does not have access to that data –

    Q Business Web Experience

  1. Finally, sign in as Diego Ramirez. Diego has access to Workplace and Regulatory policies but not to HR policies.
    1. Start by asking the same Regulatory policy question that: “Per the PCI DSS compliance policy of Unicorn Rentals, how is third-party service provider access to cardholder information protected?”. Since Diego has access to Regulatory policy data, expected answer is generated.

    Q Business Web Experience

    1. Next, when Diego asks a question about a HR policy: “Per the HR Compensation Policy of Unicorn Rentals, how is job pricing determined?.” Note how Amazon Q Business contextually answers the question mentioning Diego does not have access to that data.

    Q Business Web Experience

Troubleshooting

If you’re unable to get answers to any of your questions and get the message “Sorry, I could not find relevant information to complete your request,” check to see if any of the following issues apply:

  • No permissions: ACLs applied to your account doesn’t allow you to query certain data sources. If this is the case, please reach out to your application administrator to ensure your ACLs are configured to access the data sources.
  • EmailID not matching UserID: In rare scenarios, a user might have a different email ID associated with the Amazon Q Business Identity Center connection than is associated in the data source’s user profile. Make sure that the Amazon Q Business user profile is updated to recognize the email ID using the update-user CLI command or the related API call.
  • Data connector sync failed: Data connector fails to synchronize information from the source to Amazon Q Business application. Verify the data connectors sync run schedule and sync history to help ensure that the synchronization is successful.
  • Empty or private data sources: Private or empty projects will not be crawled during the synchronization run.

If none of the above are true then open a support case to get this resolved.

Clean up

To avoid incurring future charges, clean up any resources created as part of this solution. Delete the Amazon Q Business custom connector data source and client application created in Alation and the Amazon Q Business application. Next, delete the Secrets Manager secret with Alation OAuth client application credential data. Also, delete the user management setup in IAM Identity Center and the SageMaker Studio domain.

Conclusion

In this post, we discussed how to configure the Amazon Q Business custom connector to crawl and index tasks from Alation as a sample. We showed how you can use Amazon Q Business generative AI-based search to enable your business leaders and agents discover insights from your enterprise data.

To learn more about the Amazon Q Business custom connector, see the Amazon Q Business developer guide. To learn more about Alation Data Catalog, which is available for purchase through AWS Marketplace. Speak to your Alation account representative for custom purchase options. For any additional information, contact your Alation business partner.

AWS Partner Network Alation

Alation – AWS Partner Spotlight

Alation is an AWS Specialization Partner that has pioneered the modern data catalog and is making the leap into a full-service source for data intelligence. Alation is passionate about helping enterprises create thriving data cultures where anyone can find, understand, and trust data.

Contact Alation | Partner Overview | AWS Marketplace


About the Authors

Gene ArnoldGene Arnold is a Product Architect with Alation’s Forward Deployed Engineering team. A curious learner with over 25 years of experience, Gene focuses how to sharpen selling skills and constantly explores new product lines.

Prabhakar ChandrasekaranPrabhakar Chandrasekaran is a Senior Technical Account Manager with AWS Enterprise Support. Prabhakar enjoys helping customers build cutting-edge AI/ML solutions on the cloud. He also works with enterprise customers providing proactive guidance and operational assistance, helping them improve the value of their solutions when using AWS. Prabhakar holds eight AWS and seven other professional certifications. With over 21 years of professional experience, Prabhakar was a data engineer and a program leader in the financial services space prior to joining AWS.

Sindhu JambunathanSindhu Jambunathan is a Senior Solutions Architect at AWS, specializing in supporting ISV customers in the data and generative AI vertical to build scalable, reliable, secure, and cost-effective solutions on AWS. With over 13 years of industry experience, she joined AWS in May 2021 after a successful tenure as a Senior Software Engineer at Microsoft. Sindhu’s diverse background includes engineering roles at Qualcomm and Rockwell Collins, complemented by a Master’s of Science in Computer Engineering from the University of Florida. Her technical expertise is balanced by a passion for culinary exploration, travel, and outdoor activities.

Prateek JainPrateek Jain is a Sr. Solutions Architect with AWS, based out of Atlanta Georgia. He is passionate about GenAI and helping customers build amazing solutions on AWS. In his free time, he enjoys spending time with Family and playing tennis.

Read More

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models’ compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results from a wide array of state-of-the-art MLLMs reveal significant variations in performance, highlighting areas for improvement in instruction fidelity. Additionally, we create extra training data and…Apple Machine Learning Research

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

Today, we’re excited to announce that Mistral-Small-24B-Instruct-2501—a twenty-four billion parameter large language model (LLM) from Mistral AI that’s optimized for low latency text generation tasks—is available for customers through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace. Amazon Bedrock Marketplace is a new capability in Amazon Bedrock that developers can use to discover, test, and use over 100 popular, emerging, and specialized foundation models (FMs) alongside the current selection of industry-leading models in Amazon Bedrock. These models are in addition to the industry-leading models that are already available on Amazon Bedrock. You can also use this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference. In this post, we walk through how to discover, deploy, and use Mistral-Small-24B-Instruct-2501.

Overview of Mistral Small 3 (2501)

Mistral Small 3 (2501), a latency-optimized 24B-parameter model released under Apache 2.0 maintains a balance between performance and computational efficiency. Mistral offers both the pretrained (Mistral-Small-24B-Base-2501) and instruction-tuned (Mistral-Small-24B-Instruct-2501) checkpoints of the model under Apache 2.0. Mistral Small 3 (2501) features a 32 k token context window. According to Mistral, the model demonstrates strong performance in code, math, general knowledge, and instruction following compared to its peers. Mistral Small 3 (2501) is designed for the 80% of generative AI tasks that require robust language and instruction following performance with very low latency. The instruction-tuning process is focused on improving the model’s ability to follow complex directions, maintain coherent conversations, and generate accurate, context-aware responses. The 2501 version follows previous iterations (Mistral-Small-2409 and Mistral-Small-2402) released in 2024, incorporating improvements in instruction-following and reliability. Currently, the instruct version of this model, Mistral-Small-24B-Instruct-2501 is available for customers to deploy and use on SageMaker JumpStart and Bedrock Marketplace.

Optimized for conversational assistance

Mistral Small 3 (2501) excels in scenarios where quick, accurate responses are critical, such as in virtual assistants. This includes virtual assistants where users expect immediate feedback and near real-time interactions. Mistral Small 3 (2501) can handle rapid function execution when used as part of automated or agentic workflows. The architecture is designed to typically respond in less than 100 milliseconds, according to Mistral, making it ideal for customer service automation, interactive assistance, live chat, and content moderation.

Performance metrics and benchmarks

According to Mistral, the instruction-tuned version of the model achieves over 81% accuracy on Massive Multitask Language Understanding (MMLU) with 150 tokens per second latency, making it currently the most efficient model in its category. In third-party evaluations conducted by Mistral, the model demonstrates competitive performance against larger models such as Llama 3.3 70B and Qwen 32B. Notably, Mistral claims that the model performs at the same level as Llama 3.3 70B instruct and is more than three times faster on the same hardware.

SageMaker JumpStart overview

SageMaker JumpStart is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is model hubs, which offer a vast catalog of pre-trained models, such as Mistral, for a variety of tasks.

You can now discover and deploy Mistral models in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with Amazon SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in a secure AWS environment and under your VPC controls, helping to support data security for enterprise security needs.

Prerequisites

To try Mistral-Small-24B-Instruct-2501 in SageMaker JumpStart, you need the following prerequisites:

Amazon Bedrock Marketplace overview

To get started, in the AWS Management Console for Amazon Bedrock, select Model catalog in the Foundation models section of the navigation pane. Here, you can search for models that help you with a specific use case or language. The results of the search include both serverless models and models available in Amazon Bedrock Marketplace. You can filter results by provider, modality (such as text, image, or audio), or task (such as classification or text summarization).

Deploy Mistral-Small-24B-Instruct-2501 in Amazon Bedrock Marketplace

To access Mistral-Small-24B-Instruct-2501 in Amazon Bedrock, complete the following steps:

  1. On the Amazon Bedrock console, select Model catalog under Foundation models in the navigation pane.

At the time of writing this post, you can use the InvokeModel API to invoke the model. It doesn’t support Converse APIs or other Amazon Bedrock tooling.

  1. Filter for Mistral as a provider and select the Mistral-Small-24B-Instruct-2501

The model detail page provides essential information about the model’s capabilities, pricing structure, and implementation guidelines. You can find detailed usage instructions, including sample API calls and code snippets for integration.

The page also includes deployment options and licensing information to help you get started with Mistral-Small-24B-Instruct-2501 in your applications.

  1. To begin using Mistral-Small-24B-Instruct-2501, choose Deploy.
  2. You will be prompted to configure the deployment details for Mistral-Small-24B-Instruct-2501. The model ID will be pre-populated.
    1. For Endpoint name, enter an endpoint name (up to 50 alphanumeric characters).
    2. For Number of instances, enter a number between 1and 100.
    3. For Instance type, select your instance type. For optimal performance with Mistral-Small-24B-Instruct-2501, a GPU-based instance type such as ml.g6.12xlarge is recommended.
    4. Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you might want to review these settings to align with your organization’s security and compliance requirements.
  3. Choose Deploy to begin using the model.

When the deployment is complete, you can test Mistral-Small-24B-Instruct-2501 capabilities directly in the Amazon Bedrock playground.

  1. Choose Open in playground to access an interactive interface where you can experiment with different prompts and adjust model parameters such as temperature and maximum length.

When using Mistral-Small-24B-Instruct-2501 with the Amazon Bedrock InvokeModel and Playground console, use DeepSeek’s chat template for optimal results. For example, <|begin▁of▁sentence|><|User|>content for inference<|Assistant|>.

This is an excellent way to explore the model’s reasoning and text generation abilities before integrating it into your applications. The playground provides immediate feedback, helping you understand how the model responds to various inputs and letting you fine-tune your prompts for optimal results.

You can quickly test the model in the playground through the UI. However, to invoke the deployed model programmatically with Amazon Bedrock APIs, you need to get the endpoint Amazon Resource Name (ARN).

Discover Mistral-Small-24B-Instruct-2501 in SageMaker JumpStart

You can access Mistral-Small-24B-Instruct-2501 through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

  1. In the SageMaker Studio console, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
  2. Select HuggingFace.
  3. From the SageMaker JumpStart landing page, search for Mistral-Small-24B-Instruct-2501 using the search box.
  4. Select a model card to view details about the model such as license, data used to train, and how to use the model. Choose Deploy to deploy the model and create an endpoint.

Deploy Mistral-Small-24B-Instruct-2501 with the SageMaker SDK

Deployment starts when you choose Deploy. After deployment finishes, you will see that an endpoint is created. Test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

  1. To deploy using the SDK, start by selecting the Mistral-Small-24B-Instruct-2501 model, specified by the model_id with the value mistral-small-24B-instruct-2501. You can deploy your choice of the selected models on SageMaker using the following code. Similarly, you can deploy Mistral-Small-24b-Instruct-2501 using its model ID.
    from sagemaker.jumpstart.model import JumpStartModel 
    
    accept_eula = True 
    
    model = JumpStartModel(model_id="huggingface-llm-mistral-small-24b-instruct-2501") 
    predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The EULA value must be explicitly defined as True to accept the end-user license agreement (EULA). See AWS service quotas for how to request a service quota increase.

  1. After the model is deployed, you can run inference against the deployed endpoint through the SageMaker predictor:
    prompt = "Hello!"
    payload = {
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "max_tokens": 4000,
        "temperature": 0.1,
        "top_p": 0.9,
    }
        
    response = predictor.predict(payload)
    print(response['choices'][0]['message']['content']) 

Retail math example

Here’s an example of how Mistral-Small-24B-Instruct-2501 can break down a common shopping scenario. In this case, you ask the model to calculate the final price of a shirt after applying multiple discounts—a situation many of us face while shopping. Notice how the model provides a clear, step-by-step solution to follow.

prompt = "A store is having a 20% off sale, and you have an additional 10% off coupon. If you buy a shirt that originally costs $50, how much will you pay?"
payload = {
    "messages": [
        {
            "role": "user",
            "content": prompt
        }
    ],
    "max_tokens": 1000,
    "temperature": 0.1,
    "top_p": 0.9,
}
    
response = predictor.predict(payload)
print(response['choices'][0]['message']['content']) 

The following is the output:

First, we'll apply the 20% off sale discount to the original price of the shirt.

20% of $50 is calculated as:
0.20 * $50 = $10

So, the price after the 20% discount is:
$50 - $10 = $40

Next, we'll apply the additional 10% off coupon to the new price of $40.

10% of $40 is calculated as:
0.10 * $40 = $4

So, the price after the additional 10% discount is:
$40 - $4 = $36

Therefore, you will pay $36 for the shirt.

The response shows clear step-by-step reasoning without introducing incorrect information or hallucinated facts. Each mathematical step is explicitly shown, making it simple to verify the accuracy of the calculations.

Clean up

To avoid unwanted charges, complete the following steps in this section to clean up your resources.

Delete the Amazon Bedrock Marketplace deployment

If you deployed the model using Amazon Bedrock Marketplace, complete the following steps:

  1. On the Amazon Bedrock console, under Foundation models in the navigation pane, select Marketplace deployments.
  2. In the Managed deployments section, locate the endpoint you want to delete.
  3. Select the endpoint, and on the Actions menu, select Delete.
  4. Verify the endpoint details to make sure you’re deleting the correct deployment:
    1. Endpoint name
    2. Model name
    3. Endpoint status
  5. Choose Delete to delete the endpoint.
  6. In the deletion confirmation dialog, review the warning message, enter confirm, and choose Delete to permanently remove the endpoint.

Delete the SageMaker JumpStart predictor

After you’re done running the notebook, make sure to delete all resources that you created in the process to avoid additional billing. For more details, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mistral-Small-24B-Instruct-2501 in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

For more Mistral resources on AWS, check out the Mistral-on-AWS GitHub repo.


About the Authors

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.

Shane Rai is a Principal Generative AI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML services offered by AWS, including model offerings from top tier foundation model providers.

Avan Bala is a Solutions Architect at AWS. His area of focus is AI for DevOps and machine learning. He holds a bachelor’s degree in Computer Science with a minor in Mathematics and Statistics from the University of Maryland. Avan is currently working with the Enterprise Engaged East Team and likes to specialize in projects about emerging AI technologies.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, the machine learning and generative AI hub provided by SageMaker. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.

Read More

NVIDIA Earth-2 Features First Gen AI to Power Weather Super-Resolution for Continental US

NVIDIA Earth-2 Features First Gen AI to Power Weather Super-Resolution for Continental US

To better prepare communities for extreme weather, forecasters first need to see exactly where it’ll land.

That’s why weather agencies and climate scientists around the world are harnessing NVIDIA CorrDiff, a generative AI weather model that enables kilometer-scale forecasts of wind, temperature, and precipitation type and amount. It’s part of the NVIDIA Earth-2 platform for simulating weather and climate conditions.

The paper behind CorrDiff was featured today in Communications Earth and Environment, part of the Nature portfolio of scientific journals. Available as an easy-to-deploy NVIDIA NIM microservice, the model is already being used by weather technology companies, researchers and government agencies to enhance their forecasts.

With the rising frequency of extreme weather events, fast, high-resolution predictions of weather phenomena could help mitigate risks to people, communities and economies by supporting risk assessment, evacuation planning, disaster management and the development of climate-resilient infrastructure.

Weather agencies and startups across the globe are adopting CorrDiff and other Earth-2 tools to improve the resolution and precision of forecasts for extreme weather phenomena, renewable energy management and agricultural planning.

High-Fidelity Forecasts on the Horizon

CorrDiff uses generative AI to sharpen the precision of coarse-resolution weather models — resolving atmospheric data from 25-kilometer scale down to 2 kilometers using diffusion modeling, the same kind of AI model architecture that powers today’s text-to-image generation services.

In addition to boosting image resolution, CorrDiff can also predict related variables that weren’t present in the input data — such as radar reflectivity, which is used as an indicator of rain location and intensity.

CorrDiff was trained on the Weather Research and Forecasting model’s numerical simulations to generate weather patterns at 12x higher resolution.

The initial CorrDiff model, announced at NVIDIA GTC 2024 and described in the Communications Earth and Environment paper, was optimized on Taiwan weather data in collaboration with its Central Weather Administration.

NVIDIA researchers and engineers next worked to efficiently scale the model to cover a larger section of the globe. The version released as an NVIDIA NIM microservice at Supercomputing 2024 covers the continental United States — trained on U.S. weather data, with sample datasets of real-world natural disasters including hurricanes, floods, winter storms, tornados and cold waves.

The optimized CorrDiff NIM microservice for U.S. data is 500x faster and 10,000x more energy-efficient than traditional high-resolution numerical weather prediction using CPUs.

The research team behind CorrDiff continues to advance the model’s capabilities, and has released additional generative AI diffusion models showing how the model could be enhanced to more robustly resolve small-scale details in different environments — and better capture rare or extreme weather events.

CorrDiff could also help with downwash prediction — when strong winds funnel down to street level, damaging buildings and affecting pedestrians — in urban areas.

Weather Agencies Put CorrDiff on the Map

Meteorological agencies and companies around the globe are tapping CorrDiff to accelerate predictions with applications in regional forecasting, renewable energy and disaster management.

Taiwan’s National Science and Technology Center for Disaster Reduction, for instance, has deployed the CorrDiff to support disaster alerts in the region, enabling an estimated gigawatt-hour of energy savings due to the energy efficiency of CorrDiff running on the NVIDIA AI platform. CorrDiff predictions are embedded in the center’s disaster monitoring site, helping Taiwan forecasters better prepare for typhoons.

Discover Earth-2 at NVIDIA GTC 

Learn more about AI applications using Earth-2 at NVIDIA GTC, the global AI conference taking place March 17-21 in San Jose, California. Relevant sessions include:

Read More

How Rocket Companies modernized their data science solution on AWS

How Rocket Companies modernized their data science solution on AWS

This post was written with Dian Xu and Joel Hawkins of Rocket Companies.

Rocket Companies is a Detroit-based FinTech company with a mission to “Help Everyone Home”. With the current housing shortage and affordability concerns, Rocket simplifies the homeownership process through an intuitive and AI-driven experience. This comprehensive framework streamlines every step of the homeownership journey, empowering consumers to search, purchase, and manage home financing effortlessly. Rocket integrates home search, financing, and servicing in a single environment, providing a seamless and efficient experience.

The Rocket brand is a synonym for offering simple, fast, and trustworthy digital solutions for complex transactions. Rocket is dedicated to helping clients realize their dream of homeownership and financial freedom. Since its inception, Rocket has grown from a single mortgage lender to an network of businesses that creates new opportunities for its clients.

Rocket takes a complicated process and uses technology to make it simpler. Applying for a mortgage can be complex and time-consuming. That’s why we use advanced technology and data analytics to streamline every step of the homeownership experience, from application to closing. By analyzing a wide range of data points, we’re able to quickly and accurately assess the risk associated with a loan, enabling us to make more informed lending decisions and get our clients the financing they need.

Our goal at Rocket is to provide a personalized experience for both our current and prospective clients. Rocket’s diverse product offerings can be customized to meet specific client needs, while our team of skilled bankers must match with the best client opportunities that align with their skills and knowledge. Maintaining strong relationships with our large, loyal client base and hedge positions to cover financial obligations is key to our success. With the volume of business we do, even small improvements can have a significant impact.

In this post, we share how we modernized Rocket’s data science solution on AWS to increase the speed to delivery from eight weeks to under one hour, improve operational stability and support by reducing incident tickets by over 99% in 18 months, power 10 million automated data science and AI decisions made daily, and provide a seamless data science development experience.

Rocket’s legacy data science environment challenges

Rocket’s previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools. The Hadoop environment was hosted on Amazon Elastic Compute Cloud (Amazon EC2) servers, managed in-house by Rocket’s technology team, while the data science experience infrastructure was hosted on premises. Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink.

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. Apache HBase was employed to offer real-time key-based access to data. Model training and scoring was performed either from Jupyter notebooks or through jobs scheduled by Apache’s Oozie orchestration tool, which was part of the Hadoop implementation.

Despite the benefits of this architecture, Rocket faced challenges that limited its effectiveness:

  • Accessibility limitations: The data lake was stored in HDFS and only accessible from the Hadoop environment, hindering integration with other data sources. This also led to a backlog of data that needed to be ingested.
  • Steep learning curve for data scientists: Many of Rocket’s data scientists did not have experience with Spark, which had a more nuanced programming model compared to other popular ML solutions like scikit-learn. This created a challenge for data scientists to become productive.
  • Responsibility for maintenance and troubleshooting: Rocket’s DevOps/Technology team was responsible for all upgrades, scaling, and troubleshooting of the Hadoop cluster, which was installed on bare EC2 instances. This resulted in a backlog of issues with both vendors that remained unresolved.
  • Balancing development vs. production demands: Rocket had to manage work queues between development and production, which were always competing for the same resources.
  • Deployment challenges: Rocket sought to support more real-time and streaming inferencing use cases, but this was limited by the capabilities of MLeap for real-time models and Spark Streaming for streaming use cases, which were still experimental at that time.
  • Inadequate data security and DevOps support – The previous solution lacked robust security measures, and there was limited support for development and operations of the data science work.

Rocket’s legacy data science architecture is shown in the following diagram.

The diagram depicts the flow; the key components are detailed below:

  1. Data Ingestion: Data is ingested into the system using Attunity data ingestion in Spark SQL.
  2. Data Storage and Processing: All compute is done as Spark jobs inside of a Hadoop cluster using Apache Livy and Spark. Data is stored in HDFS and is accessed via Hive, which provides a tabular interface to the data and integrates with Spark SQL. HBase is employed to offer real-time key-based access to data.
  3. Model Development: Data exploration and model development are conducted using tools such as Jupyter or Orchestration, which communicate with the Spark server over Kerberized Livy connection.
  4. Model Training and Scoring: Model training and scoring is performed either from Jupyter notebooks or through jobs scheduled by Apache’s Oozie orchestration tool, which is part of the Hadoop implementation.

Rocket’s migration journey

At Rocket, we believe in the power of continuous improvement and constantly seek out new opportunities. One such opportunity is using data science solutions, but to do so, we must have a strong and flexible data science environment.

To address the legacy data science environment challenges, Rocket decided to migrate its ML workloads to the Amazon SageMaker AI suite. This would allow us to deliver more personalized experiences and understand our customers better. To promote the success of this migration, we collaborated with the AWS team to create automated and intelligent digital experiences that demonstrated Rocket’s understanding of its clients and kept them connected.

We implemented an AWS multi-account strategy, standing up Amazon SageMaker Studio in a build account using a network-isolated Amazon VPC. This allows us to separate development and production environments, while also improving our security stance.

We moved our new work to SageMaker Studio and our legacy Hadoop workloads to Amazon EMR, connecting to the old Hadoop cluster using Livy and SageMaker notebooks to ease the transition. This gives us access to a wider range of tools and technologies, enabling us to choose the most appropriate ones for each problem we’re trying to solve.

In addition, we moved our data from HDFS to Amazon Simple Storage Service (Amazon S3), and now use Amazon Athena and AWS Lake Formation to provide proper access controls to production data. This makes it easier to access and analyze the data, and to integrate it with other systems. The team also provides secure interactive integration through Amazon Elastic Kubernetes Service (Amazon EKS), further improving the company’s security stance.

SageMaker AI has been instrumental in empowering our data science community with the flexibility to choose the most appropriate tools and technologies for each problem, resulting in faster development cycles and higher model accuracy. With SageMaker Studio, our data scientists can seamlessly develop, train, and deploy models without the need for additional infrastructure management.

As a result of this modernization effort, SageMaker AI enabled Rocket to scale our data science solution across Rocket Companies and integrate using a hub-and-spoke model. The ability of SageMaker AI to automatically provision and manage instances has allowed us to focus on our data science work rather than infrastructure management, increasing the number of models in production by five times and data scientists’ productivity by 80%.

Our data scientists are empowered to use the most appropriate technology for the problem at hand, and our security stance has improved. Rocket can now compartmentalize data and compute, as well as compartmentalize development and production. Additionally, we are able to provide model tracking and lineage using Amazon SageMaker Experiments and artifacts discoverable using the SageMaker model registry and Amazon SageMaker Feature Store. All the data science work has now been migrated onto SageMaker, and all the old Hadoop work has been migrated to Amazon EMR.

Overall, SageMaker AI has played a critical role in enabling Rocket’s modernization journey by building a more scalable and flexible ML framework, reducing operational burden, improving model accuracy, and accelerating deployment times.

The successful modernization allowed Rocket to overcome our previous limitations and better support our data science efforts. We were able to improve our security stance, make work more traceable and discoverable, and give our data scientists the flexibility to choose the most appropriate tools and technologies for each problem. This has helped us better serve our customers and drive business growth.

Rocket’s new data science solution architecture on AWS is shown in the following diagram.

The solution consists of the following components:

  1. Data ingestion: Data is ingested into the data account from on-premises and external sources.
  2. Data refinement: Raw data is refined into consumable layers (raw, processed, conformed, and analytical) using a combination of AWS Glue extract, transform, and load (ETL) jobs and EMR jobs.
  3. Data access: Refined data is registered in the data account’s AWS Glue Data Catalog and exposed to other accounts via Lake Formation. Analytic data is stored in Amazon Redshift. Lake Formation makes this data available to both the build and compute accounts. For the build account, access to production data is restricted to read-only.
  4. Development: Data science development is done using SageMaker Studio. Data engineering development is done using AWS Glue Studio. Both disciplines have access to Amazon EMR for Spark development. Data scientists have access to the entire SageMaker ecosystem in the build account.
  5. Deployment: SageMaker trained models developed in the build account are registered with an MLFlow instance. Code artifacts for both data science activities and data engineering activities are stored in Git. Deployment initiation is controlled as part of CI/CD.
  6. Workflows: We have a number of workflow triggers. For online scoring, we typically provide an external-facing endpoint using Amazon EKS with Istio. We have numerous jobs that are launched by AWS Lambda functions that in turn are triggered by timers or events. Processes that run may include AWS Glue ETL jobs, EMR jobs for additional data transformations or model training and scoring activities, or SageMaker pipelines and jobs performing training or scoring activities.

Migration impact

We’ve evolved a long way in modernizing our infrastructure and workloads. We started our journey supporting six business channels and 26 models in production, with dozens in development. Deployment times stretched for months and required a team of three system engineers and four ML engineers to keep everything running smoothly. Despite the support of our internal DevOps team, our issue backlog with the vendor was an unenviable 200+.

Today, we are supporting nine organizations and over 20 business channels, with a whopping 210+ models in production and many more in development. Our average deployment time has gone from months to just weeks—sometimes even down to mere days! With just one part-time ML engineer for support, our average issue backlog with the vendor is practically non-existent. We now support over 120 data scientists, ML engineers, and analytical roles. Our framework mix has expanded to include 50% SparkML models and a diverse range of other ML frameworks, such as PyTorch and scikit-learn. These advancements have given our data science community the power and flexibility to tackle even more complex and challenging projects with ease.

The following table compares some of our metrics before and after migration.

. Before Migration After Migration
Speed to Delivery New data ingestion project took 4–8 weeks Data-driven ingestion takes under one hour
Operation Stability and Supportability Over a hundred incidents and tickets in 18 months Fewer incidents: one per 18 months
Data Science Data scientists spent 80% of their time waiting on their jobs to run Seamless data science development experience
Scalability Unable to scale Powers 10 million automated data science and AI decisions made daily

Lessons learned

Throughout the journey of modernizing our data science solution, we’ve learned valuable lessons that we believe could be of great help to other organizations who are planning to undertake similar endeavors.

First, we’ve come to realize that managed services can be a game changer in optimizing your data science operations.

The isolation of development into its own account while providing read-only access to production data is a highly effective way of enabling data scientists to experiment and iterate on their models without putting your production environment at risk. This is something that we’ve achieved through the combination of SageMaker AI and Lake Formation.

Another lesson we learned is the importance of training and onboarding for teams. This is particularly true for teams that are moving to a new environment like SageMaker AI. It’s crucial to understand the best practices of utilizing the resources and features of SageMaker AI, and to have a solid understanding of how to move from notebooks to jobs.

Lastly, we found that although Amazon EMR still requires some tuning and optimization, the administrative burden is much lighter compared to hosting directly on Amazon EC2. This makes Amazon EMR a more scalable and cost-effective solution for organizations who need to manage large data processing workloads.

Conclusion

This post provided overview of the successful partnership between AWS and Rocket Companies. Through this collaboration, Rocket Companies was able to migrate many ML workloads and implement a scalable ML framework. Ongoing with AWS, Rocket Companies remains committed to innovation and staying at the forefront of customer satisfaction.

Don’t let legacy systems hold back your organization’s potential. Discover how AWS can assist you in modernizing your data science solution and achieving remarkable results, similar to those achieved by Rocket Companies.


About the Authors

Dian Xu is the Senior Director of Engineering in Data at Rocket Companies, where she leads transformative initiatives to modernize enterprise data platforms and foster a collaborative, data-first culture. Under her leadership, Rocket’s data science, AI & ML platforms power billions of automated decisions annually, driving innovation and industry disruption. A passionate advocate for Gen AI and cloud technologies, Xu is also a sought-after speaker at global forums, inspiring the next generation of data professionals. Outside of work, she channels her love of rhythm into dancing, embracing styles from Bollywood to Bachata as a celebration of cultural diversity.

Joel Hawkins is a Principal Data Scientist at Rocket Companies, where he is responsible for the data science and MLOps platform. Joel has decades of experience developing sophisticated tooling and working with data at large scales. A driven innovator, he works hand in hand with data science teams to ensure that we have the latest technologies available to provide cutting edge solutions. In his spare time, he is an avid cyclist and has been known to dabble in vintage sports car restoration.

Venkata Santosh Sajjan Alla is a Senior Solutions Architect at AWS Financial Services. He partners with North American FinTech companies like Rocket and other financial services organizations to drive cloud and AI strategy, accelerating AI adoption at scale. With deep expertise in AI & ML, Generative AI, and cloud-native architecture, he helps financial institutions unlock new revenue streams, optimize operations, and drive impactful business transformation. Sajjan collaborates closely with Rocket Companies to advance its mission of building an AI-fueled homeownership platform to Help Everyone Home. Outside of work, he enjoys traveling, spending time with his family, and is a proud father to his daughter.

Alak EswaradassAlak Eswaradass is a Principal Solutions Architect at AWS based in Chicago, IL. She is passionate about helping customers design cloud architectures using AWS services to solve business challenges and is enthusiastic about solving a variety of ML use cases for AWS customers. When she’s not working, Alak enjoys spending time with her daughters and exploring the outdoors with her dogs.

Read More

AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect

AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect

Providing effective multilingual customer support in global businesses presents significant operational challenges. Through collaboration between AWS and DXC Technology, we’ve developed a scalable voice-to-voice (V2V) translation prototype that transforms how contact centers handle multi-lingual customer interactions.

In this post, we discuss how AWS and DXC used Amazon Connect and other AWS AI services to deliver near real-time V2V translation capabilities.

Challenge: Serving customers in multiple languages

In Q3 2024, DXC Technology approached AWS with a critical business challenge: their global contact centers needed to serve customers in multiple languages without the exponential cost of hiring language-specific agents for the lower volume languages. Previously, DXC had explored several existing alternatives but found limitations in each approach – from communication constraints to infrastructure requirements that impacted reliability, scalability, and operational costs. DXC and AWS decided to organize a focused hackathon where DXC and AWS Solution Architects collaborated to:

  • Define essential requirements for real-time translation
  • Establish latency and accuracy benchmarks
  • Create seamless integration paths with existing systems
  • Develop a phased implementation strategy
  • Prepare and test an initial proof of concept setup

Business impact

For DXC, this prototype was used as an enabler, allowing technical talent maximization, operational transformation, and cost improvements through:

  • Best technical expertise delivery – Hiring and matching agents based on technical knowledge rather than spoken language, making sure customers get top technical support regardless of language barriers
  • Global operational flexibility – Removing geographical and language constraints in hiring, placement, and support delivery while maintaining consistent service quality across all languages
  • Cost reduction – Eliminating multi-language expertise premiums, specialized language training, and infrastructure costs through pay-per-use translation model
  • Similar experience to native speakers – Maintaining natural conversation flow with near real-time translation and audio feedback, while delivering premium technical support in customer’s preferred language

Solution overview

The Amazon Connect V2V translation prototype uses AWS advanced speech recognition and machine translation technologies to enable real-time conversation translation between agents and customers, allowing them to speak in their preferred languages while having natural conversations. It consists of the following key components:

  • Speech recognition – The customer’s spoken language is captured and converted into text using Amazon Transcribe, which serves as the speech recognition engine. The transcript (text) is then fed into the machine translation engine.
  • Machine translation – Amazon Translate, the machine translation engine, translates the customer’s transcript into the agent’s preferred language in near real time. The translated transcript is converted back into speech using Amazon Polly, which serves as the text-to-speech engine.
  • Bidirectional translation – The process is reversed for the agent’s response, translating their speech into the customer’s language and delivering the translated audio to the customer.
  • Seamless integration – The V2V translation sample project integrates with Amazon Connect, enabling agents to handle customer interactions in multiple languages without any additional effort or training, using the Amazon Connect Streams JS and Amazon Connect RTC JS libraries.

The prototype can be extended with other AWS AI services to further customize the translation capabilities. It’s open source and ready for customization to meet your specific needs.

The following diagram illustrates the solution architecture.

The following screenshot illustrates a sample agent web application.

The user interface consists of three sections:

  • Contact Control Panel – A softphone client using Amazon Connect
  • Customer Controls – Customer-to-agent interaction controls, including Transcribe Customer Voice, Translate Customer Voice, and Synthesize Customer Voice
  • Agent controls – Agent-to-customer interaction controls, including Transcribe Agent Voice, Translate Agent Voice, and Synthesize Agent Voice

Challenges when implementing near real-time voice translation

The Amazon Connect V2V sample project was designed to minimize the audio processing time from the moment the customer or agent finishes speaking until the translated audio stream is started. However, even with the shortest audio processing time, the user experience still doesn’t match the experience of a real conversation when both are speaking the same language. This is due to the specific pattern of the customer only hearing the agent’s translated speech, and the agent only hearing the customer’s translated speech. The following diagram displays that pattern.

The example workflow consists of the following steps:

  1. The customer starts speaking in their own language, and speaks for 10 seconds.
  2. Because the agent only hears the customer’s translated speech, the agent first hears 10 seconds of silence.
  3. When customer finishes speaking, the audio processing time takes 1–2 seconds, during which time both the customer and agent hear silence.
  4. The customer’s translated speech is streamed to the agent. During that time, the customer hears silence.
  5. When the customer’s translated speech playback is complete, the agent starts speaking, and speaks for 10 seconds.
  6. Because customer only hears the agent’s translated speech, the customer hears 10 seconds of silence.
  7. When the agent finishes speaking, the audio processing time takes 1–2 seconds, during which time both the customer and agent hear silence.
  8. The agent’s translated speech is streamed to the agent. During that time, the agent hears silence.

In this scenario, the customer hears a single block of 22–24 seconds of a complete silence, from the moment they finished speaking until they hear the agent’s translated voice. This creates a suboptimal experience, because the customer might not be certain what is happening during these 22–24 seconds—for instance, if the agent was able to hear them, or if there was a technical issue.

Audio streaming add-ons

In a face-to-face conversation scenario between two people that don’t speak the same language, they might have another person as a translator or interpreter. An example workflow consists of the following steps:

  1. Person A speaks in their own language, which is heard by Person B and the translator.
  2. The translator translates what Person A said to Person B’s language. The translation is heard by Person B and Person A.

Essentially, Person A and Person B hear each other speaking their own language, and they also hear the translation (from the translator). There’s no waiting in silence, which is even more important in non-face-to-face conversations (such as contact center interactions).

To optimize the customer/agent experience, the Amazon Connect V2V sample project implements audio streaming add-ons to simulate a more natural conversation experience. The following diagram illustrates an example workflow.

The workflow consists of the following steps:

  1. The customer starts speaking in their own language, and speaks for 10 seconds.
  2. The agent hears the customer’s original voice, at a lower volume (“Stream Customer Mic to Agent” enabled).
  3. When the customer finishes speaking, the audio processing time takes 1–2 seconds. During that time, the customer and agent hear subtle audio feedback—contact center background noise—at a very low volume (“Audio Feedback” enabled).
  4. The customer’s translated speech is then streamed to the agent. During that time, the customer hears their translated speech, at a lower volume (“Stream Customer Translation to Customer” enabled).
  5. When the customer’s translated speech playback is complete, the agent starts speaking, and speaks for 10 seconds.
  6. The customer hears the agent’s original voice, at a lower volume (“Stream Agent Mic to Customer” enabled).
  7. When the agent finishes speaking, the audio processing time takes 1–2 seconds. During that time, the customer and agent hear subtle audio feedback—contact center background noise—at a very low volume (“Audio Feedback” enabled).
  8. The agent’s translated speech is then streamed to the agent. During that time, the agent hears their translated speech, at a lower volume (“Stream Agent Translation to Agent” enabled).

In this scenario, the customer hears two short blocks (1–2 seconds) of subtle audio feedback, instead of a single block of 22–24 seconds of complete silence. This pattern is much closer to a face-to-face conversation that includes a translator.

The audio streaming add-ons provide additional benefits, including:

  • Voice characteristics – In cases when the agent and customer only hear their translated and synthesized speech, the actual voice characteristics are lost. For instance, the agent can’t hear if the customer was talking slow or fast, if the customer was upset or calm, and so on. The translated and synthesized speech doesn’t carry over that information.
  • Quality assurance – In cases when call recording is enabled, only the customer’s original voice and the agent’s synthesized speech are recorded, because the translation and the synthetization are done on the agent (client) side. This makes it difficult for QA teams to properly evaluate and audit the conversations, including the many silent blocks within it. Instead, when the audio streaming add-ons are enabled, there are no silent blocks, and the QA team can hear the agent’s original voice, the customer’s original voice, and their respective translated and synthesized speech, all in a single audio file.
  • Transcription and translation accuracy – Having both the original and translated speech available in the call recording makes it straightforward to detect specific words that would improve transcription accuracy (by using Amazon Transcribe custom vocabularies) or translation accuracy (using Amazon Translate custom terminologies), to make sure that your brand names, character names, model names, and other unique content are transcribed and translated to the desired result.

Get started with Amazon Connect V2V

Ready to transform your contact center’s communication? Our Amazon Connect V2V sample project is now available on GitHub. We invite you to explore, deploy, and experiment with this powerful prototype. You can it as a foundation for developing innovative multi-lingual communication solutions in your own contact center, through the following key steps:

  1. Clone the GitHub repository.
  2. Test different configurations for audio streaming add-ons.
  3. Review the sample project’s limitations in the README.
  4. Develop your implementation strategy:
    1. Implement robust security and compliance controls that meet your organization’s standards.
    2. Collaborate with your customer experience team to define your specific use case requirements.
    3. Balance between automation and the agent’s manual controls (for example, use an Amazon Connect contact flow to automatically set contact attributes for preferred languages and audio streaming add-ons).
    4. Use your preferred transcribe, translate, and text-to-speech engines, based on specific language support requirements and business, legal, and regional preferences.
    5. Plan a phased rollout, starting with a pilot group, then iteratively optimize your transcription custom vocabularies and translation custom terminologies.

Conclusion

The Amazon Connect V2V sample project demonstrates how Amazon Connect and advanced AWS AI services can break down language barriers, enhance operational flexibility, and reduce support costs. Get started now and revolutionize how your contact center communicates across language barriers!


About the Authors

Milos Cosic is a Principal Solutions Architect at AWS.

EJ Ferrell is a Senior Solutions Architect at AWS.

Adam El Tanbouli is a Technical Program Manager for Prototyping and Support Services at DXC Modern Workplace.

Read More

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

Generative AI is revolutionizing enterprise automation, enabling AI systems to understand context, make decisions, and act independently. Generative AI foundation models (FMs), with their ability to understand context and make decisions, are becoming powerful partners in solving sophisticated business problems. At AWS, we’re using the power of models in Amazon Bedrock to drive automation of complex processes that have traditionally been challenging to streamline.

In this post, we focus on one such complex workflow: document processing. This serves as an example of how generative AI can streamline operations that involve diverse data types and formats.

Challenges with document processing

Document processing often involves handling three main categories of documents:

  • Structured – For example, forms with fixed fields
  • Semi-structured – Documents that have a predictable set of information but might vary in layout or presentation
  • Unstructured – For example, paragraphs of text or notes

Traditionally, processing these varied document types has been a pain point for many organizations. Rule-based systems or specialized machine learning (ML) models often struggle with the variability of real-world documents, especially when dealing with semi-structured and unstructured data.

We demonstrate how generative AI along with external tool use offers a more flexible and adaptable solution to this challenge. Through a practical use case of processing a patient health package at a doctor’s office, you will see how this technology can extract and synthesize information from all three document types, potentially improving data accuracy and operational efficiency.

Solution overview

This intelligent document processing solution uses Amazon Bedrock FMs to orchestrate a sophisticated workflow for handling multi-page healthcare documents with mixed content types. The solution uses the FM’s tool use capabilities, accessed through the Amazon Bedrock Converse API. This enables the FMs to not just process text, but to actively engage with various external tools and APIs to perform complex document analysis tasks.

The solution employs a strategic multi-model approach, optimizing for both performance and cost by selecting the most appropriate model for each task:

  • Anthropic’s Claude 3 Haiku – Serves as the workflow orchestrator due to its low latency and cost-effectiveness. This model’s strong reasoning and tool use abilities make it ideal for the following:

    • Coordinating the overall document processing pipeline

    • Making routing decisions for different document types

    • Invoking appropriate processing functions

    • Managing the workflow state

  • Anthropic’s Claude 3.5 Sonnet (v2) – Used for its advanced reasoning capabilities, notably strong visual processing abilities, particularly excelling at interpreting charts and graphs. Its key strengths include:

    • Interpreting complex document layouts and structure

    • Extracting text from tables and forms

    • Processing medical charts and handwritten notes

    • Converting unstructured visual information into structured data

Through the Amazon Bedrock Converse API’s standardized tool use (function calling) interface, these models can work together seamlessly to invoke document processing functions, call external APIs for data validation, trigger storage operations, and execute content transformation tasks. The API serves as the foundation for this intelligent workflow, providing a unified interface for model communication while maintaining conversation state throughout the processing pipeline. The API’s standardized approach to tool definition and function calling provides consistent interaction patterns across different processing stages. For more details on how tool use works, refer to The complete tool use workflow.

The solution incorporates Amazon Bedrock Guardrails to implement robust content filtering policies and sensitive information detection, making sure that personal health information (PHI) and personally identifiable information (PII) data is appropriately protected through automated detection and masking capabilities while maintaining industry standard compliance throughout the document processing workflow.

Prerequisites

You need the following prerequisites before you can proceed with this solution. For this post, we use the us-west-2 AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.

Use case and dataset

For our example use case, we examine a patient intake process at a healthcare institution. The workflow processes a patient health information package containing three distinct document types:

  • Structured document – A new patient intake form with standardized fields for personal information, medical history, and current symptoms. This form follows a consistent layout with clearly defined fields and check boxes, making it an ideal example of a structured document.
  • Semi-structured document – A health insurance card that contains essential coverage information. Although insurance cards generally contain similar information (policy number, group ID, coverage dates), they come from different providers with varying layouts and formats, showing the semi-structured nature of these documents.
  • Unstructured document – A handwritten doctor’s note from an initial consultation, containing free-form observations, preliminary diagnoses, and treatment recommendations. This represents the most challenging category of unstructured documents, where information isn’t confined to any predetermined format or structure.

The example document can be downloaded from the following GitHub repo.

This healthcare use case is particularly relevant because it encompasses common challenges in document processing: the need for high accuracy, compliance with healthcare data privacy requirements, and the ability to handle multiple document formats within a single workflow. The variety of documents in this patient package demonstrates how a modern intelligent document processing solution must be flexible enough to handle different levels of document structure while maintaining consistency and accuracy in data extraction.

The following diagram illustrates the solution workflow.

IDP flow using external tool claling

This self-orchestrated workflow demonstrates how modern generative AI solutions can balance capability, performance, and cost-effectiveness in transforming traditional document processing workflows in healthcare settings.

Deploy the solution

  1. Create an Amazon SageMaker domain. For instructions, see Use quick setup for Amazon SageMaker AI.
  2. Launch SageMaker Studio, then create and launch a JupyterLab space. For instructions, see Create a space.
  3. Create a guardrail. Focus on adding sensitive information filters that would mask PII or PHI.
  4. Clone the code from the GitHub repository:

    git clone https://github.com/aws-samples/anthropic-on-aws.git
  5. Change the directory to the root of the cloned repository:

    cd medical-idp
  6. Install dependencies:

    pip install -r requirements.txt
  7. Update setup.sh with the guardrail ID you created in Step 3. Then set the ENV variable:

    source setup.sh
  8. Finally, start the Streamlit application:

    streamlit run streamlit_app.py

Now you’re ready to explore the intelligent document processing workflow using Amazon Bedrock.

Technical implementation

The solution is built around the Amazon Bedrock Converse API and tool use framework, with Anthropic’s Claude 3 Haiku serving as the primary orchestrator. When a document is uploaded through the Streamlit interface, Haiku analyzes the request and determines the sequence of tools needed by consulting the tool definitions in ToolConfig. These definitions include tools for the following:

  • Document processing pipeline – Handles initial PDF processing and classification
  • Document notes processing – Extracts information from medical notes
  • New patient information processing – Processes patient intake forms
  • Insurance form processing – Handles insurance card information

The following code is an example tool definition for extracting consultation notes. Here, extract_consultation_notes represents the name of the function that the orchestration workflow will call, and document_paths defines the schema of the input parameter that will be passed to the function. The FM will contextually extract the information from the document and pass to the method. A similar toolspec will be defined for each step. Refer to the GitHub repo for the full toolspec definition.

{
            "toolSpec": {
                "name": "extract_consultation_notes",
                "description": "Extract diagnostics information from a doctor's consultation notes. Along with the extraction include the full transcript in a <transcript> node",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "document_paths": {
                                "type": "array",
                                "items": {"type": "string"},
                                "description": "Paths to the files that were classified as DOC_NOTES"
                            }
                        },
                        "required": ["document_paths"]
                    }
                }
            }
        }

When a PDF document is uploaded through the Streamlit interface, it is temporarily stored and passed to the FileProcessor class along with the tool specification and a user prompt:

prompt = ("1. Extract 2. save and 3. summarize the information from the patient information package located at " + tmp_file + ". " +
                          "The package might contain various types of documents including insurance cards. Extract and save information from all documents provided. "
                          "Perform any preprocessing or classification of the file provided prior to the extraction." + 
                          "Set the enable_guardrails parameter to " + str(enable_guardrails) + ". " + 
                          "At the end, list all the tools that you had access to. Give an explantion on why each tool was used and if you are not using a tool, explain why it was not used as well" + 
                          "Think step by step.")
                processor.process_file(prompt=prompt, 
toolspecs=toolspecs,
...

The BedrockUtils class manages the conversation with Anthropic’s Claude 3 Haiku through the Amazon Bedrock Converse API. It maintains the conversation state and handles the tool use workflow:

# From bedrockutility.py
def invoke_bedrock(self, message_list, system_message=[], tool_list=[],
                  temperature=0, maxTokens=2048, guardrail_config=None):
    response = self.bedrock.converse(
        modelId=self.model_id,
        messages=message_list,
        system=system_message,
        inferenceConfig={
            "maxTokens": maxTokens,
            "temperature": temperature
        },
        **({"toolConfig": {"tools": tool_list}} if tool_list else {})
    )

When the processor receives a document, it initiates a conversation loop with Anthropic’s Claude 3 Haiku, which analyzes the document and determines which tools to use based on the content. The model acts as an intelligent orchestrator, making decisions about the following:

  • Which document processing tools to invoke
  • The sequence of processing steps
  • How to handle different document types within the same package
  • When to summarize and complete the processing

This orchestration is managed through a continuous conversation loop that processes tool requests and their results until the entire document package has been processed.

The first key decision in the workflow is initiating the document classification process. Through the DocumentClassifier class, the solution uses Anthropic’s Claude 3.5 Sonnet to analyze and categorize each page of the uploaded document into three main types: intake forms, insurance cards, and doctor’s notes:

# from document_classifier.py
class DocumentClassifier:
    def __init__(self, file_handler):
        self.sonnet_3_5_bedrock_utils = BedrockUtils(
            model_id=ModelIDs.anthropic_claude_3_5_sonnet
        )
        
    def categorize_document(self, file_paths):
        # Convert documents to binary format for model processing
        binary_data_array = []
        for file_path in file_paths:
            binary_data, media_type = self.file_handler.get_binary_for_file(file_path)
            binary_data_array.append((binary_data[0], media_type))

        # Prepare message for classification
        message_content = [
            {"image": {"format": media_type, "source": {"bytes": data}}}
            for data, media_type in binary_data_array
        ]
        
        # Create classification request
        message_list = [{
            "role": 'user',
            "content": [
                *message_content,
                {"text": "What types of document is in this image?"}
            ]
        }]
        
        # Define system message for classification
        system_message = [{
            "text": '''You are a medical document processing agent. 
                      Categorize images as: INTAKE_FORM, INSURANCE_CARD, or DOC_NOTES'''
        }]
        
        # Get classification from model
        response = self.sonnet_3_5_bedrock_utils.invoke_bedrock(
            message_list=message_list,
            system_message=system_message
        )
        return [response['output']['message']]

Based on the classification results, the FM determines the next tool to be invoked. The tool’s description and input schema define exactly what information needs to be extracted. Following the previous example, let’s assume the next page to be processed is a consultation note. The workflow will invoke the extract_consultation_notes function. This function processes documents to extract detailed medical information. Like the classification process discussed earlier, it first converts the documents to binary format suitable for model processing. The key to accurate extraction lies in how the images and system message are combined:

def extract_info(self, file_paths):
    # Convert documents to binary data
    # This will follow the same pattern to as in the classification function
    message_content = [
        {"image": {"format": media_type, "source": {"bytes": data}}}
        for data, media_type in binary_data_array
    ]

    message_list = [{
        "role": 'user',
        "content": [
            *message_content,  # Include the processed document images
            {"text": '''Extract all information from this file
                       If you find a visualization
                           - Provide a detailed description in natural language
                           - Use domain specific language for the description
                    '''}
        ]
    }]
    
    system_message = [{
        "text": '''You are a medical consultation agent with expertise in diagnosing and treating various health conditions.
                   You have a deep understanding of human anatomy, physiology, and medical knowledge across different specialties.
                   During the consultation, you review the patient's medical records, test results, and documentation provided.
                   You analyze this information objectively and make associations between the data and potential diagnoses.
Associate a confidence score to each extracted information. This should reflect how confident the model in the extracted value matched the requested entity.
        '''}
    ]
    
    response = self.bedrock_utils.invoke_bedrock(
        message_list=message_list,
        system_message=system_message
    )
    return [response['output']['message']]

The system message serves three crucial purposes:

  • Establish medical domain expertise for accurate interpretation.
  • Provide guidelines for handling different types of information (text and visualizations).
  • Provide a self-scored confidence. Although this is not an independent grading mechanism, the score is directionally indicative of how confident the model is in its own extraction.

Following the same pattern, the FM will use the other tools in the toolspec definition to save and summarize the results.

A unique advantage of using a multi-modal FM for the extraction task is its ability to have a deep understanding of the text it is extracting. For example, the following code is an abstract of the data schema we are requesting as input to the save_consultation_notes function. Refer to the code in constants.py for full definition. The model needs to not only extract a transcript, but also understand it to extract such structured data from an unstructured document. This significantly reduces the postprocessing efforts required for the data to be consumed by a downstream application.

"consultation": {
                            "type": "object",
                            "properties": {
                            "date": {"type": "string"},
                            "concern": {
                                "type": "object",
                                "properties": {
                                    "primaryComplaint": {
                                        "type": "string",
                                        "description": "Primary medical complaint of the patient. Only capture the medical condition. no timelines"
                                    },
                                    "duration": {"type": "number"},
                                    "durationUnit": {"type": "string", "enum": ["days", "weeks", "months", "years"]},
                                    "associatedSymptoms": {
                                        "type": "object",
                                        "additionalProperties": {
                                            "type": "boolean"
                                        },
                                        "description": "Key-value pairs of symptoms and their presence (true) or absence (false)"
                                    },
                                    "absentSymptoms": {
                                        "type": "array",
                                        "items": {"type": "string"}
                                    }
                                },
                                "required": ["primaryComplaint", "duration", "durationUnit"]
                            }

The documents contain a treasure trove of personally identifiable information (PII) and personal health information (PIH). To redact this information, you can pass enable_guardrails as true. This will use the guardrail you setup earlier as part of the information extraction process and mask information identified as PII or PIH.

processor.process_file(prompt=prompt, 
                                        enable_guardrails=True,
                                        toolspecs=toolspecs,
      …
)

Finally, cross-document validation is crucial for maintaining data accuracy and compliance in healthcare settings. Although the current implementation performs basic consistency checks through the summary prompt, organizations can extend the framework by implementing a dedicated validation tool that integrates with their specific business rules and compliance requirements. Such a tool could perform sophisticated validation logic like insurance policy verification, appointment date consistency checks, or any other domain-specific validation requirements, providing complete data integrity across the document package.

Future considerations

As Amazon Bedrock continues to evolve, several powerful features can be integrated into this document processing workflow to enhance its enterprise readiness, performance, and cost-efficiency. Let’s explore how these advanced capabilities can take this solution to the next level:

  • Inference profiles in Amazon Bedrock define a model and its associated Regions for routing invocation requests, enabling various tasks such as usage tracking, cost monitoring, and cross-Region inference. These profiles help users track metrics through Amazon CloudWatch logs, monitor costs with cost allocation tags, and increase throughput by distributing requests across multiple Regions.
  • Prompt caching can help when you have workloads with long and repeated contexts that are frequently reused for multiple queries. Instead of reprocessing the entire context for each document, the workflow can reuse cached prompts, which is particularly beneficial when using the same image across different tooling workflows. With support for multiple cache checkpoints, this feature can substantially reduce processing time and inference costs while maintaining the workflow’s intelligent orchestration capabilities.
  •  Intelligent prompt routing can dynamically select the most appropriate model for each task based on performance and cost requirements. Rather than explicitly assigning Anthropic’s Claude 3 Haiku for orchestration and Anthropic’s Claude 3.5 Sonnet for document analysis, the workflow can use intelligent routing to automatically choose the optimal model within the Anthropic family for each request. This approach simplifies model management while providing cost-effective processing of different document types, from simple structured forms to complex handwritten notes, all through a single endpoint.

Conclusion

This intelligent document processing solution demonstrates the power of combining Amazon Bedrock FMs with tool use capabilities to create sophisticated, self-orchestrating workflows. By using Anthropic’s Claude 3 Haiku for orchestration and Anthropic’s Claude 3.5 Sonnet for complex visual tasks, the solution effectively handles structured, semi-structured, and unstructured documents while maintaining high accuracy and compliance standards.

Key benefits of this approach include:

  • Reduced manual processing through intelligent automation
  • Improved accuracy through specialized model selection
  • Built-in compliance with guardrails for sensitive data
  • Flexible architecture that adapts to various document types
  • Cost-effective processing through strategic model usage

As organizations continue to digitize their operations, solutions like this showcase how generative AI can transform traditional document processing workflows. The combination of powerful FMs in Amazon Bedrock and the tool use framework provides a robust foundation for building intelligent, scalable document processing solutions across industries.

For more information about Amazon Bedrock and its capabilities, visit the Amazon Bedrock User Guide.


About the Author

Raju Rangan is a Senior Solutions Architect at AWS. He works with government-sponsored entities, helping them build AI/ML solutions using AWS. When not tinkering with cloud solutions, you’ll catch him hanging out with family or smashing birdies in a lively game of badminton with friends.

Read More