Automate derivative confirms processing using AWS AI services for the capital markets industry

Automate derivative confirms processing using AWS AI services for the capital markets industry

Capital markets operation teams face numerous challenges throughout the post-trade lifecycle, including delays in trade settlements, booking errors, and inaccurate regulatory reporting. For derivative trades, it’s even more challenging. The timely settlement of derivative trades is an onerous task. This is because trades involve different counterparties and there is a high degree of variation among documents containing commercial terms (such as trade date, value date, and counterparties). We commonly see the application of screen scrapping solutions with OCR in capital market organizations. These applications come with the drawback of being inflexible and high-maintenance.

Artificial intelligence and machine learning (AI/ML) technologies can assist capital market organizations overcome these challenges. Intelligent document processing (IDP) applies AI/ML techniques to automate data extraction from documents. Using IDP can reduce or eliminate the requirement for time-consuming human reviews. IDP has the power to transform the way capital market back-office operations work. It has the potential to boost employee efficiency, enhance cash flow by speeding up trade settlements, and minimize operational and regulatory risks.

In this post, we show how you can automate and intelligently process derivative confirms at scale using AWS AI services. The solution combines Amazon Textract, a fully managed ML service to effortlessly extract text, handwriting, and data from scanned documents, and AWS Serverless technologies, a suite of fully managed event-driven services for running code, managing data, and integrating applications, all without managing servers.

Solution overview

The lifecycle of a derivative trade involves multiple phases, from trade research to execution, to clearing and settlement. The solution showcased in this post focuses on the trade clearing and settlement phase of the derivative trade lifecycle. During this phase, counterparties to the trade and their agents determine and verify the exact commercial terms of the transaction and prepare for settlement.

The following figure shows a sample derivative confirms the document.

Sample Derivative Confirmation document with commercial terms

We built the solution using the event-driven principles as depicted in the following diagram. The derivative confirmation documents received from customers are stored in Amazon Simple Storage Service (Amazon S3). An event notification on S3 object upload completion places a message in an Amazon Simple Queue Service (Amazon SQS) queue to invoke an AWS Lambda function. The function invokes the Amazon Textract API and performs a fuzzy match using the document schema mappings stored in Amazon DynamoDB. A web-based human-in-the-loop UI is built for reviewing the document processing pipeline and updating schemas to train services for new formats. The web UI uses Amazon Cognito for authentication and access control.

The Solution architecture diagram demonstrating the integration of various AWS services and a process flow

The process flow includes the following steps:

  1. The user or business application uploads an image or PDF to the designated S3 bucket.
  2. An event notification on S3 object upload completion places a message in an SQS queue.
  3. An event on message receipt invokes a Lambda function that in turn invokes the Amazon Textract StartDocumentAnalysis API for information extraction.
    1. This call starts an asynchronous analysis of the document for detecting items within the document such as key-value pairs, tables, and forms.
    2. The call also returns the ID of the asynchronous job, and saves the job ID and Amazon S3 document key to a DynamoDB table.
  4. Upon job completion, Amazon Textract sends a message to an Amazon Simple Notification Service (Amazon SNS) topic and places the resultant JSON in the designated S3 bucket for classification analysis.
  5. A Lambda function receives the Amazon SQS payload and performs fuzzy match using Sorenson-Dice analysis between the Amazon Textract JSON results and DynamoDB document configuration mappings. The Sorenson-Dice analysis step compares the two texts and computes a number between 0–1, where the former indicates no match at all and the latter an exact match.
  6. Upon analysis completion, a Lambda function writes a merged and cleansed JSON result to the original S3 bucket and inserts the analysis results back into the DynamoDB table.
  7. Amazon API Gateway endpoints facilitate the interaction with the web-based UI.
  8. The human-in-the-loop UI application provides a human-in-the-loop function to analyze the document processing pipeline and intervene as needed to update the document configuration mappings.

A human-in the-loop process was applied to visually compare the reconciled results with their locations in the input documents. End-users can verify the accuracy of the results and either accept or reject the findings. When new counterparties and formats are introduced, ML learning helps the users create new schema mappings in the human-in-the-loop UI for further processing.

What is human-in-the-loop?

A human-in-the-loop process combines supervised ML with human involvement in training and testing an algorithm. This practice of uniting human and machine intelligence creates an iterative feedback loop that allows the algorithm to produce better results.

You can apply human-in-the-loop to all types of deep learning AI projects, including natural language processing (NLP), computer vision, and transcription. Additionally, you can use human-in-the-loop in conjunction with AI content moderation systems to quickly and effectively analyze user-generated content. We refer this to as human-in-the-loop decision-making, where content is flagged by the AI and human moderators review what has been flagged.

The harmonious relationship between people and AI has several benefits, including:

  • Accuracy – In the context of document processing, there are limitations to how much of the analysis can be automated. AI can miss content that should be flagged (a false positive), and they can also incorrectly flag content that may be harmless (a false negative). Humans are essential in the content moderation process because they can interpret things such as context and multilingual text.
  • Increased efficiency – Machine intelligence can save significant time and cost by sifting through and trimming down large amounts of data. The task can then be passed on to humans to complete a final sort. Although you can’t automate the entirety of the process, you can automate a significant portion, saving time.

Looking forward: The art of the possible

Amazon Textract is an AWS service that uses ML to automatically extract text, handwriting, and data from any document.

Amazon Textract can extract information from a large variety of documents, including scanned paper records, forms, IDs, invoices, reports, certificates, legal documents, letters, bank statements, tables, handwritten notes, and more. Supported formats include common file types like PNG, JPEG, PDF, and TIFF. For formats like Word or Excel, you can convert them into images before sending them to Amazon Textract. The content is extracted within seconds and then indexed for search through a simple-to-use API.

The Queries feature within the Amazon Textract Analyze Document API provides you the flexibility to specify the data you need to extract from documents. Queries extract information from a variety of documents, like paystubs, vaccination cards, mortgage notes, and insurance cards. You don’t need to know the data structure in the document (table, form, nested data) or worry about variations across document versions and formats. The flexibility that Queries provides reduces the need to implement postprocessing and reliance on manual review of extracted data.

Conclusion

The automation of derivatives confirmation boosts the capacity of the operations team by saving processing time. In this post, we showcased common challenges in derivatives confirms processing and how can you use AWS intelligent document processing services to overcome them. The big part of capital markets’ back-office operations involves documents processing. The approach showed in this post sets a pattern for many back-office documents processing use cases, benefiting the capital markets industry in reducing costs and enhancing staff productivity.

We recommend a thorough review of Security in Amazon Textract and strict adherence to the guidelines provided. To learn more about the pricing of the solution, review the pricing details of Amazon Textract, Lambda, and Amazon S3.


“Using Amazon Textract and Serverless services, we have been able to build an end-to-end digital workflow for derivatives processing. We are expecting straight-through processing rates to increase to over 90%, reducing operational risks and costs associated with manual interventions. This automation provides the resilience and flexibility required to adapt to evolving market structures like T+1 settlement timeframes.”

– Stephen Kim, CIO, Head of Corporate Technology, Jefferies


About the Authors

Vipul Parekh, is a senior customer solutions manager at AWS guiding our Capital Markets customers in accelerating their business transformation journey on Cloud. He is a GenAI ambassador and a member of AWS AI/ML technical field community. Prior to AWS, Vipul played various roles at the top investment banks, leading transformations spanning from front office to back-office, and regulatory compliance areas.

Raj Talasila, is a senior technical program manager at AWS. He comes to AWS with 30+ years of experience in the Financial Services, Media and Entertainment, and CPG.

Saby Sahoo, is a senior solutions architect at AWS. Saby has 20+ years of experience in the field of design and implementation of IT Solutions, Data Analytics, and AI/ML/GenAI.

Sovik Kumar Nath is an AI/ML solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. Sovik has published articles and holds a patent in ML model monitoring. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

Read More

AI-powered assistants for investment research with multi-modal data: An application of Agents for Amazon Bedrock

AI-powered assistants for investment research with multi-modal data: An application of Agents for Amazon Bedrock

This post is a follow-up to Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets. This blog is part of the series, Generative AI and AI/ML in Capital Markets and Financial Services.

Financial analysts and research analysts in capital markets distill business insights from financial and non-financial data, such as public filings, earnings call recordings, market research publications, and economic reports, using a variety of tools for data mining. They face many challenges because of the increasing variety of tools and amount of data. They must synthesize massive amounts of data from multiple sources, qualitative and quantitative, to provide insights and recommendations. Analysts need to learn new tools and even some programming languages such as SQL (with different variations). To add to these challenges, they must think critically under time pressure and perform their tasks quickly to keep up with the pace of the market.

Investment research is the cornerstone of successful investing, and involves gathering and analyzing relevant information about potential investment opportunities. Through thorough research, analysts come up with a hypothesis, test the hypothesis with data, and understand the effect before portfolio managers make decisions on investments as well as mitigate risks associated with their investments. Artificial intelligence (AI)-powered assistants can boost the productivity of a financial analysts, research analysts, and quantitative trading in capital markets by automating many of the tasks, freeing them to focus on high-value creative work. AI-powered assistants can amplify an analyst’s productivity by searching for relevant information in the customer’s own database as well as online, conducting qualitative and quantitative analysis on structured and unstructured data, enabling analysts to work faster and with greater accuracy.

In this post, we introduce a solution using Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock that can help financial analysts use various data sources of multifaceted financial data (text, audio, and databases) and various tools (detect phrases, portfolio optimization, sentiment analysis, and stock query) to gather financial insights. The interaction shows how AI-powered assistants recognize and plan based on user’s prompts, come up with steps to retrieve context from data stores, and pass through various tools and LLM to arrive at a response.

AI-powered assistants for investment research

So, what are AI-powered assistants? AI-powered assistants are advanced AI systems, powered by generative AI and large language models (LLMs), which use AI technologies to understand goals from natural language prompts, create plans and tasks, complete these tasks, and orchestrate the results from the tasks to reach the goal. Generative AI agents, which form the backbone of AI-powered assistants, can orchestrate interactions between foundation models, data sources, software applications, and users. As AI technology advances, the abilities of generative AI agents are expected to grow, providing more opportunities to gain a competitive advantage.

Leading this evolution is Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon using a single API, along with a broad set of capabilities to build and scale generative AI applications with security, privacy, and responsible AI.

You can now use Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock to build specialized agents and AI-powered assistants that run actions based on natural language input prompts and your organization’s data. These managed agents act as intelligent orchestrators, coordinating interactions between foundation models, API integrations, user questions and instructions, and knowledge sources loaded with your proprietary data. At runtime, the agent intelligently handles and orchestrates the user inputs throughout a dynamic number of steps.

The following video demonstrates an AI-powered assistant in Agents for Amazon Bedrock in action.

Solution overview

A key component of an AI-powered assistant is Agents for Amazon Bedrock. An agent consists of the following components:

  • Foundation model – The agent invokes an FM to interpret user input, generate subsequent prompts in its orchestration process, and generate responses.
  • Instructions – Instructions telling the agent what it’s designed to do and how to do it.
  • Action groups – Action groups are interfaces that an agent uses to interact with the different underlying components such as APIs and databases. An agent uses action groups to carry out actions, such as making an API call to another tool.
  • Knowledge base – The knowledge base is a link to an existing knowledge base, consisting of customer’s documents (such as PDF files and text files) that allows the agent to query for extra context for the prompts.

Both the action groups and knowledge base are optional and not required for the agent itself.

In this post, an AI-powered assistant for investment research can use both structured and unstructured data for providing context to the LLM using a Retrieval Augmented Generation (RAG) architecture, as illustrated in the following diagram.

For the AI-powered assistant, the following the action groups are associated:

  • Detect-phrases – Useful for when you need to detect key phrases in financial reports
  • Portfolio-optimization – Useful for when you need to build an optimal allocation portfolio from a list of stock symbols using python functions
  • Sentiment-analysis – Useful for when you need to analyze the sentiment of an excerpt from a financial report
  • Stock-query – Useful for when you need to answer any question about historical stock prices

Depending on the prompts, the AI-powered assistant for investment research uses different types of structured and unstructured data. The agent can find insights from different modalities of financial data:

  • Unstructured data – This includes annual 10K and quarterly 10Q earnings reports, which are converted into vectors using Amazon Titan Embeddings models and stored as vectors in an Amazon OpenSearch Serverless vector database, all orchestrated using a knowledge base
  • Structured data – This includes tabular stock data, which is stored in Amazon Simple Storage Service (Amazon S3) and queried using Amazon Athena
  • Other data modalities – This includes audio files of quarterly earnings calls, which are converted into unstructured data using Amazon Textract and Amazon Transcribe

When the AI-powered assistant receives a prompt from a business user, it follows a number of steps as part its orchestration:

  1. Break down the prompt into a number of steps using an LLM within Amazon Bedrock.
  2. Follow chain-of-thought reasoning and instructions, and complete the steps using appropriate action groups.
  3. As part of the process, depending on the prompt, search and identify relevant context for RAG.
  4. Pass the results with the prompt to an LLM within Amazon Bedrock.
  5. Generate the final response and respond to the user in English with relevant data.

The following diagram illustrates this workflow.

Technical architecture and key steps

The multi-modal agent orchestrates various steps based on natural language prompts from business users to generate insights. For unstructured data, the agent uses AWS Lambda functions with AI services such as Amazon Comprehend for natural language processing (NLP). For structured data, the agent uses the SQL Connector and SQLAlchemy to analyze the database through Athena. The agent also uses the selected LLM for computations and quantitative modeling, and the context session equips the agent with conversation history. The multi-modal agent is implemented using Agents for Amazon Bedrock and coordinates the different actions and knowledge bases based on prompts from business users through the AWS Management Console, although it can also be invoked through the AWS API.

The following diagram illustrates the technical architecture.

The key components of the technical architecture are as follows:

  • Data storage and analytics – The quarterly financial earning recordings as audio files, financial annual reports as PDF files, and S&P stock data as CSV files are hosted on Amazon S3. Data exploration on stock data is done using Athena.
  • Large language models – The LLMs available to be used by Agents for Amazon Bedrock are Anthropic Claude Instant v1, v2.0, and v2.1.
  • Agents – We use Agents for Amazon Bedrock to build and configure autonomous agents. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. Depending on the user input, the agent decides the action or knowledge base to call to answer the question. We created the following purpose-built agent actions using Lambda and Agents for Amazon Bedrock for our scenario:
    • Stocks querying – To query S&P stocks data using Athena and SQLAlchemy.
    • Portfolio optimization – To build a portfolio based on the chosen stocks.
    • Sentiment analysis – To identify and score sentiments on a topic using Amazon Comprehend.
    • Detect phrases – To find key phrases in recent quarterly reports using Amazon Comprehend.
  • Knowledge base – To search for financial earnings information stored in multi-page PDF files, we use a knowledge base (using an OpenSearch Serverless vector store).

To dive deeper into the solution and code for all the steps, see the GitHub repo.

Benefits and lessons learned in migrating from LangChain agents to Agents for Amazon Bedrock

Agents for Amazon Bedrock and LangChain agents both use an LLM to interpret user input and prompts in their orchestration processes. The LLM acts as a reasoning engine to determine next actions. Agents for Amazon Bedrock offers several benefits when implementing an agent-based solution.

  • Serverless

Agents for Amazon Bedrock is serverless, meaning you can build agents without managing any infrastructure.

  • Conversation history and session management

By default, LangChain agents are stateless, meaning they don’t remember previous interactions or keep history of the conversation. It supports either a simple memory system that recalls the most recent conversations or complex memory structures that analyze historical messages to return the most relevant results. In our previous post, we deployed a persistent storage solution using Amazon DynamoDB.

Agents for Amazon Bedrock provides a short-term memory for conversations by default, allowing the user to interact with the agent continuously during the session.

  • RAG support

Knowledge Bases for Amazon Bedrock provides an out-of-the-box RAG solution. It enables a faster time-to-market by abstracting the heavy lifting of building a pipeline and offers a persistent solution for keeping large data as vector embeddings in vector databases, thereby reducing latency to RAG systems.

A knowledge base simplifies the setup and implementation of RAG by automating several steps in this process:

  • Preprocessing data – Split the documents into manageable chunks for efficient retrieval. The chunks are then converted to embeddings and written to a vector index while maintaining a mapping to the original document.
  • Runtime processing – Embed user queries into vectors. Compare vector embeddings of user queries and document chunks to find semantically similar matches. Augment user prompts with context from matched chunks.

Knowledge Bases for Amazon Bedrock supports popular databases for vector storage, including the vector engine for OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora (coming soon), and MongoDB (coming soon).

  • Compatibility

Most functions (tools) from our previous multi-modal agent can be migrated to Amazon Bedrock using action groups. Action groups define agent actions by providing an OpenAPI schema to define invocable APIs, as well as a Lambda function specifying input and output. Lambda natively supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby code. LangChain’s supported languages do not include PowerShell and Node.js.

  • Simple prompt

A key element to get optimal results in our LangChain agent was using a good and clear prompt. In our previous multi-modal agent, we used the following prompt:

You are a Minimization Solutionist with a set of tools at your disposal.
You would be presented with a problem. First understand the problem and devise a plan to solve the problem.
Please output the plan starting with the header 'Plan:' and then followed by a numbered list of steps.
Ensure the plan has the minimum amount of steps needed to solve the problem. Do not include unnecessary steps.
<instructions>
These are guidance on when to use a tool to solve a task, follow them strictly:
1. For the tool that specifically focuses on stock price data, use "Stock Query Tool".
2......
</instructions>nnAssistant:"""

The prompt provided detailed information to give the agent as much guidance as possible to respond to a question.

With Agents for Amazon Bedrock, we used simple instructions for the agent to obtain the same results. With a shorter prompt (“You are a financial analyst with a set of tools at your disposal”), we were able to answer the same questions with the same quality.

  • Editability of base prompts

Agents for Amazon Bedrock also exposes the four default base prompt templates that are used during the preprocessing, orchestration, knowledge base response generation, and postprocessing. You can optionally edit these base prompt templates to customize your agent’s behavior at each step of its sequence.

  • Traceability

Each response from an Amazon Bedrock agent is accompanied by a trace that details the steps being orchestrated by the agent. The trace provides information about the inputs to the action groups that the agent invokes and the knowledge bases that it queries to respond to the user. In addition, the trace provides information about the outputs that the action groups and knowledge bases return.

  • Security

You can securely connect LLMs to your company data sources using Agents for Amazon Bedrock. With a knowledge base, you can use agents to give LLMs in Amazon Bedrock access to additional data that helps the model generate more relevant, context-specific, and accurate responses without continually retraining the LLM.

Dive deeper into the solution

To dive deeper into the solution and the code shown in this post, see the GitHub repo. The repo contains instructions for the end-to-end solution, including setting up the agents, associated action groups, unstructured data (earnings reports PDF files, earnings call audio files), and structured data (stocks time series).

In the appendix at the end of this post, we share different questions asked by a financial analyst, the agent tools invoked, and the answer from the multi-modal agent.

Clean up

After you run the multi-modal agent, make sure to clean up any resources manually that won’t be used later:

  1. Delete your agent and the knowledge base associated to your agent.
  2. Delete the vector index containing the data embeddings.
  3. Delete the S3 buckets created by AWS CloudFormation and then delete the CloudFormation stack.

Conclusion

The solution presented in this post uses Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock to assist financial analysts in navigating the complexities of multifaceted financial data. By seamlessly integrating various data sources, including text, audio, and databases, this AI-powered assistant can effectively plan and complete tasks based on user prompts—retrieving relevant information, processing it through various tools, and ultimately providing insightful conclusions. Agents for Amazon Bedrock and LangChain agents both use LLMs for interpreting user input and determining subsequent actions, but Agents for Amazon Bedrock offers several advantages, including serverless deployment, built-in stateful conversation history tracking, and out-of-the-box support for RAG with knowledge base integration. Additionally, Amazon Bedrock simplifies the agent development process by enabling migration of existing functions, requiring only basic prompts for optimal results, and providing traceability for agent actions and knowledge base queries. For further insights into generative AI, foundation models, and LLMs, Amazon Bedrock offers a wealth of use cases and resources. To learn more, see Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets, which demonstrates how multi-modal agents use AWS services to efficiently gather financial insights from an organization’s text, speech, and tabular data sources for a financial analyst’s query using LangChain agents as alternative implementation.

Appendix

The following table contains example questions provided by a financial analyst to ask the AI-powered assistant for investment research based on structured and unstructured data.

Question purpose Question asked by financial analyst to AI-powered assistant Action group or knowledge base invoked Intermediate steps from AI-powered assistant chain-of-thought reasoning Answer from AI-powered assistant
Prompt the agent to analyze the company’s financial health and performance. Analyze Amazon’s financial health based on the 10k report. Calculate important financial ratios. Please limit your answer to 5 sentences. Financial-Data-Explorer

Step1: To analyze Amazon’s financial health and calculate important financial ratios based on their 10K report, I will need to search the knowledge base for relevant information from Amazon’s 10K filing. I can then extract key financial data like revenues, expenses, assets, liabilities, etc. and use that to calculate ratios like current ratio, debt-to-equity ratio, return on assets, etc. to assess their financial position and performance.

Step2: The search results provide some key financial data from Amazon’s recent 10K filing, including net sales, operating income, net income, and context around factors impacting their financials like foreign exchange rates and severance costs. To analyze their financial health and calculate ratios, I will need to extract additional data points like current assets, current liabilities, total assets, total liabilities, etc. from the 10K report.
Step3: With the additional balance sheet data from Amazon’s 10K report, I now have the key financial figures needed to calculate some important ratios to analyze their financial health:nnCurrent Ratio = Total Current Assets / Total Current Liabilitiesn= $146.8 billion / $155.4 billion = 0.94nnThis current ratio slightly below 1 indicates Amazon may have a bit of a liquidity crunch in the short-term to cover current obligations.nnDebt-to-Equity Ratio = Total Liabilities / Total Shareholders’ Equity n= ($155.4 billion + $161.2 billion) / ($462.7 billion – $316.6 billion)n= $316.6 billion / $146.1 billion = 2.17nnThe relatively high debt-to-equity ratio over 2 shows Amazon has a highly leveraged capital structure, though this may be expected given their large investments in infrastructure and growth initiatives.nnReturn on Assets = Net Income / Total Assetsn= $9.9 billion / ($146.8 billion + $315.9 billion) n= $9.9 billion / $462.7 billion = 2.1%nnThe low single-digit return on assets indicates Amazon is not very efficient at generating profits from its asset base currently, likely due to heavy investments and spending on growth.

The current ratio of 0.94 indicates Amazon may face some liquidity challenges in covering short-term obligations.[1]

The debt-to-equity ratio of 2.17 shows a highly leveraged capital structure, likely due to major investments in growth initiatives.[2]

The return on assets of 2.1% is relatively low, suggesting Amazon is not very efficient at generating profits from its asset base currently.[3]

Prompt the agent to analyze the company’s financial performance from company’s earnings reports What were the main drivers of AWS’s revenue growth in the Q1 2023 quarter? Please limit your answers to 5 sentences. Financial-Data-Explorer To determine the main drivers of AWS’s revenue growth in Q1 2023, I will need to search the knowledge base for any information specifically related to AWS’s performance and growth factors during that quarter. The main drivers of AWS’s revenue growth of 16% in Q1 2023 were increased customer usage of AWS services, partially offset by pricing changes primarily driven by long-term customer contracts. Higher customer adoption and consumption of AWS offerings fueled the topline growth, though pricing concessions on long-term deals moderated the growth rate.
Prompt the agent to analyze the company’s management team and their plans for the future from company’s earnings reports What are Amazon’s top priorities for 2023? What are the biggest risks facing the company? Please limit your answers to 5 sentences. Financial-Data-Explorer To determine Amazon’s top priorities and biggest risks for 2023, I will search the knowledge base for any relevant information from their recent earnings reports, investor presentations, or other public statements outlining their strategic focus areas and potential risk factors. Amazon’s top priorities for 2023 include mitigating impacts from macroeconomic headwinds, managing foreign exchange volatility, and accurately forecasting customer demand amid economic uncertainty. Some of the biggest risks facing the company are inventory management challenges, rising payments processing costs and fraud risks, reliance on third-party payment providers, and a potential slowdown in enterprise AWS adoption due to customer cost optimization efforts.
Prompt the agent to perform sentiment analysis on the earnings call recordings What is the sentiment around inflation in Amazon’s earnings call?

Sentiment Analysis Tool,

Financial-Data-Explorer

Step 1: To determine the sentiment around inflation expressed in Amazon’s earnings call, I will need to search the knowledge base for any relevant excerpts or commentary from the earnings call transcript related to inflation.

2: The search results provide some relevant excerpts from Amazon’s earnings guidance and commentary related to inflation and macroeconomic factors. To better assess the overall sentiment, I will use the Sentiment Analysis tool on these excerpts.

The sentiment around inflation in Amazon’s earnings call is decidedly negative. Amazon expresses concerns that ongoing inflationary pressures, rising costs, and recessionary fears are driving cautious consumer spending, shifts to lower-priced products, and challenges in forecasting demand. The company expects these macroeconomic factors to continue impacting operations, customer demand, and revenue growth rates across segments like AWS in the coming quarters.
Prompt the agent to build an optimal portfolio using the collected data What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks on how to allocate 10,000 USD? Stocks Querying Tool,
Portfolio Optimization Tool
Step 1: To get the closing prices of the stocks AAAA, WWW, and DDD in 2018, I will use the Stock-Query tool
Step 2: The Stock-Query tool provided the closing prices for the requested stocks in 2018. To build an optimized portfolio allocation with $10,000 using these stocks, I will use the Portfolio-Optimization tool.
Based on the closing prices of stocks AAAA, WWW, and DDD in 2018 provided by the <REDACTED> tool, and the optimized portfolio allocation from the <REDACTED> tool, here is how to allocate $10,000 across these three stocks: With $10,000: – Buy 48 shares of AAAA stock at $172.44 per share, investing $8,415 – Buy 1 share of WWW stock at $85.91 per share, investing $165.80 – Buy 14 shares of DDD stock at $9.82 per share, investing $1,419.20 This optimized portfolio allocation maximizes returns while minimizing risk based on the 2018 stock prices.

About the Authors

Omar Ashton is a Partner Solutions Architect in Colombia. With 17 years of experience in software design and development, IT solution architecture, technology strategy and leadership of development teams.

Sovik Kumar Nath is an AI/ML and GenAI specialist senior solution architect with AWS working with financial services and capital markets customers. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. Sovik has published articles and holds a patent in ML model monitoring. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

Jose Rojas is a Partner Solutions Architect at AWS. He helps Partners to increase productivity, efficiency and revenue by adopting and creating solutions on AWS. Before joining AWS, Jose worked at Cisco Meraki helping customers adopt cloud networking solutions. Outside work, he enjoys traveling with his family, swimming and cycling.

Mohan Musti is a Principal Technical Account Manger based out of Dallas. Mohan helps customers architect and optimize applications on AWS. Mohan has Computer Science and Engineering from JNT University, India. In his spare time, he enjoys spending time with his family and camping.

Jia (Vivian) Li is a Senior Solutions Architect in AWS, with specialization in AI/ML. She currently supports customers in financial industry. Prior to joining AWS in 2022, she had 7 years of experience supporting enterprise customers use AI/ML in the cloud to drive business results. Vivian has a BS from Peking University and a PhD from University of Southern California. In her spare time, she enjoys all the water activities, and hiking in the beautiful mountains in her home state, Colorado.

Uchenna Egbe is an AI/ML and GenAI specialist Solutions Architect who enjoys building reusable AIML solutions. Uchenna has an MS from the University of Alaska Fairbanks. He spends his free time researching about herbs, teas, superfoods, and how to incorporate them into his daily diet.

Read More

Research Focus: Week of June 24, 2024

Research Focus: Week of June 24, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: June 24, 2024

Towards Energy Efficient 5G vRAN Servers

Virtualized radio access networks (vRANs), which run the cellular radio stack on commodity servers instead of specialized hardware, are increasingly used in modern cellular networks (e.g., 5G), owing to advantages such as a multi-vendor ecosystem, easier maintenance, and faster feature upgrades. In a recent paper: Towards Energy Efficient 5G vRAN Servers, researchers from Microsoft and external colleagues present RENC, a system that saves energy by adjusting CPU frequency in response to sub-second variations in cellular workloads, using three techniques. First, despite large fluctuations in vRAN CPU load at sub-ms timescales, RENC establishes safe low-load intervals, e.g., by coupling media access control (MAC) layer rate limiting with CPU frequency changes. This prevents high traffic during low-power operation, which would otherwise hurt performance. Second, they design techniques to compute CPU frequencies that are safe for these low-load intervals, achieved by measuring the slack in vRAN threads’ deadlines using Linux eBPF hooks, or minor binary rewriting of the vRAN software. Third, they demonstrate the need to handle CPU load spikes triggered by control operations, such as new users attaching to the network. Their evaluation in a state-of-the-art vRAN testbed shows that their techniques reduce a vRAN server’s CPU power consumption by up to 45% (29% server-wide).

RENC is purely a research project and there are no current plans to incorporate RENC into a product.


The CoExplorer Technology Probe: A generative AI-powered adaptive interface to support intentionality in planning and running video meetings

Video meetings have enabled a new era of distributed work, but running effective meetings can be challenging. Traditional videoconferencing systems offer little support for reducing the effort of planning and conducting a video meeting. Generative AI has the potential to radically redefine meetings by augmenting intentional meeting behaviors.

In a recent paper: The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video Meetings, researchers from Microsoft present a novel adaptive meeting prototype. It preemptively generates (1) likely phases that meetings would undergo, (2) tools that allow capturing attendees’ thoughts before the meeting, and (3) appropriate files and applications for each phase of the meeting and their window layout. Using CoExplorer as a technology probe in a guided walkthrough, their study findings suggest that generative AI has the potential to keep meetings on track and reduce workload. The researchers present some design implications of their findings, and discuss some concerns, e.g., about users’ agency, trust, and possible disruption to traditional meeting norms.

Microsoft Research Podcast

AI Frontiers: AI for health and the future of research with Peter Lee

Peter Lee, head of Microsoft Research, and Ashley Llorens, AI scientist and engineer, discuss the future of AI research and the potential for GPT-4 as a medical copilot.


Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still insufficient.

In a recent paper: Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs (opens in new tab), accepted for presentation at the Association of Computational Linguistics (ACL) 2024 (opens in new tab) conference, researchers from Microsoft and external colleagues propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game, DejaBoom!, their approach identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.


MAIRA-2: Grounded Radiology Report Generation

Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. In a recent preprint: MAIRA-2: Grounded Radiology Report Generation, researchers from Microsoft extend report generation to include the localization of individual findings on the image – or grounded report generation. Prior work indicates that grounding helps clarify image understanding and interpret AI-generated text. Therefore, grounded reporting should improve the utility and transparency of automated report drafting. 

To enable evaluation of grounded reporting, the researchers propose a novel framework – RadFact – leveraging the reasoning capabilities of LLMs. RadFact (opens in new tab) assesses the factuality of individual generated sentences, as well as correctness of generated spatial localizations, when present. The researchers introduce MAIRA-2, a large multimodal model combining a radiology-specific image encoder with an LLM, which is trained for the new task of grounded report generation on chest x-rays. MAIRA-2 uses more comprehensive inputs than explored previously: the current frontal image, the current lateral image, the prior frontal image and prior report, as well as the Indication, Technique and Comparison sections of the current report. These additions significantly improve report quality and reduce model hallucinations, establishing a new state of the art on findings generation (without grounding) on MIMIC-CXR, while demonstrating the feasibility of grounded reporting as a novel and richer task.

Microsoft Research in the news


Microsoft technology could help store “insane” supply of new data 

BBC | June 11, 2004

Project Silica uses powerful lasers to enable a piece of glass about the size of a DVD to store more than seven terabytes of data, helping to manage the rapidly growing supply.


Microsoft’s secret weapon – research leader Peter Lee 

The JoongAng | June 13, 2004

Peter Lee, president of Microsoft Research, is a leading force in Microsoft’s leap forward in the era of generative AI.

The post Research Focus: Week of June 24, 2024 appeared first on Microsoft Research.

Read More

Thinking Outside the Blox: How Roblox Is Using Generative AI to Enhance User Experiences

Thinking Outside the Blox: How Roblox Is Using Generative AI to Enhance User Experiences

Roblox is a colorful online platform that aims to reimagine the way that people come together — now that vision is being augmented by generative AI. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with Anupam Singh, vice president of AI and growth engineering at Roblox, on how the company is using the technology to enhance virtual experiences with features such as automated chat filters and real-time text translation, which help build inclusivity and user safety. Singh also discusses how generative AI can be used to power coding assistants that help creators focus more on creative expression, rather than spending time manually scripting world-building features.

Time Stamps

1:49: Background on Roblox and user interactions within the platform
6:38: Singh’s insight on AI and machine learning’s role in Roblox’s growth
15:51: Using generative AI to enhance user self-expression
20:04: How generative AI simplifies content creation
24:26: What’s next for Roblox

You Might Also Like:

Media.Monks’ Lewis Smithingham on Enhancing Media and Marketing With AI – Ep. 222

In this episode, Lewis Smithingham, senior vice president of innovation and special operations at Media.Monks, discusses AI’s potential to enhance the media and entertainment industry. Smithingham delves into Media.Monk’s platform for entertainment and speaks to its vision where AI enhances creativity and allows for more personalized, scalable content creation.

The Case for Generative AI in the Legal Field – Ep. 210

AI-driven digital solutions enable law practitioners to search laws and cases intelligently — automating the time-consuming process of drafting and analyzing legal documents. In this episode, Thomson Reuters Chief Product Officer David Wong discusses AI’s potential to help deliver better access to justice.

Anima Anandkumar on Using Generative AI to Tackle Global Challenges – Ep. 203

Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research. Anima Anandkumar, senior director of AI research at NVIDIA, discusses generative AI’s potential to make splashes in the scientific community.

Deepdub’s Ofir Krakowski on Redefining Dubbing from Hollywood to Bollywood – Ep. 202

Deepdub acts as a digital bridge, providing access to content by using generative AI to break down language and cultural barriers in the entertainment landscape. In this episode, Deepdub co-founder and CEO Ofir Krakowski speaks on how AI-driven dubbing helps entertainment companies boost efficiency and increase accessibility.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Cut the Noise: NVIDIA Broadcast Supercharges Livestreaming, Remote Work

Cut the Noise: NVIDIA Broadcast Supercharges Livestreaming, Remote Work

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC users.

AI has changed computing forever. The spotlight has most recently been on generative AI, but AI-accelerated, NVIDIA RTX-powered tools have also been key in improving gaming, content creation and productivity over the years.

The NVIDIA Broadcast app is one example, using Tensor Cores on a local RTX GPU to seamlessly improve audio and video quality. Paired with the NVIDIA encoder (NVENC) built into GeForce RTX and NVIDIA RTX GPUs, the app makes it easy to get started as a livestreamer or to look professional during video conference calls.

The Stream Dream

High-quality livestreaming traditionally required expensive hardware. Many livestreamers relied on software CPU encoding using the x264 software library, which often impacted gameplay quality. This led many to use a dual-PC setup, with one PC focused on gaming and content and the other on encoding the stream. It was complicated to assemble, difficult to troubleshoot and often cost-prohibitive for budding livestreamers.

NVENC is here to help. It’s a dedicated hardware video encoder on NVIDIA GPUs that processes the encoding, freeing up the rest of the system to focus on game and content performance. Industry-leading streaming apps like Open Broadcaster Software (OBS) are adding support for NVENC, paving the way for a new generation of broadcasters on popular platforms like Twitch and YouTube.

Meanwhile, NVIDIA Maxine helps solve the issue of expensive equipment. It includes free, AI-powered features like virtual green screens and webcam-based augmented reality tracking that eliminate the need for special equipment like physical green screens or motion- capture suits. Broadcasters first got to experience the technology at TwitchCon 2019, where they tested OBS live on the show floor with an AI-accelerated green screen on a GeForce RTX 2080 GPU.

Maxine’s AI-powered effects debuted for RTX users in the RTX Voice beta, and moved into the NVIDIA Broadcast app.

Now Showing: NVIDIA Broadcast

NVIDIA Broadcast offers AI-powered features that improve audio and video quality for a variety of use cases. It’s user-friendly, works in any app and is a breeze to set up.

It includes:

  • Noise and Acoustic Echo Removal: AI eliminates unwanted background noise from both the mic and inbound audio at the touch of a button.
  • Virtual Backgrounds: Features like Background Removal, Replacement and Blur help customize backgrounds without the need for expensive equipment or complex lighting setups.
  • Eye Contact: AI helps make it appear as though a streamer is looking directly at the camera, even when they’re glancing off camera or taking notes.
  • Auto Frame: Dynamically tracks movements in real time, automatically cropping and zooming moving objects regardless of their position.
  • Vignette: AI applies a darkening effect to the corners of camera images, providing visual contrast to draw attention to the center of the video and adding stylistic flair.
  • Video Noise Removal: Removes visual noise from low-light situations for a cleaner picture.

NVIDIA Broadcast works by creating a virtual camera, microphone or speaker in Windows so that users can set up their devices once and use them in any broadcasting, video conferencing or voice chat apps, including Discord, Google Meet, Microsoft Teams, OBS Studio, Slack, Webex and Zoom.

Those with an NVIDIA GeForce RTX, TITAN RTX, NVIDIA RTX or Quadro RTX GPU can use their GPU’s dedicated Tensor Cores to help the app’s AI networks run in real time.

The same AI-powered technology in NVIDIA Broadcast is also available to app developers as a software development kit. Audiovisual technology company Elgato includes Maxine’s AI audio noise removal technology in its Wave Link software, while VTube Studio — a popular app for connecting a 3D model to a webcam for streaming as an animated character — offers an RTX-accelerated model tracker plug-in as a free download. Independent developer Xaymar uses NVIDIA Maxine in his VoiceFX plug-in.

Content creators can use this plug-in or Elgato’s virtual studio technology (VST) filter to clean up noise and echo from recordings in post-processing in video editing suites like Adobe Premiere Pro or in digital audio workstations like Ableton Live and Adobe Audition.

(Not) Hearing Is Believing

Since its release, NVIDIA Broadcast has been used by millions.

“I’ve utilized the video noise removal and background replacement the most,” said Mr_Vudoo, a Twitch personality and broadcaster. “The eye contact feature was very interesting and quite honestly took me by surprise at how well it worked.”

Unmesh Dinda, host of the YouTube channel PiXimperfect, demonstrated NVIDIA Broadcast’s noise-canceling and echo-removal AI features in an extreme scenario. He set an electric fan whirring directly into his microphone and donned a helmet that was intensely hammered on. Even with these loud sounds in the background, Dinda could be heard crystal clear with Broadcast’s noise-removal feature turned on. The video has racked up more than 12 million views.

NVIDIA Broadcast is also a useful tool for the growing remote workforce. In an article, Tom’s Hardware editor-in-chief Avram Piltch detailed his testing of the app’s noise reduction features against noisy air conditioners, lawn-mowing neighbors and even a robot-wielding, tantrum-throwing child. Broadcast’s AI audio filters prevailed every time:

“I got my eight-year-old to fake throwing a fit right behind me and, once I enabled noise removal, every whine of ‘I’m not going to bed’ went silent (at least on the recording),” said Piltch. “To double the challenge, we had him throw a tantrum while carrying around a robot car with whirring treads. Once again, NVIDIA Broadcast removed all of the unwanted sound.”

Even everyday scenarios like video calls with a medical professional benefit from NVIDIA Broadcast’s AI-powered background removal.

Download NVIDIA Broadcast for free on any RTX-powered desktop or laptop.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models

Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. Recent advances stem from the confluence of several factors, such as large-scale training datasets, deep learning techniques, and the rise of large language models. High-quality datasets are used to train models on realistic scenarios and enable the evaluation of the system on potentially unseen data. Standardized metrics facilitate comparisons between different ODQA systems, allowing researchers to objectively track advancements…Apple Machine Learning Research

AI21 Labs Jamba-Instruct model is now available in Amazon Bedrock

AI21 Labs Jamba-Instruct model is now available in Amazon Bedrock

We are excited to announce the availability of the Jamba-Instruct large language model (LLM) in Amazon Bedrock. Jamba-Instruct is built by AI21 Labs, and most notably supports a 256,000-token context window, making it especially useful for processing large documents and complex Retrieval Augmented Generation (RAG) applications.

What is Jamba-Instruct

Jamba-Instruct is an instruction-tuned version of the Jamba base model, previously open sourced by AI21 Labs, which combines a production grade-model, Structured State Space (SSM) technology, and Transformer architecture. With the SSM approach, Jamba-Instruct is able to achieve the largest context window length in its model size class while also delivering the performance traditional transformer-based models provide. These models yield a performance boost over AI21’s previous generation of models, the Jurassic-2 family of models. For more information about the hybrid SSM/Transformer architecture, refer to the Jamba: A Hybrid Transformer-Mamba Language Model whitepaper.

Get started with Jamba-Instruct

To get started with Jamba-Instruct models in Amazon Bedrock, first you need to get access to the model.

  1. On the Amazon Bedrock console, choose Model access in the navigation pane.
  2. Choose Modify model access.
  3. Select the AI21 Labs models you want to use and choose Next.
  4. Choose Submit to request model access.

For more information, refer to Model access.

Next, you can test the model either in the Amazon Bedrock Text or Chat playground.

Example use cases for Jamba-Instruct

Jamba-Instruct’s long context length is particularly well-suited for complex Retrieval Augmented Generation (RAG) workloads, or potentially complex document analysis. For example, it would be suitable for detecting contradictions between different documents or analyzing one document in the context of another. The following is an example prompt suitable for this use case:

You are an expert research assistant; 
you are to note any contradictions between the first document and second document provided: 

Document 1: 
{the document content} 

Document 2: 
{the document content} 

Contradictions:

You can also use Jamba for query augmentation, a technique where an original query is transformed into related queries, for purposes of optimizing RAG applications. For example:

You are a curious and novel researcher, 
who is highly interested in getting all the relevant information on a specific topic. 
Given an original query, you would like to generate up to 10 related queries. 
These queries should be grounded in the original query, but nevertheless new:

Original Query:
{Original Query}

New Queries:

You can also use Jamba for standard LLM operations, such as summarization and entity extraction.

Prompt guidance for Jamba-Instruct can be found in the AI21 model documentation. For more information about Jamba-Instruct, including relevant benchmarks, refer to Built for the Enterprise: Introducing AI21’s Jamba-Instruct Model.

Programmatic access

You can also access Jamba-Instruct through an API, using Amazon Bedrock and AWS SDK for Python (Boto3). For installation and setup instructions, refer to the quickstart. The following is an example code snippet:

import boto3
import json

bedrock = boto3.client(service_name="bedrock-runtime")

prompt = "<s>[INST] INSERT YOUR PROMPT HERE [/INST]"

body = json.dumps({
    "prompt": prompt,
    "max_tokens": 256,
    "top_p": 0.8,
    "temperature": 0.7,
})

modelId = "ai21.jamba-instruct-v1:0"

accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body,
    modelId=modelId,
    accept=accept,
    contentType=contentType
)

print(json.loads(response.get('body').read()))

Conclusion

AI2I Labs Jamba-Instruct in Amazon Bedrock is well-suited for applications where a long context window (up to 256,000 tokens) is required, like producing summaries or answering questions that are grounded in long documents, avoiding the need to manually segment documents sections to fit the smaller context windows of other LLMs. The new SSM/Transformer hybrid architecture also provides benefits in model throughput. It can provide a performance boost of up to three times more tokens per second for context window lengths exceeding 128,000 tokens, compared to other models in similar size class.

AI2I Labs Jamba-Instruct in Amazon Bedrock is available in the US East (N. Virginia) AWS Region and can be accessed in on-demand consumption model. To learn more, refer to and Supported foundation models in Amazon Bedrock. To get started with AI2I Labs Jamba-Instruct in Amazon Bedrock, visit the Amazon Bedrock console.


About the Authors

Joshua Broyde, PhD, is a Principal Solution Architect at AI21 Labs. He works with customers and AI21 partners across the generative AI value chain, including enabling generative AI at an enterprise level, using complex LLM workflows and chains for regulated and specialized environments, and using LLMs at scale.

Fernando Espigares Caballero is a Senior Partner Solutions Architect at AWS. He creates joint solutions with strategic Technology Partners to deliver value to customers. He has more than 25 years of experience working in IT platforms, data centers, and cloud and internet-related services, holding multiple Industry and AWS certifications. He is currently focusing on generative AI to unlock innovation and creation of novel solutions that solve specific customer needs.

Read More

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

Amazon Web Services is excited to announce the launch of the AWS Neuron Monitor container, an innovative tool designed to enhance the monitoring capabilities of AWS Inferentia and AWS Trainium chips on Amazon Elastic Kubernetes Service (Amazon EKS). This solution simplifies the integration of advanced monitoring tools such as Prometheus and Grafana, enabling you to set up and manage your machine learning (ML) workflows with AWS AI Chips. With the new Neuron Monitor container, you can visualize and optimize the performance of your ML applications, all within a familiar Kubernetes environment. The Neuron Monitor container can also run on Amazon Elastic Container Service (Amazon ECS), but for the purpose of this post, we primarily discuss Amazon EKS deployment.

In addition to the Neuron Monitor container, the release of CloudWatch Container Insights (for Neuron) provides further benefits. This extension provides a robust monitoring solution, offering deeper insights and analytics tailored specifically for Neuron-based applications. With Container Insights, you can now access more granular data and comprehensive analytics, making it effortless for developers to maintain high performance and operational health of their ML workloads.

Solution overview

The Neuron Monitor container solution provides a comprehensive monitoring framework for ML workloads on Amazon EKS, using the power of Neuron Monitor in conjunction with industry-standard tools like Prometheus, Grafana, and Amazon CloudWatch. By deploying the Neuron Monitor DaemonSet across EKS nodes, developers can collect and analyze performance metrics from ML workload pods.

In one flow, metrics gathered by Neuron Monitor are integrated with Prometheus, which is configured using a Helm chart for scalability and ease of management. These metrics are then visualized through Grafana, offering you detailed insights into your applications’ performance for effective troubleshooting and optimization.

Alternatively, metrics can also be directed to CloudWatch through the CloudWatch Observability EKS add-on or a Helm chart for a deeper integration with AWS services in a single step. The add-on helps automatically discover critical health metrics from the AWS Trainium and AWS Inferentia chips in the Amazon EC2 Trn1 and Amazon EC2 Inf2 instances, as well as from Elastic Fabric Adapter, the network interface for EC2 instances.. This integration can help you better understand the traffic impact on your distributed deep learning algorithms.

This architecture has many benefits:

  • Highly targeted and intentional monitoring on Container Insights
  • Real-time analytics and greater visibility into ML workload performance on Neuron
  • Native support for your existing Amazon EKS infrastructure

Neuron Monitor provides flexibility and depth in monitoring within the Kubernetes environment.

The following diagram illustrates the solution architecture:

Fig.1 Solution Architecture Diagram

In the following sections, we demonstrate how to use Container Insights for enhanced observability, and how to set up Prometheus and Grafana for this solution.

Configure Container Insights for enhanced observability

In this section, we walk through the steps to configure Container Insights.

Set up the CloudWatch Observability EKS add-on

Refer to Install the Amazon CloudWatch Observability EKS add-on for instructions to create the amazon-cloudwatch-observability add-on in your EKS cluster. This process involves deploying the necessary resources for monitoring directly within CloudWatch.

After you set up the add-on, check the health of the add-on with the following command:

aws eks describe-addon --cluster-name <value> --addon-name amazon-cloudwatch-observability

The output should contain the following property value:

"status": "ACTIVE",

For details about confirming the output, see Retrieve addon version compatibility.

Once the add-on is active, you can then directly view metrics in Container Insights.

View CloudWatch metrics

Navigate to the Container Insights console, where you can visualize metrics and telemetry about your whole Amazon EKS environment, including your Neuron device metrics. The enhanced Container Insights page looks similar to the following screenshot, with the high-level summary of your clusters, along with kube-state and control-plane metrics. The Container Insights dashboard also shows cluster status and alarms. It uses predefined thresholds for CPU, memory, and NeuronCores to quickly identify which resources have higher consumption, and enables proactive actions to avoid performance impact.

Fig.2 CloudWatch Container Insights Dashboard

The out-of-the-box opinionated performance dashboards and troubleshooting UI enables you to see your Neuron metrics at multiple granularities from an aggregated cluster level to per-container level and per-NeuronCore level. With the Container Insights default configuration, you can also qualify and correlate your Neuron metrics against the other aspects of your infrastructure such as CPU, memory, disk, Elastic Fabric Adapter devices, and more.

When you navigate to any of the clusters based on their criticality, you can view the Performance monitoring dashboard, as shown in the following screenshot.

Fig.3 Performance Monitoring Dashboard Views

This monitoring dashboard provides various views to analyze performance, including:

  • Cluster-wide performance dashboard view – Provides an overview of resource utilization across the entire cluster
  • Node performance view – Visualizes metrics at the individual node level
  • Pod performance view – Focuses on pod-level metrics for CPU, memory, network, and so on
  • Container performance view – Drills down into utilization metrics for individual containers

This landing page has now been enhanced with Neuron metrics, including top 10 graphs, which helps you identify unhealthy components in your environments even without alarms and take proactive action before application performance is impacted. For a more in-depth analysis of what is delivered on this landing page, refer to Announcing Amazon CloudWatch Container Insights with Enhanced Observability for Amazon EKS on EC2.

Prometheus and Grafana

In this section, we walk through the steps to set up Prometheus and Grafana.

Prerequisites

You should have an EKS cluster set up with AWS Inferentia or Trainium worker nodes.

Set up the Neuron Monitoring container

The Neuron Monitoring container is hosted on Amazon ECR Public. Although it’s accessible for immediate use, it’s not a recommended best practice for direct production workload use due to potential throttling limits. For more information on this and on setting up a pull through cache, see the Neuron Monitor User Guide. For production environments, it’s advisable to copy the Neuron Monitoring container to your private Amazon Elastic Container Registry (Amazon ECR) repository, where the Amazon ECR pull through cache feature can manage synchronization effectively.

Set up Kubernetes for Neuron Monitoring

You can use the following YAML configuration snippet to set up Neuron Monitoring in your Kubernetes cluster. This setup includes a DaemonSet to deploy the monitoring container on each suitable node in namespace neuron-monitor:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: neuron-monitor
  namespace: neuron-monitor
  labels:
    app: neuron-monitor
    version: v1
spec:
  selector:
    matchLabels:
      app: neuron-monitor
  template:
    metadata:
      labels:
        app: neuron-monitor
        version: v1
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/os
                    operator: In
                    values:
                      - linux
                  - key: node.kubernetes.io/instance-type
                    operator: In
                    values:
                      - trn1.2xlarge
                      - trn1.32xlarge
                      - trn1n.32xlarge
                      - inf1.xlarge
                      - inf1.2xlarge
                      - inf1.6xlarge
                      - inf2.xlarge
                      - inf2.8xlarge
                      - inf2.24xlarge
                      - inf2.48xlarge
      containers:
        - name: neuron-monitor
          image: public.ecr.aws/neuron/neuron-monitor:1.0.1
          ports:
            - containerPort: 8000
          command:
             - "/opt/bin/entrypoint.sh"
          args: 
            - "--port"
            - "8000"  
          resources:
            limits:
              cpu: 500m
              memory: 256Mi
            requests:
              cpu: 256m
              memory: 128Mi
          env:
          - name: GOMEMLIMIT
            value: 160MiB
          securityContext:
            privileged: true

To apply this YAML file, complete the following steps:

  1. Replace <IMAGE_URI> with the URI of the Neuron Monitoring container image in your ECR repository.
  2. Run the YAML file with the Kubernetes command line tool with the following code:
kubectl apply -f <filename>.yaml
  1. Verify the Neuron Monitor container is running as DaemonSet:
kubectl get daemonset -n neuron-monitor

Set up Amazon Managed Service for Prometheus

To utilize Amazon Managed Service for Prometheus with your EKS cluster, you must first configure Prometheus to scrape metrics from Neuron Monitor pods and forward them to the managed service.

Prometheus requires the Container Storage Interface (CSI) in the EKS cluster. You can use eksctl to set up the necessary components.

  1. Create an AWS Identity and Access Management (IAM) service account with appropriate permissions:
eksctl create iamserviceaccount --name ebs-csi-controller-sa --namespace kube-system --cluster <cluster-name> --role-name <role name> --role-only --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy --approve
  1. Install the Amazon Elastic Block Store (Amazon EBS) CSI driver add-on:
eksctl create addon --name aws-ebs-csi-driver --cluster <cluster-name> --service-account-role-arn <role-arn> --force
  1. Verify the add-on installation:
eksctl get addon --name aws-ebs-csi-driver --cluster <cluster-name>

Now you’re ready to set up your Amazon Managed Service for Prometheus workspace.

  1. Create a workspace using the AWS Command Line Interface (AWS CLI) and confirm its active status:
aws amp create-workspace --alias <alias>
aws amp list-workspaces --alias <alias>
  1. Set up the required service roles following the AWS guidelines to facilitate the ingestion of metrics from your EKS clusters. This includes creating an IAM role specifically for Prometheus ingestion:
aws iam get-role --role-name amp-iamproxy-ingest-role

Next, you install Prometheus in your EKS cluster using a Helm chart, configuring it to scrape metrics from Neuron Monitor and forward them to your Amazon Managed Service for Prometheus workspace. The following is an example of the Helm chart .yaml file to override the necessary configs:

serviceAccounts:
    server:
        name: "amp-iamproxy-ingest-service-account"
        annotations:
            eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/amp-iamproxy-ingest-role"
server:
    remoteWrite:
        - url: https://aps-workspaces.<region>.amazonaws.com/workspaces/<workspace-id>/api/v1/remote_write
          sigv4:
            region: us-west-2
          queue_config:
            max_samples_per_send: 1000
            max_shards: 200
            capacity: 2500
extraScrapeConfigs: |
  - job_name: neuron-monitor-stats
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: neuron-monitor
    - source_labels: [__meta_kubernetes_pod_container_port_number]
      action: keep
      regex: 8000

This file has the following key sections:

  • serviceAccounts – Configures the service account used by Prometheus with the necessary IAM role for permissions to ingest metrics
  • remoteWrite – Specifies the endpoint for writing metrics to Amazon Managed Service for Prometheus, including AWS Region-specific details and batch-writing configurations
  • extraScrapeConfigs – Defines additional configurations for scraping metrics from Neuron Monitor pods, including selecting pods based on labels and making sure only relevant metrics are captured
  1. Install Prometheus in your EKS cluster using the Helm command and specifying the .yaml file:
helm install prometheus prometheus-community/prometheus -n prometheus --create-namespace -f values.yaml
  1. Verify the installation by checking that all Prometheus pods are running:
kubectl get pods -n prometheus

This confirms that Prometheus is correctly set up to collect metrics from the Neuron Monitor container and forward them to Amazon Managed Service for Prometheus.

Integrate Amazon Managed Grafana

When Prometheus is operational, complete the following steps:

  1. Set up Amazon Managed Grafana. For instructions, see Getting started with Amazon Managed Grafana.
  2. Configure it to use Amazon Managed Service for Prometheus as a data source. For details, see Use AWS data source configuration to add Amazon Managed Service for Prometheus as a data source.
  3. Import the example Neuron Monitor dashboard from GitHub to quickly visualize your metrics.

The following screenshot shows your dashboard integrated with Amazon Managed Grafana.

Fig.4 Integrating Amazon Managed Grafana

Clean up

To make sure none of the resources created in this walkthrough are left running, complete the following cleanup steps:

  1. Delete the Amazon Managed Grafana workspace.
  2. Uninstall Prometheus from the EKS cluster:
helm uninstall prometheus -n Prometheus
  1. Remove the Amazon Managed Service for Prometheus workspace ID from the trust policy of the role amp-iamproxy-ingest-role or delete the role.
  2. Delete the Amazon Managed Service for Prometheus workspace:
aws amp delete-workspace --workspace-id <workspace-id>

  1. Clean up the CSI:
eksctl delete addon --cluster <cluster-name> --name aws-ebs-csi-driver
 eksctl delete iamserviceaccount --name ebs-csi-controller-sa --namespace kube-system --cluster <cluster-name>
  1. Delete the Neuron Monitor DaemonSet from the EKS cluster:
kubectl delete daemonset neuron-monitor -n neuron-monitor

Conclusion

The release of the Neuron Monitor container marks a significant enhancement in the monitoring of ML workloads on Amazon EKS, specifically tailored for AWS Inferentia and Trainium chips. This solution simplifies the integration of powerful monitoring tools like Prometheus, Grafana, and CloudWatch, so you can effectively manage and optimize your ML applications with ease and precision.

To explore the full capabilities of this monitoring solution, refer to Deploy Neuron Container on Elastic Kubernetes Service (EKS). Refer to Amazon EKS and Kubernetes Container Insights metrics to learn more about setting up the Neuron Monitor container and using Container Insights to fully harness the capabilities of your ML infrastructure on Amazon EKS. Additionally, engage with our community through our GitHub repo to share experiences and best practices, so you stay at the forefront of ML operations on AWS.


About the Authors

Niithiyn Vijeaswaran is a Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.

Emir Ayar is a Senior Tech Lead Solutions Architect with the AWS Prototyping team. He specializes in assisting customers with building ML and generative AI solutions, and implementing architectural best practices. He supports customers in experimenting with solution architectures to achieve their business objectives, emphasizing agile innovation and prototyping. He lives in Luxembourg and enjoys playing synthesizers.

Ziwen Ning is a software development engineer at AWS. He currently focuses on enhancing the AI/ML experience through the integration of AWS Neuron with containerized environments and Kubernetes. In his free time, he enjoys challenging himself with badminton, swimming and other various sports, and immersing himself in music.

Rohit Talluri is a Generative AI GTM Specialist (Tech BD) at Amazon Web Services (AWS). He is partnering with top generative AI model builders, strategic customers, key AI/ML partners, and AWS Service Teams to enable the next generation of artificial intelligence, machine learning, and accelerated computing on AWS. He was previously an Enterprise Solutions Architect, and the Global Solutions Lead for AWS Mergers & Acquisitions Advisory.

Albert Opher is a Solutions Architect Intern at AWS. He is a rising senior at the University of Pennsylvania pursuing Dual Bachelor’s Degrees in Computer Information Science and Business Analytics in the Jerome Fisher Management and Technology Program. He has experience with multiple programming languages, AWS cloud services, AI/ML technologies, product and operations management, pre and early seed start-up ventures, and corporate finance.

Geeta Gharpure is a senior software developer on the Annapurna ML engineering team. She is focused on running large scale AI/ML workloads on Kubernetes. She lives in Sunnyvale, CA and enjoys listening to audible in her free time

Read More

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Extracting valuable insights from customer feedback presents several significant challenges. Manually analyzing and categorizing large volumes of unstructured data, such as reviews, comments, and emails, is a time-consuming process prone to inconsistencies and subjectivity. Scalability becomes an issue as the amount of feedback grows, hindering the ability to respond promptly and address customer concerns. In addition, capturing granular insights, such as specific aspects mentioned and associated sentiments, is difficult. Inefficient routing and prioritization of customer inquiries or issues can lead to delays and dissatisfaction. These pain points highlight the need to streamline the process of extracting insights from customer feedback, enabling businesses to make data-driven decisions and enhance the overall customer experience.

Large language models (LLMs) have transformed the way we engage with and process natural language. These powerful models can understand, generate, and analyze text, unlocking a wide range of possibilities across various domains and industries. From customer service and ecommerce to healthcare and finance, the potential of LLMs is being rapidly recognized and embraced. Businesses can use LLMs to gain valuable insights, streamline processes, and deliver enhanced customer experiences. Unlike traditional natural language processing (NLP) approaches, such as classification methods, LLMs offer greater flexibility in adapting to dynamically changing categories and improved accuracy by using pre-trained knowledge embedded within the model.

Amazon Bedrock, a fully managed service designed to facilitate the integration of LLMs into enterprise applications, offers a choice of high-performing LLMs from leading artificial intelligence (AI) companies like Anthropic, Mistral AI, Meta, and Amazon through a single API. It provides a broad set of capabilities like model customization through fine-tuning, knowledge base integration for contextual responses, and agents for running complex multi-step tasks across systems. With Amazon Bedrock, developers can experiment, evaluate, and deploy generative AI applications without worrying about infrastructure management. Its enterprise-grade security, privacy controls, and responsible AI features enable secure and trustworthy generative AI innovation at scale.

To create and share customer feedback analysis without the need to manage underlying infrastructure, Amazon QuickSight provides a straightforward way to build visualizations, perform one-time analysis, and quickly gain business insights from customer feedback, anytime and on any device. In addition, the generative business intelligence (BI) capabilities of QuickSight allow you to ask questions about customer feedback using natural language, without the need to write SQL queries or learn a BI tool. This user-friendly approach to data exploration and visualization empowers users across the organization to analyze customer feedback and share insights quickly and effortlessly.

In this post, we explore how to integrate LLMs into enterprise applications to harness their generative capabilities. We delve into the technical aspects of workflow implementation and provide code samples that you can quickly deploy or modify to suit your specific requirements. Whether you’re a developer seeking to incorporate LLMs into your existing systems or a business owner looking to take advantage of the power of NLP, this post can serve as a quick jumpstart.

Advantages of adopting generative approaches for NLP tasks

For customer feedback analysis, you might wonder if traditional NLP classifiers such as BERT or fastText would suffice. Although these traditional machine learning (ML) approaches might perform decently in terms of accuracy, there are several significant advantages to adopting generative AI approaches. The following table compares the generative approach (generative AI) with the discriminative approach (traditional ML) across multiple aspects.

. Generative AI (LLMs) Traditional ML
Accuracy Achieves competitive accuracy by using knowledge acquired during pre-training and utilizing the semantic similarity between category names and customer feedback. Particularly beneficial if you don’t have much labeled data. Can achieve high accuracy given sufficient labeled data, but performance may degrade if you don’t have much labeled data and rely solely on predefined features, because it lacks the ability to capture semantic similarities effectively.
Acquiring labeled data Uses pre-training on large text corpora, enabling zero-shot or few-shot learning. No labeled data is needed. Requires labeled data for all categories of interest, which can be time-consuming and expensive to obtain.
Model generalization Benefits from exposure to diverse text genres and domains during pre-training, enhancing generalization to new tasks. Relies on a large volume of task-specific labeled data to improve generalization, limiting its ability to adapt to new domains.
Operational efficiency Uses prompt engineering, reducing the need for extensive fine-tuning when new categories are introduced. Requires retraining the model whenever new categories are added, leading to increased computational costs and longer deployment times.
Handling rare categories and imbalanced data Can generate text for rare or unseen categories by using its understanding of context and language semantics. Struggles with rare categories or imbalanced classes due to limited labeled examples, often resulting in poor performance on infrequent classes.
Explainability Provides explanations for its predictions through generated text, offering insights into its decision-making process. Explanations are often limited to feature importance or decision rules, lacking the nuance and context provided by generated text.

Generative AI models offer advantages with pre-trained language understanding, prompt engineering, and reduced need for retraining on label changes, saving time and resources compared to traditional ML approaches. You can further fine-tune a generative AI model to tailor the model’s performance to your specific domain or task. For more information, see Customize models in Amazon Bedrock with your own data using fine-tuning and continued pre-training.

In this post, we primarily focus on the zero-shot and few-shot capabilities of LLMs for customer feedback analysis. Zero-shot learning in LLMs refers to their ability to perform tasks without any task-specific examples, whereas few-shot learning involves providing a small number of examples to improve performance on a new task. These capabilities have gained significant attention due to their ability to strike a balance between accuracy and operational efficiency. By using the pre-trained knowledge of LLMs, zero-shot and few-shot approaches enable models to perform NLP with minimal or no labeled data. This eliminates the need for extensive data annotation efforts and allows for quick adaptation to new tasks.

Solution overview

Our solution presents an end-to-end generative AI application for customer review analysis. When the automated content processing steps are complete, you can use the output for downstream tasks, such as to invoke different components in a customer service backend application, or to insert the generated tags into metadata of each document for product recommendation.

The following diagram illustrates the architecture and workflow of the proposed solution.

Reference architecture

The customer review analysis workflow consists of the following steps:

  1. A user uploads a file to dedicated data repository within your Amazon Simple Storage Service (Amazon S3) data lake, invoking the processing using AWS Step Functions.
  2. The Step Functions workflow starts. In the first step, an AWS Lambda function reads and validates the file, and extracts the raw data.
  3. The raw data is processed by an LLM using a preconfigured user prompt. The LLM generates output based on the user prompt.
  4. The processed output is stored in a database or data warehouse, such as Amazon Relational Database Service (Amazon RDS).
  5. The stored data is visualized in a BI dashboard using QuickSight.
  6. The user receives a notification when the results are ready and can access the BI dashboard to view and analyze the results.

The project is available on GitHub and provides AWS Cloud Development Kit (AWS CDK) code to deploy. The AWS CDK is an open source software development framework for defining cloud infrastructure in code (IaC) and provisioning it through AWS CloudFormation. This provides an automated deployment experience on your AWS account. We highly suggest you follow the GitHub README and deployment guidance to get started.

In the following sections, we highlight the key components to explain this automated framework for insight discovery: workflow orchestration with Step Functions, prompt engineering for the LLM, and visualization with QuickSight.

Prerequisites

This post is intended for developers with a basic understanding of LLM and prompt engineering. Although no advanced technical knowledge is required, familiarity with Python and AWS Cloud services will be beneficial if you want to explore our sample code on GitHub.

Workflow orchestration with Step Functions

To manage and coordinate multi-step workflows and processes, we take advantage of Step Functions. Step Functions is a visual workflow service that enables developers to build distributed applications, automate processes, orchestrate microservices, and create data and ML pipelines using AWS services. It can automate extract, transform, and load (ETL) processes, so multiple long-running ETL jobs run in order and complete successfully without manual orchestration. By combining multiple Lambda functions, Step Functions allows you to create responsive serverless applications and orchestrate microservices. Moreover, it can orchestrate large-scale parallel workloads, enabling you to iterate over and process large datasets, such as security logs, transaction data, or image and video files. The definition of our end-to-end orchestration is detailed in the GitHub repo.

Step Functions invokes multiple Lambda functions for the end-to-end workflow:

Step Functions uses the Map state processing modes to orchestrate large-scale parallel workloads. You can modify the Step Functions state machine to adapt to your own workflow, or modify the Lambda function for your own processing logic.

Step function

Prompt engineering

To invoke Amazon Bedrock, you can follow our code sample that uses the Python SDK. A prompt is natural language text describing the task that an AI should perform. Prompt engineering may involve phrasing a query, specifying a style, providing relevant context, or assigning a role to the AI, such as “You are helpful assistant.” We provide a prompt example for feedback categorization. For more information, refer to Prompt engineering. You can modify the prompt to adapt to your own workflow.

This framework uses a sample prompt to generate tags for user feedback from the predefined tags listed. You can engineer the prompt based on your user feedback style and business requirements.

You are tasked with selecting an appropriate tag from the given lists based on user feedback content and feedback title enclosed within the `<feedback>` and `<title>` XML tag. 

Here is the list of potential tags: 
<tags> 
$tags 
</tags> 

<title> 
$title 
</title>

<feedback> 
$feedback 
</feedback> 

Please choose only one from tag list and response to the user’s questions within <tag></tag> tags. If none of the tags above are suitable for the feedback or information is not enough, return "unknown". No explanation is required. No need to echo tag list and feedback. 

Visualization with QuickSight

We have successfully used an LLM to categorize the feedback into predefined categories. After the data is categorized and stored in Amazon RDS, you can use QuickSight to generate an overview and visualize the insights from the dataset. For deployment guidance, refer to GitHub Repository: Result Visualization Guide.

We use an LLM from Amazon Bedrock to generate a category label for each piece of feedback. This generated label is stored in the label_llm field. To analyze the distribution of these labels, select the label_llm field along with other relevant fields and visualize the data using a pie chart. This will provide an overview of the different categories and their proportions within the feedback dataset, as shown in the following screenshot.

Category pie chart

In addition to the category overview, you can also generate a trend analysis of the feedback or issues over time. The following screenshot demonstrates a trend where the number of issues peaked in March but then showed immediate improvement, with a reduction in the number of issues in subsequent months.

Quicksight analysis sample

Sometimes, you may need to create paginated reports to present to a company management team about customer feedback. You can use Amazon QuickSight Paginated Reports to create highly formatted multi-page reports from the insight extracted by LLMs, define report layouts and formatting, and schedule report generation and distribution.

Clean up

If you followed the GitHub deployment guide and want to clean up afterwards, delete the stack customer-service-dev on the CloudFormation console or run the command cdk destroy customer-service-dev. You can also refer to the cleanup section in the GitHub deployment guide.

Applicable real-world applications and scenarios

You can use this automated architecture for content processing for various real-world applications and scenarios:

  • Customer feedback categorization and sentiment classification – In the context of modern application services, customers often leave comments and reviews to share their experiences. To effectively utilize this valuable feedback, you can use LLMs to analyze and categorize the comments. The LLM extracts specific aspects mentioned in the feedback, such as food quality, service, ambiance, and other relevant factors. Additionally, it determines the sentiment associated with each aspect, classifying it as positive, negative, or neutral. With LLMs, businesses can gain valuable insights into customer satisfaction levels and identify areas that require improvement, enabling them to make data-driven decisions to enhance their offerings and overall customer experience.
  • Email categorization for customer service – When customers reach out to a company’s customer service department through email, they often have various inquiries or issues that need to be addressed promptly. To streamline the customer service process, you can use LLMs to analyze the content of each incoming email. By examining the email’s content and understanding the nature of the inquiry, the LLM categorizes the email into predefined categories such as billing, technical support, product information, and more. This automated categorization allows the emails to be efficiently routed to the appropriate departments or teams for further handling and response. By implementing this system, companies can make sure customer inquiries are promptly addressed by the relevant personnel, improving response times and enhancing customer satisfaction.
  • Web data analysis for product information extraction – In the realm of ecommerce, extracting accurate and comprehensive product information from webpages is crucial for effective data management and analysis. You can use an LLM to scan and analyze product pages on an ecommerce website, extracting key details such as the product title, pricing information, promotional status (such as on sale or limited-time offer), product description, and other relevant attributes. The LLM’s ability to understand and interpret the structured and unstructured data on these pages allows for the efficient extraction of valuable information. The extracted data is then organized and stored in a database, enabling further utilization for various purposes, including product comparison, pricing analysis, or generating comprehensive product feeds. By using the power of an LLM for web data analysis, ecommerce businesses can provide accuracy and completeness of their product information, facilitating improved decision-making and enhancing the overall customer experience.
  • Product recommendation with tagging – To enhance the product recommendation system and improve search functionality on an online website, implementing a tagging mechanism is highly beneficial. You can use LLMs to generate relevant tags for each product based on its title, description, and other available information. The LLM can generate two types of tags: predefined tags and free tags. Predefined tags are assigned from a predetermined set of categories or attributes that are relevant to the products, providing consistency and structured organization. Free tags are open-ended and generated by the LLM to capture specific characteristics or features of the products, providing a more nuanced and detailed representation. These tags are then associated with the corresponding products in the database. When users search for products or browse recommendations, the tags serve as powerful matching criteria, enabling the system to suggest highly relevant products based on user preferences and search queries. By incorporating an LLM-powered tagging system, online websites can significantly improve the user experience, increase the likelihood of successful product discovery, and ultimately drive higher customer engagement and satisfaction.

Conclusion

In this post, we explored how you can seamlessly integrate LLMs into enterprise applications to take advantage of their powerful generative AI capabilities. With AWS services such as Amazon Bedrock, Step Functions, and QuickSight, businesses can create intelligent workflows that automate processes, generate insights, and enhance decision-making.

We have provided a comprehensive overview of the technical aspects involved in implementing such a workflow, along with code samples that you can deploy or customize to meet your organization’s specific needs. By following the step-by-step guide and using the provided resources, you can quickly incorporate this generative AI application into your current workload. We encourage you to check out the GitHub repository, deploy the solution to your AWS environment, and modify it according to your own user feedback and business requirements.

Embracing LLMs and integrating them into your enterprise applications can unlock a new level of efficiency, innovation, and competitiveness. You can learn from AWS Generative AI Customer Stories how others harness the power of generative AI to drive their business forward, and check out our AWS Generative AI blogs for the latest technology updates in today’s rapidly evolving technological landscape.


About the Authors

Jacky Wu, is a Senior Solutions Architect at AWS. Before AWS, he had been implementing front-to-back cross-asset trading system for large financial institutions, developing high frequency trading system of KRX KOSPI Options and long-short strategies of APJ equities. He is very passionate about how technology can solve capital market challenges and provide beneficial outcomes by AWS latest services and best practices. Outside of work, Jacky enjoys 10km run and traveling.

Yanwei Cui, PhD, is a Senior Machine Learning Specialist Solutions Architect at AWS. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building AI-powered industrial applications in computer vision, natural language processing, and online user behavior prediction. At AWS, he shares his domain expertise and helps customers unlock business potentials and drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Michelle Hong, PhD, works as Prototyping Solutions Architect at Amazon Web Services, where she helps customers build innovative applications using a variety of AWS components. She demonstrated her expertise in machine learning, particularly in natural language processing, to develop data-driven solutions that optimize business processes and improve customer experiences.

Read More