Alida gains deeper understanding of customer feedback with Amazon Bedrock

Alida gains deeper understanding of customer feedback with Amazon Bedrock

This post is co-written with Sherwin Chu from Alida.

Alida helps the world’s biggest brands create highly engaged research communities to gather feedback that fuels better customer experiences and product innovation.

Alida’s customers receive tens of thousands of engaged responses for a single survey, therefore the Alida team opted to leverage machine learning (ML) to serve their customers at scale. However, when employing the use of traditional natural language processing (NLP) models, they found that these solutions struggled to fully understand the nuanced feedback found in open-ended survey responses. The models often only captured surface-level topics and sentiment, and missed crucial context that would allow for more accurate and meaningful insights.

In this post, we learn about how Anthropic’s Claude Instant model on Amazon Bedrock enabled the Alida team to quickly build a scalable service that more accurately determines the topic and sentiment within complex survey responses. The new service achieved a 4-6 times improvement in topic assertion by tightly clustering on several dozen key topics vs. hundreds of noisy NLP keywords.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Using Amazon Bedrock allowed Alida to bring their service to market faster than if they had used other machine learning (ML) providers or vendors.

The challenge

Surveys with a combination of multiple-choice and open-ended questions allow market researchers to get a more holistic view by capturing both quantitative and qualitative data points.

Multiple-choice questions are easy to analyze at scale, but lack nuance and depth. Set response options may also lead to biasing or priming participant responses.

Open-ended survey questions allow responders to provide context and unanticipated feedback. These qualitative data points deepen researchers’ understanding beyond what multiple-choice questions can capture alone. The challenge with the free-form text is that it can lead to complex and nuanced answers that are difficult for traditional NLP to fully understand. For example:

“I recently experienced some of life’s hardships and was really down and disappointed. When I went in, the staff were always very kind to me. It’s helped me get through some tough times!”

Traditional NLP methods will identify topics as “hardships,” “disappointed,” “kind staff,” and “get through tough times.” It can’t distinguish between the responder’s overall current negative life experiences and the specific positive store experiences.

Alida’s existing solution automatically process large volumes of open-ended responses, but they wanted their customers to gain better contextual comprehension and high-level topic inference.

Amazon Bedrock

Prior to the introduction of LLMs, the way forward for Alida to improve upon their existing single-model solution was to work closely with industry experts and develop, train, and refine new models specifically for each of the industry verticals that Alida’s customers operated in. This was both a time- and cost-intensive endeavor.

One of the breakthroughs that make LLMs so powerful is the use of attention mechanisms. LLMs use self-attention mechanisms that analyze the relationships between words in a given prompt. This allows LLMs to better handle the topic and sentiment in the earlier example and presents an exciting new technology that can be used to address the challenge.

With Amazon Bedrock, teams and individuals can immediately start using foundation models without having to worry about provisioning infrastructure or setting up and configuring ML frameworks. You can get started with the following steps:

  1. Verify that your user or role has permission to create or modify Amazon Bedrock resources. For details, see Identity-based policy examples for Amazon Bedrock
  2. Log in into the Amazon Bedrock console.
  3. On the Model access page, review the EULA and enable the FMs you’d like in your account.
  4. Start interacting with the FMs via the following methods:

Alida’s executive leadership team was eager to be an early adopter of the Amazon Bedrock because they recognized its ability to help their teams to bring new generative AI-powered solutions to market faster.

Vincy William, the Senior Director of Engineering at Alida who leads the team responsible for building the topic and sentiment analysis service, says,

“LLMs provide a big leap in qualitative analysis and do things (at a scale that is) humanly not possible to do. Amazon Bedrock is a game changer, it allows us to leverage LLMs without the complexity.”

The engineering team experienced the immediate ease of getting started with Amazon Bedrock. They could select from various foundation models and start focusing on prompt engineering instead of spending time on right-sizing, provisioning, deploying, and configuring resources to run the models.

Solution overview

Sherwin Chu, Alida’s Chief Architect, shared Alida’s microservices architecture approach. Alida built the topic and sentiment classification as a service with survey response analysis as its first application. With this approach, common LLM implementation challenges such as the complexity of managing prompts, token limits, request constraints, and retries are abstracted away, and the solution allows for consuming applications to have a simple and stable API to work with. This abstraction layer approach also enables the service owners to continually improve internal implementation details and minimize API-breaking changes. Finally, the service approach allows for a single point to implement any data governance and security policies that evolve as AI governance matures in the organization.

The following diagram illustrates the solution architecture and flow.

Alida microservice architecture

Alida evaluated LLMs from various providers, and found Anthropic’s Claude Instant to be the right balance between cost and performance. Working closely with the prompt engineering team, Chu advocated to implement a prompt chaining strategy as opposed to a single monolith prompt approach.

Prompt chaining enables you to do the following:

  • Break down your objective into smaller, logical steps
  • Build a prompt for each step
  • Provide the prompts sequentially to the LLM

This creates additional points of inspection, which has the following benefits:

  • It’s straightforward to systematically evaluate changes you make to the input prompt
  • You can implement more detailed tracking and monitoring of the accuracy and performance at each step

Key considerations with this strategy include the increase in the number of requests made to the LLM and the resulting increase in the overall time it takes to complete the objective. For Alida’s use case they chose to batching a collection of open-ended responses in a single prompt to the LLM is what they chose to offset these effects.

NLP vs. LLM

Alida’s existing NLP solution relies on clustering algorithms and statistical classification to analyze open-ended survey responses. When applied to sample feedback for a coffee shop’s mobile app, it extracted topics based on word patterns but lacked true comprehension. The following table includes some examples comparing NLP responses vs. LLM responses.

Survey Response Existing Traditional NLP Amazon Bedrock with Claude Instant
Topic Topic Sentiment
I almost exclusively order my drinks through the app bc of convenience and it’s less embarrassing to order super customized drinks lol. And I love earning rewards! [‘app bc convenience’, ‘drink’, ‘reward’] Mobile Ordering Convenience positive
The app works pretty good the only complaint I have is that I can’t add Any number of money that I want to my gift card. Why does it specifically have to be $10 to refill?! [‘complaint’, ‘app’, ‘gift card’, ‘number money’] Mobile Order Fulfillment Speed negative

The example results show how the existing solution was able to extract relevant keywords, but isn’t able to achieve a more generalized topic group assignment.

In contrast, using Amazon Bedrock and Anthropic Claude Instant, the LLM with in-context training is able to assign the responses to pre-defined topics and assign sentiment.

In additional to delivering better answers for Alida’s customers, for this particular use-case, pursuing a solution using an LLM over traditional NLP methods saved a vast amount of time and effort in training and maintaining a suitable model. The following table compares training a traditional NLP model vs. in-context training of an LLM.

. Data Requirement Training Process Model Adaptability
Training a traditional NLP model Thousands of human-labeled examples

Combination of automated and manual feature engineering.

Iterative train and evaluate cycles.

Slower turnaround due to the need to retrain model
In-context training of LLM Several examples

Trained on the fly within the prompt.

Limited by context window size.

Faster iterations by modifying the prompt.

Limited retention due to context window size.

Conclusion

Alida’s use of Anthropic’s Claude Instant model on Amazon Bedrock demonstrates the powerful capabilities of LLMs for analyzing open-ended survey responses. Alida was able to build a superior service that was 4-6 times more precise at topic analysis when compared to their NLP-powered service. Additionally, using in-context prompt engineering for LLMs significantly reduced development time, because they didn’t need to curate thousands of human-labeled data points to train a traditional NLP model. This ultimately allows Alida to give their customers richer insights sooner!

If you’re ready to start building your own foundation model innovation with Amazon Bedrock, checkout this link to Set up Amazon Bedrock. If you interested in reading about other intriguing Amazon Bedrock applications, see the Amazon Bedrock specific section of the AWS Machine Learning Blog.


About the authors

Kinman Lam is an ISV/DNB Solution Architect for AWS. He has 17 years of experience in building and growing technology companies in the smartphone, geolocation, IoT, and open source software space. At AWS, he uses his experience to help companies build robust infrastructure to meet the increasing demands of growing businesses, launch new products and services, enter new markets, and delight their customers.

Sherwin ChuSherwin Chu is the Chief Architect at Alida, helping product teams with architectural direction, technology choice, and complex problem-solving. He is an experienced software engineer, architect, and leader with over 20 years in the SaaS space for various industries. He has built and managed numerous B2B and B2C systems on AWS and GCP.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML and generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, AWS’ flagship generative AI offering for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.

Read More

Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together

Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together

Amazon Bedrock is the best place to build and scale generative AI applications with large language models (LLM) and other foundation models (FMs). It enables customers to leverage a variety of high-performing FMs, such as the Claude family of models by Anthropic, to build custom generative AI applications. Looking back to 2021, when Anthropic first started building on AWS, no one could have envisioned how transformative the Claude family of models would be. We have been making state-of-the-art generative AI models accessible and usable for businesses of all sizes through Amazon Bedrock. In just a few short months since Amazon Bedrock became generally available on September 28, 2023, more than 10K customers have been using it to deliver, and many of them are using Claude. Customers such as ADP, Broadridge, Cloudera, Dana-Farber Cancer Institute, Genesys, Genomics England, GoDaddy, Intuit, M1 Finance, Perplexity AI, Proto Hologram, Rocket Companies and more are using Anthropic’s Claude models on Amazon Bedrock to drive innovation in generative AI and to build transformative customer experiences. And today, we are announcing an exciting milestone with the next generation of Claude coming to Amazon Bedrock: Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku.

Introducing Anthropic’s Claude 3 models

Anthropic is unveiling its next generation of Claude with three advanced models optimized for different use cases. Haiku is the fastest and most cost-effective model on the market. It is a fast compact model for near-instant responsiveness. For the vast majority of workloads, Sonnet is 2x faster than Claude 2 and Claude 2.1 with higher levels of intelligence. It excels at intelligent tasks demanding rapid responses, like knowledge retrieval or sales automation. And it strikes the ideal balance between intelligence and speed – qualities especially critical for enterprise use cases. Opus is the most advanced, capable, state-of-the-art FM with deep reasoning, advanced math, and coding abilities, with top-level performance on highly complex tasks. It can navigate open-ended prompts, and novel scenarios with remarkable fluency, including task automation, hypothesis generation, and analysis of charts, graphs, and forecasts. And Sonnet is first available on Amazon Bedrock today. Current evaluations from Anthropic suggest that the Claude 3 model family outperforms comparable models in math word problem solving (MATH) and multilingual math (MGSM) benchmarks, critical benchmarks used today for LLMs.

  1. Vision capabilities – Claude 3 models have been trained to understand structured and unstructured data across different formats, not just language, but also images, charts, diagrams, and more. This lets businesses build generative AI applications integrating diverse multimedia sources and solving truly cross-domain problems. For instance, pharmaceutical companies can query drug research papers alongside protein structure diagrams to accelerate discovery. Media organizations can generate image captions or video scripts automatically.
  2. Best-in-class benchmarks – Claude 3 exceeds existing models on standardized evaluations such as math problems, programming exercises, and scientific reasoning. Customers can optimize domain specific experimental procedures in manufacturing, or audit financial reports based on contextual data, in an automated way and with high accuracy using AI-driven responses.

    Specifically, Opus outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits high levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.

  3. Reduced hallucination – Businesses require predictive, controllable outputs from AI systems directing automated processes or customer interactions. Claude 3 models mitigate hallucination through constitutional AI techniques that provide transparency into the model’s reasoning, as well as improve accuracy. Claude 3 Opus shows an estimated 2x gain in accuracy over Claude 2.1 on difficult open-ended questions, reducing the likelihood of faulty responses. As enterprise customers rely on Claude across industries like healthcare, finance, and legal research, reducing hallucinations is essential for safety and performance. The Claude 3 family sets a new standard for reliable generative AI output.

Benefits of Anthropic Claude 3 FMs on Amazon Bedrock

Through Amazon Bedrock, customers will get easy access to build with Anthropic’s newest models. This includes not only natural language models but also their expanded range of multimodal AI models capable of advanced reasoning across text, images, charts, and more. Our collaboration has already helped customers accelerate generative AI adoption and delivered business value to them. Here are a few ways customers have been using Anthropic’s Claude models on Amazon Bedrock:

“We are developing a generative AI solution on AWS to help customers plan epic trips and create life-changing experiences with personalized travel itineraries. By building with Claude on Amazon Bedrock, we reduced itinerary generation costs by nearly 80% percent when we quickly created a scalable, secure AI platform that can organize our book content in minutes to deliver cohesive, highly accurate travel recommendations. Now we can repackage and personalize our content in various ways on our digital platforms, based on customer preference, all while highlighting trusted local voices–just like Lonely Planet has done for 50 years.”

— Chris Whyde, Senior VP of Engineering and Data Science, Lonely Planet

“We are working with AWS and Anthropic to host our custom, fine-tuned Anthropic Claude model on Amazon Bedrock to support our strategy of rapidly delivering generative AI solutions at scale and with cutting-edge encryption, data privacy, and safe AI technology embedded in everything we do. Our new Lexis+ AI platform technology features conversational search, insightful summarization, and intelligent legal drafting capabilities, which enable lawyers to increase their efficiency, effectiveness, and productivity.”

— Jeff Reihl, Executive VP and CTO, LexisNexis Legal & Professional

“At Broadridge, we have been working to automate the understanding of regulatory reporting requirements to create greater transparency and increase efficiency for our customers operating in domestic and global financial markets. With use of Claude on Amazon Bedrock, we’re thrilled to get even higher accuracy in our experiments with processing and summarizing capabilities. With Amazon Bedrock, we have choice in our use of LLMs, and we value the performance and integration capabilities it offers.”

— Saumin Patel, VP Engineering generative AI, Broadridge

The Claude 3 model family caters to various needs, allowing customers to choose the model best suited for their specific use case, which is key to developing a successful prototype and later production systems that can deliver real impact—whether for a new product, feature or process that boosts the bottom line. Keeping customer needs top of mind, Anthropic and AWS are delivering where it matters most to organizations of all sizes:

  1. Improved performance – Claude 3 models are significantly faster for real-time interactions thanks to optimizations across hardware and software.
  2. Increased accuracy and reliability – Through massive scaling as well as new self-supervision techniques, expected gains of 2x in accuracy for complex questions over long contexts mean AI that’s even more helpful, safe, and honest.
  3. Simpler and secure customization – Customization capabilities, like retrieval-augmented generation (RAG), simplify training models on proprietary data and building applications backed by diverse data sources, so customers get AI tuned for their unique needs. In addition, proprietary data is never exposed to the public internet, never leaves the AWS network, is securely transferred through VPC, and is encrypted in transit and at rest.

And AWS and Anthropic are continuously reaffirming our commitment to advancing generative AI in a responsible manner. By constantly improving model capabilities committing to frameworks like Constitutional AI or the White House voluntary commitments on AI, we can accelerate the safe, ethical development and deployment of this transformative technology.

The future of generative AI

Looking ahead, customers will build entirely new categories of generative AI-powered applications and experiences with the latest generation of models. We’ve only begun to tap generative AI’s potential to automate complex processes, augment human expertise, and reshape digital experiences. We expect to see unprecedented levels of innovation as customers choose Anthropic’s models augmented with multimodal skills leveraging all the tools they need to build and scale generative AI applications on Amazon Bedrock. Imagine sophisticated conversational assistants providing fast and highly-contextual responses, picture personalized recommendation engines that seamlessly blend in relevant images, diagrams and associated knowledge to intuitively guide decisions. Envision scientific research turbocharged by generative AI able to read experiments, synthesize hypotheses, and even propose novel areas for exploration. There are so many possibilities that will be realized by taking full advantage of all generative AI has to offer through Amazon Bedrock. Our collaboration ensures enterprises and innovators worldwide will have the tools to reach the next frontier of generative AI-powered innovation responsibly, and for the benefit of all.

Conclusion

It’s still early days for generative AI, but strong collaboration and a focus on innovation are ushering in a new era of generative AI on AWS. We can’t wait to see what customers build next.

Resources

Check out the following resources to learn more about this announcement:


About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Read More

Knowledge Bases for Amazon Bedrock now supports hybrid search

Knowledge Bases for Amazon Bedrock now supports hybrid search

At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG).

In a previous post, we described how Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you and shared details about some of the recent feature launches.

For RAG-based applications, the accuracy of the generated response from large language models (LLMs) is dependent on the context provided to the model. Context is retrieved from the vector database based on the user query. Semantic search is widely used because it is able to understand more human-like questions—a user’s query is not always directly related to the exact keywords in the content that answers it. Semantic search helps provide answers based on the meaning of the text. However, it has limitations in capturing all the relevant keywords. Its performance relies on the quality of the word embeddings used to represent meaning of the text. To overcome such limitations, combining semantic search with keyword search (hybrid) will give better results.

In this post, we discuss the new feature of hybrid search, which you can select as a query option alongside semantic search.

Hybrid search overview

Hybrid search takes advantage of the strengths of multiple search algorithms, integrating their unique capabilities to enhance the relevance of returned search results. For RAG-based applications, semantic search capabilities are commonly combined with traditional keyword-based search to improve the relevance of search results. It enables searching over both the content of documents and their underlying meaning. For example, consider the following query:

What is the cost of the book "<book_name>" on <website_name>?

In this query for a book name and website name, a keyword search will give better results, because we want the cost of the specific book. However, the term “cost” might have synonyms such as “price,” so it will be better to use semantic search, which understands the meaning of the text. Hybrid search brings the best of both approaches: precision of semantic search and coverage of keywords. It works great for RAG-based applications where the retriever has to handle a wide variety of natural language queries. The keywords help cover specific entities in the query such as product name, color, and price, while semantics better understands the meaning and intent within the query. For example, if you have want to build a chatbot for an ecommerce website to handle customer queries such as the return policy or details of the product, using hybrid search will be most suitable.

Use cases for hybrid search

The following are some common use cases for hybrid search:

  • Open domain question answering – This involves answering questions on a wide variety of topics. This requires searching over large collections of documents with diverse content, such as website data, which can include various topics such as sustainability, leadership, financial results, and more. Semantic search alone can’t generalize well for this task, because it lacks the capacity for lexical matching of unseen entities, which is important for handling out-of-domain examples. Therefore, combining keyword-based search with semantic search can help narrow down the scope and provide better results for open domain question answering.
  • Contextual-based chatbots – Conversations can rapidly change direction and cover unpredictable topics. Hybrid search can better handle such open-ended dialogs.
  • Personalized search – Web-scale search over heterogeneous content benefits from a hybrid approach. Semantic search handles popular head queries, while keywords cover rare long-tail queries.

Although hybrid search offers wider coverage by combining two approaches, semantic search has precision advantages when the domain is narrow and semantics are well-defined, or when there is little room for misinterpretation, like factoid question answering systems.

Benefits of hybrid search

Both keyword and semantic search will return a separate set of results along with their relevancy scores, which are then combined to return the most relevant results. Knowledge Bases for Amazon Bedrock currently supports four vector stores: Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL-Compatible Edition, Pinecone, and Redis Enterprise Cloud. As of this writing, the hybrid search feature is available for OpenSearch Serverless, with support for other vector stores coming soon.

The following are some of the benefits of using hybrid search:

  • Improved accuracy – The accuracy of the generated response from the FM is directly dependent on the relevancy of retrieved results. Based on your data, it can be challenging to improve the accuracy of your application only using semantic search. The key benefit of using hybrid search is to get improved quality of retrieved results, which in turn helps the FM generate more accurate answers.
  • Expanded search capabilities – Keyword search casts a wider net and finds documents that may be relevant but might not contain semantic structure throughout the document. It allows you to search on keywords as well as the semantic meaning of the text, thereby expanding the search capabilities.

In the following sections, we demonstrate how to use hybrid search with Knowledge Bases for Amazon Bedrock.

Use hybrid search and semantic search options via SDK

When you call the Retrieve API, Knowledge Bases for Amazon Bedrock selects the right search strategy for you to give you most relevant results. You have the option to override it to use either hybrid or semantic search in the API.

Retrieve API

The Retrieve API is designed to fetch relevant search results by providing the user query, knowledge base ID, and number of results that you want the API to return. This API converts user queries into embeddings, searches the knowledge base using either hybrid search or semantic (vector) search, and returns the relevant results, giving you more control to build custom workflows on top of the search results. For example, you can add postprocessing logic to the retrieved results or add your own prompt and connect with any FM provided by Amazon Bedrock for generating answers.

To show you an example of switching between hybrid and semantic (vector) search options, we have created a knowledge base using the Amazon 10K document for 2023. For more details on creating a knowledge base, refer to Build a contextual chatbot application using Knowledge Bases for Amazon Bedrock.

To demonstrate the value of hybrid search, we use the following query:

As of December 31st 2023, what is the leased square footage for physical stores in North America?

The answer for the preceding query involves a few keywords, such as the date, physical stores, and North America. The correct response is 22,871 thousand square feet. Let’s observe the difference in the search results for both hybrid and semantic search.

The following code shows how to use hybrid or semantic (vector) search using the Retrieve API with Boto3:

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

def retrieve(query, kbId, numberOfResults=5):
    return bedrock_agent_runtime.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults,
                'overrideSearchType': "HYBRID/SEMANTIC", # optional
            }
        }
    )
response = retrieve("As of December 31st 2023, what is the leased square footage for physical stores in North America?", "<knowledge base id>")["retrievalResults"]

The overrideSearchType option in retrievalConfiguration offers the choice to use either HYBRID or SEMANTIC. By default, it will select the right strategy for you to give you most relevant results, and if you want to override the default option to use either hybrid or semantic search, you can set the value to HYBRID/SEMANTIC. The output of the Retrieve API includes the retrieved text chunks, the location type and URI of the source data, and the relevancy scores of the retrievals. The scores help determine which chunks best match the response of the query.

The following are the results for the preceding query using hybrid search (with some of the output redacted for brevity):

[
  {
    "content": {
      "text": "... Description of Use Leased Square Footage (1).... Physical stores (2) 22,871  ..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "Property and equipment, net by segment is as follows (in millions): December 31, 2021 2022 2023 North America $ 83,640 $ 90,076 $ 93,632 International 21,718 23,347 24,357 AWS 43,245 60,324 72,701 Corporate 1.."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "..amortization of property and equipment acquired under finance leases of $9.9 billion, $6.1 billion, and $5.9 billion for 2021, 2022, and 2023. 54 Table of Contents Note 4 — LEASES We have entered into non-cancellable operating and finance leases for fulfillment network, data center, office, and physical store facilities as well as server and networking equipment, aircraft, and vehicles. Gross assets acquired under finance leases, ..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61908984
  }
]

The following are the results for semantic search (with some of the output redacted for brevity):

[
  {
    "content": {
      "text": "Property and equipment, net by segment is as follows (in millions):    December 31,    2021 2022 2023   North America $ 83,640 $ 90,076 $ 93,632  International 21,718 23,347 24,357  AWS 43,245 60,324 72,701.."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "Depreciation and amortization expense on property and equipment was $22.9 billion, $24.9 billion, and $30.2 billion which includes amortization of property and equipment acquired under finance leases of $9.9 billion, $6.1 billion, and $5.9 billion for 2021, 2022, and 2023.   54        Table of Contents   Note 4 — LEASES We have entered into non-cancellable operating and finance leases for fulfillment network, data center, office, and physical store facilities as well a..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61908984
  },
  {
    "content": {
      "text": "Incentives that we receive from property and equipment   vendors are recorded as a reduction to our costs. Property includes buildings and land that we own, along with property we have acquired under build-to-suit lease arrangements when we have control over the building during the construction period and finance lease arrangements..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61353767
  }
]

As you can see in the results, hybrid search was able to retrieve the search result with the leased square footage for physical stores in North America as mentioned in the user query. The main reason was that hybrid search was able to combine the results from keywords such as date, physical stores, and North America in the query, whereas semantic search did not. Therefore, when the search results are augmented with the user query and the prompt, the FM won’t be able to provide the correct response in case of semantic search.

Now let’s look at the RetrieveAndGenerate API with hybrid search to understand the final response generated by the FM.

RetrieveAndGenerate API

The RetrieveAndGenerate API queries a knowledge base and generates a response based on the retrieved results. You specify the knowledge base ID as well as the FM to generate a response from the results. Amazon Bedrock converts the queries into embeddings, queries the knowledge base based on the search type, and then augments the FM prompt with the search results as context information and returns the FM-generated response.

Let’s use the query “As of December 31st 2023, what is the leased square footage for physical stores in North America?” and ask the RetrieveAndGenerate API to generate the response using our query:

def retrieveAndGenerate(input, kbId):
    return bedrock_agent_runtime.retrieve_and_generate(
        input={
            'text': input
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kbId,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-instant-v1'
                'retrievalConfiguration': {
                'overrideSearchType': 'HYBRID/SEMANTIC',
                }
                }
            }
        )
response = retrieveAndGenerate("As of December 31st 2023, what is the leased square footage for physical stores in North America?", "<knowledge base id>")["output"]["text"]

The following are the results using hybrid search:

22,871 thousand leased square feet

The following are the results using semantic search:

The search results do not contain any information about the leased square footage for physical stores in North America for 2023.

The actual answer for the query is 22,871 thousand leased square feet, which is generated by the hybrid search. The retrieved search results for hybrid search included the information about the leased square footage for physical stores in North America, whereas semantic search wasn’t able to fetch the right information from the vector store due to embeddings translation. Therefore, the FM couldn’t provide the correct response because it didn’t have the correct and most relevant search results.

However, for more generic questions that don’t involve entities such as physical stores or North America, both hybrid and semantic search give similar results.

The following are sample responses from a few queries demonstrating cases when both hybrid and semantic search yield similar results.

Question Semantic Search: RAG API Hybrid Search: RAG API
How does Amazon serve the developers and enterprises? We serve developers and enterprises of all sizes, including start-ups, government agencies, and academic institutions, through AWS, which offers a broad set of on-demand technology services, including compute, storage, database, analytics, and machine learning, and other services We serve developers and enterprises of all sizes, including start-ups, government agencies, and academic institutions, through AWS, which offers a broad set of on-demand technology services, including compute, storage, database, analytics, and machine learning, and other services.
Who are the Executive Officers and Directors for Amazon as of January 24, 2024? The executive officers of Amazon as of 2024 include Andrew R. Jassy as President and Chief Executive Officer, Douglas J. Herrington as CEO Worldwide Amazon Stores, Brian T. Olsavsky as Senior Vice President and Chief Financial Officer, Shelley L. Reynolds as Vice President, Worldwide Controller, and Principal Accounting Officer, Adam N. Selipsky as CEO Amazon Web Services, and David A. Zapolsky as Senior Vice President, Global Public Policy and General Counsel. As of 2024, Jeffrey P. Bezos serves as Executive Chair of Amazon.com. Andrew R. Jassy serves as President and Chief Executive Officer. Other executive officers include Douglas J. Herrington as CEO Worldwide Amazon Stores, Brian T. Olsavsky as Senior Vice President and Chief Financial Officer, Shelley L. Reynolds as Vice President, Worldwide Controller, and Principal Accounting Officer, and Adam N. Selipsky as CEO Amazon Web Services. David A. Zapolsky serves as Senior Vice President, Global Public Policy and General Counsel

Use hybrid search and semantic search options via the Amazon Bedrock console

To use hybrid and semantic search options on the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
  2. Choose the knowledge base you created.
  3. Choose Test knowledge base.
  4. Choose the configurations icon.
  5. For Search type¸ select Hybrid search (semantic & text).

By default, you can choose an FM to get a generated response for your query. If you want to see only the retrieved results, you can toggle Generate response off to get only retrieved results.

Conclusion

In this post, we covered the new query feature in Knowledge Bases for Amazon Bedrock, which enables hybrid search. We learned how to configure the hybrid search option in the SDK and the Amazon Bedrock console. This helps overcome some of the limitations of relying solely on semantic search, especially for searching over large collections of documents with diverse content. The use of hybrid search depends on the document type and the use case that you are trying to implement.

For additional resources, refer to the following:

References

Improving Retrieval Performance in RAG Pipelines with Hybrid Search


About the Authors

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Read More

Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

The rise of artificial intelligence (AI) has created opportunities to improve the customer experience in the contact center space. Machine learning (ML) technologies continually improve and power the contact center customer experience by providing solutions for capabilities like self-service bots, live call analytics, and post-call analytics. Self-service bots integrated with your call center can help you achieve decreased wait times, intelligent routing, decreased time to resolution through self-service functions or data collection, and improved net promoter scores (NPS). Some examples include a customer calling to check on the status of an order and receiving an update from a bot, or a customer needing to submit a renewal for a license and the chatbot collecting the necessary information, which it hands over to an agent for processing.

With Amazon Lex bots, you can use conversational AI capabilities to enable these capabilities within your call center. Amazon Lex uses automatic speech recognition (ASR) and natural language understanding (NLU) to understand the customer’s needs and assist them on their journey.

Genesys Cloud (an omni-channel orchestration and customer relationship platform) provides a contact center platform in a public cloud model that enables quick and simple integration of AWS Contact Center Intelligence (AWS CCI) to transform the modern contact center from a cost center into a profit center. As part of AWS CCI, Genesys Cloud integrates with Amazon Lex, which enables self-service, intelligent routing, and data collection capabilities.

When exploring AWS CCI capabilities with Amazon Lex and Genesys Cloud, you may be unsure of where to start on your bot design journey. To assist those who may be starting with a blank canvas, Amazon Lex provides the Amazon Lex automated chatbot designer. The automated chatbot designer uses ML to provide an initial bot design that you can then refine and launch conversational experiences faster based on your current call transcripts. With the automated chatbot designer, Amazon Lex customers and partners have a straightforward and intuitive way of designing chatbots and can reduce bot design time from weeks to hours. However, the automated chatbot designer requires transcripts to be in a certain format that is not aligned to Genesys Cloud transcript exports.

In this post, we show how you can implement an architecture using Amazon EventBridge, Amazon Simple Storage Service (Amazon S3), and AWS Lambda to automatically collect, transform, and load your Genesys call transcripts in the required format for the Amazon Lex automated chatbot designer. You can then run the automated chatbot designer on your transcripts, be given recommendations for bot design, and streamline your bot design journey.

Solution overview

The following diagram illustrates the solution architecture.

The solution workflow consists of the following steps:

  1. Genesys Cloud sends iterative transcripts events to your EventBridge event bus.
  2. Lambda receives the iterative transcripts from EventBridge, determines when a conversation is complete, and invokes the Transcript API within Genesys Cloud and drops the full transcript in an S3 bucket.
  3. When a new full transcript is uploaded to Amazon S3, Lambda converts the Genesys Cloud formatted transcript into the required format for the Amazon Lex automated chatbot designer and copies it to an S3 bucket.
  4. The Amazon Lex automated chatbot designer uses ML to build an initial bot design based on the provided Genesys Cloud transcripts.

Prerequisites

Before you deploy the solution, you must complete the following prerequisites:

  1. Set up your Genesys Cloud CX account and make sure that you are able to log in. For more information on setting up your account, refer to the Genesys documentation.
  2. Make sure that the right permissions are set for enabling and publishing transcripts from Genesys. For more information on setting up the required permissions, refer to Roles and permissions overview.
  3. If PCI and PII encryption is required for transcription, make sure it is set up in Genesys. For more information on setting up the required permissions, refer to Are interaction transcripts encrypted when stored in the cloud.
  4. Set up an AWS account with the appropriate permissions.

Deploy the Genesys EventBridge integration

To enable the EventBridge integration with Genesys Cloud, complete the following steps:

  1. Log in to the Genesys Cloud environment.
  2. Choose Admin, Integrations, Add Integrations, and Amazon EventBridge Source.
  3. On the Configuration tab, provide the following information:
    1. For AWS Account ID, enter your AWS account ID.
    2. For AWS Account Region, enter the Region where you want EventBridge to be set up.
    3. For Event Source Suffix, enter a suffix (for example, genesys-eb-poc-demo).
  4. Save your configuration.
  5. On the EventBridge console, choose Integration in the navigation pane, then choose Partner event sources.

There should be an event source listed with a name like aws.partner/genesys.com/…/genesys-eb-poc-demo.

  1. Select the partner event source and choose Associate with event bus.

The status changes from Pending to Active. This sets up the EventBridge configuration for Genesys.

Next, you set up OAuth2 credentials in Genesys Cloud for authorizing the API call to get the final transcript.

  1. Navigate to the Genesys Cloud instance.
  2. Choose Admin, Integrations, and OAuth.
  3. Choose Add Client.
  4. On the Client Details tab, provide the following information:
    1. For App Name, enter a name (for example, TranscriptInvoke-creds).
    2. For Grant Types, select Client Credentials.

Make sure you’re using the right role that has access to invoke the Transcribe APIs.

  1. Choose Save.

This generates new values for Client ID and Client Secret. Copy these values to use in the next section, where you configure the template for the solution.

Deploy the solution

After you have set up the Genesys EventBridge integration, you can deploy an AWS Serverless Application Model (AWS SAM) template, which deploys the remainder of the architecture. To deploy the solution in your account, complete the following steps:

  1. Install AWS SAM if not installed already. For instructions, refer to Installing the AWS SAM CLI.
  2. Download the GitHub repo and unzip to your directory.
  3. Navigate to the genesys-to-lex-automated-chatbot-designer folder and run the following commands:
    sam build --use-container
    sam deploy –guided

The first command builds the source of your application. The second command packages and deploys your application to AWS, with a series of prompts:

  • Stack Name – Enter the name of the stack to deploy to AWS CloudFormation. This should be unique to your account and Region; a good starting point is something matching your project name.
  • AWS Region – Enter the Region you want to deploy your app to. Make sure it is deployed in the same Region as the EventBridge event bus.
  • Parameter GenesysBusname – Enter the bus name created when you configured the Genesys integration. The pattern of the bus name should look like aws.partner/genesys.com/*.
  • Parameter ClientId – Enter the client ID you copied earlier.
  • Parameter ClientSecret – Enter the client secret you copied earlier.
  • Parameter FileNamePrefix – Change the default file name prefix for the target transcript file in the raw S3 bucket or keep the default.
  • Parameter GenCloudEnv – Enter is the cloud environment for the specific Genesys organization. Genesys is available in more than 15 Regions worldwide as of this writing, so this value is mandatory and should point to the environment where your organization is created in Genesys (for example, usw2.pure.cloud).
  • Confirm changes before deploy – If set to yes, any change sets will be shown to you before deployment for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes.
  • Allow SAM CLI IAM role creation – Many AWS SAM templates, including this example, create AWS Identity and Access Management (IAM) roles required for the Lambda functions included to access AWS services. By default, these are scoped down to the minimum required permissions. To deploy a CloudFormation stack that creates or modifies IAM roles, you must provide the CAPABILITY_IAM value for capabilities. If permission isn’t provided through this prompt, to deploy this example, you must explicitly pass --capabilities CAPABILITY_IAM to the sam deploy command.
  • Save arguments to samconfig.toml – If set to yes, your choices will be saved to a configuration file inside the project, so that in the future you can rerun sam deploy without parameters to deploy changes to your application.

After you deploy your AWS SAM application in your account, you can test that Genesys transcripts are being sent to your account and being transformed into the required format for the Amazon Lex automated chatbot designer.

Make a test call to validate the solution

After you have set up the Genesys EventBridge integration and deployed the preceding AWS SAM template, you can make test calls and validate that files are ending up in the S3 bucket for transformed files. At a high level, you need to perform the following steps:

  1. Make a test call to your Genesys instance to create a transcript.
  2. Wait a few minutes and check the TransformedTranscript bucket for the output.

Run the automated chatbot designer

After you have a few days’ worth of transcripts saved in Amazon S3, you can run the automated chatbot designer through the Amazon Lex console using the steps in this section. For more information about the minimum and maximum amount of turns for the service, refer to Prepare transcripts.

  1. On the Amazon Lex V2 console, choose Bots in the navigation pane.
  2. Choose Create bot.
  3. Select Start with transcripts as the creation method.
  4. Give the bot a name (for this example, InsuranceBot) and provide an optional description.
  5. Select Create a role with basic Amazon Lex permissions and use this as your runtime role.
  6. After you fill out the other fields, choose Next to proceed to the language configuration.
  7. Choose the language and voice for your interaction.
  8. Specify the Amazon S3 location of the transcripts that the solution has converted for you.
  9. Add additional local paths if you have a specific a folder structure within your S3 bucket.
  10. Apply a filter (date range) for your input transcripts.
  11. Choose Done.

You can use the status bar on the Amazon S3 console to track the analysis. Within a few hours, the automated chatbot designer surfaces a chatbot design that includes user intents, sample phrases associated with those intents, and a list of all the information required to fulfill them. The amount of time it takes to complete training depends on several factors, including the volume of transcripts and the complexity of the conversations. Typically, 600 lines of transcript are analyzed every minute.

  1. Choose Review to view the intents and slot types discovered by the automated chatbot designer.

The Intents tab lists all the intents along with sample phrases and slots, and the Slot types tab provides a list of all the slot types along with slot type values.

  1. Choose any of the intents to review the sample utterances and slots. For example, in the following screenshot, we choose ChangePassword to view the utterances.
  2. Choose the Associated transcripts tab to review the conversations used to identify the intents.
  3. After you review the results, select the intents and slot types relevant to your use case and choose Add.

This adds the selected intents and slot types to the bot. You can now iterate on this design by making changes such as adding prompts, merging intents or slot types, and renaming slots.

You have now used the Amazon Lex automated chatbot designer to identify common intents, utterances mapped to those intents, and information that the chatbot needs to collect to fulfill certain business functions.

Clean up

When you’re finished, clean up your resources by using the following command within the AWS SAM CLI:

sam delete

Conclusion

This post showed you how to use the Genesys Cloud CX and EventBridge integration to send your Genesys CX transcripts to your AWS account, transform them, and use them with the Amazon Lex automated chatbot designer to create sample bots, intents, utterances, and slots. This architecture can help first-time AWS CCI users and current AWS CCI users onboard more chatbots using the Genesys CX and Amazon Lex integration, or in continuous improvement opportunities where you may want to compare your current intent design to that outputted by the Amazon Lex automated chatbot designer. For more information about other AWS CCI capabilities, see Contact Center Intelligence.


About the Authors

Joe Morotti is a Solutions Architect at Amazon Web Services (AWS), helping Enterprise customers across the Midwest US. He has held a wide range of technical roles and enjoy showing customer’s art of the possible. In his free time, he enjoys spending quality time with his family exploring new places and over analyzing his sports team’s performance.

Anand Bose is a Senior Solutions Architect at Amazon Web Services, supporting ISV partners who build business applications on AWS. He is passionate about creating differentiated solutions that unlock customers for cloud adoption. Anand lives in Dallas, Texas and enjoys travelling.

Teri Ferris is responsible for architecting great customer experiences alongside business partners, leveraging Genesys technology solutions that enable Experience Orchestration for contact centers. In her role she advises on solution architecture, integrations, IVR, routing, reporting analytics, self-service, AI, outbound, mobile capabilities, omnichannel, social channels, digital, unified communications (UCaaS), and analytics and how they can streamline the customer experience. Before Genesys, she held senior leadership roles at Human Resources, Payroll, and Learning Management companies, including overseeing the Contact Center.

Read More

Use RAG for drug discovery with Knowledge Bases for Amazon Bedrock

Use RAG for drug discovery with Knowledge Bases for Amazon Bedrock

Amazon Bedrock provides a broad range of models from Amazon and third-party providers, including Anthropic, AI21, Meta, Cohere, and Stability AI, and covers a wide range of use cases, including text and image generation, embedding, chat, high-level agents with reasoning and orchestration, and more. Knowledge Bases for Amazon Bedrock allows you to build performant and customized Retrieval Augmented Generation (RAG) applications on top of AWS and third-party vector stores using both AWS and third-party models. Knowledge Bases for Amazon Bedrock automates synchronization of your data with your vector store, including diffing the data when it’s updated, document loading, and chunking, as well as semantic embedding. It allows you to seamlessly customize your RAG prompts and retrieval strategies—we provide the source attribution, and we handle memory management automatically. Knowledge Bases is completely serverless, so you don’t need to manage any infrastructure, and when using Knowledge Bases, you’re only charged for the models, vector databases and storage you use.

RAG is a popular technique that combines the use of private data with large language models (LLMs). RAG starts with an initial step to retrieve relevant documents from a data store (most commonly a vector index) based on the user’s query. It then employs a language model to generate a response by considering both the retrieved documents and the original query.

In this post, we demonstrate how to build a RAG workflow using Knowledge Bases for Amazon Bedrock for a drug discovery use case.

Overview of Knowledge Bases for Amazon Bedrock

Knowledge Bases for Amazon Bedrock supports a broad range of common file types, including .txt, .docx, .pdf, .csv, and more. To enable effective retrieval from private data, a common practice is to first split these documents into manageable chunks. Knowledge Bases has implemented a default chunking strategy that works well in most cases to allow you to get started faster. If you want more control, Knowledge Bases lets you control the chunking strategy through a set of preconfigured options. You can control the maximum token size and the amount of overlap to be created across chunks to provide coherent context to the embedding. Knowledge Bases for Amazon Bedrock manages the process of synchronizing data from your Amazon Simple Storage Service (Amazon S3) bucket, splits it into smaller chunks, generates vector embeddings, and stores the embeddings in a vector index. This process comes with intelligent diffing, throughput, and failure management.

At runtime, an embedding model is used to convert the user’s query to a vector. The vector index is then queried to find documents similar to the user’s query by comparing document vectors to the user query vector. In the final step, semantically similar documents retrieved from the vector index are added as context for the original user query. When generating a response for the user, the semantically similar documents are prompted in the text model, together with source attribution for traceability.

Knowledge Bases for Amazon Bedrock supports multiple vector databases, including Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, and Redis Enterprise Cloud. The Retrieve and RetrieveAndGenerate APIs allow your applications to directly query the index using a unified and standard syntax without having to learn separate APIs for each different vector database, reducing the need to write custom index queries against your vector store. The Retrieve API takes the incoming query, converts it into an embedding vector, and queries the backend store using the algorithms configured at the vector database level; the RetrieveAndGenerate API uses a user-configured LLM provided by Amazon Bedrock and generates the final answer in natural language. The native traceability support informs the requesting application about the sources used to answer a question. For enterprise implementations, Knowledge Bases supports AWS Key Management Service (AWS KMS) encryption, AWS CloudTrail integration, and more.

In the following sections, we demonstrate how to build a RAG workflow using Knowledge Bases for Amazon Bedrock, backed by the OpenSearch Serverless vector engine, to analyze an unstructured clinical trial dataset for a drug discovery use case. This data is information rich but can be vastly heterogenous. Proper handling of specialized terminology and concepts in different formats is essential to detect insights and ensure analytical integrity. With Knowledge Bases for Amazon Bedrock, you can access detailed information through simple, natural queries.

Build a knowledge base for Amazon Bedrock

In this section, we demo the process of creating a knowledge base for Amazon Bedrock via the console. Complete the following steps:

  1. On the Amazon Bedrock console, under Orchestration in the navigation pane, choose Knowledge base.
  2. Choose Create knowledge base.

  1. In the Knowledge base details section, enter a name and optional description.
  2. In the IAM permissions section, select Create and use a new service role.
  3. For Service name role, enter a name for your role, which must start with AmazonBedrockExecutionRoleForKnowledgeBase_.
  4. Choose Next.

  1. In the Data source section, enter a name for your data source and the S3 URI where the dataset sits. Knowledge Bases supports the following file formats:
    • Plain text (.txt)
    • Markdown (.md)
    • HyperText Markup Language (.html)
    • Microsoft Word document (.doc/.docx)
    • Comma-separated values (.csv)
    • Microsoft Excel spreadsheet (.xls/.xlsx)
    • Portable Document Format (.pdf)
  1. Under Additional settings¸ choose your preferred chunking strategy (for this post, we choose Fixed size chunking) and specify the chunk size and overlay in percentage. Alternatively, you can use the default settings.
  2. Choose Next.

  1. In the Embeddings model section, choose the Titan Embeddings model from Amazon Bedrock.
  2. In the Vector database section, select Quick create a new vector store, which manages the process of setting up a vector store.
  3. Choose Next.

  1. Review the settings and choose Create knowledge base.

  1. Wait for the knowledge base creation to complete and confirm its status is Ready.
  2. In the Data source section, or on the banner at the top of the page or the popup in the test window, choose Sync to trigger the process of loading data from the S3 bucket, splitting it into chunks of the size you specified, generating vector embeddings using the selected text embedding model, and storing them in the vector store managed by Knowledge Bases for Amazon Bedrock.

The sync function supports ingesting, updating, and deleting the documents from the vector index based on changes to documents in Amazon S3. You can also use the StartIngestionJob API to trigger the sync via the AWS SDK.

When the sync is complete, the Sync history shows status Completed.

Query the knowledge base

In this section, we demonstrate how to access detailed information in the knowledge base through straightforward and natural queries. We use an unstructured synthetic dataset consisting of PDF files, the page number of each ranging from 10–100 pages, simulating a clinical trial plan of a proposed new medicine including statistical analysis methods and participant consent forms. We use the Knowledge Bases for Amazon Bedrock retrieve_and_generate and retrieve APIs with Amazon Bedrock LangChain integration.

Before you can write scripts that use the Amazon Bedrock API, you’ll need to install the appropriate version of the AWS SDK in your environment. For Python scripts, this will be the AWS SDK for Python (Boto3):

pip install langchain
pip install boto3

Additionally, enable access to the Amazon Titan Embeddings model and Anthropic Claude v2 or v1. For more information, refer to Model access.

Generate questions using Amazon Bedrock

We can use Anthropic Claude 2.1 for Amazon Bedrock to propose a list of questions to ask on the clinical trial dataset:

import boto3
from langchain.llms.bedrock import Bedrock

bedrock_client = boto3.client("bedrock-runtime")

# Start with the query
prompt = "For medical research trial consent forms to sign, what are the top 5 questions can be asked?"

claude_llm = Bedrock(
    model_id="anthropic.claude-v2:1",
    model_kwargs={"temperature": 0, "top_k": 10, "max_tokens_to_sample": 3000},
    client=bedrock_client,
)

# Provide the prompt to the LLM to generate an answer to the query without any additional context provided
response = claude_llm(prompt)
questions = [
    item.split(".")[1].strip() for item in response.strip().split("nn")[1:-1]
]
questions
>>> answer:
'What is the purpose of the study? Make sure you understand the goals of the research and what the study procedures will entail',
'What are the risks and potential benefits? The form should explain all foreseeable risks, side effects, or discomforts you might experience from participating',
'What will participation involve? Get details on what tests, medications, lifestyle changes, or procedures you will go through, how much time it will take, and how long the study will last',
'Are there any costs or payments? Ask if you will be responsible for any costs related to the study or get paid for participating',
'How will my privacy be protected? The form should explain how your personal health information will be kept confidential before, during, and after the trial'

Use the Amazon Bedrock RetrieveAndGenerate API

For a fully managed RAG experience, you can use the native Knowledge Bases for Amazon Bedrock RetrieveAndGenerate API to obtain the answers directly:

bedrock_agent_client = boto3.client("bedrock-agent-runtime")

kb_id = "<YOUR_KNOWLEDGE_BASE_ID>"

def retrieveAndGenerate(
    input: str,
    kbId: str,
    region: str = "us-east-1",
    sessionId: str = None,
    model_id: str = "anthropic.claude-v2:1",
):
    model_arn = f"arn:aws:bedrock:{region}::foundation-model/{model_id}"

    if sessionId:
        return bedrock_agent_client.retrieve_and_generate(
            input={"text": input},
            retrieveAndGenerateConfiguration={
                "type": "KNOWLEDGE_BASE",
                "knowledgeBaseConfiguration": {
                    "knowledgeBaseId": kbId,
                    "modelArn": model_arn,
                },
            },
            sessionId=sessionId,
        )

    else:
        return bedrock_agent_client.retrieve_and_generate(
            input={"text": input},
            retrieveAndGenerateConfiguration={
                "type": "KNOWLEDGE_BASE",
                "knowledgeBaseConfiguration": {
                    "knowledgeBaseId": kbId,
                    "modelArn": model_arn,
                },
            },
        )

response = retrieveAndGenerate(
    "What are the potential risks and benefits of participating?", kb_id
)

generated_text = response["output"]["text"]
>>> "The potential risks include side effects from the study medication lithium such as nausea, loose stools, thirst, urination changes, shakiness, headaches, sweating, fatigue, decreased concentration, and skin rash. There is also a risk of lithium interaction with other medications. For women, there is a risk of birth defects if lithium is taken during pregnancy. There are no guaranteed benefits, but possible benefits include new information that could help the participant from the interviews and tests conducted during the study."

The cited information source can be obtained via the following code (with some of the output redacted for brevity):

response["citations"]

>>> [
    {
        "generatedResponsePart": {
            "textResponsePart": {
                "text": " The potential risks include side effects from the study...",
                "span": {"start": 0, "end": 361},
            }
        },
        "retrievedReferences": [
            {
                "content": {
                    "text": "590 ICF#2 Page 7 of 19 The primary risks and discomforts of participation…"
                },
                "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
            },
            {
                "content": {
                    "text": "N/A CSP 590 ICF#2 Page 10 of 19 Risks associated with suddenly stopping study medications..."
                },
                "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
            },
        ],
    },
    {
        "generatedResponsePart": {
            "textResponsePart": {
                "text": " There are no guaranteed benefits, but possible benefits include...",
                "span": {"start": 363, "end": 531},
            }
        },
        "retrievedReferences": [
            {
                "content": {
                    "text": "research, not usual clinical care. After these are done we ask..."
                },
                "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
            }
        ],
    },
]

By passing the session ID of the RetrieveAndGenerate API, you can preserve the conversation context and ask follow-up questions. For example, without the context, if you ask for more details from the previous answer, it may not be able to answer correctly:

retrieveAndGenerate("elaborate more on the first side effect", kb_id, sessionId=None)["output"]["text"]
>>> "The search results do not provide additional details about the mild nausea side effect that would allow me to elaborate further on it."

But by passing the session ID, the RAG pipeline is able to identify the corresponding context and return relevant answers:

retrieveAndGenerate("elaborate more on the first side effect", kb_id, sessionId=response["sessionId"])["output"]["text"]
>>> "The search results provide details that nausea from taking lithium is usually mild and goes away after days or weeks for most people. Specifically, up to 75% of people may experience mild nausea when first starting lithium, but this goes away in 90-99% of people who continue taking it."

The following table shows the retrieved answers to all the corresponding questions.

Question Answer
What is the purpose of the study? Make sure you understand the goals of the research and what the study procedures will entail. The purpose of the study is to test whether lithium is effective at preventing repeated suicidal self-directed violence in patients with depression or bipolar disorder.
What are the risks and potential benefits? The form should explain all foreseeable risks, side effects, or discomforts you might experience from participating. The possible risks or discomforts include: the interview questions causing discomfort, side effects from the lithium medication such as nausea, loose stools, thirst, urination changes, shakiness, headaches, sweating, fatigue, decreased concentration, skin rash, thyroid changes, worsening acne/psoriasis, lithium toxicity, and risks if the medication is suddenly stopped. The potential benefits are that the tests may lead to new information to help the participant, and lithium may help prevent repeated suicidal self-directed violence for those with depression or bipolar disorder.
What will participation involve? Get details on what tests, medications, lifestyle changes, or procedures you will go through, how much time it will take, and how long the study will last. Participation will involve completing an interview and questionnaires covering thinking, behaviors, mental health treatment, medications, alcohol and drug use, home and social supports, and understanding of the research study. This takes about two hours and can be done in multiple sessions, in person and by phone. If eligible for the full study, there will be about 20 study visits over one year. This will involve taking study medication, having vital signs checked, completing questionnaires, reviewing side effects, and continuing normal medical and mental health care.
Are there any costs or payments? Ask if you will be responsible for any costs related to the study or get paid for participating. Yes, there are costs and payments discussed in the search results. You will not be charged for any treatments or procedures that are part of the study. However, you will still have to pay any usual VA co-payments for care and medications not related to the study. You will not be paid for participation, but the study will reimburse expenses related to participation like transportation, parking, etc. Reimbursement amounts and process are provided.
How will my privacy be protected? The form should explain how your personal health information will be kept confidential before, during, and after the trial. Your privacy will be protected by conducting interviews in private, keeping written notes in locked files and offices, storing electronic information in encrypted and password protected files, and obtaining a Confidentiality Certificate from the Department of Health and Human Services to prevent disclosing information that identifies you. Information that identifies you may be shared with doctors responsible for your care or for audits and evaluations by government agencies, but talks and papers about the study will not identify you.

Query using the Amazon Bedrock Retrieve API

To customize your RAG workflow, you can use the Retrieve API to fetch the relevant chunks based on your query and pass it to any LLM provided by Amazon Bedrock. To use the Retrieve API, define it as follows:

def retrieve(query: str, kbId: str, numberOfResults: int = 5):
    return bedrock_agent_client.retrieve(
        retrievalQuery={"text": query},
        knowledgeBaseId=kbId,
        retrievalConfiguration={
            "vectorSearchConfiguration": {"numberOfResults": numberOfResults}
        },
    )

Retrieve the corresponding context (with some of the output redacted for brevity):

query = "What is the purpose of the medical research study?"
response = retrieve(query, kb_id, 3)
retrievalResults = response["retrievalResults"]
>>> [
    {
        "content": {"text": "You will not be charged for any procedures that..."},
        "location": {"type": "S3", "s3Location": {"uri": "s3://XXXXX/XXXX.pdf"}},
        "score": 0.6552521,
    },
    {
        "content": {"text": "and possible benefits of the study. You have been..."},
        "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
        "score": 0.6581577,
    },
    ...,
]

Extract the context for the prompt template:

def get_contexts(retrievalResults):
    contexts = []
    for retrievedResult in retrievalResults:
        contexts.append(retrievedResult["content"]["text"])
    return " ".join(contexts)

contexts = get_contexts(retrievalResults)

Import the Python modules and set up the in-context question answering prompt template, then generate the final answer:

from langchain.prompts import PromptTemplate

PROMPT_TEMPLATE = """
Human: You are an AI system working on medical trial research, and provides answers to questions 
by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

<context>
{context_str}
</context>

<question>
{query_str}
</question>

The response should be specific and use statistics or numbers when possible.

Assistant:"""

claude_prompt = PromptTemplate(
    template=PROMPT_TEMPLATE, input_variables=["context_str", "query_str"]
)

prompt = claude_prompt.format(context_str=contexts, query_str=query)
response = claude_llm(prompt)
>>> "Based on the context provided, the purpose of this medical research study is to evaluate the efficacy of lithium compared to a placebo in preventing suicide over a 1 year period. Specifically, participants will be randomly assigned to receive either lithium or a placebo pill for 1 year, with their doctors and the participants themselves not knowing which treatment they receive (double-blind). Blood lithium levels will be monitored and doses adjusted over the first 6-8 visits, then participants will be followed monthly for 1 year to assess outcomes."

Query using Amazon Bedrock LangChain integration

To create an end-to-end customized Q&A application, Knowledge Bases for Amazon Bedrock provides integration with LangChain. To set up the LangChain retriever, provide the knowledge base ID and specify the number of results to return from the query:

from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever

retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=kb_id,
    retrieval_config={"vectorSearchConfiguration": {"numberOfResults": 4}},
)

Now set up LangChain RetrievalQA and generate answers from the knowledge base:

from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=claude_llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": claude_prompt},
)

[qa(q)["result"] for q in questions]

This will generate corresponding answers similar to the ones listed in the earlier table.

Clean up

Make sure to delete the following resources to avoid incurring additional charges:

Conclusion

Amazon Bedrock provides a broad set of deeply integrated services to power RAG applications of all scales, making it straightforward to get started with analyzing your company data. Knowledge Bases for Amazon Bedrock integrates with Amazon Bedrock foundation models to build scalable document embedding pipelines and document retrieval services to power a wide range of internal and customer-facing applications. We are excited about the future ahead, and your feedback will play a vital role in guiding the progress of this product. To learn more about the capabilities of Amazon Bedrock and knowledge bases, refer to Knowledge base for Amazon Bedrock.


About the Authors

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS Certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book – Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning (ML) projects in various domains such as computer vision, natural language processing and generative AI. She helps customers to build, train and deploy large machine learning models at scale. She speaks in internal and external conferences such re:Invent, Women in Manufacturing West, YouTube webinars and GHC 23. In her free time, she likes to go for long runs along the beach.

Dr. Baichuan Sun, currently serving as a Sr. AI/ML Solution Architect at AWS, focuses on generative AI and applies his knowledge in data science and machine learning to provide practical, cloud-based business solutions. With experience in management consulting and AI solution architecture, he addresses a range of complex challenges, including robotics computer vision, time series forecasting, and predictive maintenance, among others. His work is grounded in a solid background of project management, software R&D, and academic pursuits. Outside of work, Dr. Sun enjoys the balance of traveling and spending time with family and friends.

Derrick Choo is a Senior Solutions Architect at AWS focused on accelerating customer’s journey to the cloud and transforming their business through the adoption of cloud-based solutions. His expertise is in full stack application and machine learning development. He helps customers design and build end-to-end solutions covering frontend user interfaces, IoT applications, API and data integrations and machine learning models. In his free time, he enjoys spending time with his family and experimenting with photography and videography.

Frank Winkler is a Senior Solutions Architect and Generative AI Specialist at AWS based in Singapore, focused in Machine Learning and Generative AI. He works with global digital native companies to architect scalable, secure, and cost-effective products and services on AWS. In his free time, he spends time with his son and daughter, and travels to enjoy the waves across ASEAN.

Nihir Chadderwala is a Sr. AI/ML Solutions Architect in the Global Healthcare and Life Sciences team. His expertise is in building Big Data and AI-powered solutions to customer problems especially in biomedical, life sciences and healthcare domain. He is also excited about the intersection of quantum information science and AI and enjoys learning and contributing to this space. In his spare time, he enjoys playing tennis, traveling, and learning about cosmology.

Read More

Unlock personalized experiences powered by AI using Amazon Personalize and Amazon OpenSearch Service

Unlock personalized experiences powered by AI using Amazon Personalize and Amazon OpenSearch Service

OpenSearch is a scalable, flexible, and extensible open source software suite for search, analytics, security monitoring, and observability applications, licensed under the Apache 2.0 license. Amazon OpenSearch Service is a fully managed service that makes it straightforward to deploy, scale, and operate OpenSearch in the AWS Cloud.

OpenSearch uses a probabilistic ranking framework called BM-25 to calculate relevance scores. If a distinctive keyword appears more frequently in a document, BM-25 assigns a higher relevance score to that document. This framework, however, doesn’t consider user behavior like click-through or purchase data, which can further improve relevance for individual users.

Improving the functionality of search is an integral aspect of enhancing the overall user experience and engagement on a website or application. Search traffic is considered high intent because users are actively seeking a particular item, and they have been found to convert up to two times more than non-site search visitors on average. By using user interaction data such as clicks, likes, and purchases, businesses can improve search relevancy to capitalize on this traffic and reduce instances of users abandoning their sessions due to difficulties in finding the desired items. By refining the quality of search results, businesses can significantly improve their customer engagement, satisfaction, and loyalty, as well as increase their conversion rates, ultimately leading to greater profitability and success.

Amazon Personalize allows you to add sophisticated personalization capabilities to your applications by using the same machine learning (ML) technology used on Amazon.com for over 20 years. No ML expertise is required.

Amazon Personalize supports the automatic adjustment of recommendations based on contextual information about your user, such as device type, location, time of day, or other information you provide. You supply Amazon Personalize with historical data about your users and their interactions within your application, such as purchase history, ratings, and likes. You can add data to Amazon Personalize in bulk by importing large historical datasets all at once from an Amazon Simple Storage Service (Amazon S3) CSV file, using a format required by Amazon Personalize. You can also add data incrementally by importing records using the Amazon Personalize console or API. After your historical data is imported, you can continue to provide new data in real time by sending user interaction events. Based on the use case you want to address, such as product recommendations, you select a pre-built recipe that is optimized for that goal. Amazon Personalize analyzes your data and trains a custom ML model based on the parameters in the recipe to generate personalized recommendations optimized for your users and application. After the model is trained, you can generate real-time personalized recommendations for your users.

With the newly launched Amazon Personalized Search Plugin for Amazon OpenSearch Service, you can use user interaction histories and interests to enhance their search results. By utilizing an Amazon Personalize recipe such as Personalized-Ranking, you can help boost search results for relevant items based on user interests at the time of getting search results from OpenSearch Service.

This post explains how to integrate the Amazon Personalize Search Ranking plugin with OpenSearch Service to enable personalized search experiences. To build Amazon Personalize artifacts in this post, we use a dataset from IMDb, the world’s most authoritative source for movie, TV, and celebrity content, available on AWS Marketplace, as well as the MovieLens dataset prepared by GroupLens research at the University of Minnesota, consisting of user rankings for various movies.

Solution overview

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

  1. A user issues a search request through their website or portal. This search request is sent to OpenSearch Service.
  2. The top N search results are returned from the OpenSearch Service index and sent to the plugin to preprocess and prepare the input for an Amazon Personalize campaign.
  3. The request is sent to Amazon Personalize to get the re-ranked search results.
  4. Amazon Personalize returns the personalized ranking of the search results with the relevant score for each result.
  5. The reranked hits are returned by the plugin to OpenSearch Service, with a weighting applied between the OpenSearch Service relevance score and the Amazon Personalize personalized ranking score. You specify a weight parameter (between 0.0–1.0) that controls the balance between OpenSearch Service and Amazon Personalize when reranking results. A higher weight means more influence from the Amazon Personalize ranking scores vs. the OpenSearch Service scores. This allows you to customize how much the personalized recommendations affect the final search results ranking returned to the user.
  6. The user gets personalized search results based on their preferences and interactions.

Prerequisites

You should have the following prerequisites:

  • An AWS account.
  • An AWS Identity and Access Management (IAM) role with appropriate access permissions. We provide AWS CloudFormation templates and Jupyter notebooks to help set up the required IAM role and access.
  • To enable personalization in OpenSearch Service, you need to set up the required Amazon Personalize resources, including a dataset group, solution version, and campaign. We have provided a Jupyter notebook that creates all the Amazon Personalize resources, taking advantage of the fully managed Jupyter notebook instance capabilities of Amazon SageMaker.

Deploy the CloudFormation stack

The CloudFormation stack automates the deployment of the OpenSearch Service domain and SageMaker Notebook instance. Complete the following steps to deploy the stack:

  1. Sign in to the AWS Management Console with your credentials in the account where you want to deploy the CloudFormation stack.
  2. Launch the CloudFormation stack directly.
  3. On the Specify details page, provide any parameters required by the template, such as OpenSearch Service and SageMaker instance sizes.
  4. On the Configure stack options page, specify a stack name and any other options you want to set.
  5. Complete creating the stack and monitor the status on the stack details page.
  6. After the stack is created, open the SageMaker notebook instance from the console.

The notebook instance will already be preloaded with the required notebooks.

Set up and complete the Amazon Personalize workflow

Open the 1.Configure_Amazon_Personalize.ipynb notebook to set up the Amazon Personalize artifacts. This notebook walks you through the following steps:

  1. Download the dataset and preprocess the data to create the required input files for creating the datasets.
  2. Create a dataset group.
  3. Create datasets and schemas.
  4. Prepare and import data.
  5. Create a solution and a solution version.
  6. Create a campaign for the solution version.

Install the Amazon Personalize Search Ranking plugin using a Jupyter notebook

Open the 2.Configure_Amazon_OpenSearch.ipynb notebook and run through the instructions. This notebook walks you through the following steps:

  1. Ingest sample index data into the OpenSearch Service instance. Populating the index with representative data facilitates thorough testing and validation of the plugin.
  2. Install the plugin package in the OpenSearch Service domain. This integrates the personalization capabilities into the OpenSearch environment.
  3. Set up search pipelines to activate the plugin’s functionality. Search pipelines contain request preprocessors and response postprocessors that transform queries and results. When constructing a pipeline, specify the Amazon Personalize campaign ARN created earlier in a personalized_search_ranking postprocessor to enable personalized re-ranking. This configures the plugin to retrieve real-time personalization results from Amazon Personalize for application during result processing. Defining pipelines allows the plugin to augment search relevance based on user preferences.

Install the Amazon Personalize Search Ranking plugin using the console

You can also set up the Amazon Personalize search plugin from the console. You only need to do this if you have not installed the plugin using the Jupyter notebook from earlier.

To install the Amazon Personalize Search Ranking plugin on OpenSearch Service, complete the following steps:

  1. On the OpenSearch Service console, navigate to your domain.
  2. On the Packages tab, choose Associate package to associate the Amazon Personalize Search Ranking plugin with your OpenSearch Service domain. The plugin version must match the OpenSearch Service domain version.

The Amazon Personalize Search Ranking plugin can be installed on OpenSearch Service versions 2.9 and above.

  1. Locate the Amazon Personalize Search Ranking plugin in the list of available plugins.
  2. Choose Associate next to the plugin to install it and associate it with your existing OpenSearch Service domain.

After you have connected the plugin, it will appear in the list of packages as a plugin type. With the plugin installed, the installation process is now finished.

Enable the Amazon Personalize Search Ranking plugin

The Amazon Personalize Search Ranking plugin uses the search-pipeline feature of OpenSearch Service, released starting with version 2.9. The plugin depends on the search-pipeline feature to apply Amazon Personalized ranking on search results provided by OpenSearch Service and also needs to be set up as a search-pipeline response processor. This pipeline definition will contain configuration for the Amazon Personalize plugin, which includes the Amazon Personalize campaign to call for getting Amazon Personalize ranking, the IAM role to access Amazon Personalize resources, as well as the parameters defined in the following table.

Settings Required Default Description
campaign Yes None Specify the ARN of the Amazon Personalize campaign to use to personalize results.
recipe Yes None Specify the name of the Amazon Personalize recipe to use. As of this writing, aws-personalized-ranking is the only supported value.
item_id_field No “_id” If the _id field for an indexed document in OpenSearch doesn’t correspond with your Amazon Personalize itemId, specify the name of the field that does.
weight Yes None Specify the emphasis that the response processor puts on personalization when it re-ranks results. Specify a value within a range of 0.0–1.0. The closer to 1.0 that it is, the more likely it is that results from Amazon Personalize rank higher. If you specify 0.0, no personalization occurs and OpenSearch Service takes precedence.
tag No None Specify an identifier for the processor.
iam_role_arn Yes None Specify the IAM role to access Amazon Personalize resources. This is required for OpenSearch Service, and optional for open source OpenSearch.
aws_region Yes None Specify the AWS Region where you created your Amazon Personalize campaign.
ignore_failure No None Specify whether the plugin ignores any processor failures. For values, specify true or false. For your production environments, we recommend that you specify true to avoid any interruptions for query responses. For test environments, you can specify false to view any errors that the plugin generates.
external_account_iam_role_arn No None If you use OpenSearch Service and your Amazon Personalize and OpenSearch Service resources exist in different accounts, specify the ARN of the role that has permission to access to Amazon Personalize.

The following Python code snippet creates a search pipeline with a personalized_search_ranking response processor on an OpenSearch Service domain. You run this step one time as a part of the notebook that accompanies this post:

Define search pipeline for personalized ranking

You can use the following Python code to create a search pipeline with a personalized_search_ranking response processor on an OpenSearch Service domain. Replace domain endpoint with your domain endpoint URL. For example: https://<domain name>.<AWS region>.es.amazonaws.com.

import requests
from requests_auth_aws_sigv4 import AWSSigV4

domain_endpoint = 'domain endpoint'
pipeline_name = 'pipeline name'
url = f'{domain_endpoint}/_search/pipeline/{pipeline_name}'
auth = AWSSigV4('es')

headers = {'Content-Type': 'application/json'}

body = {
  "description": "A pipeline to apply custom re-ranking from Amazon Personalize",
  "response_processors": [
    {
      "personalized_search_ranking" : {
        "campaign_arn" : "<Replace with Amazon Personalize Campaign ARN>",
        "item_id_field" : "itemId",
        "recipe" : "aws-personalized-ranking",
        "weight" : "0.3",
        "tag" : "personalize-processor",
        "iam_role_arn": "<Replace with Role ARN>",
        "aws_region": "<Replace with AWS region>",
        "ignore_failure": true
    }
  ]
}
try:
    response = requests.put(url, auth=auth, json=body, headers=headers)
    print(response.text)
except Exception as e:
    print(f"Error: {e}")

Apply a search pipeline to an individual query

After you configure a search pipeline with a personalized_search_ranking response processor, you can apply the Amazon Personalize Search Ranking plugin to your OpenSearch queries and view the re-ranked results. Update the code to specify your domain endpoint, your OpenSearch Service index, the name of your pipeline (you configured above), and your query (we use “Tom Cruise” for query). For user_id, specify the ID of the user that you’re getting search results for. This user must be in the data that you used to create your Amazon Personalize solution version.

import requests
from requests_auth_aws_sigv4 import AWSSigV4

domain_endpoint = 'domain endpoint'
index = 'index name'
url = f'{domain_endpoint}/{index}/_search/'

auth = AWSSigV4('es')
headers = {'Content-Type': 'application/json'}
params = {"search_pipeline": "<Replace with pipeline-name>"}
body = {
    "query": {
        "multi_match": {
            "query": "Tom Cruise",
            "fields": ["title", "plot", "genres", "directedBy", "starring"]
        }
    },
    "ext": {
        "personalize_request_parameters": {
            "user_id": "<Replace with USER ID>"
        }
    }
}
try:
    response = requests.post(url, auth=auth, params=params, json=body, headers=headers)
    print(response)
except Exception as e:
    print(f"Error: {e}")

Evaluate the results

Open the 3.Testing.ipynb notebook and walk through the steps to test and compare the results for queries that use personalization and those that don’t. The Amazon Personalize Search Ranking plugin re-ranks the search results in the OpenSearch Service query response. It considers both the ranking from Amazon Personalize and the ranking from OpenSearch Service. This notebook walks you through the following steps:

  1. Define the necessary connection parameters to establish a connection with your OpenSearch Service domain. This involves specifying the domain endpoint, authentication credentials, and any additional configuration settings required for your specific OpenSearch Service setup.
  2. Create a set of sample queries, including queries with personalization parameters and queries without personalization parameters. These queries will be used to evaluate the impact of personalization on the search results.
  3. Run and compare the results for queries that use personalization and those that do not.

For our example, we used a query for “Tom Cruise” and for the personalization parameter, we used a user with a recent history of viewing drama and romance film genres. The subsequent search results exhibit how the plugin tailors and prioritizes recommendations predicated on the user’s observed viewing behavior. This exemplifies the plugin’s ability to deliver a customized, curated experience by considering individual user preferences and engagement patterns. The capability to refine and attune search outcomes based on inferences of a user’s preferences enables delivering enhanced relevance and utility.

Personalized vs. non-personalized results

Let’s consider personalizing results for a user with ID 12. First, we check this user’s recent interactions by running the code in the 3.Testing.ipynb notebook to retrieve their interaction history. This allows us to see what types of movies this user has reviewed recently, which can inform how we personalize recommendations for them.

In this example, we see that the user has expressed interest in drama, romance, and thriller movie genres. To provide personalized recommendations, we first run queries with personalization parameters enabled, utilizing the user’s genre preferences. We then run the same queries without personalization enabled, for comparison. The following results show the difference between the non-personalized and personalized recommendation outputs.

The first two columns display the default OpenSearch Service results for the query “Tom Cruise” on a movies index, showing a variety of Tom Cruise films across different genres. The next two columns showcase personalized OpenSearch Service results for the same “Tom Cruise” query, but customized for a user interested in drama, romance, and thriller genres. Compared to the generic results, the personalized results prominently feature Tom Cruise movies in the user’s preferred drama, romance, and thriller genres. The delta highlights how the personalized results have been re-ranked relative to the non-personalized results, prioritizing films that match the user’s genre preferences. This demonstrates how personalization can tailor OpenSearch Service results to individual users’ tastes and interests.

This comparison demonstrates how Amazon Personalize can customize OpenSearch Service movie results to match an individual user’s interests. Although standard OpenSearch Service aims to universally serve relevant movie results for Tom Cruise, Amazon Personalize tailors the results to focus on Tom Cruise films it predicts this user will enjoy based on their unique viewing history and preferences.

The side-by-side results illustrate how Amazon Personalize provides a more targeted, user-centric search experience by personalizing the movie results to the individual.

Clean up

Complete the following steps to clean up your resources:

  1. Follow the steps in the 4.Cleanup.ipynb notebook to clean up the resources created through the notebook.
  2. On the AWS CloudFormation console, delete the stack that you created.

Conclusion

The Amazon Personalize Search Ranking plugin integrates seamlessly with OpenSearch Service to enable personalized search experiences. By using user behavior data and the ML capabilities of Amazon Personalize, the plugin can reorder OpenSearch Service result rankings to boost relevance for each unique user. This creates a custom-tailored search experience that surfaces the most relevant content higher in the results. The plugin is configurable to balance personalization with OpenSearch Service native scoring to fit diverse use cases. Overall, the Amazon Personalize Search Ranking plugin is a powerful way to enhance OpenSearch Service search relevance and engagement by factoring in the individual interests and preferences of your users. With just a few configuration steps, you can start serving hyper-relevant results that resonate strongly with your users.

Additional resources


About the Authors

James Jory is a Principal Solutions Architect in Applied AI with AWS. He has a special interest in personalization and recommender systems and a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and auto racing simulations.

Reagan Rosario is a Solutions Architect at AWS, specializing in building scalable, highly available, and secure cloud solutions for education technology companies. With over 10 years of experience in software engineering and architecture roles, Reagan loves using his technical knowledge to help AWS customers architect robust cloud solutions that leverage the breadth and depth of AWS.

Read More

Automate Amazon SageMaker Pipelines DAG creation

Automate Amazon SageMaker Pipelines DAG creation

Creating scalable and efficient machine learning (ML) pipelines is crucial for streamlining the development, deployment, and management of ML models. In this post, we present a framework for automating the creation of a directed acyclic graph (DAG) for Amazon SageMaker Pipelines based on simple configuration files. The framework code and examples presented here only cover model training pipelines, but can be readily extended to batch inference pipelines as well.

This dynamic framework uses configuration files to orchestrate preprocessing, training, evaluation, and registration steps for both single-model and multi-model use cases based on user-defined Python scripts, infrastructure needs (including Amazon Virtual Private Cloud (Amazon VPC) subnets and security groups, AWS Identity and Access Management (IAM) roles, AWS Key Management Service (AWS KMS) keys, containers registry, and instance types), input and output Amazon Simple Storage Service (Amazon S3) paths, and resource tags. Configuration files (YAML and JSON) allow ML practitioners to specify undifferentiated code for orchestrating training pipelines using declarative syntax. This enables data scientists to quickly build and iterate on ML models, and empowers ML engineers to run through continuous integration and continuous delivery (CI/CD) ML pipelines faster, decreasing time to production for models.

Solution overview

The proposed framework code starts by reading the configuration files. It then dynamically creates a SageMaker Pipelines DAG based on the steps declared in the configuration files and the interactions and dependencies among steps. This orchestration framework caters to both single-model and multi-model use cases, and provides a smooth flow of data and processes. The following are the key benefits of this solution:

  • Automation – The entire ML workflow, from data preprocessing to model registry, is orchestrated with no manual intervention. This reduces the time and effort required for model experimentation and operationalization.
  • Reproducibility – With a predefined configuration file, data scientists and ML engineers can reproduce the entire workflow, achieving consistent results across multiple runs and environments.
  • Scalability Amazon SageMaker is used throughout the pipeline, enabling ML practitioners to process large datasets and train complex models without infrastructure concerns.
  • Flexibility – The framework is flexible and can accommodate a wide range of ML use cases, ML frameworks (such as XGBoost and TensorFlow), multi-model training, and multi-step training. Every step of the training DAG can be customized via the configuration file.
  • Model governance – The Amazon SageMaker Model Registry integration allows for tracking model versions, and therefore promoting them to production with confidence.

The following architecture diagram depicts how you can use the proposed framework during both experimentation and operationalization of ML models. During experimentation, you can clone the framework code repository provided in this post and your project-specific source code repositories into Amazon SageMaker Studio, and set your virtual environment (detailed later in this post). You can then iterate on preprocessing, training, and evaluation scripts, as well as configuration choices. To create and run a SageMaker Pipelines training DAG, you can call the framework’s entry point, which will read all the configuration files, create the necessary steps, and orchestrate them based on the specified step ordering and dependencies.

During operationalization, the CI pipeline clones the framework code repository and project-specific training repositories into an AWS CodeBuild job, where the framework’s entry point script is called to create or update the SageMaker Pipelines training DAG, and then run it.

Repository structure

The GitHub repository contains the following directories and files:

  • /framework/conf/ – This directory contains a configuration file that is used to set common variables across all modeling units such as subnets, security groups, and IAM role at the runtime. A modeling unit is a sequence of up to six steps for training an ML model.
  • /framework/createmodel/ – This directory contains a Python script that creates a SageMaker model object based on model artifacts from a SageMaker Pipelines training step. The model object is later used in a SageMaker batch transform job for evaluating model performance on a test set.
  • /framework/modelmetrics/ – This directory contains a Python script that creates an Amazon SageMaker Processing job for generating a model metrics JSON report for a trained model based on results of a SageMaker batch transform job performed on test data.
  • /framework/pipeline/ – This directory contains Python scripts that use Python classes defined in other framework directories to create or update a SageMaker Pipelines DAG based on the specified configurations. The model_unit.py script is used by pipeline_service.py to create one or more modeling units. Each modeling unit is a sequence of up to six steps for training an ML model: process, train, create model, transform, metrics, and register model. Configurations for each modeling unit should be specified in the model’s respective repository. The pipeline_service.py also sets dependencies among SageMaker Pipelines steps (how steps within and across modeling units are sequenced or chained) based on the sagemakerPipeline section, which should be defined in the configuration file of one of the model repositories (the anchor model). This allows you to override default dependencies inferred by SageMaker Pipelines. We discuss the configuration file structure later in this post.
  • /framework/processing/ – This directory contains a Python script that creates a SageMaker Processing job based on the specified Docker image and entry point script.
  • /framework/registermodel/ – This directory contains a Python script for registering a trained model along with its calculated metrics in SageMaker Model Registry.
  • /framework/training/ – This directory contains a Python script that creates a SageMaker training job.
  • /framework/transform/ – This directory contains a Python script that creates a SageMaker batch transform job. In the context of model training, this is used to calculate the performance metric of a trained model on test data.
  • /framework/utilities/ – This directory contains utility scripts for reading and joining configuration files, as well as logging.
  • /framework_entrypoint.py – This file is the entry point of the framework code. It calls a function defined in the /framework/pipeline/ directory to create or update a SageMaker Pipelines DAG and run it.
  • /examples/ – This directory contains several examples of how you can use this automation framework to create simple and complex training DAGs.
  • /env.env – This file allows you to set common variables such as subnets, security groups, and IAM role as environment variables.
  • /requirements.txt – This file specifies Python libraries that are required for the framework code.

Prerequisites

You should have the following prerequisites before deploying this solution:

  • An AWS account
  • SageMaker Studio
  • A SageMaker role with Amazon S3 read/write and AWS KMS encrypt/decrypt permissions
  • An S3 bucket for storing data, scripts, and model artifacts
  • Optionally, the AWS Command Line Interface (AWS CLI)
  • Python3 (Python 3.7 or greater) and the following Python packages:
    • boto3
    • sagemaker
    • PyYAML
  • Additional Python packages used in your custom scripts

Deploy the solution

Complete the following steps to deploy the solution:

  1. Organize your model training repository according to the following structure:
    <MODEL-DIR-REPO>
     .
    ├── <MODEL-DIR>
    |    ├── conf
    |    |   └── conf.yaml
    |    └── scripts
    |        ├── preprocess.py
    |        ├── train.py
    |        ├── transform.py
    |        └── evaluate.py
    └── README.md
    

  2. Clone the framework code and your model source code from the Git repositories:
    • Clone dynamic-sagemaker-pipelines-framework repo into a training directory. In the following code, we assume the training directory is called aws-train:
      git clone https://github.com/aws-samples/dynamic-sagemaker-pipelines-framework.git aws-train

    • Clone the model source code under the same directory. For multi-model training, repeat this step for as many models as you need to train.
      git clone https:<MODEL-DIR-REPO>.git aws-train

For single-model training, your directory should look like the following:

<aws-train>  
.  
├── framework
└── <MODEL-DIR>

For multi-model training, your directory should look like the following:

<aws-train>  
.  
├── framework
└── <MODEL-DIR-1>
└── <MODEL-DIR-2>
└── <MODEL-DIR-3>
  1. Set up the following environment variables. Asterisks indicate environment variables that are required; the rest are optional.
Environment Variable Description
SMP_ACCOUNTID* AWS account where the SageMaker pipeline is run
SMP_REGION* AWS Region where the SageMaker pipeline is run
SMP_S3BUCKETNAME* S3 bucket name
SMP_ROLE* SageMaker role
SMP_MODEL_CONFIGPATH* Relative path of the of single-model or multi-model configuration files
SMP_SUBNETS Subnet IDs for SageMaker networking configuration
SMP_SECURITYGROUPS Security group IDs for SageMaker networking configuration

For single-model use cases, SMP_MODEL_CONFIGPATH will be <MODEL-DIR>/conf/conf.yaml. For multi-model use cases, SMP_MODEL_CONFIGPATH will be */conf/conf.yaml, which allows you to find all conf.yaml files using Python’s glob module and combine them to form a global configuration file. During experimentation (local testing), you can specify environment variables inside the env.env file and then export them by running the following command in your terminal:

source env.env

Note that the values of environment variables in env.env should be placed inside quotation marks (for example, SMP_REGION="us-east-1"). During operationalization, these environment variables should be set by the CI pipeline.

  1. Create and activate a virtual environment by running the following commands:
    python -m venv .venv
    
    source .venv/bin/activate

  2. Install the required Python packages by running the following command:
    pip install -r requirements.txt

  3. Edit your model training conf.yaml files. We discuss the configuration file structure in the next section.
  4. From the terminal, call the framework’s entry point to create or update and run the SageMaker Pipeline training DAG:
    python framework/framework_entrypoint.py

  5. View and debug the SageMaker Pipelines run on the Pipelines tab of the SageMaker Studio UI.

Configuration file structure

There are two types of configuration files in the proposed solution: framework configuration and model configuration. In this section, we describe each in detail.

Framework configuration

The /framework/conf/conf.yaml file sets the variables that are common across all modeling units. This includes SMP_S3BUCKETNAME, SMP_ROLE, SMP_MODEL_CONFIGPATH, SMP_SUBNETS, SMP_SECURITYGROUPS, and SMP_MODELNAME. Refer to Step 3 of deployment instructions for descriptions of these variables and how to set them via environment variables.

Model configuration

For each model in the project, we need to specify the following in the <MODEL-DIR>/conf/conf.yaml file (asterisks indicate required sections; the rest are optional):

  • /conf/models* – In this section, you can configure one or more modeling units. When the framework code is run, it will automatically read all configuration files during runtime and append them to the config tree. Theoretically, you can specify all modeling units in the same conf.yaml file, but it’s recommended to specify each modeling unit configuration in its respective directory or Git repository to minimize errors. The units are as follows:
    • {model-name}* – The name of the model.
    • source_directory* – A common source_dir path to use for all steps within the modeling unit.
    • preprocess – This section specifies preprocessing parameters.
    • train* – This section specifies training job parameters.
    • transform* – This section specifies SageMaker Transform job parameters for making predictions on the test data.
    • evaluate – This section specifies SageMaker Processing job parameters for generating a model metrics JSON report for the trained model.
    • registry* – This section specifies parameters for registering the trained model in SageMaker Model Registry.
  • /conf/sagemakerPipeline* – This section defines the SageMaker Pipelines flow, including dependencies among steps. For single-model use cases, this section is defined at the end of the configuration file. For multi-model use cases, the sagemakerPipeline section only needs to be defined in the configuration file of one of the models (any of the models). We refer to this model as the anchor model. The parameters are as follows:
    • pipelineName* – Name of the SageMaker pipeline.
    • models* – Nested list of modeling units:
      • {model-name}* – Model identifier, which should match a {model-name} identifier in the /conf/models section.
        • steps*
          • step_name* – Step name to be displayed in the SageMaker Pipelines DAG.
          • step_class* – (Union[Processing, Training, CreateModel, Transform, Metrics, RegisterModel])
          • step_type* – This parameter is only required for preprocessing steps, for which it should be set to preprocess. This is needed to distinguish preprocess and evaluate steps, both of which have a step_class of Processing.
          • enable_cache – ([Union[True, False]]). This indicates whether to enable SageMaker Pipelines caching for this step.
          • chain_input_source_step – ([list[step_name]]). You can use this to set the channel outputs of another step as input to this step.
          • chain_input_additional_prefix – This is only allowed for steps of the Transform step_class, and can be used in conjunction with chain_input_source_step parameter to pinpoint the file that should be used as the input to the transform step.
    • dependencies – This section specifies the sequence in which the SageMaker Pipelines steps should be run. We have adapted the Apache Airflow notation for this section (for example, {step_name} >> {step_name}). If this section is left blank, explicit dependencies specified by the chain_input_source_step parameter or implicit dependencies define the SageMaker Pipelines DAG flow.

Note that we recommend having one training step per modeling unit. If multiple training steps are defined for a modeling unit, the subsequent steps implicitly take the last training step to create the model object, calculate metrics, and register the model. If you need to train multiple models, it’s recommended to create multiple modeling units.

Examples

In this section, we demonstrate three examples of ML model training DAGs created using the presented framework.

Single-model training: LightGBM

This is a single-model example for a classification use case where we use LightGBM in script mode on SageMaker. The dataset consists of categorical and numerical variables to predict the binary label Revenue (to predict if the subject makes a purchase or not). The preprocessing script is used to model the data for training and testing and then stage it in an S3 bucket. The S3 paths are then provided to the training step in the configuration file.

When the training step runs, SageMaker loads the file on the container at /opt/ml/input/data/{channelName}/, accessible via the environment variable SM_CHANNEL_{channelName} on the container (channelName= ‘train’ or ‘test’).The training script does the following:

  1. Load the files locally from local container paths using the NumPy load module.
  2. Set hyperparameters for the training algorithm.
  3. Save the trained model at the local container path /opt/ml/model/.

SageMaker takes the content under /opt/ml/model/ to create a tarball that is used to deploy the model to SageMaker for hosting.

The transform step takes as input the staged test file as input and the trained model to make predictions on the trained model. The output of the transform step is chained to the metrics step to evaluate the model against the ground truth, which is explicitly supplied to the metrics step. Finally, the output of the metrics step is implicitly chained to the register step to register the model in SageMaker Model Registry with information about the model’s performance produced in the metrics step. The following figure shows a visual representation of the training DAG. You can refer to the scripts and configuration file for this example in the GitHub repo.

Single-model training: LLM fine-tuning

This is another single-model training example, where we orchestrate fine-tuning of a Falcon-40B large language model (LLM) from Hugging Face Hub for a text summarization use case. The preprocessing script loads the samsum dataset from Hugging Face, loads the tokenizer for the model, and processes the train/test data splits for fine-tuning the model on this domain data in the falcon-text-summarization-preprocess step.

The output is chained to the falcon-text-summarization-tuning step, where the training script loads the Falcon-40B LLM from Hugging Face Hub and starts accelerated fine-tuning using LoRA on the train split. The model is evaluated in the same step after fine-tuning, which gatekeeps the evaluation loss to fail the falcon-text-summarization-tuning step, which causes the SageMaker pipeline to stop before it is able to register the fine-tuned model. Otherwise, the falcon-text-summarization-tuning step runs successfully and the model is registered in SageMaker Model Registry. The following figure shows a visual representation of the LLM fine-tuning DAG. The scripts and configuration file for this example are available in the GitHub repo.

Multi-model training

This is a multi-model training example where a principal component analysis (PCA) model is trained for dimensionality reduction, and a TensorFlow Multilayer Perceptron model is trained for California Housing Price prediction. The TensorFlow model’s preprocessing step uses a trained PCA model to reduce dimensionality of its training data. We add a dependency in the configuration to ensure the TensorFlow model is registered after PCA model registration. The following figure shows a visual representation of the multi-model training DAG example. The scripts and configuration files for this example are available in the GitHub repo.

Clean up

Complete the following steps to clean up your resources:

  1. Use the AWS CLI to list and remove any remaining pipelines that are created by the Python scripts.
  2. Optionally, delete other AWS resources such as the S3 bucket or IAM role created outside SageMaker Pipelines.

Conclusion

In this post, we presented a framework for automating SageMaker Pipelines DAG creation based on configuration files. The proposed framework offers a forward-looking solution to the challenge of orchestrating complex ML workloads. By using a configuration file, SageMaker Pipelines provides the flexibility to build orchestration with minimal code, so you can streamline the process of creating and managing both single-model and multi-model pipelines. This approach not only saves time and resources, but also promotes MLOps best practices, contributing to the overall success of ML initiatives. For more information about implementation details, review the GitHub repo.


About the Authors

Luis Felipe Yepez Barrios, is a Machine Learning Engineer with AWS Professional Services, focused on scalable distributed systems and automation tooling to expedite scientific innovation in the field of Machine Learning (ML). Furthermore, he assists enterprise clients in optimizing their machine learning solutions through AWS services.

Jinzhao Feng, is a Machine Learning Engineer at AWS Professional Services. He focuses on architecting and implementing large scale Generative AI and classical ML pipeline solutions. He is specialized in FMOps, LLMOps and distributed training.

Harsh Asnani, is a Machine Learning Engineer at AWS. His Background is in Applied Data Science with a focus on operationalizing Machine Learning workloads in the cloud at scale.

Hasan Shojaei, is a Sr. Data Scientist with AWS Professional Services, where he helps customers across different industries solve their business challenges through the use of big data, machine learning, and cloud technologies. Prior to this role, Hasan led multiple initiatives to develop novel physics-based and data-driven modeling techniques for top energy companies. Outside of work, Hasan is passionate about books, hiking, photography, and history.

Alec Jenab, is a Machine Learning Engineer who specializes in developing and operationalizing machine learning solutions at scale for enterprise customers. Alec is passionate about bringing innovative solutions to market, especially in areas where machine learning can meaningfully improve end user experience. Outside of work, he enjoys playing basketball, snowboarding, and discovering hidden gems in San Francisco.

Read More

Accelerating large-scale neural network training on CPUs with ThirdAI and AWS Graviton

Accelerating large-scale neural network training on CPUs with ThirdAI and AWS Graviton

This guest post is written by Vihan Lakshman, Tharun Medini, and Anshumali Shrivastava from ThirdAI.

Large-scale deep learning has recently produced revolutionary advances in a vast array of fields. Although this stunning progress in artificial intelligence remains remarkable, the financial costs and energy consumption required to train these models has emerged as a critical bottleneck due to the need for specialized hardware like GPUs. Traditionally, even modestly sized neural models have required costly hardware accelerators for training, which limits the number of organizations with the financial means to take full advantage of this technology.

Founded in 2021, ThirdAI Corp. is a startup dedicated to the mission of democratizing artificial intelligence technologies through algorithmic and software innovations that fundamentally change the economics of deep learning. We have developed a sparse deep learning engine, known as BOLT, that is specifically designed for training and deploying models on standard CPU hardware as opposed to costly and energy-intensive accelerators like GPUs. Many of our customers have reported strong satisfaction with ThirdAI’s ability to train and deploy deep learning models for critical business problems on cost-effective CPU infrastructure.

In this post, we investigate of potential for the AWS Graviton3 processor to accelerate neural network training for ThirdAI’s unique CPU-based deep learning engine.

The benefits of high-performance CPUs

At ThirdAI, we achieve these breakthroughs in efficient neural network training on CPUs through proprietary dynamic sparse algorithms that activate only a subset of neurons for a given input (see the following figure), thereby side-stepping the need for full dense computations. Unlike other approaches to sparse neural network training, ThirdAI uses locality-sensitive hashing to dynamically select neurons for a given input as shown in the bold lines below. In certain cases, we have even observed that our sparse CPU-based models train faster than the comparable dense architecture on GPUs.

Dense Neural architecture with bold lines showing which neurons are selected

Given that many of our target customers operate in the cloud—and among those, the majority use AWS—we were excited to try out the AWS Graviton3 processor to see if the impressive price-performance improvements of Amazon’s silicon innovation would translate to our unique workload of sparse neural network training and thereby provide further savings for customers. Although both the research community and the AWS Graviton team have delivered exciting advances in accelerating neural network inference on CPU instances, we at ThirdAI are, to our knowledge, the first to seriously study how to train neural models on CPUs efficiently.

As shown in our results, we observed a significant training speedup with AWS Graviton3 over the comparable Intel and NVIDIA instances on several representative modeling workloads.

Instance types

For our evaluation, we considered two comparable AWS CPU instances: a c6i.8xlarge machine powered by Intel’s Ice Lake processor and a c7g.8xlarge powered by AWS Graviton3. The following table summarizes the details of each instance.

Instance vCPU RAM (GB) Processor On-Demand Price (us-east-1)
c7g.8xlarge 32 64 AWS Graviton3 $1.1562/hr
c6i.8xlarge 32 64 Intel Ice Lake $1.36/hr
g5g.8xlarge (GPU) 32 64 with 16 GB GPU Memory AWS Graviton2 processors with 1 NVIDIA T4G GPU $1.3720/hr

Evaluation 1: Extreme classification

For our first evaluation, we focus on the problem of extreme multi-label classification (XMC), an increasingly popular machine learning (ML) paradigm with a number of practical applications in search and recommendations (including at Amazon). For our evaluation, we focus on the public Amazon-670K product recommendation task, which, given an input product, identifies similar products from a collection of over 670,000 items.

In this experiment, we benchmark ThirdAI’s BOLT engine against TensorFlow 2.11 and PyTorch 2.0 on the aforementioned hardware choices: Intel Ice Lake, AWS Graviton3, and an NVIDIA T4G GPU. For our experiments on Intel and AWS Graviton, we use the AWS Deep Learning AMI (Ubuntu 18.04) version 59.0. For our GPU evaluation, we use the NVIDIA GPU-Optimized Arm64 AMI, available via the AWS Marketplace. For this evaluation, we use the SLIDE model architecture, which achieves both competitive performance on this extreme classification task and strong training performance on CPUs. For our TensorFlow and PyTorch comparisons, we implement the analogous version of the SLIDE multi-layer perceptron (MLP) architecture with dense matrix multiplications. We train each model for five epochs (full passes through the training dataset) with a fixed batch size of 256 and learning rate of 0.001. We observed that all models achieved the same test accuracy of 33.6%.

The following chart compares the training time of ThirdAI’s BOLT to TensorFlow 2.11 and PyTorch 2.0 on the Amazon670k extreme classification benchmark. All models achieve the same test precision. We observe that AWS Graviton3 considerably accelerates the performance of BOLT out of the box with no customizations needed—by approximately 40%. ThirdAI’s BOLT on AWS Graviton3 also achieves considerably faster training than the TensorFlow or PyTorch models trained on the GPU. Note that there is no ThirdAI result on the NVIDIA GPU benchmark because BOLT is designed to run on CPUs. We do not include TensorFlow and PyTorch CPU benchmarks because of the prohibitively long training time.

Amazon 670k Training time Bar chart comparing instances c6i.8xlarge vs c7g.8xlarge

The following table summarizes the training time and test accuracy for each processor/specialized processor(GPU).

Processor Engine Training Time (s) Test Accuracy
Intel Ice Lake (c6i.8xlarge) BOLT 1470 33.6
AWS Graviton3 (c7g.8xlarge) BOLT 935 33.6
NVIDIA T4G (g5g.8xlarge) TensorFlow 7550 33.6
NVIDIA T4G (g5g.8xlarge) PyTorch 5130 33.6

Evaluation 2: Yelp Polarity sentiment analysis

For our second evaluation, we focus on the popular Yelp Polarity sentiment analysis benchmark, which involves classifying a review as positive or negative. For this evaluation, we compare ThirdAI’s Universal Deep Transformers (UDT) model against a fine-tuned DistilBERT network, a compressed pre-trained language model that achieves near-state-of-the-art performance with reduced inference latency. Because fine-tuning DistilBERT models on a CPU would take a prohibitively long time (at least several days), we benchmark ThirdAI’s CPU-based models against DistilBERT fine-tuned on a GPU. We train all models with a batch size of 256 for a single pass through the data (one epoch). We note that we can achieve slightly higher accuracy with BOLT with additional passes through the data, but we restrict ourselves to a single pass in this evaluation for consistency.

As shown in the following figure, AWS Graviton3 again accelerates ThirdAI’s UDT model training considerably. Furthermore, UDT is able to achieve comparable test accuracy to DistilBERT with a fraction of the training time and without the need for a GPU. We note that there has also been recent work in optimizing the fine-tuning of Yelp Polarity on CPUs. Our models, however, still achieve greater efficiency gains and avoid the cost of pre-training, which is substantial and requires the use of hardware accelerators like GPUs.

Training time on Yelp Polarity C7g vs c6i

The following table summarizes the training time, test accuracy, and inference latency.

Processor Engine Model Training Time (s) Test Accuracy Inference Latency (ms)
Intel Icelake (c6i.8xlarge) BOLT UDT 47 93.2 <1
Graviton3 (c7g.8xlarge) BOLT UDT 29 92.9 <1
T4G GPU (g5g.8xlarge) TensorFlow DistilBERT 4200 93.3 8.7
T4G GPU (g5g.8xlarge) PyTorch DistilBERT 3780 93.4 8.3

Evaluation 3: Multi-class text classification (DBPedia)

For our final evaluation, we focus on the problem of multi-class text classification, which involves assigning a label to a given input text from a set of more than two output classes. We focus on the DBPedia benchmark, which consists of 14 possible output classes. Again, we see that AWS Graviton3 accelerates UDT performance over the comparable Intel instance by roughly 40%. We also see that BOLT achieves comparable results to the DistilBERT transformer-based model fine-tuned on a GPU while achieving sub-millisecond latency.

ThirdAI BOLT training time on c7g vs c6i

The following table summarizes the training time, test accuracy, and inference latency.

Processor Engine Model Training Time (s) Test Accuracy Inference Latency (ms)
Intel Icelake (c6i.8xlarge) BOLT UDT 23 98.23 <1
Graviton3 (c7g.8xlarge) BOLT UDT 14 98.10 <1
T4G GPU (g5g.8xlarge) TensorFlow DistilBERT 4320 99.23 8.6
T4G GPU (g5g.8xlarge) PyTorch DistilBERT 3480 99.29 8

Get started with ThirdAI on AWS Graviton

We have designed our BOLT software for compatibility with all major CPU architectures, including AWS Graviton3. In fact, we didn’t have to make any customizations to our code to run on AWS Graviton3. Therefore, you can use ThirdAI for model training and deployment on AWS Graviton3 with no additional effort. In addition, as detailed in our recent research whitepaper, we have developed a set of novel mathematical techniques to automatically tune the specialized hyperparameters associated with our sparse models, allowing our models to work well immediately out of the box.

We also note that our models primarily work well for search, recommendation, and natural language processing tasks that typically feature large, high-dimensional output spaces and a requirement of extremely low inference latency. We are actively working on extending our methods to additional domains, such as computer vision, but be aware that our efficiency improvements do not translate to all ML domains at this time.

Conclusion

In this post, we investigated the potential for the AWS Graviton3 processor to accelerate neural network training for ThirdAI’s unique CPU-based deep learning engine. Our benchmarks on search, text classification, and recommendations benchmarks suggest that AWS Graviton3 can accelerate ThirdAI’s model training workloads by 30–40% over the comparable x86 instances with a price-performance improvement of nearly 50%. Furthermore, because AWS Graviton3 instances are available at a lower cost than the analogous Intel and NVIDIA machines and enable shorter training and inference times, you can further unlock the value of the AWS pay-as-you-go usage model by using lower-cost machines for shorter durations of time.

We are very excited by the price and performance savings of AWS Graviton3 and will look to pass on these improvements to our customers so they can enjoy faster ML training and inference with improved performance on low-cost CPUs. As customers of AWS ourselves, we are delighted by the speed at which AWS Graviton3 allows us to experiment with our models, and we look forward to using more cutting-edge silicon innovation from AWS going forward. Graviton Technical Guide is a good resource to consider while evaluating your ML workloads to run on Graviton. You can also try Graviton t4g instances free trial.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post. At the time of writing the blog the most current instance were c6i and hence the comparison was done with c6i instances.


About the Author

Vihan Lakshman – Vihan Lakshman is a research scientist at ThirdAI Corp. focused on developing systems for resource-efficient deep learning. Prior to ThirdAI, he worked as an Applied Scientist at Amazon and received undergraduate and master’s degrees from Stanford University. Vihan is also a recipient of a National Science Foundation research fellowship.

Tharun Medini – Tharun Medini is the co-founder and CTO of ThirdAI Corp. He did his PhD in “Hashing Algorithms for Search and Information Retrieval” at Rice University. Prior to ThirdAI, Tharun worked at Amazon and Target. Tharun is the recipient of numerous awards for his research, including the Ken Kennedy Institute BP Fellowship, the American Society of Indian Engineers Scholarship, and a Rice University Graduate Fellowship.

Anshumali Shrivastava – Anshumali Shrivastava is an associate professor in the computer science department at Rice University. He is also the Founder and CEO of ThirdAI Corp, a company that is democratizing AI to commodity hardware through software innovations. His broad research interests include probabilistic algorithms for resource-frugal deep learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch.  He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Office of Scientific Research, a machine learning research award from Amazon, and a Data Science Research Award from Adobe. He has won numerous paper awards, including Best Paper Awards at NIPS 2014 and MLSys 2022, as well as the Most Reproducible Paper Award at SIGMOD 2019. His work on efficient machine learning technologies on CPUs has been covered by popular press including Wall Street Journal, New York Times, TechCrunch, NDTV, etc.

Read More

Supercharge your AI team with Amazon SageMaker Studio: A comprehensive view of Deutsche Bahn’s AI platform transformation

Supercharge your AI team with Amazon SageMaker Studio: A comprehensive view of Deutsche Bahn’s AI platform transformation

AI’s growing influence in large organizations brings crucial challenges in managing AI platforms. These include developing a scalable and operationally efficient platform that adheres to organizational compliance and security standards. Amazon SageMaker Studio offers a comprehensive set of capabilities for machine learning (ML) practitioners and data scientists. These include a fully managed AI development environment with an integrated development environment (IDE), simplifying the end-to-end ML workflow. Its collaborative capabilities such as real-time coediting and sharing notebooks within the team ensures smooth teamwork, while the scalability and high-performance training caters to large datasets. With built-in security, cost-effectiveness, and a range of pre-built tools like Amazon SageMaker Autopilot, Amazon SageMaker JumpStart, and Amazon SageMaker Feature store, SageMaker Studio is a powerful platform for accelerating AI projects and empowering data scientists at every level of expertise.

Deutsche Bahn is a leading transportation organization in Germany with a revenue of 56.3 billion EUR (in 2022), a workforce of 336,884 employees (including 221,343 employees in Germany), and operations spanning 130 countries. They offer a wide range of services, including public and regional transport, freight services, and rail infrastructure. Through the integrated operation of traffic and railway infrastructure, as well as the economically and ecologically intelligent connection of all modes of transport, Deutsche Bahn moves people and goods. Deutsche Bahn has been at the forefront in adopting AI, using SageMaker Studio as a key AI platform. At Deutsche Bahn, a dedicated AI platform team manages and operates the SageMaker Studio platform, and multiple data analytics teams within the organization use the platform to develop, train, and run various analytics and ML activities.

The AI platform team’s key objective is to ensure seamless access to Workbench services and SageMaker Studio for all Deutsche Bahn teams and projects, with a primary focus on data scientists and ML engineers. This platform helps Deutsche Bahn realize a spectrum of use cases, ranging from railway maintenance, forecasting, and future applications in generative AI.

The AI platform managed service, built on SageMaker Studio, seamlessly aligns with Deutsche Bahn’s group-wide platform strategy. It meets the company’s compliance requirements, enables a swift project initiation for the team by provisioning a SageMaker domain, and reduces maintenance overhead due to an overarching operating model. Major benefits include high scalability of the service, in large part due to automation and a self-service model, and an attractive pricing model that’s primarily based on resource consumption.

“SageMaker Studio provided us a common platform that is scalable, security compliant, and addresses the development needs of data scientists from multiple data analytics teams within the DB organization. Before this, each team managed and operated their own JupyterLab notebooks, which was not efficient or cost-effective. Within 8 weeks, we onboarded over 120 developers, provisioned 25 SageMaker domains, and quickly got started using this platform.”

– Emmanuel Drosos, product owner at DB Systel.

In this post, we explore how Deutsche Bahn scaled and operated their AI platform using SageMaker Studio for multiple teams, while ensuring robust security and oversight.

Solution overview

The architecture at Deutsche Bahn consists of a central platform account managed by a platform team responsible for managing infrastructure and operations for SageMaker Studio. SageMaker Studio resources are grouped by SageMaker domains, each consisting of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations. At Deutsche Bahn, data scientists from various teams use SageMaker domains for their ML activities; each team has a dedicated SageMaker domain that they use for developing and testing ML models and collaborate using features such as notebook sharing.

From an infrastructure perspective, the VPC provisioned in the AI platform account as shown in the following figure has no outbound internet connectivity to ensure security and compliance. For high availability, multiple identical private isolated subnets are provisioned. The SageMaker Studio domains are deployed in VPC only mode, which creates an elastic network interface for communication between the SageMaker service account (AWS service account) and the platform account’s VPC. The endpoints like SageMaker API, SageMaker Studio, and SageMaker notebook facilitate secure and reliable communication between the platform account’s VPC and the SageMaker domain managed by AWS in the SageMaker service account.

Each data analytics team is able to request one or multiple SageMaker domains through the company’s internal self-service portal. This process of ordering a SageMaker domain is orchestrated through a separate workflow process (via AWS Step Functions). During this orchestration flow, an Azure Active Directory (AD) group for the data analytics team is provisioned with the AD group name corresponding to the domain name. The orchestration leads to a continuous integration and continuous deployment (CI/CD) pipeline deploying an AWS Cloud Development Kit (AWS CDK) app consisting of a SageMaker domain for the respective team.

In addition to the SageMaker domain, a customized AWS Identity and Access Management (IAM) role (SageMaker-execution-role), Amazon Simple Storage Service (Amazon S3) bucket (data-bucket), customer managed key (CMK), and other AWS resources are provisioned during the deployment process by the AWS CDK app, as illustrated in the following figure. The AD group contains scientists who needs access to their team’s SageMaker domain. The AD group name corresponds to the SageMaker domain’s name and is primarily used during the authorization process.

Client separation is implemented on the level of SageMaker domains by using IAM authentication mode. A domain-specific IAM role (SageMaker-execution-role) is attached to each domain that follows the principle of least privilege and is assumed by the data analytics team during the login process. This role grants data scientists in the team the ability to perform various activities, such as running processing jobs, hyperparameter tuning jobs, transformation jobs, and experiments, as well as creating models. These ML activities are run on behalf of the user by SageMaker using the IAM pass role permission. However, certain actions like creating S3 buckets, modifying IAM roles, updating SageMaker domains, and provisioning large instances are restricted for security, compliance, and cost control reasons. The associated IAM policy makes sure that the data analytics team only has access to the relevant S3 bucket and CMK for their authorized domain, as depicted in the following figure. Additionally, the role SageMaker-execution-role allows the team members to assume roles in other accounts within the Deutsche Bahn organization from SageMaker Studio, providing them with flexibility to access resources like Amazon Relational Database Service (Amazon S3), other S3 buckets, and Amazon Athena. The IAM policy uses aws:RequestTag and aws:ResourceTag for fine-grained access control during SageMaker activities, like processing jobs, training jobs, and create models. These tags also help track associated costs for the domain. For more information, refer to Actions, resources, and condition keys for Amazon SageMaker.

ml-14819-3

The CMK encrypts both the SageMaker domain’s file system contents stored in Amazon EFS and the contents of the S3 bucket (data-bucket) that is provisioned to store data for SageMaker processing and transformation jobs. In addition, resource-based policies, such as the bucket policy and CMK policy, provide an extra layer of security, restricting both access to only authorized AI team members and permitted actions on these resources.

The AI team does not have AWS Management Console access to the AI platform team’s account. To access SageMaker Studio, as illustrated in the following figure, the data scientists from the data analytics team use a generated presigned URL by authenticating through an Amazon Cognito based custom login application. After the user logs in to this custom application, they receive an OAuth access token that contains information such as AD group name. After they log in to the custom application, the user requests SageMaker domain access through the UI by triggering an Amazon API Gateway call to generate a presigned URL. API Gateway invokes the PreSignUrlGenerator AWS Lambda function and uses an Amazon Cognito authorizer to validate the OAuth access token in the request header. The PreSignUrlGenerator function validates user access permissions for the requested SageMaker domain by comparing the AD name in the access token against the requested SageMaker domain. Upon successful authorization, the PreSignUrlGenerator function creates a SageMaker user profile upon first login and generates a presigned URL response. The custom login application then redirects the users to the requested SageMaker domain.

ml14819-4

AWS CDK

The solution at Deutsche Bahn uses AWS CDK as infrastructure as code (IaC) to provision a SageMaker domain along with resources like S3 buckets and a CMK. The following figure illustrates the stacks and associated resources used for SageMaker deployment. The infrastructure stack takes care of setting up essential resources like VPC, subnets, and multiple SageMaker endpoints. The resources such as VPC, subnets, and service control policies (SCPs) are managed by a central cloud team through a different stack (but is shown here for simplicity). The SageMakerStudioStack is primarily responsible for provisioning a SageMaker domain, a dedicated data bucket, a CMK, and the dedicated IAM role SageMaker-execution-role. Notably, each SageMaker domain is provisioned through its individual SageMakerStudioStack.

ml-14819-5

The solution uses a purpose-built L3 construct (SageMaker Studio domain), as shown in the following figure, for the SageMaker domain resource. SageMaker Studio has a lifecycle configuration feature that enables specific initializations during the startup of JupyterLab or KernelGateway apps.

ml-14819-6

Deutsch Bahn uses the lifecycle configuration as shown in the following figure to automatically detect and shut down idle instances in the SageMaker domain, reducing unnecessary costs. Due to restricted outbound connectivity, the data analytics team uses internally hosted images and third-party libraries from the company’s internal artifactory. The lifecycle configuration script for KernelGateway configures pip and conda package managers to redirect downloads to the internally hosted artifactory location. As of this writing, there is no AWS CDK construct for the lifecycle configuration resource; therefore, they use a custom CDK resource to provision and manage the LifeCycleConfig script. Custom resources in AWS CDK offer the ability to provision and manage resources not directly supported by AWS CloudFormation or AWS CDK constructs.

Installation

The sample AWS CDK application demonstrates how various components, including the SageMaker domain, lifecycle configuration, Amazon Cognito, and IAM role with the least privileges, function together. Within the application, the SagemakerStudioStack class handles the provisioning of a SageMaker domain, IAM role (sagemaker-execution-role) that users assume, CMK, lifecycle configuration, SageMaker user profile, S3 bucket for data processing, and Amazon Cognito user group. The demo AWS CDK application provides a concise overview of key components, such as the SageMaker domain, lifecycle configuration, authentication through Amazon Cognito, and IAM role with least privileges. The SagemakerLoginStack, on the other hand, is responsible for deploying the Amazon Cognito user pool, Lambda function, and API Gateway for generating presigned URLs. The CognitoUserStack primarily focuses on deploying a user within the Amazon Cognito user pool.

You can run the following commands to compile, synthesize, and deploy the application. You should adjust the account, user, and password in the sample code for your application. The password should be at least 8 characters, with uppercase characters and numbers. The user parameter is the SageMaker domain user that will be authenticated by Amazon Cognito.

  1. Download the source code from the GitHub repo.
  2. Bootstrap the AWS account. In the following code, adjust the account number and Region as needed:
    cdk bootstrap aws://11111111111/eu-central-1

  3. Install the packages and compile the code:
    npm install
    npm run build

  4. Synthesize the AWS CDK application:
    npx cdk synth -c account=11111111111 -c region='eu-central-1' -c domain-name=team1 -c user=demo-user -c password=<your password>

  5. Deploy the application with all stacks into the account and Region of your choice:
    npx cdk deploy --all -c account=11111111111 -c region='eu-central-1' -c domain-name=team1 -c user=demo-user -c password=<password>

  6. Download the Postman app to make an API call.

If you don’t have a Postman account, create a free account with your email. If you already have an account, sign in to your account.

  1. On the File menu, choose Import and import the Postman environment JSON file included in the GitHub repo.
  2. On the Environments tab in Postman, locate the environment called SageMaker.
  3. Add the following environment variables, which you see as part of the stack deployment output from SagemakerLoginStack:
    ..... output from the cdk deploy .....
    
    //PreSignedURLApi
    
    SageMaker-login-stack.PreSignedURLApiEndpointXXXX= https://xxxxxxx.execute-api.eu-central-1.amazonaws.com/prod/
    
    //UserPoolClientId
    
    SageMaker-login-stack.UserPoolUserPoolClientIdFXXXX = xxxxxxxxxxxxxxxx
    
    //UserPoolClientSecret
    
    SageMaker-login-stack.UserPoolUserPoolClientSecretC1D088A5 = xxxxxxxxxxxxxxx
    
    //CognitoSigninDomain
    
    SageMaker-login-stack.UserPoolCognitoSigninDomainD3B08161 = https://SageMaker-login-xxxxx.auth.eu-central-1.amazoncognito.com/oauth2

Use the following parameters (fetch the values from the output during cdk deploy):

    • domainName – The domain name parameter you passed in cdk deploy, for example team1
    • client-id – The Amazon Cognito client ID
    • client-secret – The Amazon Cognito client secret.
    • SageMaker-presigned-api – The URL of the API Gateway created by AWS CDK, which generates the presigned URL
    • cognito-signin-endpoint – The endpoint URL of the Amazon Cognito domain where the client app (in this case, Postman) authenticates by providing credentials of the user (demo-user)

The next step is to generate an OAuth2 token.

    1. On the Authorization tab, choose the SageMaker environment and choose Generate New Access Token.

All the values on this tab should be prefilled.

    1. Update the environment variables and choose Get New Access Token.

ml-14819-8

  1. In the pop-up window that opens, log in to Amazon Cognito with the user name (demo-user) and password you used earlier.

Upon successful authentication, a new access token is generated.

  1. Choose Use Token.
  2. Choose GeneratePresignedUrlDemo in the Postman SageMaker collections and choose Send.
  3. Make sure you selected the right environment (SageMaker) on the drop-down list.

This makes a REST API call to API Gateway and generates a presigned URL to access the SageMaker domain. You can see this URL in the response body.

  1. Copy this URL and enter it in the browser window.

A new SageMaker domain will be launched with your user profile.

This demo application supports SageMaker features like training jobs, processing jobs, and model endpoints. Note that features like Amazon SageMaker Canvas, SageMaker JumpStart, and SageMaker Feature Store are not activated.

Clean up

Complete the following steps to clean up your resources:

  1. On the SageMaker console, in the navigation pane, choose Domain, User Profile, and Apps.
  2. Delete all running apps (KernelGateway or JupyterLab) from this solution.
  3. Delete all the SageMaker user profiles you created during the login step.
  4. On the Amazon EFS console, delete the EFS file system created for this post.
  5. Run the following command to delete the resources created with the AWS CDK:
    npx cdk destroy --all

Conclusion

The post highlighted how Deutsche Bahn effectively used SageMaker Studio to revamp its AI platform, resulting in a scalable, automated, and manageable solution to support its diverse data analytics teams. This architecture features a central platform account, a self-service domain ordering process, and infrastructure provisioning using AWS CDK. The deployment process incorporates a CI/CD pipeline, ensuring the smooth delivery of SageMaker domains.

Overall, the transformation brought about by SageMaker Studio has empowered Deutsche Bahn to construct a robust platform for their AI initiatives, catering to over 100 developers and managing 20 SageMaker domains within a single AWS account.

Lastly, we extend our sincere appreciation to Nico Seegert (d-fine) and Philipp Vollmer (Deutsche Bahn), whose invaluable contributions were instrumental in shaping this architecture.

For further reading, refer to the following resources:

___________________________________________________________________________________________

About the authors

Prasanna Tuladhar is a Cloud Infrastructure Architect at AWS Professional Services in Munich, Germany. Specializing in cloud infrastructure, workload migration, and DevOps on the AWS platform, he empowers customers to achieve their business objectives. Outside of work, he enjoys jogging, hiking, and quality time with his family.

Emmanuel Drosos is a Product Owner for the AI platform at DBSystel, a subsidiary of Deutsche Bahn (DB) Germany. With a passion for innovation and technology, Emmanuel spearheads initiatives aimed at leveraging the power of the cloud to drive AI platform at DB (Deutsche Bahn). The AI.Platform is one of DB’s group-wide development platforms. It includes AI services and tools for the development of AI (machine learning) models and directly usable AI services. Simple, integrated and scalable.He works closely with other DB customers to unlock the full potential of AI platform, enabling them to achieve their business objectives efficiently and effectively. Outside of his professional activities, Emmanuel enjoys traveling and is an enthusiastic nature and hiking lover.

Vishwanath Bhat is a DevOps Architect at AWS Professional Services, based in Germany. He helps customers to get the full benefit of the cloud and achieve their business goals with AWS cloud. When not working, he likes to go swimming in alpine lakes, hiking, reading or play football.

Kumudhan Cherarajan is a DevOps Consultant at AWS Professional Services, based in Switzerland. He is passionate about helping customers adopt process and services that increase their efficiency in the cloud journey. When not working, he likes to play cricket and music.

Read More