Introducing three new NVIDIA GPU-based Amazon EC2 instances

Introducing three new NVIDIA GPU-based Amazon EC2 instances

Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing portfolio offers the broadest choice of accelerators to power your artificial intelligence (AI), machine learning (ML), graphics, and high performance computing (HPC) workloads. We are excited to announce the expansion of this portfolio with three new instances featuring the latest NVIDIA GPUs: Amazon EC2 P5e instances powered by NVIDIA H200 GPUs, Amazon EC2 G6 instances featuring NVIDIA L4 GPUs, and Amazon EC2 G6e instances powered by NVIDIA L40S GPUs. All three instances will be available in 2024, and we look forward to seeing what you can do with them.

AWS and NVIDIA have collaborated for over 13 years and have pioneered large-scale, highly performant, and cost-effective GPU-based solutions for developers and enterprise across the spectrum. We have combined NVIDIA’s powerful GPUs with differentiated AWS technologies such as AWS Nitro System, 3,200 Gbps of Elastic Fabric Adapter (EFA) v2 networking, hundreds of GB/s of data throughput with Amazon FSx for Lustre, and exascale computing with Amazon EC2 UltraClusters to deliver the most performant infrastructure for AI/ML, graphics, and HPC. Coupled with other managed services such as Amazon Bedrock, Amazon SageMaker, and Amazon Elastic Kubernetes Service (Amazon EKS), these instances provide developers with the industry’s best platform for building and deploying generative AI, HPC, and graphics applications.

High-performance and cost-effective GPU-based instances for AI, HPC, and graphics workloads

To power the development, training, and inference of the largest large language models (LLMs), EC2 P5e instances will feature NVIDIA’s latest H200 GPUs, which offer 141 GBs of HBM3e GPU memory, which is 1.7 times larger and 1.4 times faster than H100 GPUs. This boost in GPU memory along with up to 3200 Gbps of EFA networking enabled by AWS Nitro System will enable you to continue to build, train, and deploy your cutting-edge models on AWS.

EC2 G6e instances, featuring NVIDIA L40S GPUs, are built to provide developers with a broadly available option for training and inference of publicly available LLMs, as well as support the increasing adoption of Small Language Models (SLM). They are also optimal for digital twin applications that use NVIDIA Omniverse for describing and simulating across 3D tools and applications, and for creating virtual worlds and advanced workflows for industrial digitalization.

EC2 G6 instances, featuring NVIDIA L4 GPUs, will deliver a lower-cost, energy-efficient solution for deploying ML models for natural language processing, language translation, video and image analysis, speech recognition, and personalization as well as graphics workloads, such as creating and rendering real-time, cinematic-quality graphics and game streaming.


About the Author

Chetan Kapoor is the Director of Product Management for the Amazon EC2 Accelerated Computing Portfolio.

Read More

Boost inference performance for LLMs with new Amazon SageMaker containers

Boost inference performance for LLMs with new Amazon SageMaker containers

Today, Amazon SageMaker launches a new version (0.25.0) of Large Model Inference (LMI) Deep Learning Containers (DLCs) and adds support for NVIDIA’s TensorRT-LLM Library. With these upgrades, you can effortlessly access state-of-the-art tooling to optimize large language models (LLMs) on SageMaker and achieve price-performance benefits – Amazon SageMaker LMI TensorRT-LLM DLC reduces latency by 33% on average and improves throughput by 60% on average for Llama2-70B, Falcon-40B and CodeLlama-34B models, compared to previous version.

LLMs have seen an unprecedented growth in popularity across a broad spectrum of applications. However, these models are often too large to fit on a single accelerator or GPU device, making it difficult to achieve low-latency inference and scale. SageMaker offers LMI DLCs to help you maximize the utilization of available resources and improve performance. The latest LMI DLCs offer continuous batching support for inference requests to improve throughput, efficient inference collective operations to improve latency, Paged Attention V2 (which improves the performance of workloads with longer sequence lengths), and the latest TensorRT-LLM library from NVIDIA to maximize performance on GPUs. LMI DLCs offer a low-code interface that simplifies compilation with TensorRT-LLM by just requiring the model ID and optional model parameters; all of the heavy lifting required with building a TensorRT-LLM optimized model and creating a model repo is managed by the LMI DLC. In addition, you can use the latest quantization techniques—GPTQ, AWQ, and SmoothQuant—that are available with LMI DLCs. As a result, with LMI DLCs on SageMaker, you can accelerate time-to-value for your generative AI applications and optimize LLMs for the hardware of your choice to achieve best-in-class price-performance.

In this post, we dive deep into the new features with the latest release of LMI DLCs, discuss performance benchmarks, and outline the steps required to deploy LLMs with LMI DLCs to maximize performance and reduce costs.

New features with SageMaker LMI DLCs

In this section, we discuss three new features with SageMaker LMI DLCs.

SageMaker LMI now supports TensorRT-LLM

SageMaker now offers NVIDIA’s TensorRT-LLM as part of the latest LMI DLC release (0.25.0), enabling state-of-the-art optimizations like SmoothQuant, FP8, and continuous batching for LLMs when using NVIDIA GPUs. TensorRT-LLM opens the door to ultra-low latency experiences that can greatly improve performance. The TensorRT-LLM SDK supports deployments ranging from single-GPU to multi-GPU configurations, with additional performance gains possible through techniques like tensor parallelism. To use the TensorRT-LLM library, choose the TensorRT-LLM DLC from the available LMI DLCs and set engine=MPI among other settings such as option.model_id. The following diagram illustrates the TensorRT-LLM tech stack.

Efficient inference collective operations

In a typical deployment of LLMs, model parameters are spread across multiple accelerators to accommodate the requirements of a large model that can’t fit on a single accelerator. This enhances inference speed by enabling each accelerator to carry out partial calculations in parallel. Afterwards, a collective operation is introduced to consolidate these partial results at the end of these processes, and redistribute them among the accelerators.

For P4D instance types, SageMaker implements a new collective operation that speeds up communication between GPUs. As a result, you get lower latency and higher throughput with the latest LMI DLCs compared to previous versions. Furthermore, this feature is supported out of the box with LMI DLCs, and you don’t need to configure anything to use this feature because it’s embedded in the SageMaker LMI DLCs and is exclusively available for Amazon SageMaker.

Quantization support

SageMaker LMI DLCs now support the latest quantization techniques, including pre-quantized models with GPTQ, Activation-aware Weight Quantization (AWQ), and just-in-time quantization like SmoothQuant.

GPTQ allows LMI to run popular INT3 and INT4 models from Hugging Face. It offers the smallest possible model weights that can fit on a single GPU/multi-GPU. LMI DLCs also support AWQ inference, which allows faster inference speed. Finally, LMI DLCs now support SmoothQuant, which allows INT8 quantization to reduce the memory footprint and computational cost of models with minimal loss in accuracy. Currently, we allow you to do just-in-time conversion for SmoothQuant models without any additional steps. GPTQ and AWQ need to be quantized with a dataset to be used with LMI DLCs. You can also pick up popular pre-quantized GPTQ and AWQ models to use on LMI DLCs. To use SmoothQuant, set option.quantize=smoothquant with engine=DeepSpeed in serving.properties. A sample notebook using SmoothQuant for hosting GPT-Neox on ml.g5.12xlarge is located on GitHub.

Using SageMaker LMI DLCs

You can deploy your LLMs on SageMaker using the new LMI DLCs 0.25.0 without any changes to your code. SageMaker LMI DLCs use DJL serving to serve your model for inference. To get started, you just need to create a configuration file that specifies settings like model parallelization and inference optimization libraries to use. For instructions and tutorials on using SageMaker LMI DLCs, refer to Model parallelism and large model inference and our list of available SageMaker LMI DLCs.

The DeepSpeed container includes a library called LMI Distributed Inference Library (LMI-Dist). LMI-Dist is an inference library used to run large model inference with the best optimization used in different open-source libraries, across vLLM, Text-Generation-Inference (up to version 0.9.4), FasterTransformer, and DeepSpeed frameworks. This library incorporates open-source popular technologies like FlashAttention, PagedAttention, FusedKernel, and efficient GPU communication kernels to accelerate the model and reduce memory consumption.

TensorRT LLM is an open-source library released by NVIDIA in October 2023. We optimized the TensorRT-LLM library for inference speedup and created a toolkit to simplify the user experience by supporting just-in-time model conversion. This toolkit enables users to provide a Hugging Face model ID and deploy the model end-to-end. It also supports continuous batching with streaming. You can expect approximately 1–2 minutes to compile the Llama-2 7B and 13B models, and around 7 minutes for the 70B model. If you want to avoid this compilation overhead during SageMaker endpoint setup and scaling of instances , we recommend using ahead of time (AOT) compilation with our tutorial to prepare the model. We also accept any TensorRT LLM model built for Triton Server that can be used with LMI DLCs.

Performance benchmarking results

We compared the performance of the latest SageMaker LMI DLCs version (0.25.0) to the previous version (0.23.0). We conducted experiments on the Llama-2 70B, Falcon 40B, and CodeLlama 34B models to demonstrate the performance gain with TensorRT-LLM and efficient inference collective operations (available on SageMaker).

SageMaker LMI containers come with a default handler script to load and host models, providing a low-code option. You also have the option to bring your own script if you need to do any customizations to the model loading steps. You need to pass the required parameters in a serving.properties file. This file contains the required configurations for the Deep Java Library (DJL) model server to download and host the model. The following code is the serving.properties used for our deployment and benchmarking:

engine=MPI
option.use_custom_all_reduce=true 
option.model_id={{s3url}}
option.tensor_parallel_degree=8
option.output_formatter=json
option.max_rolling_batch_size=64
option.model_loading_timeout=3600

The engine parameter is used to define the runtime engine for the DJL model server. We can specify the Hugging Face model ID or Amazon Simple Storage Service (Amazon S3) location of the model using the model_id parameter. The task parameter is used to define the natural language processing (NLP) task. The tensor_parallel_degree parameter sets the number of devices over which the tensor parallel modules are distributed. The use_custom_all_reduce parameter is set to true for GPU instances that have NVLink enabled to speed up model inference. You can set this for P4D, P4de, P5 and other GPUs that have NVLink connected. The output_formatter parameter sets the output format. The max_rolling_batch_size parameter sets the limit for the maximum number of concurrent requests. The model_loading_timeout sets the timeout value for downloading and loading the model to serve inference. For more details on the configuration options, refer to Configurations and settings.

Llama-2 70B

The following are the performance comparison results of Llama-2 70B. Latency reduced by 28% and throughput increased by 44% for concurrency of 16, with the new LMI TensorRT LLM DLC.

Falcon 40B

The following figures compare Falcon 40B. Latency reduced by 36% and throughput increased by 59% for concurrency of 16, with the new LMI TensorRT LLM DLC.

CodeLlama 34B

The following figures compare CodeLlama 34B. Latency reduced by 36% and throughput increased by 77% for concurrency of 16, with the new LMI TensorRT LLM DLC.

Recommended configuration and container for hosting LLMs

With the latest release, SageMaker is providing two containers: 0.25.0-deepspeed and 0.25.0-tensorrtllm. The DeepSpeed container contains DeepSpeed, the LMI Distributed Inference Library. The TensorRT-LLM container includes NVIDIA’s TensorRT-LLM Library to accelerate LLM inference.

We recommend the deployment configuration illustrated in the following diagram.

To get started, refer to the sample notebooks:

Conclusion

In this post, we showed how you can use SageMaker LMI DLCs to optimize LLMs for your business use case and achieve price-performance benefits. To learn more about LMI DLC capabilities, refer to Model parallelism and large model inference. We’re excited to see how you use these new capabilities from Amazon SageMaker.


About the authors

Michael Nguyen is a Senior Startup Solutions Architect at AWS, specializing in leveraging AI/ML to drive innovation and develop business solutions on AWS. Michael holds 12 AWS certifications and has a BS/MS in Electrical/Computer Engineering and an MBA from Penn State University, Binghamton University, and the University of Delaware.

Rishabh Ray Chaudhury is a Senior Product Manager with Amazon SageMaker, focusing on Machine Learning inference. He is passionate about innovating and building new experiences for Machine Learning customers on AWS to help scale their workloads. In his spare time, he enjoys traveling and cooking. You can find him on LinkedIn.

Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.

Jian Sheng is a Software Development Engineer at Amazon Web Services who has worked on several key aspects of machine learning systems. He has been a key contributor to the SageMaker Neo service, focusing on deep learning compilation and framework runtime optimization. Recently, he has directed his efforts and contributed to optimizing the machine learning system for large model inference.

Vivek Gangasani is a AI/ML Startup Solutions Architect for Generative AI startups at AWS. He helps emerging GenAI startups build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.

Harish Tummalacherla is Software Engineer with Deep Learning Performance team at SageMaker. He works on performance engineering for serving large language models efficiently on SageMaker. In his spare time, he enjoys running, cycling and ski mountaineering.

Read More

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

Generative artificial intelligence (generative AI) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Most real-world data exists in unstructured formats like PDFs, which requires preprocessing before it can be used effectively.

According to IDC, unstructured data accounts for over 80% of all business data today. This includes formats like emails, PDFs, scanned documents, images, audio, video, and more. While this data holds valuable insights, its unstructured nature makes it difficult for AI algorithms to interpret and learn from it. According to a 2019 survey by Deloitte, only 18% of businesses reported being able to take advantage of unstructured data.

As AI adoption continues to accelerate, developing efficient mechanisms for digesting and learning from unstructured data becomes even more critical in the future. This could involve better preprocessing tools, semi-supervised learning techniques, and advances in natural language processing. Companies that use their unstructured data most effectively will gain significant competitive advantages from AI. Clean data is important for good model performance. Extracted texts still have large amounts of gibberish and boilerplate text (e.g., read HTML). Scraped data from the internet often contains a lot of duplications. Data from social media, reviews, or any user generated contents can also contain toxic and biased contents, and you may need to filter them out using some pre-processing steps. There could also be a lot of low-quality contents or bot-generated texts, which can be filtered out using accompanying metadata (e.g., filter out customer service responses that received low customer ratings).

Data preparation is important at multiple stages in Retrieval Augmented Generation (RAG) models. The knowledge source documents need preprocessing, like cleaning text and generating semantic embeddings, so they can be efficiently indexed and retrieved. The user’s natural language query also requires preprocessing, so it can be encoded into a vector and compared to document embeddings. After retrieving relevant contexts, they may need additional preprocessing, like truncation, before being concatenated to the user’s query to create the final prompt for the foundation model.

Solution overview

In this post, we work with a PDF documentation dataset—Amazon Bedrock user guide. Further, we show how to preprocess a dataset for RAG. Specifically, we clean the data and create RAG artifacts to answer the questions about the content of the dataset. Consider the following machine learning (ML) problem: user asks a large language model (LLM) question: “How to filter and search models in Amazon Bedrock?”. LLM has not seen the documentation during the training or fine-tuning stage, thus wouldn’t be able to answer the question and most probably will hallucinate. Our goal with this post, is to find a relevant piece of text from the PDF (i.e., RAG) and attach it to the prompt, thus enabling LLM to answer questions specific to this document.

Below, we show how you can do all these main preprocessing steps from Amazon SageMaker Data Wrangler:

  1. Extracting text from a PDF document (powered by Textract)
  2. Remove sensitive information (powered by Comprehend)
  3. Chunk text into pieces.
  4. Create embeddings for each piece (powered by Bedrock).
  5. Upload embedding to a vector database (powered by OpenSearch)

Prerequisites

For this walkthrough, you should have the following:

Note: Create OpenSearch Service domains following the instructions here. For simplicity, let’s pick the option with a master username and password for fine-grained access control. Once the domain is created, create a vector index with the following mappings, and vector dimension 1536 aligns with Amazon Titan embeddings:

PUT knowledge-base-index
{
  "settings": {
    "index.knn": True
  },
  "mappings": {
    "properties": {
      "text_content": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "text_content_v": {
        "type": "knn_vector",
        "dimension": 1536
      },
      
    }
  }
} }

Walkthrough

Build a data flow

In this section, we cover how we can build a data flow to extract text and metadata from PDFs, clean and process the data, generate embeddings using Amazon Bedrock, and index the data in Amazon OpenSearch.

Launch SageMaker Canvas

To launch SageMaker Canvas, complete the following steps:

  1. On the Amazon SageMaker Console, choose Domains in the navigation pane.
  2. Choose your domain.
  3. On the launch menu, choose Canvas.

Create a dataflow

Complete the following steps to create a data flow in SageMaker Canvas:

  1. On the SageMaker Canvas home page, choose Data preparation.
  2. Choose Create on the right side of page, then give a data flow name and select Create.
  3. This will land on a data flow page.
  4. Choose Import data, select tabular data.

Now let’s import the data from Amazon S3 bucket:

  1. Choose Import data and select Tabular from the drop-down list.
  2. Data Source and select Amazon S3 from the drop-down list.
  3. Navigate to the meta data file with PDF file locations, and choose the file.
  4. Now the metadata file is loaded to the data preparation data flow, and we can proceed to add next steps to transform the data and index into Amazon OpenSearch. In this case the file has following metadata, with the location of each file in Amazon S3 directory.

To add a new transform, complete the following steps:

  1. Choose the plus sign and choose Add Transform.
  2. Choose Add Step and choose Custom Transform.
  3. You can create a custom transform using Pandas, PySpark, Python user-defined functions, and SQL PySpark. Choose Python (PySpark) for this use-case.
  4. Enter a name for the step. From the example code snippets, browse and select extract text from pdf. Make necessary changes to code snippet and select Add.

  5. Let’s add a step to redact Personal Identifiable Information (PII) data from the extracted data by leveraging Amazon Comprehend. Choose Add Step and choose Custom Transform. And select Python (PySpark).

From the example code snippets, browse and select mask PII. Make necessary changes to code snippet and select Add.

  1. The next step is to chunk the text content. Choose Add Step and choose Custom Transform. And select Python (PySpark).

From the example code snippets, browse and select Chunk text. Make necessary changes to code snippet and select Add.

  1. Let’s convert the text content to vector embeddings using the Amazon Bedrock Titan Embeddings model. Choose Add Step and choose Custom Transform. And select Python (PySpark).

From the example code snippets, browse and select Generate text embedding with Bedrock. Make necessary changes to code snippet and select Add.

  1. Now we have vector embeddings available for the PDF file contents. Let’s go ahead and index the data into Amazon OpenSearch. Choose Add Step and choose Custom Transform. And select Python (PySpark). You’re free to rewrite the following code to use your preferred vector database. For simplicity, we are using master username and password to access OpenSearch API’s, for production workloads select option according to your organization policies.
    from pyspark.sql.functions import col, udf
    from pyspark.sql.types import StringType
    import json
    import requests
    
    text_column = "text_redacted_chunks_embedding"
    output_column = text_column + "_response"
    
    headers = {"Content-Type": "application/json", "kbn-xsrf": "true", "osd-xsrf": "true", "security_tenant": "global"};
    index_name = 's3_vector_data_v1'
    
    
    def index_data(text_redacted_chunks, text_redacted_chunks_embedding):
        input_json = json.dumps({"text_content": text_redacted_chunks[-1], "text_content_v": text_redacted_chunks_embedding[-1]})
        response = requests.request(method="POST",
                                    url=f'https://search-canvas-vector-db-domain-dt3yq3b4cykwuvc6t7rnkvmnka.us-west-2.es.amazonaws.com/{index_name}/_doc',
                                    headers=headers,
                                    json=input_json,
                                    auth=(master_user, 'master_pass'),
                                    timeout=30)
        return response.content
    
    
    indexing_udf = udf(index_data, StringType())
    df = df.withColumn('index_response',
                       indexing_udf(col("text_redacted_chunks"), col("text_redacted_chunks_embedding")))

Finally, the dataflow created would be as follows:

With this dataflow, the data from the PDF file has been read and indexed with vector embeddings in Amazon OpenSearch. Now it’s time for us to create a file with queries to query the indexed data and save it to the Amazon S3 location. We’ll point our search data flow to the file and output a file with corresponding results in a new file in an Amazon S3 location.

Preparing a prompt

After we create a knowledge base out of our PDF, we can test it by searching the knowledge base for a few sample queries. We’ll process each query as follows:

  1. Generate embedding for the query (powered by Amazon Bedrock)
  2. Query vector database for the nearest neighbor context (powered by Amazon OpenSearch)
  3. Combine the query and the context into the prompt.
  4. Query LLM with a prompt (powered by Amazon Bedrock)
  5. On the SageMaker Canvas home page, choose Data preparation.
  6. Choose Create on the right side of page, then give a data flow name and select Create.

Now let’s load the user questions and then create a prompt by combining the question and the similar documents. This prompt is provided to the LLM for generating an answer to the user question.

  1. Let’s load a csv file with user questions. Choose Import Data and select Tabular from the drop-down list.
  2. Data Source, and select Amazon S3 from the drop-down list. Alternatively, you can choose to upload a file with user queries.
  3. Let’s add a custom transformation to convert the data into vector embeddings, followed by searching related embeddings from Amazon OpenSearch, before sending a prompt to Amazon Bedrock with the query and context from knowledge base. To generate embeddings for the query, you can use the same example code snippet Generate text embedding with Bedrock mentioned in Step #7 above.

Let’s invoke the Amazon OpenSearch API to search relevant documents for the generated vector embeddings. Add a custom transform with Python (PySpark).

from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType
import json
import requests

text_column = "Queries_embedding"
output_column = text_column + "_response"

headers = {"Content-Type": "application/json", "kbn-xsrf": "true", "osd-xsrf": "true", "security_tenant": "global"};
index_name = 's3_vector_data_v1'

def search_data(text_column_embedding):
    input_json={'size':20,'query':{'knn':{'text_content_v':{'vector':{text_column_embedding},'k':5,},},},'fields':['text_content']}
    response = requests.request(method="GET",
                                url=f'https://search-canvas-vector-db-domain-dt3yq3b4cykwuvc6t7rnkvmnka.us-west-2.es.amazonaws.com/{index_name}/_search',
                                headers=headers,
                                json=input_json,
                                auth=(master_user, master_pass'),
                                timeout=30)
    return response.content

search_udf = udf(search_data, types.ArrayType())
df = df.withColumn(output_column,search_udf(col(text_column)))

Let’s add a custom transform to call the Amazon Bedrock API for query response, passing the documents from the Amazon OpenSearch knowledge base. From the example code snippets, browse and select Query Bedrock with context. Make necessary changes to code snippet and select Add.

In summary, RAG based question answering dataflow is as follows:

ML practitioners spend a lot of time crafting feature engineering code, applying it to their initial datasets, training models on the engineered datasets, and evaluating model accuracy. Given the experimental nature of this work, even the smallest project leads to multiple iterations. The same feature engineering code is often run again and again, wasting time and compute resources on repeating the same operations. In large organizations, this can cause an even greater loss of productivity because different teams often run identical jobs or even write duplicate feature engineering code because they have no knowledge of prior work. To avoid the reprocessing of features, we’ll export our data flow to an Amazon SageMaker pipeline. Let’s select the + button to the right of the query. Select export data flow and choose Run SageMaker Pipeline (via Jupyter notebook).

Cleaning up

To avoid incurring future charges, delete or shut down the resources you created while following this post. Refer to Logging out of Amazon SageMaker Canvas for more details.

Conclusion

In this post, we showed you how Amazon SageMaker Canvas’s end-to-end capabilities by assuming the role of a data professional preparing data for an LLM. The interactive data preparation enabled quickly cleaning, transforming, and analyzing the data to engineer informative features. By removing coding complexities, SageMaker Canvas allowed rapid iteration to create a high-quality training dataset. This accelerated workflow led directly into building, training, and deploying a performant machine learning model for business impact. With its comprehensive data preparation and unified experience from data to insights, SageMaker Canvas empowers users to improve their ML outcomes.

We encourage you to learn more by exploring Amazon SageMaker Data Wrangler, Amazon SageMaker Canvas, Amazon Titan models, Amazon Bedrock, and Amazon OpenSearch Service to build a solution using the sample implementation provided in this post and a dataset relevant to your business. If you have questions or suggestions, then please leave a comment.


About the Authors

Ajjay Govindaram is a Senior Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.

Nikita Ivkin is a Senior Applied Scientist at Amazon SageMaker Data Wrangler with interests in machine learning and data cleaning algorithms.

Read More

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI.

This is the third post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker.

In Part 1 and Part 2, we show how the Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their Salesforce data securely using SageMaker and use its tools to build, train, and deploy models to endpoints hosted on SageMaker. SageMaker endpoints can be registered to the Salesforce Data Cloud to activate predictions in Salesforce.

In this post, we demonstrate how business analysts and citizen data scientists can create machine learning (ML) models, without any code, in Amazon SageMaker Canvas and deploy trained models for integration with Salesforce Einstein Studio to create powerful business applications. SageMaker Canvas provides a no-code experience to access data from Salesforce Data Cloud and build, test, and deploy models using just a few clicks. SageMaker Canvas also enables you to understand your predictions using feature importance and SHAP values, making it straightforward for you to explain predictions made by ML models.

SageMaker Canvas

SageMaker Canvas enables business analysts and data science teams to build and use ML and generative AI models without having to write a single line of code. SageMaker Canvas provides a visual point-and-click interface to generate accurate ML predictions for classification, regression, forecasting, natural language processing (NLP), and computer vision (CV). In addition, you can access and evaluate foundation models (FMs) from Amazon Bedrock or public FMs from Amazon SageMaker JumpStart for content generation, text extraction, and text summarization to support generative AI solutions. SageMaker Canvas allows you to bring ML models built anywhere and generate predictions directly in SageMaker Canvas.

Salesforce Data Cloud and Einstein Studio

Salesforce Data Cloud is a data platform that provides businesses with real-time updates of their customer data from any touch point.

Einstein Studio is a gateway to AI tools on Salesforce Data Cloud. With Einstein Studio, admins and data scientists can effortlessly create models with a few clicks or using code. Einstein Studio’s bring your own model (BYOM) experience provides the capability to connect custom or generative AI models from external platforms such as SageMaker to Salesforce Data Cloud.

Solution overview

To demonstrate how you can build ML models using data in Salesforce Data Cloud using SageMaker Canvas, we create a predictive model to recommend a product. This model uses the features stored in Salesforce Data Cloud such as customer demographics, marketing engagements, and purchase history. The product recommendation model is built and deployed using the SageMaker Canvas no-code user interface using data in Salesforce Data Cloud.

We use the following sample dataset stored in Amazon Simple Storage Service (Amazon S3). To use this dataset in Salesforce Data Cloud, refer to Create Amazon S3 Data Stream in Data Cloud. The following attributes are needed to create the model:

  • Club Member – If the customer is a club member
  • Campaign – The campaign the customer is a part of
  • State – The state or province the customer resides in
  • Month – The month of purchase
  • Case Count – The number of cases raised by the customer
  • Case Type Return – Whether the customer returned any product within the last year
  • Case Type Shipment Damaged – Whether the customer had any shipments damaged in the last year
  • Engagement Score – The level of engagement the customer has (response to mailing campaigns, logins to the online store, and so on)
  • Tenure – The tenure of the customer relationship with the company
  • Clicks – The average number of clicks the customer has made within a week prior to purchase
  • Pages Visited – The average number of pages the customer visited within a week prior to purchase
  • Product Purchased – The actual product purchased

The following steps give an overview of how to use the Salesforce Data Cloud connector launched in SageMaker Canvas to access your enterprise data and build a predictive model:

  1. Configure the Salesforce connected app to register the SageMaker Canvas domain.
  2. Set up OAuth for Salesforce Data Cloud in SageMaker Canvas.
  3. Connect to Salesforce Data Cloud data using the built-in SageMaker Canvas Salesforce Data Cloud connector and import the dataset.
  4. Build and train models in SageMaker Canvas.
  5. Deploy the model in SageMaker Canvas and make predictions.
  6. Deploy an Amazon API Gateway endpoint as a front-end connection to the SageMaker inference endpoint.
  7. Register the API Gateway endpoint in Einstein Studio. For instructions, refer to Bring Your Own AI Models to Data Cloud.

The following diagram illustrates the solution architecture.

Prerequisites

Before you get started, complete the following prerequisite steps to create a SageMaker domain and enable SageMaker Canvas:

  1. Create an Amazon SageMaker Studio domain. For instructions, refer to Onboard to Amazon SageMaker Domain.
  2. Note down the domain ID and execution role that is created and will be used by your user profile. You add permissions to this role in subsequent steps.

The following screenshot shows the domain we created for this post.

  1. Next, go to the user profile and choose Edit.
  2. Navigate to the Amazon SageMaker Canvas settings section and select Enable Canvas base permissions.
  3. Select Enable direct deployments of Canvas models and Enable model registry permissions for all users.

This allows SageMaker Canvas to deploy models to endpoints on the SageMaker console. These settings can be configured at the domain or user profile level. User profile settings take precedence over domain settings.

Create or update the Salesforce connected app

Next, we create a Salesforce connected app to enable the OAuth flow from SageMaker Canvas to Salesforce Data Cloud. Complete the following steps:

  1. Log in to Salesforce and navigate to Setup.
  2. Search for App Manager and create a new connected app.
  3. Provide the following inputs:
    1. For Connected App Name, enter a name.
    2. For API Name, leave as default (it’s automatically populated).
    3. For Contact Email, enter your contact email address.
    4. Select Enable OAuth Settings.
    5. For Callback URL, enter https://<domain-id>.studio.<region>.sagemaker.aws/canvas/default/lab, and provide the domain ID and Region from your SageMaker domain.
  4. Configure the following scopes on your connected app:
    1. Manage user data via APIs (api).
    2. Perform requests at any time (refresh_token, offline_access).
    3. Perform ANSI SQL queries on Salesforce Data Cloud data (Data Cloud_query_api).
    4. Manage Data Cloud profile data (Data Cloud_profile_api).
    5. Access the identity URL service (id, profile, email, address, phone).
    6. Access unique user identifiers (openid).
  5. Set your connected app IP Relaxation setting to Relax IP restrictions.

Configure OAuth settings for the Salesforce Data Cloud connector

SageMaker Canvas uses AWS Secrets Manager to securely store connection information from the Salesforce connected app. SageMaker Canvas allows administrators to configure OAuth settings for an individual user profile or at the domain level. Note that you can add a secret to both a domain and user profile, but SageMaker Canvas looks for secrets in the user profile first.

To configure your OAuth settings, complete the following steps:

  1. Navigate to edit Domain or User Profile Settings in SageMaker Console.
  2. Choose Canvas Settings in the navigation pane.
  3. Under OAuth settings, for Data Source, choose Salesforce Data Cloud.
  4. For Secret setup, you can create a new secret or use an existing secret. For this example, we create a new secret and input the client ID and client secret from the Salesforce connected app.

For more details on enabling OAuth in SageMaker Canvas, refer to Set up OAuth for Salesforce Data Cloud.

This completes the setup to enable data access from Salesforce Data Cloud to SageMaker Canvas to build AI and ML models.

Import data from Salesforce Data Cloud

To import your data, complete the following steps:

  1. From the user profile you created with your SageMaker domain, choose Launch and select Canvas.

The first time you access your Canvas app, it will take about 10 minutes to create.

  1. Choose Data Wrangler in the navigation pane.
  2. On the Create menu, choose Tabular to create a tabular dataset.
  3. Name the dataset and choose Create.
  4. For Data Source, choose Salesforce Data Cloud and Add Connection to import the data lake object.

If you’ve previously configured a connection to Salesforce Data Cloud, you will see an option to use that connection instead of creating a new one.

  1. Provide a name for a new Salesforce Data Cloud connection and choose Add connection.

It will take a few minutes to complete.

  1. You will be redirected to the Salesforce login page to authorize the connection.

After the login is successful, the request will be redirected back to SageMaker Canvas with the data Lake object listing.

  1. Select the dataset that contains the features for model training that was uploaded via Amazon S3.
  2. Drag and drop the file, then choose Edit in SQL.

Salesforce adds a “__c“ to all the Data Cloud object fields. As per SageMaker Canvas naming convention, ”__“ is not allowed in the field names.

  1. Edit the SQL to rename the columns and drop metadata that isn’t relevant for model training. Replace the table name with your object name.
    SELECT "state__c" as state, 
    "case_type_shipment_damaged__c" as case_type_shipment_damaged, 
    "campaign__c" as campaign, 
    "engagement_score__c" as engagement_score, 
    "case_count__c" as case_count, 
    "case_type_return__c" as case_type_return, 
    "club_member__c" as club_member, 
    "pages_visited__c" as pages_visited, 
    "product_purchased__c" as product_purchased, 
    "clicks__c" as clicks, 
    "tenure__c" as tenure, 
    "month__c" as month FROM product_recommendation__dlm;

  2. Choose Run SQL and then Create dataset.
  3. Select the dataset and choose Create a model.

  4. To create a model to predict a product recommendation, provide a model name, choose Predictive analysis for Problem type, and choose Create.

Build and train the model

Complete the following steps to build and train your model:

  1. After the model is launched, set the target column to product_purchased.

SageMaker Canvas displays key statistics and correlations of each column to the target column. SageMaker Canvas provides you with tools to preview your model and validate data before you begin building.

  1. Use the preview model feature to see the accuracy of your model and validate your dataset to prevent issues while building the model.
  2. After reviewing your data and making any changes to your dataset, choose your build type. The Quick build option may be faster, but it will only use a subset of your data to build a model. For the purpose of this post, we selected the Standard build option.

A standard build can take 2–4 hours to complete.

SageMaker Canvas automatically handles missing values in your dataset while it builds the model. It will also apply other data prep transformations for you to get the data ready for ML.

  1. After your model begins building, you can leave the page.

When the model shows as Ready on the My models page, it’s ready for analysis and predictions.

  1. After the model is built, navigate to My models, choose View to view the model you created, and choose the most recent version.
  2. Go to the Analyze tab to see the impact of each feature on the prediction.
  3. For additional information on the model’s predictions, navigate to the Scoring tab.
  4. Choose Predict to initiate a product prediction.

Deploy the model and make predictions

Complete the following steps to deploy your model and start making predictions:

  1. You can choose to make either batch or single predictions. For the purpose of this post, we choose Single prediction.

When you choose Single prediction, SageMaker Canvas displays the features that you can provide inputs for.

  1. You can change the values by choosing Update and view the real-time prediction.

The accuracy of the model as well as the impact of each feature for that specific prediction will be displayed.

  1. To deploy the model, provide a deployment name, select an instance type and instance count, and choose Deploy.

Model deployment will take a few minutes.

Model status is updated to In Service after the deployment is successful.

SageMaker Canvas provides an option to test the deployment.

  1. Choose View details.

The Details tab provides the model endpoint details. Instance type, count, input format, response content, and endpoint are some of key details displayed.

  1. Choose Test deployment to test the deployed endpoint.

Similar to single prediction, the view displays the input features and provides an option to update and test the endpoint in real time.

The new prediction along with the endpoint invocation result is returned to the user.

Create API to expose SageMaker Endpoint

To generate predictions that power business applications in Salesforce, you need to expose the SageMaker inference endpoint created by your SageMaker Canvas deployment via API Gateway and register it in Salesforce Einstein.

The request and response formats vary between Salesforce Einstein and SageMaker inference endpoint. You could either use API Gateway to perform the transformation or use AWS Lambda to transform the request and map the response. Refer to Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda to expose a SageMaker endpoint via Lambda and API Gateway.

The following code snippet is a Lambda function to transform the request and the response

import json
import boto3
import os
client = boto3.client("runtime.sagemaker")
endpoint = os.environ['SAGEMAKER_ENDPOINT_NAME']
prediction_label = 'product_purchased__c'

def lambda_handler(event, context):
        features=[]
        # Input Sample : {"instances": [{"features": ["Washington", 1, "New Colors", 1, 1, 1, 1, 1, 1, 1, 1]}, {"features": ["California", 1, "Web", 100, 1, 1, 100, 1, 10, 1, 1]}]}
        for instance in event["instances"]:
            features.append(','.join(map(str, instance["features"])))
        body='n'.join(features)
        response = client.invoke_endpoint(EndpointName=endpoint,ContentType="text/csv",Body=body,Accept="application/json")
        response =  json.loads(response['Body'].read().decode('utf-8'))
        prediction_response={"predictions":[]}
        for prediction in response.get('predictions'):
            prediction_response['predictions'].append({prediction_label:prediction['predicted_label']})
        return prediction_response

Update the endpoint and prediction_label values in the Lambda function based on your configuration.

  1. Add an environment variable SAGEMAKER_ENDPOINT_NAME to capture the SageMaker inference endpoint.
  2. Set the prediction label to match the model output JSON key that is registered in Einstein Studio.

The default timeout for a Lambda function is 3 seconds. Depending on the prediction request input size, the SageMaker real-time inference API may take more than 3 seconds to respond.

  1. Increase the Lambda function timeout but keep it below the API Gateway default integration timeout, which is 29 seconds.

Register the model in Salesforce Einstein Studio

To register the API Gateway endpoint in Einstein Studio, refer to Bring Your Own AI Models to Data Cloud.

Conclusion

In this post, we explained how you can use SageMaker Canvas to connect to Salesforce Data Cloud and generate predictions through automated ML features without writing a single line of code. We demonstrated the SageMaker Canvas model build capability to conduct an early preview of your model performance before running the standard build that trains the model with the full dataset. We also showcased post-model creation activities like using the single predictions interface within SageMaker Canvas and understanding your predictions using feature importance. Next, we used the SageMaker endpoint created in SageMaker Canvas and made it available as an API so you can integrate it with Salesforce Einstein Studio and create powerful Salesforce applications.

In an upcoming post, we will show you how to use data from Salesforce Data Cloud in SageMaker Canvas to make data insights and preparation even more straightforward by using a visual interface and simple natural language prompts.

To get started with SageMaker Canvas, see SageMaker Canvas immersion day and refer to Getting started with Amazon SageMaker Canvas.


About the authors

Daryl Martis is the Director of Product for Einstein Studio at Salesforce Data Cloud. He has over 10 years of experience in planning, building, launching, and managing world-class solutions for enterprise customers, including AI/ML and cloud solutions. He has previously worked in the financial services industry in New York City. Follow him on Linkedin.

Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.

Ife Stewart is a Principal Solutions Architect in the Strategic ISV segment at AWS. She has been engaged with Salesforce Data Cloud over the last 2 years to help build integrated customer experiences across Salesforce and AWS. Ife has over 10 years of experience in technology. She is an advocate for diversity and inclusion in the technology field.

 Ravi Bhattiprolu is a Sr. Partner Solutions Architect at AWS. Ravi works with strategic partners, Salesforce and Tableau, to deliver innovative and well-architected products and solutions that help joint customers realize their business objectives.

Miriam Lebowitz is a Solutions Architect in the Strategic ISV segment at AWS. She is engaged with teams across Salesforce, including Salesforce Data Cloud, and specializes in data analytics. Outside of work, she enjoys baking, traveling, and spending quality time with friends and family.

Read More

GPT-4’s potential in shaping the future of radiology

GPT-4’s potential in shaping the future of radiology

This research paper is being presented at the 2023 Conference on Empirical Methods in Natural Language Processing (opens in new tab) (EMNLP 2023), the premier conference on natural language processing and artificial intelligence.

EMNLP 2023 blog hero - female radiologist analyzing an MRI image of the head

In recent years, AI has been increasingly integrated into healthcare, bringing about new areas of focus and priority, such as diagnostics, treatment planning, patient engagement. While AI’s contribution in certain fields like image analysis and drug interaction is widely recognized, its potential in natural language tasks with these newer areas presents an intriguing research opportunity. 

One notable advancement in this area involves GPT-4’s impressive performance (opens in new tab) on medical competency exams and benchmark datasets. GPT-4 has also demonstrated potential utility (opens in new tab) in medical consultations, providing a promising outlook for healthcare innovation.

Progressing radiology AI for real problems

Our paper, “Exploring the Boundaries of GPT-4 in Radiology (opens in new tab),” which we are presenting at EMNLP 2023 (opens in new tab), further explores GPT-4’s potential in healthcare, focusing on its abilities and limitations in radiology—a field that is crucial in disease diagnosis and treatment through imaging technologies like x-rays, computed tomography (CT) and magnetic resonance imaging (MRI). We collaborated with our colleagues at Nuance (opens in new tab), a Microsoft company, whose solution, PowerScribe, is used by more than 80 percent of US radiologists. Together, we aimed to better understand technology’s impact on radiologists’ workflow.

Our research included a comprehensive evaluation and error analysis framework to rigorously assess GPT-4’s ability to process radiology reports, including common language understanding and generation tasks in radiology, such as disease classification and findings summarization. This framework was developed in collaboration with a board-certified radiologist to tackle more intricate and challenging real-world scenarios in radiology and move beyond mere metric scores.

We also explored various effective zero-, few-shot, and chain-of-thought (CoT) prompting techniques for GPT-4 across different radiology tasks and experimented with approaches to improve the reliability of GPT-4 outputs. For each task, GPT-4 performance was benchmarked against prior GPT-3.5 models and respective state-of-the-art radiology models. 

We found that GPT-4 demonstrates new state-of-the-art performance in some tasks, achieving about a 10-percent absolute improvement over existing models, as shown in Table 1. Surprisingly, we found radiology report summaries generated by GPT-4 to be comparable and, in some cases, even preferred over those written by experienced radiologists, with one example illustrated in Table 2.

Table 1: Table showing GPT-4 either outperforms or is on par with previous state-of-the-art multimodal LLMs.
Table 1: Results overview. GPT-4 either outperforms or is on par with previous state-of-the-art (SOTA) multimodal LLMs.
Table 2. Table showing examples where GPT-4 impressions, or findings summaries, are favored over existing manually written impressions on the Open-i dataset. In both examples, GPT-4 outputs are more faithful and provide more complete details on the findings.
Table 2. Examples where GPT-4 findings summaries are favored over existing manually written ones on the Open-i dataset. In both examples, GPT-4 outputs are more faithful and provide more complete details on the findings.

Another encouraging prospect for GPT-4 is its ability to automatically structure radiology reports, as schematically illustrated in Figure 1. These reports, based on a radiologist’s interpretation of medical images like x-rays and include patients’ clinical history, are often complex and unstructured, making them difficult to interpret. Research shows that structuring these reports can improve standardization and consistency in disease descriptions, making them easier to interpret by other healthcare providers and more easily searchable for research and quality improvement initiatives. Additionally, using GPT-4 to structure and standardize radiology reports can further support efforts to augment real-world data (RWD) and its use for real-world evidence (RWE). This can complement more robust and comprehensive clinical trials and, in turn, accelerate the application of research findings into clinical practice.

MAIRA - Figure 1. Radiology report findings are input into GPT-4, which structures the findings into a knowledge graph and performs tasks such as disease classification, disease progression classification, or impression generation.
Figure 1. Radiology report findings are input into GPT-4, which structures the findings into a knowledge graph and performs tasks such as disease classification, disease progression classification, or impression generation.

Beyond radiology, GPT-4’s potential extends to translating medical reports into more empathetic (opens in new tab) and understandable formats for patients and other health professionals. This innovation could revolutionize patient engagement and education, making it easier for them and their carers to actively participate in their healthcare.

Microsoft Research Podcast

Collaborators: Gov4git with Petar Maymounkov and Kasia Sitkiewicz

Gov4git is a governance tool for decentralized, open-source cooperation, and is helping to lay the foundation for a future in which everyone can collaborate more efficiently, transparently, and easily and in ways that meet the unique desires and needs of their respective communities.


A promising path toward advancing radiology and beyond

When used with human oversight, GPT-4 also has the potential to transform radiology by assisting professionals in their day-to-day tasks. As we continue to explore this cutting-edge technology, there is great promise in improving our evaluation results of GPT-4 by investigating how it can be verified more thoroughly and finding ways to improve its accuracy and reliability. 

Our research highlights GPT-4’s potential in advancing radiology and other medical specialties, and while our results are encouraging, they require further validation through extensive research and clinical trials. Nonetheless, the emergence of GPT-4 heralds an exciting future for radiology. It will take the entire medical community working alongside other stakeholders in technology and policy to determine the appropriate use of these tools and responsibly realize the opportunity to transform healthcare. We eagerly anticipate its transformative impact towards improving patient care and safety.

Learn more about this work by visiting the Project MAIRA (opens in new tab) (Multimodal AI for Radiology Applications) page.

Acknowledgements 

We’d like to thank our coauthors: Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Perez-Garcia, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Ozan Oktay 

The post GPT-4’s potential in shaping the future of radiology appeared first on Microsoft Research.

Read More

AWS AI services enhanced with FM-powered capabilities

AWS AI services enhanced with FM-powered capabilities

Artificial intelligence (AI) continues to transform how we do business and serve our customers. AWS offers a range of pre-trained AI services that provide ready-to-use intelligence for your applications. In this post, we explore the new AI service capabilities and how they are enhanced using foundation models (FMs).

We focus on the following major updates in this post across key AI services:

  • Amazon Transcribe now offers FM-powered language support across over 100 languages to unlock rich insights.
  • Amazon Transcribe Call Analytics now offers a new generative AI-powered summarization capability (in preview) that automates post-call summarization to improve contact center agent and manager productivity.
  • Amazon Personalize now uses an FM to generate more compelling content and product recommendations
  • Amazon Lex now uses large language models (LLMs) to provide accurate and conversational responses to FAQs (in preview), going beyond task-oriented dialogue

Amazon Transcribe expands language support and supercharges customer service productivity using FMs

In order to build global and inclusive speech-enabled applications that cater to users from diverse linguistic backgrounds, customers seek a truly global AI service that can understand and transcribe a wide array of languages with high accuracy. To help you scale globally, Amazon Transcribe now offers a speech FM-powered automatic speech recognition (ASR) system that expands support to over 100 languages.

FM-powered Amazon Transcribe delivers significant accuracy improvement between 20% and 50% across most languages. Apart from accuracy improvements, the new ASR system delivers several differentiating features across all supported languages (over 100) related to ease of use, customization, user safety, and privacy. Some examples include features such as automatic punctuation, custom vocabulary, automatic language identification, speaker diarization, word-level confidence scores, and custom vocabulary filters. Enabled by the high accuracy of Amazon Transcribe across different accents and noise conditions, its support for a large number of languages, and its breadth of value-added feature sets, thousands of enterprises will be empowered to unlock rich insights from their audio content, as well as increase the accessibility and discoverability of their audio and video content across various domains. All existing and new customers using Amazon Transcribe can experience the performance improvements out of the box, without any API changes.

Carbyne is a software company that develops cloud-based, mission-critical contact center solutions for emergency call responders. Carbyne’s mission is to help emergency responders save lives, and language cannot come in the way of their goals.

“AI-powered Carbyne Live Audio Translation is directly aimed at helping improve emergency response for the 68 million Americans who speak a language other than English at home, in addition to the up to 79 million foreign visitors to the country annually. By leveraging Amazon Transcribe’s new multilingual foundation model powered ASR, Carbyne will be even better equipped to democratize life-saving emergency services, because Every. Person. Counts.”

– Alex Dizengof, Co-Founder and CTO of Carbyne.

In a contact center, agents spend precious time after each call manually summarizing notes, which can impact their productivity and increase call wait times. Managers who have limited time to investigate calls and agent performance spend a significant amount of time listening to call recordings or reading entire transcripts while investigating caller issues. Amazon Transcribe Call Analytics now offers generative call summarization, a generative AI-powered capability that can automatically condense the entire interaction into a concise summary. For example, the following is a sample summary of a 10-minute phone call: “Customer reported that they didn’t receive their order even after 10 days from expected delivery date. The agent offered the customer a free replacement and $10 credit for future purchases. The agent will follow up with the customer in 2 days to confirm the receipt of the replacement order.”

This capability allows agents to spend more time talking to callers waiting in the queue rather than engaging in after-call work, thereby improving customer experience. Managers can review the call summary to quickly understand the context of an interaction without reading the whole transcript.

With AWS post call analytics solution, Principal can currently conduct large-scale historical analytics to understand where customer experiences can be improved, generate actionable insights, and prioritize where to act. We look forward to exploring the post call summarization feature using generative AI in Amazon Transcribe Call Analytics in order to enable our agents to focus their time and resources engaging with customers, rather than manual after contact work

– Miguel Antonio Sanchez, Regional Chief Data Officer, Principal Financial Group.

The following screenshots illustrate how to enable generative call summarization on the Amazon Transcribe console, and an example of a summarized transcript.

Amazon Personalize enables hyper-personalization with FMs

Customers across industries such as retail and media and entertainment are increasingly looking to make content and recommended products more tailored to user interest in order to drive higher engagement. For instance, on streaming platforms, users see the standard “Because you watched” recommendations, and on ecommerce websites, “frequently bought together” is used as a generic tagline. To offer more personalized browsing experiences with titles such as “Rise and Shine” and “Love, laughter, and hijinks,” companies need to allocate resources to generate compelling taglines manually. This is tedious and time consuming.

To help address this challenge, Amazon Personalize now offers the Content Generator—a new FM-powered capability that uses natural language to craft simple and engaging text that describes the thematic connections between recommended items. This enables companies to automatically generate engaging titles or email subject lines, to invite customers to click on videos or purchase items.

In addition, Amazon Personalize now offers Personalize on LangChain to power the journey of customers who want to build their own FM-based applications. With this integration, you can invoke Amazon Personalize, retrieve recommendations for a campaign or recommender, and seamlessly feed it into your FM-powered applications within the LangChain ecosystem.

“We are integrating generative AI with Amazon Personalize in order to deliver hyper-personalized experiences to our users. Amazon Personalize has helped us achieve high levels of automation in content customization. For instance, FOX Sports experienced a 400% increase in viewership content starts post-event when applied. Now, we are augmenting generative AI with Amazon Bedrock to our pipeline in order to help our content editors generate themed collections. We look forward to exploring features such as Amazon Personalize Content Generator and Personalize on Langchain to further personalize those collections for our users.”

– Daryl Bowden, Executive Vice President, Technology, Fox Corporation.

Amazon Lex offers FM-powered capabilities to build bots faster and improve containment

Driven by rising consumer demand for automated self-service, companies are prioritizing investments in conversational AI to optimize customer experience. To that end, AWS recently previewed Conversational FAQ (CFAQ), a new capability from Amazon Lex that answers frequently asked customer questions intelligently and at scale. Powered by FMs from Amazon Bedrock and approved knowledge sources, CFAQ enables companies to provide accurate, automated responses to common customer inquiries in a natural and engaging way. With this innovation, brands can deliver seamless self-service experiences that strengthen customer satisfaction and loyalty.

CFAQ simplifies bot development by eliminating the need to manually create intents, sample utterances, slots, and prompts to handle a wide range of frequently asked questions. It does so with a new intent type called QnAIntent that securely connects to knowledge sources like Amazon Bedrock, Amazon OpenSearch Service, and Amazon Kendra knowledge bases to retrieve the most relevant information to answer a question. Developers maintain control over response content, with the option to summarize retrieved information or use the authorized text as is. This allows highly regulated industries like financial services and healthcare to use CFAQ, enabling you to ensure responses use only compliant language. By streamlining access to relevant knowledge, CFAQ reduces the effort to build bots that handle common customer questions naturally and accurately.

Conclusion

AWS is constantly innovating on behalf of our customers. The latest set of advancements in AI services allow us to deliver more impactful capabilities that help organizations work smarter and provide personalized and intuitive experiences. To learn more about these launches, refer to the following:


About the author

Bratin Saha is the Vice President of Artificial Intelligence and Machine Learning at AWS.

Read More

Elevate your self-service assistants with new generative AI features in Amazon Lex

Elevate your self-service assistants with new generative AI features in Amazon Lex

In this post, we talk about how generative AI is changing the conversational AI industry by providing new customer and bot builder experiences, and the new features in Amazon Lex that take advantage of these advances.

As the demand for conversational AI continues to grow, developers are seeking ways to enhance their chatbots with human-like interactions and advanced capabilities such as FAQ handling. Recent breakthroughs in generative AI are leading to significant improvements in natural language understanding that make conversational systems more intelligent. By training large neural network models on datasets with trillions of tokens, AI researchers have developed techniques that allow bots to understand more complex questions, provide nuanced and more natural human-sounding responses, and handle a wide range of topics. With these new generative AI innovations, you can create virtual assistants that feel more natural, intuitive, and helpful during text- or voice-based self-service interactions. The rapid progress in generative AI is bringing automated chatbots and virtual assistants significantly closer to the goal of having truly intelligent, free-flowing conversations. With further advances in deep learning and neural network techniques, conversational systems are poised to become even more flexible, relatable, and human-like. This new generation of AI-powered assistants can provide seamless self-service experiences across a multitude of use cases.

How Amazon Bedrock is changing the landscape of conversational AI

Amazon Bedrock is a user-friendly way to build and scale generative AI applications with foundational models (FMs). Amazon Bedrock offers an array of FMs from leading providers, so AWS customers have flexibility and choice to use the best models for their specific use case.

In today’s fast-paced world, we expect quick and efficient customer service from every business. However, providing excellent customer service can be significantly challenging when the volume of inquiries outpaces the human resources employed to address them. Businesses can overcome this challenge efficiently while also providing personalized customer service by taking advantage of advancements in generative AI powered by large language models (LLMs).

Over the years, AWS has invested in democratizing access to—and amplifying the understanding of—AI, machine learning (ML), and generative AI. LLMs can be highly useful in contact centers by providing automated responses to frequently asked questions, analyzing customer sentiment and intents to route calls appropriately, generating summaries of conversations to help agents, and even automatically generating emails or chat responses to common customer inquiries. By handling repetitive tasks and gaining insights from conversations, LLMs allow contact center agents to focus on delivering higher value through personalized service and resolving complex issues.

Improving the customer experience with conversational FAQs

Generative AI has tremendous potential to provide quick, reliable answers to commonly asked customer questions in a conversational manner. With access to authorized knowledge sources and LLMs, your existing Amazon Lex bot can provide helpful, natural, and accurate responses to FAQs, going beyond task-oriented dialogue. Our Retrieval Augmented Generation (RAG) approach allows Amazon Lex to harness both the breadth of knowledge available in repositories as well as the fluency of LLMs. You can simply ask your question in free-form, conversational language, and receive a natural, tailored response within seconds. The new conversational FAQ feature in Amazon Lex allows bot developers and conversation designers to focus on defining business logic rather than designing exhaustive FAQ-based conversation flows within a bot.

We are introducing a built-in QnAIntent that uses an LLM to query an authorized knowledge source and provide a meaningful and contextual response. In addition, developers can configure the QnAIntent to point to specific knowledge base sections, ensuring only specific portions of the knowledge content is queried at runtime to fulfill user requests. This capability fulfills the need for highly regulated industries, such as financial services and healthcare, to only provide responses in compliant language. The conversational FAQ feature in Amazon Lex allows organizations to improve containment rates while avoiding the high costs of missed queries and human representative transfers.

Building an Amazon Lex bot using the descriptive bot builder

Building conversational bots from scratch is a time-consuming process that requires deep knowledge of how users interact with bots in order to anticipate potential requests and code appropriate responses. Today, conversation designers and developers spend many days writing code to help run all possible user actions (intents), the various ways users phrase their requests (utterances), and the information needed from the user to complete those actions (slots).

The new descriptive bot building feature in Amazon Lex uses generative AI to accelerate the bot building process. Instead of writing code, conversation designers and bot developers can now describe in plain English what they want the bot to accomplish (for example, “Take reservations for my hotel using name and contact info, travel dates, room type, and payment info”). Using only this simple prompt, Amazon Lex will automatically generate intents, training utterances, slots, prompts, and a conversational flow to bring the described bot to life. By providing a baseline bot design, this feature immensely reduces the time and complexity of building conversational chatbots, allowing the builder to reprioritize effort on fine-tuning the conversational experience.

By tapping into the power of generative AI with LLMs, Amazon Lex enables developers and non-technical users to build bots simply by describing their goal. Rather than meticulously coding intents, utterances, slots, and so on, developers can provide a natural language prompt and Amazon Lex will automatically generate a basic bot flow ready for further refinement. This capability is initially only available in English, but developers can further customize the AI-generated bot as needed before deployment, saving many hours of manual development work.

Improving the user experience with assisted slot resolution

As consumers become more familiar with chatbots and interactive voice response (IVR) systems, they expect higher levels of intelligence baked into self-service experiences. Disambiguating responses that are more conversational is imperative to success as users expect more natural, human-like experiences. With rising consumer confidence in chatbot capabilities, there is also an expectation of elevated performance from natural language understanding (NLU). In the likely scenario that a semantically simple or complex utterance is not resolved properly to a slot, user confidence can dwindle. In such instances, an LLM can dynamically assist the existing Amazon Lex NLU model and ensure accurate slot resolution even when the user utterance is beyond the bounds of the slot model. In Amazon Lex, the assisted slot resolution feature provides the bot developer yet another tool for which to increase containment.

During runtime, when NLU fails to resolve a slot during a conversational turn, Amazon Lex will call the LLM selected by the bot developer to assist with resolving the slot. If the LLM is able to provide a value upon slot retry, the user can continue with the conversation as normal. For example, if upon slot retry, a bot asks “What city does the policy holder reside in?” and the user responds “I live in Springfield,” the LLM will be able to resolve the value to “Springfield.” The supported slot types for this feature include AMAZON.City, AMAZON.Country, AMAZON.Number, AMAZON.Date, AMAZON.AlphaNumeric (without regex) and AMAZON.PhoneNumber, and AMAZON.Confirmation. This feature is only available in English at the time of writing.

Improving the builder experience with training utterance generation

One of the pain points that bot builders and conversational designers often encounter is anticipating the variation and diversity of responses when invoking an intent or soliciting slot information. When a bot developer creates a new intent, sample utterances must be provided to train the ML model on the types of responses it can and should accept. It can often be difficult to anticipate the permutations on verbiage and syntax used by customers. With utterance generation, Amazon Lex uses foundational models such as Amazon Titan to generate training utterances with just one click, without the need for any prompt engineering.

Utterance generation uses the intent name, existing utterances, and optionally the intent description to generate new utterances with an LLM. Bot developers and conversational designers can edit or delete the generated utterances before accepting them. This feature works with both new and existing intents.

Conclusion

Recent advancements in generative AI have undoubtedly made automated consumer experiences better. With Amazon Lex, we are committed to infusing generative AI into every aspect of the builder and user experience. The features mentioned in this post are just the beginning—and we can’t wait to show you what is to come.

To learn more, refer to Amazon Lex Documentation, and try these features out on the Amazon Lex console.


About the authors

Anuradha Durfee is a Senior Product Manager on the Amazon Lex team and has more than 7 years of experience in conversational AI. She is fascinated by voice user interfaces and making technology more accessible through intuitive design.

Sandeep Srinivasan is a Senior Product Manager on the Amazon Lex team. As a keen observer of human behavior, he is passionate about customer experience. He spends his waking hours at the intersection of people, technology, and the future.

Read More

Amazon Transcribe announces a new speech foundation model-powered ASR system that expands support to over 100 languages

Amazon Transcribe announces a new speech foundation model-powered ASR system that expands support to over 100 languages

Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that makes it straightforward for you to add speech-to-text capabilities to your applications. Today, we are happy to announce a next-generation multi-billion parameter speech foundation model-powered system that expands automatic speech recognition to over 100 languages. In this post, we discuss some of the benefits of this system, how companies are using it, and how to get started. We also provide an example of the transcription output below.

Transcribe’s speech foundation model is trained using best-in-class, self-supervised algorithms to learn the inherent universal patterns of human speech across languages and accents. It is trained on millions of hours of unlabeled audio data from over 100 languages. The training recipes are optimized through smart data sampling to balance the training data between languages, ensuring that traditionally under-represented languages also reach high accuracy levels.

Carbyne is a software company that develops cloud-based, mission-critical contact center solutions for emergency call responders. Carbyne’s mission is to help emergency responders save lives, and language can’t get in the way of their goals. Here is how they use Amazon Transcribe to pursue their mission:

“AI-powered Carbyne Live Audio Translation is directly aimed at helping improve emergency response for the 68 million Americans who speak a language other than English at home, in addition to the up to 79 million foreign visitors to the country annually. By leveraging Amazon Transcribe’s new multilingual foundation model powered ASR, Carbyne will be even better equipped to democratize life-saving emergency services, because Every. Person. Counts.”

– Alex Dizengof, Co-Founder and CTO of Carbyne.

By leveraging speech foundation model, Amazon Transcribe delivers significant accuracy improvement between 20% and 50% across most languages. On telephony speech, which is a challenging and data-scarce domain, accuracy improvement is between 30% and 70%. In addition to substantial accuracy improvement, this large ASR model also delivers improvements in readability with more accurate punctuation and capitalization. With the advent of generative AI, thousands of enterprises are using Amazon Transcribe to unlock rich insights from their audio content. With significantly improved accuracy and support for over 100 languages, Amazon Transcribe will positively impact all such use cases. All existing and new customers using Amazon Transcribe in batch mode can access speech foundation model-powered speech recognition without needing any change to either the API endpoint or input parameters.

The new ASR system delivers several key features across all the 100+ languages related to ease of use, customization, user safety, and privacy. These include features such as automatic punctuation, custom vocabulary, automatic language identification, speaker diarization, word-level confidence scores, and custom vocabulary filter. The system’s expanded support for different accents, noise environments, and acoustic conditions enables you to produce more accurate outputs and thereby helps you effectively embed voice technologies in your applications.

Enabled by the high accuracy of Amazon Transcribe across different accents and noise conditions, its support for a large number of languages, and its breadth of value-added feature sets, thousands of enterprises will be empowered to unlock rich insights from their audio content, as well as increase the accessibility and discoverability of their audio and video content across various domains. For instance, contact centers transcribe and analyze customer calls to identify insights and subsequently improve customer experience and agent productivity. Content producers and media distributors automatically generate subtitles using Amazon Transcribe to improve content accessibility.

Get started with Amazon Transcribe

You can use the AWS Command Line Interface (AWS CLI), AWS Management Console, and various AWS SDKs for batch transcriptions and continue to use the same StartTranscriptionJob API to get performance benefits from the enhanced ASR model without needing to make any code or parameter changes on your end. For more information about using the AWS CLI and the console, refer to Transcribing with the AWS CLI and Transcribing with the AWS Management Console, respectively.

The first step is to upload your media files into an Amazon Simple Storage Service (Amazon S3) bucket, an object storage service built to store and retrieve any amount of data from anywhere. Amazon S3 offers industry-leading durability, availability, performance, security, and virtually unlimited scalability at very low cost. You can choose to save your transcript in your own S3 bucket, or have Amazon Transcribe use a secure default bucket. To learn more about using S3 buckets, see Creating, configuring, and working with Amazon S3 buckets.

Transcription output

Amazon Transcribe uses JSON representation for its output. It provides the transcription result in two different formats: text format and itemized format. Nothing changes with respect to the API endpoint or input parameters.

The text format provides the transcript as a block of text, whereas itemized format provides the transcript in the form of timely ordered transcribed items, along with additional metadata per item. Both formats exist in parallel in the output file.

Depending on the features you select when creating the transcription job, Amazon Transcribe creates additional and enriched views of the transcription result. See the following example code:

{
   "jobName": "2x-speakers_2x-channels",
    "accountId": "************",
    "results": {
        "transcripts": [
{
                "transcript": "Hi, welcome."
            }
        ],
        "speaker_labels": [
            {
                "channel_label": "ch_0",
                "speakers": 2,
                "segments": [
                ]
            },
            {
                "channel_label": "ch_1",
                "speakers": 2,
                "segments": [
                ]
            }
        ],
        "channel_labels": {
            "channels": [
            ],
            "number_of_channels": 2
        },
        "items": [
            
        ],
        "segments": [
        ]
    },
    "status": "COMPLETED"
}

The views are as follows:

  • Transcripts – Represented by the transcripts element, it contains only the text format of the transcript. In multi-speaker, multi-channel scenarios, concatenation of all transcripts is provided as a single block.
  • Speakers – Represented by the speaker_labels element, it contains the text and itemized formats of the transcript grouped by speaker. It’s available only when the multi-speakers feature is enabled.
  • Channels – Represented by the channel_labels element, it contains the text and itemized formats of the transcript, grouped by channel. It’s available only when the multi-channels feature is enabled.
  • Items – Represented by the items element, it contains only the itemized format of the transcript. In multi-speaker, multi-channel scenarios, items are enriched with additional properties, indicating speaker and channel.
  • Segments – Represented by the segments element, it contains the text and itemized formats of the transcript, grouped by alternative transcription. It’s available only when the alternative results feature is enabled.

Conclusion

At AWS, we are constantly innovating on behalf of our customers. By extending the language support in Amazon Transcribe to over 100 languages, we enable our customers to serve users from diverse linguistic backgrounds. This not only enhances accessibility, but also opens up new avenues for communication and information exchange on a global scale. To learn more about the features discussed in this post, check out features page and what’s new post.


About the authors

Sumit Kumar is a Principal Product Manager, Technical at AWS AI Language Services team. He has 10 years of product management experience across a variety of domains and is passionate about AI/ML. Outside of work, Sumit loves to travel and enjoys playing cricket and Lawn-Tennis.

Vivek Singh is a Senior Manager, Product Management at AWS AI Language Services team. He leads the Amazon Transcribe product team. Prior to joining AWS, he held product management roles across various other Amazon organizations such as consumer payments and retail. Vivek lives in Seattle, WA and enjoys running, and hiking.

Read More

Drive hyper-personalized customer experiences with Amazon Personalize and generative AI

Drive hyper-personalized customer experiences with Amazon Personalize and generative AI

Today, we are excited to announce three launches that will help you enhance personalized customer experiences using Amazon Personalize and generative AI. Whether you’re looking for a managed solution or build your own, you can use these new capabilities to power your journey.

Amazon Personalize is a fully managed machine learning (ML) service that makes it easy for developers to deliver personalized experiences to their users. It enables you to improve customer engagement by powering personalized product and content recommendations in websites, applications, and targeted marketing campaigns, with no ML expertise required. Using recipes (algorithms prepared for specific uses cases) provided by Amazon Personalize, you can offer diverse personalization experiences like “recommend for you”, “frequently bought together”, guidance on next best actions, and targeted marketing campaigns with user segmentation.

Generative AI is quickly transforming how enterprises do business. Gartner predicts that “by 2026, more than 80% of enterprises will have used generative AI APIs or models, or deployed generative AI-enabled applications in production environments, up from less than 5% in 2023.” While generative AI can quickly create content, it alone is not enough to provide higher degree of personalization to adapt to the ever-changing and nuanced preferences of individual users. Many companies are actively seeking solutions to enhance user experience using Amazon Personalize and generative AI.

FOX Corporation (FOX) produces and distributes news, sports, and entertainment content.

“We are integrating generative AI with Amazon Personalize in order to deliver hyper-personalized experiences to our users. Amazon Personalize has helped us achieve high levels of automation in content customization. For instance, FOX Sports experienced a 400% increase in viewership content starts post-event when applied. Now, we are augmenting generative AI with Amazon Bedrock to our pipeline in order to help our content editors generate themed collections. We look forward to exploring features such as Amazon Personalize Content Generator and Personalize on LangChain to further personalize those collections for our users.”

– Daryl Bowden, Executive Vice President of Technology Platforms.

Announcing Amazon Personalize Content Generator to make recommendations more compelling

Amazon Personalize has launched Content Generator, a new generative AI-powered capability that helps companies make recommendations more compelling by identifying thematic connections between the recommended items. This capability can elevate the recommendation experience beyond standard phrases like “People who bought this also bought…” to more engaging taglines such as “Rise and Shine” for a breakfast food collection, enticing users to click and purchase.

To explore the impact of Amazon Personalize Content Generator in detail, let’s look at two examples.

Use case 1: Carousel titles for movie collections

A micro-genre is a specialized subcategory within a broader genre of film, music, or other forms of media. Streaming platforms use micro-genres to enhance user experience by allowing viewers or listeners to discover content that aligns with their specific tastes and interests. By recommending media content with micro-genres, streaming platforms cater to diverse preferences, ultimately increasing user engagement and satisfaction.

Now you can use Amazon Personalize Content Generator to create carousel titles for micro-genre collections. First, import your datasets of users’ interactions and items into Amazon Personalize for training. You upload a list of itemId values as your seed items. Next, create a batch inference job selecting Themed recommendations with Content Generator on the Amazon Personalize console or setting batch-inference-job-mode to THEME_GENERATION in the API configuration.

As the batch inference output, you will get a set of similar items and a theme for each seed item. We also provide items-theme relevance scores that you can use to set a threshold to show only items that are strongly related to the theme. The following screenshot shows an example of the output:

{"input":{"itemId":"40"},"output":{
"recommendedItems":["36","50","44","22","21","29","3","1","2","39"],
"theme":"Movies with a strong female lead",
"itemsThemeRelevanceScores":[0.19994527,0.183059963,0.17478035,0.1618133,0.1574806,0.15468733,0.1499242,0.14353688,0.13531424,0.10291852]}}

{"input":{"itemId":"43"},"output":{
"recommendedItems":["50","21","36","3","17","2","39","1","10","5"],
"theme":"Romantic movies for a cozy night in",
"itemsThemeRelevanceScores":[0.184988,0.1795761,0.11143453,0.0989443,0.08258403,0.07952615,0.07115086,0.0621634,-0.138913,-0.188913]}}
...

Subsequently you can replace the generic phrase “More like X” with the output theme from Amazon Personalize Content Generator to make the recommendations more compelling.

Use case 2: Subject lines for marketing emails

Email marketing, although cost-effective, often struggles with low open rates and high unsubscribe rates. The decision to open an email critically depends on how attractive the subject line is, because it’s the first thing recipients see along with the sender’s name. However, scripting appealing subject lines can often be tedious and time-consuming.

Now with Amazon Personalize Content Generator, you can create compelling subject lines or headlines in the email body more efficiently, further personalizing your email campaigns. You follow the same process of data ingestion, training, and creating a batch inference job as in the previous use case. The following is an example of a marketing email that incorporates output from Amazon Personalize using Content Generator, including a set of recommended items and a generated subject line:

Subject: Cleaning Products That Will Make Your Life Sparkle!

Dear <user name>,
Are you ready to transform your cleaning routine into an effortless and enjoyable experience? Explore our top-tier selections:
Robot Vacuum Cleaners <picture>
Window Cleaning Kits <picture>
Scrub Brushes with Ergonomic Handles <picture>
Microfiber Cloths <picture>
Eco-Friendly Cleaning Sprays <picture>

These examples showcase how Amazon Personalize Content Generator can assist you in creating a more engaging browsing experience or a more effective marketing campaign. For more detailed instructions, refer to Themed batch recommendations.

Announcing LangChain integration to seamlessly integrate Amazon Personalize with the LangChain framework

LangChain is a powerful open-source framework that allows for integration with large language models (LLMs). LLMs are typically versatile but may struggle with domain-specific tasks where deeper context and nuanced responses are needed. LangChain empowers developers in such scenarios to build modules (agents/chains) for their specific generative AI tasks. They can also introduce context and memory into LLMs by connecting and chaining LLM prompts to solve for varying use cases.

We are excited to launch LangChain integration. With this new capability, builders can use the Amazon Personalize custom chain on LangChain to seamlessly integrate Amazon Personalize with generative AI solutions. Adding a personalized touch to generative AI solutions helps you create more tailored and relevant interactions with end-users. The following code snippet demonstrates how you can invoke Amazon Personalize, retrieve recommendations for a campaign or recommender, and seamlessly feed it into your generative AI applications within the LangChain ecosystem. You can also use this for sequential chains.

from langchain.utilities import AmazonPersonalize
from langchain.chains import AmazonPersonalizeChain
from langchain.llms.bedrock import Bedrock

recommender_arn="<insert_arn>"
client=AmazonPersonalize(recommender_arn=recommender_arn, credentials_profile_name="default",region_name="us-west-2")

bedrock_llm = Bedrock(model_id="anthropic.claude-v2", region_name="us-west-2")

# Create personalize chain
chain = AmazonPersonalizeChain.from_llm( llm=bedrock_llm, client=client)
response = chain({'user_id': '1'})

You can use this capability to craft personalized marketing copies, generate concise summaries for recommended content, recommend products or content in chatbots, and build next-generation customer experiences with your creativity.

Amazon Personalize now enables you to return metadata in inference response to improve generative AI workflow

Amazon Personalize now improves your generative AI workflow by enabling return item metadata as part of the inference output. Getting recommendations along with metadata makes it more convenient to provide additional context to LLMs. This additional context, such as genre and product description, can help the models gain a deeper understanding of item attributes to generate more relevant content.

Amazon Personalize supports this capability for both custom recipes and domain optimized recommenders. When creating a campaign or a recommender, you can enable the option to return metadata with recommendation results, or adjust the setting by updating the campaign or recommender. You can select up to 10 metadata fields and 50 recommendation results to return metadata during an inference call, either through the Amazon Personalize API or the Amazon Personalize console.

The following is an example in the API:

## Create campaign with enabled metadata
example_name = 'metadata_response_enabled_campaign'
create_campaign_response = personalize.create_campaign(
    name = example_name,
    solutionVersionArn = example_solution_version_arn,
    minProvisionedTPS = 1,
    campaignConfig = {"enableMetadataWithRecommendations": True}
)

## GetRecommendations with metadata columns
metadataMap = {"ITEMS": ["genres", "num"]}
response = personalize_runtime.get_recommendations(campaignArn=example_campaign_arn,
     userId="0001", itemId="0002", metadataColumns=metadataMap, numResults=2)
     
## Example response with metadata
 itemList': 
 [
     {
      'itemId': '356',
      'metadata': {'genres': 'Comedy', 'num': '0.6103248'}
     },
     {
      'itemId': '260',
      'metadata': {'genres': 'Action|Adventure', 'num': '0.074548'}},
     }
 ]

Conclusion

At AWS, we are constantly innovating on behalf of our customers. By introducing these new launches powered by Amazon Personalize and Amazon Bedrock, we will enrich every aspect of the builder and user experience, elevating efficiency and end-user satisfaction. To learn more about the capabilities discussed in this post, check out Amazon Personalize features and the Amazon Personalize Developer Guide.


About the Authors

Jingwen Hu is a Senior Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. In her spare time, she enjoys traveling and exploring local food.

Pranav Agarwal is a Senior Software Engineer with AWS AI/ML and works on architecting software systems and building AI-powered recommender systems at scale. Outside of work, he enjoys reading, running, and ice-skating.

Rishabh Agrawal is a Senior Software Engineer working on AI services at AWS. In his spare time, he enjoys hiking, traveling, and reading.

Ashish Lal is a Senior Product Marketing Manager who leads product marketing for AI services at AWS. He has 9 years of marketing experience and has led the product marketing effort for intelligent document processing. He got his master’s in Business Administration at the University of Washington.

Read More

Build brand loyalty by recommending actions to your users with Amazon Personalize Next Best Action

Build brand loyalty by recommending actions to your users with Amazon Personalize Next Best Action

Amazon Personalize is excited to announce the new Next Best Action (aws-next-best-action) recipe to help you determine the best actions to suggest to your individual users that will enable you to increase brand loyalty and conversion.

Amazon Personalize is a fully managed machine learning (ML) service that makes it effortless for developers to deliver highly personalized user experiences in real time. It enables you to improve customer engagement by powering personalized product and content recommendations in websites, applications, and targeted marketing campaigns. You can get started without any prior ML experience, using APIs to easily build sophisticated personalization capabilities in a few clicks. All your data is encrypted to be private and secure.

In this post, we show you how to use the Next Best Action recipe to personalize action recommendations based on each user’s past interactions, needs, and behavior.

Solution overview

With the rapid growth of digital channels and technology advances that make hyper-personalization more accessible, brands struggle to determine what actions will maximize engagement for each individual user. Brands either show the same actions to all users or rely on traditional user segmentation approaches to recommend actions to each user cohort. However, these approaches are no longer sufficient, because every user expects a unique experience and tends to abandon brands that don’t understand their needs. Furthermore, brands are unable to update the action recommendations in real time due to the manual nature of the process.

With Next Best Action, you can determine the actions that have the highest likelihood of engaging each individual user based on their preferences, needs, and history. Next Best Action takes the in-session interests of each user into account and provides action recommendations in real time. You can recommend actions such as enrolling in loyalty programs, signing up for a newsletter or magazine, exploring a new category, downloading an app, and other actions that encourage conversion. This will enable you to improve each user’s experience by providing them with recommendations on actions across their user journey that will help promote long-term brand engagement and revenue. It will also help improve your return on marketing investment by recommending the action that each user has a high likelihood of taking.

AWS Partners like Credera are excited by the personalization possibilities that the Amazon Personalize Next Best Action will unlock for their customers.

“Amazon Personalize is a world-class machine learning solution that enables companies to create meaningful customer experiences across a wide array of use cases without extensive rework or up-front implementation cost that is typically required of these types of solutions. We are really excited about the addition of the Next Best Action capability that will enable customers to provide personalized action recommendations, significantly improving their digital experiences and driving additional business value. Specifically, we expect anyone working within the retail or content space to see an improved experience for their customers and higher conversions as a direct result of using Amazon Personalize. We are extremely thrilled to be a launch partner with AWS on this release and looking forward to empowering businesses to drive ML-based personalized solutions with Next Best Action.”

– Jason Goth, Partner and Chief Technology Officer, Credera.

Example use cases

To explore the impact of this new feature in greater detail, let’s review an example by taking three users: A (User_id 11999), B (User_id 17141), and C (User_id 8103), who are in different stages of their user journey while making purchases on a website. We then see how Next Best Action suggests the optimal actions for each user based on their past interactions and preferences.

First, we look at the action interactions dataset to understand how users have interacted with actions in the past. The following example shows the three users and their different shopping patterns. User A is a frequent buyer and has shopped mostly in the “Beauty & Grooming” and “Jewelry” categories in the past. User B is a casual buyer who has made a few purchases in the “Electronics” category in the past, and User C is a new user on the website who has made their first purchase in the “Clothing” category.

User Type User_id Actions Action_Event_Type Timestamp
User A 11999 Purchase in “Beauty & Grooming” category taken 2023-09-17 20:03:05
User A 11999 Purchase in “Beauty & Grooming” category taken 2023-09-18 19:28:38
User A 11999 Purchase in “Beauty & Grooming” category taken 2023-09-20 17:49:52
User A 11999 Purchase in “Jewelry” category taken 2023-09-26 18:36:16
User A 11999 Purchase in “Beauty & Grooming” category taken 2023-09-30 19:21:05
User A 11999 Download the mobile app taken 2023-09-30 19:29:35
User A 11999 Purchase in “Jewelry” category taken 2023-10-01 19:35:47
User A 11999 Purchase in “Beauty & Grooming” category taken 2023-10-04 19:19:34
User A 11999 Purchase in “Jewelry” category taken 2023-10-06 20:38:55
User A 11999 Purchase in “Beauty & Grooming” category taken 2023-10-10 20:17:07
User B 17141 Purchase in “Electronics” category taken 2023-09-29 20:17:49
User B 17141 Purchase in “Electronics” category taken 2023-10-02 00:38:08
User B 17141 Purchase in “Electronics” category taken 2023-10-07 11:04:56
User C 8103 Purchase in “Clothing” category taken 2023-09-26 18:30:56

Traditionally, brands either show the same actions to all users or employ user segmentation strategies to recommend actions to their user base. The following table is an example of a brand showing the same set of actions to all users. These actions may or may not be relevant to the users, reducing their engagement with the brand.

User Type User_id Action Recommendations Rank of Action
User A 11999 Subscribe to Loyalty Program 1
User A 11999 Download the mobile app 2
User A 11999 Purchase in “Electronics” category 3
User B 17141 Subscribe to Loyalty Program 1
User B 17141 Download the mobile app 2
User B 17141 Purchase in “Electronics” category 3
User C 8103 Subscribe to Loyalty Program 1
User C 8103 Download the mobile app 2
User C 8103 Purchase in “Electronics” category 3

Now let’s use Next Best Action to recommend actions for each user. After you define the actions eligible for recommendations, the aws-next-best-action recipe returns a ranked list of actions, personalized for each user, based on user propensity (the probability of a user taking a particular action, ranging between 0.0–1.0) and value of that action, if provided. For the purpose of this post, we only consider user propensity.

In the following example, we see that for User A (frequent buyer), Subscribe to Loyalty Program is the top recommended action with a propensity score of 1.00, which means that this user is most likely to enroll in the loyalty program because they have made numerous purchases. Therefore, recommending the action Subscribe to Loyalty Program to User A has a high probability of increasing User A’s engagement.

User Type User_id Action Recommendations Rank of Action Propensity Score
User A 11999 Subscribe to Loyalty Program 1 1.00
User A 11999 Purchase in “Jewelry” category 2 0.86
User A 11999 Purchase in “Beauty & Grooming” category 3 0.85
User B 17141 Purchase in “Electronics” category 1 0.78
User B 17141 Subscribe to Loyalty Program 2 0.71
User B 17141 Purchase in “Smart Homes” category 3 0.66
User C 8103 Purchase in “Handbags & Shoes” category 1 0.60
User C 8103 Download the mobile app 2 0.48
User C 8103 Purchase in “Clothing” category 3 0.46

Similarly, User B (casual buyer persona) has a higher probability to continue purchasing in “Electronics” category and also buying new products in a similar category, “Smart Homes”. Therefore, Next Best Action recommends you to prioritize actions, Purchase in “Electronics” category and Purchase in “Smart Homes” category. This means that if you prompt User B to buy products in these two categories, it can lead to greater engagement. We also notice the action to Subscribe to Loyalty Program is recommended to User B but with a lower propensity score of 0.71 as compared to User A, whose propensity score is 1.0. This is because users that have a deeper history and are further along their shopping journey benefit more from Loyalty programs due of the added benefits and are highly likely to interact more.

Finally, we see that Next Best Action for User C is purchasing in “Handbags & Shoes” category, which is similar to their previous action of Purchase in “Clothing” category. We also see that the propensity score to Download the mobile app is relatively lower (0.48) than another action, Purchase in “Handbags & Shoes” category, which has a higher propensity score of 0.60. This means that if you recommend User C to purchase products in a complementary category (“Handbags & Shoes”) over downloading the mobile app, they are more likely to stick with your brand and continue shopping in the future.

For more details on how to implement the Next Best Action (aws-next-best-action) recipe, refer to documentation.

Conclusion

The new Next Best Action recipe in Amazon Personalize helps you recommend the right actions to the right user in real time based on their individual behavior and needs. This will enable you to maximize user engagement and lead to greater conversion rates.

For more information about Amazon Personalize, see the Amazon Personalize Developer Guide.


About the Authors

Shreeya Sharma is a Sr. Technical Product Manager working with AWS AI/ML on Amazon Personalize. She has a background in computer science engineering, technology consulting, and data analytics. In her spare time, she enjoys traveling, performing theatre, and trying out new adventures.

Pranesh Anubhav is a Senior Software Engineer for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.

Aniket Deshmukh is an Applied Scientist in AWS AI labs supporting Amazon Personalize. Aniket works in the general area of recommendation systems, contextual bandits, and multi-modal deep learning.

Read More