Deploy large language models on AWS Inferentia2 using large model inference containers

Deploy large language models on AWS Inferentia2 using large model inference containers

You don’t have to be an expert in machine learning (ML) to appreciate the value of large language models (LLMs). Better search results, image recognition for the visually impaired, creating novel designs from text, and intelligent chatbots are just some examples of how these models are facilitating various applications and tasks.

ML practitioners keep improving the accuracy and capabilities of these models. As a result, these models grow in size and generalize better, such as in the evolution of transformer models. We explained in a previous post how you can use Amazon SageMaker deep learning containers (DLCs) to deploy these kinds of large models using a GPU-based instance.

In this post, we take the same approach but host the model on AWS Inferentia2. We use the AWS Neuron software development kit (SDK) to access the Inferentia device and benefit from its high performance. We then use a large model inference container powered by Deep Java Library (DJLServing) as our model serving solution. We demonstrate how these three layers work together by deploying an OPT-13B model on an Amazon Elastic Compute Cloud (Amazon EC2) inf2.48xlarge instance.

The three pillars

The following image represents the layers of hardware and software working to help you unlock the best price and performance of your large language models. AWS Neuron and tranformer-neuronx are the SDKs used to run deep learning workloads on AWS Inferentia. Lastly, DJLServing is the serving solution that is integrated in the container.

Hardware: Inferentia

AWS Inferentia, specifically designed for inference by AWS, is a high-performance and low-cost ML inference accelerator. In this post, we use AWS Inferentia2 (available via Inf2 instances), the second generation purpose-built ML inference accelerator.

Each EC2 Inf2 instance is powered by up to 12 Inferentia2 devices, and allows you to choose between four instance sizes.

Amazon EC2 Inf2 supports NeuronLink v2, a low-latency and high-bandwidth chip-to-chip interconnect, which enables high performance collective communication operations such as AllReduce and AllGather. This efficiently shards models across AWS Inferentia2 devices (such as via Tensor Parallelism), and therefore optimizes latency and throughput. This is particularly useful for large language models. For benchmark performance figures, refer to AWS Neuron Performance.

At the heart of the Amazon EC2 Inf2 instance are AWS Inferentia2 devices, each containing two NeuronCores-v2. Each NeuronCore-v2 is an independent, heterogenous compute-unit, with four main engines: Tensor, Vector, Scalar, and GPSIMD engines. It includes an on-chip software-managed SRAM memory for maximizing data locality. The following diagram shows the internal workings of the AWS Inferentia2 device architecture.

Neuron and transformers-neuronx

Above the hardware layer are the software layers used to interact with AWS Inferentia. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and AWS Trainium based instances. It enables end-to-end ML development lifecycle to build new models, train and optimize these models, and deploy them for production. AWS Neuron includes a deep learning compiler, runtime, and tools that are natively integrated with popular frameworks like TensorFlow and PyTorch.

transformers-neuronx is an open-source library built by the AWS Neuron team that helps run transformer decoder inference workflows using the AWS Neuron SDK. Currently, it has examples for the GPT2, GPT-J, and OPT model types, and different model sizes that have their forward functions re-implemented in a compiled language for extensive code analysis and optimizations. Customers can implement other model architecture based on the same library. AWS Neuron-optimized transformer decoder classes have been re-implemented in XLA HLO (High Level Operations) using a syntax called PyHLO. The library also implements tensor parallelism to shard the model weights across multiple NeuronCores.

Tensor parallelism is needed because the models are so large, they don’t fit into a single accelerator HBM memory. The support for tensor parallelism by the AWS Neuron runtime in transformers-neuronx makes heavy use of collective operations such as AllReduce. The following are some principles for setting the tensor parallelism degree (number of NeuronCores participating in sharded matrix multiply operations) for AWS Neuron-optimized transformer decoder models:

  • The number of attention heads needs to be divisible by the tensor parallelism degree
  • The total data size of model weights and key-value caches needs to be smaller than 16 GB times the tensor parallelism degree
  • Currently, the Neuron runtime supports tensor parallelism degrees 1, 2, 8, and 32 on Trn1 and supports tensor parallelism degrees 1, 2, 4, 8, and 24 on Inf2

DJLServing

DJLServing is a high-performance model server that added support for AWS Inferentia2 in March 2023. The AWS Model Server team offers a container image that can help LLM/AIGC use cases. DJL is also part of Rubikon support for Neuron that includes the integration between DJLServing and transformers-neuronx. The DJLServing model server and transformers-neuronx library are the core components of the container built to serve the LLMs supported through the transformers library. This container and the subsequent DLCs will be able to load the models on the AWS Inferentia chips on an Amazon EC2 Inf2 host along with the installed AWSInferentia drivers and toolkit. In this post, we explain two ways of running the container.

The first way is to run the container without writing any additional code. You can use the default handler for a seamless user experience and pass in one of the supported model names and any load time configurable parameters. This will compile and serve an LLM on an Inf2 instance. The following code shows an example:

engine=Python
option.entryPoint=djl_python.transformers_neuronx
option.task=text-generation
option.model_id=facebook/opt-1.3b
option.tensor_parallel_degree=2

Alternatively, you can write your own model.py file, but that requires implementing the model loading and inference methods to serve as a bridge between the DJLServing APIs and, in this case, the transformers-neuronx APIs. You can also provide configurable parameters in a serving.properties file to be picked up during model loading. For the full list of configurable parameters, refer to All DJL configuration options.

The following code is a sample model.py file. The serving.properties file is similar to the one shown earlier.

def load_model(properties):
     """
    Load a model based from the framework provided APIs
    :param: properties configurable properties for model loading
            specified in serving.properties
    :return: model and other artifacts required for inference
    """
    batch_size = int(properties.get("batch_size", 2))
    tp_degree = int(properties.get("tensor_parallel_degree", 2))
    amp = properties.get("dtype", "f16")
    
    model_id = "facebook/opt-13b"
    model = OPTForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True)
    ...
    
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = OPTForSampling.from_pretrained(load_path,
                                           batch_size=batch_size,
                                           amp=amp,
                                           tp_degree=tp_degree)
    model.to_neuron()
    return model, tokenizer, batch_size

Let’s see what this all looks like on an Inf2 instance.

Launch the Inferentia hardware

We first need to launch an inf.42xlarge instance to host our OPT-13b model. We use the Deep Learning AMI Neuron PyTorch 1.13.0 (Ubuntu 20.04) 20230226 Amazon Machine Image (AMI) because it already includes the Docker image and necessary drivers for the AWS Neuron runtime.

We increase the storage of the instance to 512 GB to accommodate for large language models.

Install necessary dependencies and create the model

We set up a Jupyter notebook server with our AMI to make it easier to view and manage our directories and files. When we’re in the desired directory, we set subdirectories for logs and models and create a serving.properties file.

We can use the standalone model provided by the DJL Serving container. This means we don’t have to define a model, but we do need to provide a serving.properties file. See the following code:

option.model_id=facebook/opt-1.3b
option.batch_size=2
option.tensor_parallel_degree=2
option.n_positions=256
option.dtype=fp16
option.model_loading_timeout=600
engine=Python
option.entryPoint=djl_python.transformers-neuronx
#option.s3url=s3://djl-llm/opt-1.3b/

#can also specify which device to load on.
#engine=Python ---because the handles are implement in python.

This instructs the DJL model server to use the OPT-13B model. We set the batch size to 2 and dtype=f16 for the model to fit on the neuron device. DJL serving supports dynamic batching and by setting a similar tensor_parallel_degree value, we can increase throughput of inference requests because we distribute inference across multiple NeuronCores. We also set n_positions=256 because this informs the maximum length we expect the model to have.

Our instance has 12 AWS Neuron devices, or 24 NeuronCores, while our OPT-13B model requires 40 attention heads. For example, setting tensor_parallel_degree=8 means every 8 NeuronCores will host one model instance. If you divide the required attention heads (40) by the number of NeuronCores (8), then you get 5 attention heads allocated to each NeuronCore, or 10 on each AWS Neuron device.

You can use the following sample model.py file, which defines the model and creates the handler function. You can edit it to meet your needs, but be sure it can be supported on transformers-neuronx.

cat serving.properties
option.tensor_parallel_degree=2 
option.batch_size=2 
option.dtype=f16 
engine=Python
cat model.py
import torch
import tempfile
import os

from transformers.models.opt import OPTForCausalLM
from transformers import AutoTokenizer
from transformers_neuronx import dtypes
from transformers_neuronx.module import save_pretrained_split
from transformers_neuronx.opt.model import OPTForSampling
from djl_python import Input, Output

model = None

def load_model(properties):
    batch_size = int(properties.get("batch_size", 2))
    tp_degree = int(properties.get("tensor_parallel_degree", 2))
    amp = properties.get("dtype", "f16")
    model_id = "facebook/opt-13b"
    load_path = os.path.join(tempfile.gettempdir(), model_id)
    model = OPTForCausalLM.from_pretrained(model_id,
                                           low_cpu_mem_usage=True)
    dtype = dtypes.to_torch_dtype(amp)
    for block in model.model.decoder.layers:
        block.self_attn.to(dtype)
        block.fc1.to(dtype)
        block.fc2.to(dtype)
    model.lm_head.to(dtype)
    save_pretrained_split(model, load_path)
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = OPTForSampling.from_pretrained(load_path,
                                           batch_size=batch_size,
                                           amp=amp,
                                           tp_degree=tp_degree)
    model.to_neuron()
    return model, tokenizer, batch_size

def infer(seq_length, prompt):
    with torch.inference_mode():
        input_ids = torch.as_tensor([tokenizer.encode(text) for text in prompt])
        generated_sequence = model.sample(input_ids,
                                          sequence_length=seq_length)
        outputs = [tokenizer.decode(gen_seq) for gen_seq in generated_sequence]
    return outputs

def handle(inputs: Input):
    global model, tokenizer, batch_size
    if not model:
        model, tokenizer, batch_size = load_model(inputs.get_properties())

    if inputs.is_empty():
        # Model server makes an empty call to warmup the model on startup
        return None

    data = inputs.get_as_json()
    seq_length = data["seq_length"]
    prompt = data["text"]
    outputs = infer(seq_length, prompt)
    result = {"outputs": outputs}
    return Output().add_as_json(result)
mkdir -p models/opt13b logs
mv serving.properties model.py models/opt13b

Run the serving container

The last steps before inference are to pull the Docker image for the DJL serving container and run it on our instance:

docker pull deepjavalibrary/djl-serving:0.21.0-pytorch-inf2

After you pull the container image, run the following command to deploy your model. Make sure you’re in the right directory that contains the logs and models subdirectory because the command will map these to the container’s /opt/directories.

docker run -it --rm --network=host 
           -v `pwd`/models:/opt/ml/model 
           -v `pwd`/logs:/opt/djl/logs 
           -u djl --device /dev/neuron0  --device /dev/neuron10  --device /dev/neuron2  --device /dev/neuron4  --device /dev/neuron6  --device /dev/neuron8 --device /dev/neuron1  --device /dev/neuron11 
           -e MODEL_LOADING_TIMEOUT=7200 
           -e PREDICT_TIMEOUT=360 
           deepjavalibrary/djl-serving:0.21.0-pytorch-inf2 serve

Run inference

Now that we’ve deployed the model, let’s test it out with a simple CURL command to pass some JSON data to our endpoint. Because we set a batch size of 2, we pass along the corresponding number of inputs:

curl -X POST "http://127.0.0.1:8080/predictions/opt13b" 
     -H 'Content-Type: application/json' 
     -d '{"seq_length":2048,
          "text":[
                    "Hello, I am a language model,",
                    "Welcome to Amazon Elastic Compute Cloud,"
                  ]
          }'


The preceding command generates a response in the command line. The model is quite chatty but its response validates our model. We were able to run inference on our LLM thanks to Inferentia!

Clean up

Don’t forget to delete your EC2 instance once you are done to save cost.

Conclusion

In this post, we deployed an Amazon EC2 Inf2 instance to host an LLM and ran inference using a large model inference container. You learned how AWS Inferentia and the AWS Neuron SDK interact to allow you to easily deploy LLMs for inference at an optimal price-to-performance ratio. Stay tuned for updates on more capabilities and new innovations with Inferentia. For more examples about Neuron, see aws-neuron-samples.


About the Authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications. He enjoys coffee, cooking, staying active, and spending time with his family.

Aaqib Ansari is a Software Development Engineer with the Amazon SageMaker Inference team. He focuses on helping SageMaker customers accelerate model inference and deployment. In his spare time, he enjoys hiking, running, photography and sketching.

Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.

Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.

Read More

Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

With the advent of high-speed 5G mobile networks, enterprises are more easily positioned than ever with the opportunity to harness the convergence of telecommunications networks and the cloud. As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers to reduce latency and increase responsiveness of their applications. As an example, smart venue solutions can use near-real-time computer vision for crowd analytics over 5G networks, all while minimizing investment in on-premises hardware networking equipment. Retailers can deliver more frictionless experiences on the go with natural language processing (NLP), real-time recommendation systems, and fraud detection. Even ground and aerial robotics can use ML to unlock safer, more autonomous operations.

To reduce the barrier to entry of ML at the edge, we wanted to demonstrate an example of deploying a pre-trained model from Amazon SageMaker to AWS Wavelength, all in less than 100 lines of code. In this post, we demonstrate how to deploy a SageMaker model to AWS Wavelength to reduce model inference latency for 5G network-based applications.

Solution overview

Across AWS’s rapidly expanding global infrastructure, AWS Wavelength brings the power of cloud compute and storage to the edge of 5G networks, unlocking more performant mobile experiences. With AWS Wavelength, you can extend your virtual private cloud (VPC) to Wavelength Zones corresponding to the telecommunications carrier’s network edge in 29 cities across the globe. The following diagram shows an example of this architecture.

AWS Wavelength Reference Architecture

You can opt in to the Wavelength Zones within a given Region via the AWS Management Console or the AWS Command Line Interface (AWS CLI). To learn more about deploying geo-distributed applications on AWS Wavelength, refer to Deploy geo-distributed Amazon EKS clusters on AWS Wavelength.

Building on the fundamentals discussed in this post, we look to ML at the edge as a sample workload with which to deploy to AWS Wavelength. As our sample workload, we deploy a pre-trained model from Amazon SageMaker JumpStart.

SageMaker is a fully managed ML service that allows developers to easily deploy ML models into their AWS environments. Although AWS offers a number of options for model training—from AWS Marketplace models and SageMaker built-in algorithms—there are a number of techniques to deploy open-source ML models.

JumpStart provides access to hundreds of built-in algorithms with pre-trained models that can be seamlessly deployed to SageMaker endpoints. From predictive maintenance and computer vision to autonomous driving and fraud detection, JumpStart supports a variety of popular use cases with one-click deployment on the console.

Because SageMaker is not natively supported in Wavelength Zones, we demonstrate how to extract the model artifacts from the Region and re-deploy to the edge. To do so, you use Amazon Elastic Kubernetes Service (Amazon EKS) clusters and node groups in Wavelength Zones, followed by creating a deployment manifest with the container image generated by JumpStart. The following diagram illustrates this architecture.

Reference architecture for Amazon SageMaker JumpStart on AWS Wavelength

Prerequisites

To make this as easy as possible, ensure that your AWS account has Wavelength Zones enabled. Note that this integration is only available in us-east-1 and us-west-2, and you will be using us-east-1 for the duration of the demo.

To opt in to AWS Wavelength, complete the following steps:

  1. On the Amazon VPC console, choose Zones under Settings and choose US East (Verizon) / us-east-1-wl1.
  2. Choose Manage.
  3. Select Opted in.
  4. Choose Update zones.

Create AWS Wavelength infrastructure

Before we convert the local SageMaker model inference endpoint to a Kubernetes deployment, you can create an EKS cluster in a Wavelength Zone. To do so, deploy an Amazon EKS cluster with an AWS Wavelength node group. To learn more, you can visit this guide on the AWS Containers Blog or Verizon’s 5GEdgeTutorials repository for one such example.

Next, using an AWS Cloud9 environment or interactive development environment (IDE) of choice, download the requisite SageMaker packages and Docker Compose, a key dependency of JumpStart.

pip install sagemaker
pip install 'sagemaker[local]' --upgrade
sudo curl -L "https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose --version

Create model artifacts using JumpStart

First, make sure that you have an AWS Identity and Access Management (IAM) execution role for SageMaker. To learn more, visit SageMaker Roles.

  1. Using this example, create a file called train_model.py that uses the SageMaker Software Development Kit (SDK) to retrieve a pre-built model (replace <your-sagemaker-execution-role> with the Amazon Resource Name (ARN) of your SageMaker execution role). In this file, you deploy a model locally using the instance_type attribute in the model.deploy() function, which starts a Docker container within your IDE using all requisite model artifacts you defined:
#train_model.py
from sagemaker import image_uris, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = "<your-sagemaker-execution-role>"
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

# model_version="*" fetches the latest version of the model.
infer_model_id = "tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2"
infer_model_version= "*"
endpoint_name = name_from_base(f"jumpstart-example-{infer_model_id}")

# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,
image_scope="inference",
model_id=infer_model_id,
model_version=infer_model_version,
instance_type="local",
)
# Retrieve the inference script uri.
deploy_source_uri = script_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, script_scope="inference"
)
# Retrieve the base model uri.
base_model_uri = model_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, model_scope="inference"
)
model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
model_data=base_model_uri,
entry_point="inference.py",
role=aws_role,
predictor_cls=Predictor,
name=endpoint_name,
)
print(deploy_image_uri,deploy_source_uri,base_model_uri)
# deploy the Model.
base_model_predictor = model.deploy(
initial_instance_count=1,
instance_type="local",
endpoint_name=endpoint_name,
)
  1. Next, set infer_model_id to the ID of the SageMaker model that you would like to use.

For a complete list, refer to Built-in Algorithms with pre-trained Model Table. In our example, we use the Bidirectional Encoder Representations from Transformers (BERT) model, commonly used for natural language processing.

  1. Run the train_model.py script to retrieve the JumpStart model artifacts and deploy the pre-trained model to your local machine:
python train_model.py

Should this step succeed, your output may resemble the following:

763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu
s3://jumpstart-cache-prod-us-east-1/source-directory-tarballs/tensorflow/inference/tc/v2.0.0/sourcedir.tar.gz
s3://jumpstart-cache-prod-us-east-1/tensorflow-infer/v2.0.0/infer-tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2.tar.gz

In the output, you will see three artifacts in order: the base image for TensorFlow inference, the inference script that serves the model, and the artifacts containing the trained model. Although you could create a custom Docker image with these artifacts, another approach is to let SageMaker local mode create the Docker image for you. In the subsequent steps, we extract the container image running locally and deploy to Amazon Elastic Container Registry (Amazon ECR) as well as push the model artifact separately to Amazon Simple Storage Service (Amazon S3).

Convert local mode artifacts to remote Kubernetes deployment

Now that you have confirmed that SageMaker is working locally, let’s extract the deployment manifest from the running container. Complete the following steps:

Identify the location of the SageMaker local mode deployment manifest: To do so, search our root directory for any files named docker-compose.yaml.

docker_manifest=$( find /tmp/tmp* -name "docker-compose.yaml" -printf '%T+ %pn' | sort | tail -n 1 | cut -d' ' -f2-)
echo $docker_manifest

Identify the location of the SageMaker local mode model artifacts: Next, find the underlying volume mounted to the local SageMaker inference container, which will be used in each EKS worker node after we upload the artifact to Amazon s3.

model_local_volume = $(grep -A1 -w "volumes:" $docker_manifest | tail -n 1 | tr -d ' ' | awk -F: '{print $1}' | cut -c 2-) 
# Returns something like: /tmp/tmpcr4bu_a7</p>

Create local copy of running SageMaker inference container: Next, we’ll find the currently running container image running our machine learning inference model and make a copy of the container locally. This will ensure we have our own copy of the container image to pull from Amazon ECR.

# Find container ID of running SageMaker Local container
mkdir sagemaker-container
container_id=$(docker ps --format "{{.ID}} {{.Image}}" | grep "tensorflow" | awk '{print $1}')
# Retrieve the files of the container locally
docker cp $my_container_id:/ sagemaker-container/

Before acting on the model_local_volume, which we’ll push to Amazon S3, push a copy of the running Docker image, now in the sagemaker-container directory, to Amazon Elastic Container Registry. Be sure to replace region, aws_account_id, docker_image_id and my-repository:tag or follow the Amazon ECR user guide. Also, be sure to take note of the final ECR Image URL (aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag), which we will use in our EKS deployment.

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com
docker build .
docker tag <docker-image-id> aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag
docker push aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag

Now that we have an ECR image corresponding to the inference endpoint, create a new Amazon S3 bucket and copy the SageMaker Local artifacts (model_local_volume) to this bucket. In parallel, create an Identity Access Management (IAM) that provides Amazon EC2 instances access to read objects within the bucket. Be sure to replace <unique-bucket-name> with a globally unique name for your Amazon S3 bucket.

# Create S3 Bucket for model artifacts
aws s3api create-bucket --bucket <unique-bucket-name>
aws s3api put-public-access-block --bucket <unique-bucket-name> --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# Step 2: Create IAM attachment to Node Group
cat > ec2_iam_policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::sagemaker-wavelength-demo-app/*",
        "arn:aws:s3:::sagemaker-wavelength-demo-app"
      ]
    }
  ]
}

# Create IAM policy
policy_arn=$(aws iam create-policy --policy-name sagemaker-demo-app-s3 --policy-document file://ec2_iam_policy.json --query Policy.Arn)
aws iam attach-role-policy --role-name wavelength-eks-Cluster-wl-workers --policy-arn $policy_arn

# Push model artifacts to S3
cd $model_local_volume
tar -cvf sagemaker_model.tar .
aws s3 cp sagemaker_model.tar s3://

Next, to ensure that each EC2 instance pulls a copy of the model artifact on launch, edit the user data for your EKS worker nodes. In your user data script, ensure that each node retrieves the model artifacts using the the S3 API at launch. Be sure to replace <unique-bucket-name> with a globally unique name for your Amazon S3 bucket. Given that the node’s user data will also include the EKS bootstrap script, the complete user data may look something like this.

#!/bin/bash
mkdir /tmp/model</p><p>cd /tmp/model
aws s3api get-object --bucket sagemaker-wavelength-demo-app --key sagemaker_model.tar  sagemaker_model.tar
tar -xvf sagemaker_model.tar
set -o xtrace
/etc/eks/bootstrap.sh <your-eks-cluster-id>

Now, you can inspect the existing docker manifest it and translate it to Kubernetes-friendly manifest files using Kompose, a well-known conversion tool. Note: if you get a version compatibility error, change the version attribute in line 27 of docker-compose.yml to “2”.

curl -L https://github.com/kubernetes/kompose/releases/download/v1.26.0/kompose-linux-amd64 -o kompose
chmod +x kompose && sudo mv ./kompose /usr/local/bin/compose
cd "$(dirname "$docker_manifest")"
kompose convert 

After running Kompose, you’ll see four new files: a Deployment object, Service object, PersistentVolumeClaim object, and NetworkPolicy object. You now have everything you need to begin your foray into Kubernetes at the edge!

Deploy SageMaker model artifacts

Make sure you have kubectl and aws-iam-authenticator downloaded to your AWS Cloud9 IDE. If not, follow the installation guides:

Now, complete the following steps:

Modify the service/algo-1-ow3nv object to switch the service type from ClusterIP to NodePort. In our example, we have selected port 30,007 as our NodePort:

# algo-1-ow3nv-service.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.26.0 (40646f47)
  creationTimestamp: null
  labels:
    io.kompose.service: algo-1-ow3nv
  name: algo-1-ow3nv
spec:
  type: NodePort
  ports:
    - name: "8080"
      port: 8080
      targetPort: 8080
      nodePort: 30007
  selector:
    io.kompose.service: algo-1-ow3nv
status:
  loadBalancer: {}

Next, you must allow the NodePort in the security group for your node. To do so, retrieve the security groupID and allow-list the NodePort:

node_group_sg=$(aws ec2 describe-security-groups --filters Name=group-name,Values='wavelength-eks-Cluster*' --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $node_group_sg --ip-permissions IpProtocol=tcp,FromPort=30007,ToPort=30007,IpRanges='[{CidrIp=0.0.0.0/0}]'

Next, modify the algo-1-ow3nv-deployment.yaml manifest to mount the /tmp/model hostPath directory to the container. Replace <your-ecr-image> with the ECR image you created earlier:

# algo-1-ow3nv-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.26.0 (40646f47)
  creationTimestamp: null
  labels:
    io.kompose.service: algo-1-ow3nv
  name: algo-1-ow3nv
spec:
  replicas: 1
  selector:
    matchLabels:
      io.kompose.service: algo-1-ow3nv
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.26.0 (40646f47)
      creationTimestamp: null
      labels:
        io.kompose.network/environment-sagemaker-local: "true"
        io.kompose.service: algo-1-ow3nv
    spec:
      containers:
        - args:
            - serve
          env:
            - name: SAGEMAKER_CONTAINER_LOG_LEVEL
              value: "20"
            - name: SAGEMAKER_PROGRAM
              value: inference.py
            - name: SAGEMAKER_REGION
              value: us-east-1
            - name: SAGEMAKER_SUBMIT_DIRECTORY
              value: /opt/ml/model/code
          image: <your-ecr-image>
          name: sagemaker-test-model
          ports:
            - containerPort: 8080
          resources: {}
          stdin: true
          tty: true
          volumeMounts:
            - mountPath: /opt/ml/model
              name: algo-1-ow3nv-claim0
      restartPolicy: Always
      volumes:
        - name: algo-1-ow3nv-claim0
          hostPath:
            path: /tmp/model
status: {}

With the manifest files you created from Kompose, use kubectl to apply the configs to your cluster:

$ kubectl apply -f algo-1-ow3nv-deployment.yaml algo-1-ow3nv-service.yaml
deployment.apps/algo-1-ow3nv created
service/algo-1-ow3nv created

Connect to the 5G edge model

To connect to your model, complete the following steps:

On the Amazon EC2 console, retrieve the carrier IP of the EKS worker node or use the AWS CLI to query the carrier IP address directly:

aws ec2 describe-instances --filters "Name=tag:aws:autoscaling:groupName,Values=eks-EKSNodeGroup*" --query 'Reservations[*].Instances[*].[Placement.AvailabilityZone,NetworkInterfaces[].Association.CarrierIp]' --output text
# Example Output: 155.146.1.12

Now, with the carrier IP address extracted, you can connect to the model directly using the NodePort. Create a file called invoke.py to invoke the BERT model directly by providing a text-based input that will be run against a sentiment-analyzer to determine whether the tone was positive or negative:

import json
endpoint_name="jumpstart-example-tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2"
request_body = "simply stupid , irrelevant and deeply , truly , bottomlessly cynical ".encode("utf-8")
import requests
r2=requests.post(url="http://155.146.1.12:30007/invocations", data=request_body, headers={"Content-Type":"application/x-text","Accept":"application/json;verbose"})
print(r2.text)

Your output should resemble the following:

{"probabilities": [0.998723, 0.0012769578], "labels": [0, 1], "predicted_label": 0}

Clean up

To destroy all application resources created, delete the AWS Wavelength worker nodes, the EKS control plane, and all the resources created within the VPC. Additionally, delete the ECR repo used to host the container image, the S3 buckets used to host the SageMaker model artifacts and the sagemaker-demo-app-s3 IAM policy.

Conclusion

In this post, we demonstrated a novel approach to deploying SageMaker models to the network edge using Amazon EKS and AWS Wavelength. To learn about Amazon EKS best practices on AWS Wavelength, refer to Deploy geo-distributed Amazon EKS clusters on AWS Wavelength. Additionally, to learn more about Jumpstart, visit the Amazon SageMaker JumpStart Developer Guide or the JumpStart Available Model Table.


About the Authors

 Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS Edge Computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking and the edge cloud.

Mohammed Al-Mehdar is a Senior Solutions Architect in the Worldwide Telecom Business Unit at AWS. His main focus is to help enable customers to build and deploy Telco and Enterprise IT workloads on AWS. Prior to joining AWS, Mohammed has been working in the Telco industry for over 13 years and brings a wealth of experience in the areas of LTE Packet Core, 5G, IMS and WebRTC. Mohammed holds a bachelor’s degree in Telecommunications Engineering from Concordia University.

Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He enjoys cooking and going on runs in New York City.

Justin St. Arnauld is an Associate Director – Solution Architects at Verizon for the Public Sector with over 15 years of experience in the IT industry. He is a passionate advocate for the power of edge computing and 5G networks and is an expert in developing innovative technology solutions that leverage these technologies. Justin is particularly enthusiastic about the capabilities offered by Amazon Web Services (AWS) in delivering cutting-edge solutions for his clients. In his free time, Justin enjoys keeping up-to-date with the latest technology trends and sharing his knowledge and insights with others in the industry.

Read More

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

Data is at the heart of machine learning (ML). Including relevant data to comprehensively represent your business problem ensures that you effectively capture trends and relationships so that you can derive the insights needed to drive business decisions. With Amazon SageMaker Canvas, you can now import data from over 40 data sources to be used for no-code ML. Canvas expands access to ML by providing business analysts with a visual interface that allows them to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code. Now, you can import data in-app from popular relational data stores such as Amazon Athena as well as third-party software as a service (SaaS) platforms supported by Amazon AppFlow such as Salesforce, SAP OData, and Google Analytics.

The process of gathering high-quality data for ML can be complex and time-consuming, because the proliferation of SaaS applications and data storage services has created a spread of data across a multitude of systems. For example, you may need to conduct a customer churn analysis using customer data from Salesforce, financial data from SAP, and logistics data from Snowflake. To create a dataset across these sources, you need to log into each application individually, select the desired data, and export it locally, where it can then be aggregated using a different tool. This dataset then needs to be imported into a separate application for ML.

With this launch, Canvas empowers you to capitalize on data stored in disparate sources by supporting in-app data import and aggregation from over 40 data sources. This feature is made possible through new native connectors to Athena and to Amazon AppFlow via the AWS Glue Data Catalog. Amazon AppFlow is a managed service that enables you to securely transfer data from third-party SaaS applications to Amazon Simple Storage Service (Amazon S3) and catalog the data with the Data Catalog with just a few clicks. After your data is transferred, you can simply access the data source within Canvas, where you can view table schemas, join tables within or across data sources, write Athena queries, and preview and import your data. After your data is imported, you can use existing Canvas functionalities such as building an ML model, viewing column impact data, or generating predictions. You can automate the data transfer process in Amazon AppFlow to activate on a schedule to ensure that you always have access to the latest data in Canvas.

Solution overview

The steps outlined in this post provide two examples of how to import data into Canvas for no-code ML. In the first example, we demonstrate how to import data through Athena. In the second example, we show how to import data from a third-party SaaS application via Amazon AppFlow.

Import data from Athena

In this section, we show an example of importing data in Canvas from Athena to conduct a customer segmentation analysis. We create an ML classification model to categorize our customer base into four different classes, with the end goal to use the model to predict which class a new customer will fall into. We follow three major steps: import the data, train a model, and generate predictions. Let’s get started.

Import the data

To import data from Athena, complete the following steps:

  1. On the Canvas console, choose Datasets in the navigation pane, then choose Import.
  2. Expand the Data Source menu and choose Athena.
  3. Choose the correct database and table that you want to import from. You can optionally preview the table by choosing the preview icon.

The following screenshot shows an example of the preview table.

In our example, we segment customers based on the marketing channel through which they have engaged our services. This is specified by the column segmentation, where A is print media, B is mobile, C is in-store promotions, and D is television.

  1. When you’re satisfied that you have the right table, drag the desired table into the Drag and drop datasets to join section.
  2. You can now optionally select or deselect columns, join tables by dragging another table into the Drag and drop datasets to join section, or write SQL queries to specify your data slice. For this post, we use all the data in the table.
  3. To import the data, choose Import data.

Your data is imported into Canvas as a dataset from the specific table in Athena.

Train a model

After your data is imported, it shows up on the Datasets page. At this stage, you can build a model. To do so, complete the following steps:

  1. Select your dataset and choose Create a model.
  2. For Model name, enter your model name (for this post, my_first_model).
  3. Canvas enables you to create models for predictive analysis, image analysis, and text analysis. Because we want to categorize customers, select Predictive analysis for Problem type.
  4. To proceed, choose Create.

On the Build page, you can see statistics about your dataset, such as the percentage of missing values and mean of the data.

  1. For Target column, choose a column (for this post, segmentation).

Canvas offers two types of models that can generate predictions. Quick build prioritizes speed over accuracy, providing a model in 2–15 minutes. Standard build prioritizes accuracy over speed, providing a model in 2–4 hours.

  1. For this post, choose Quick build.
  2. After the model is trained, you can analyze the model accuracy.

The following model categorizes customers correctly 94.67% of the time.

  1. You can optionally also view how each column impacts the categorization. In this example, as a customer ages, the column has less of an influence on the categorization. To generate predictions with your new model, choose Predict.

Generate predictions

On the Predict tab, you can generate both batch predictions and single predictions. Complete the following steps:

  1. For this post, choose Single prediction to understand what customer segmentation will result for a new customer.

For our prediction, we want to understand what segmentation a customer will be if they are 32 years old and a lawyer by profession.

  1. Replace the corresponding values with these inputs.
  2. Choose Update.

The updated prediction is displayed in the prediction window. In this example, a 32-year old lawyer is classified in segment D.

Import data from a third-party SaaS application to AWS

To import data from third-party SaaS applications into Canvas for no-code ML, you must first transfer data from the application to Amazon S3 via Amazon AppFlow. In this example, we transfer manufacturing data from SAP OData.

To transfer your data, complete the following steps:

  1. On the Amazon AppFlow console, choose Create flow.
  2. For Flow name, enter a name.
  3. Choose Next.
  4. For Source name, choose your desired third-party SaaS application (for this post, SAP OData).
  5. Choose Create new connection.
  6. In the Connect to SAP OData pop-up window, fill out the authentication details and choose Connect.
  7. For SAP OData object, choose the object containing your data within SAP OData.
  8. For Destination name, choose Amazon S3.
  9. For Bucket details, specify your S3 bucket details.
  10. Select Catalog your data in the AWS Glue Data Catalog.
  11. For User role, choose the AWS Identity and Access Management (IAM) role that the Canvas user will use to access the data from.
  12. For Flow trigger, select Run on demand.

Alternatively, you can automate the flow transfer by selecting Run flow on schedule.

  1. Choose Next.
  2. Choose how to map the fields and complete the field mapping. For this post, because there is no corresponding destination database to map to, there is no need to specify the mapping.
  3. Choose Next.

  4. Optionally, add filters if necessary to restrict data transferred.
  5. Choose Next.
  6. Review your details and choose Create flow.

When the flow is created, a green ribbon will populate at the top of the page indicating that it is successfully updated.

  1. Choose Run flow.

At this stage, you have successfully transferred your data from SAP OData to Amazon S3.

Now you can import the data from within the Canvas app. To import your data from Canvas, follow the same set of steps as described in the Data import section earlier in this post. For this example, on the Data source drop-down menu on the Data import page, you can see SAP OData listed.

You are now able to use all existing Canvas functionalities, such as cleaning your data, building an ML model, viewing column impact data, and generating predictions.

Clean up

To clean up the resources provisioned, log out of the Canvas application by choosing Log out in the navigation pane.

Conclusion

With Canvas, you can now import data for no-code ML from 47 data sources through native connectors with Athena and Amazon AppFlow via the AWS Glue Data Catalog. This process enables you to directly access and aggregate data across data sources within Canvas after data is transferred via Amazon AppFlow. You can automate the data transfer to activate on a schedule, which means that you don’t have to go through the process again to refresh your data. With this process, you can create new datasets with your latest data without having to leave the Canvas app. This feature is now available in all AWS Regions where Canvas is available. To get started with importing your data, navigate to the Canvas console and follow the steps outlined in this post. To learn more, refer to Connect to data sources.


About the authors

Brandon Nair is a Senior Product Manager for Amazon SageMaker Canvas. His professional interest lies in creating scalable machine learning services and applications. Outside of work he can be found exploring national parks, perfecting his golf swing or planning an adventure trip.

Sanjana Kambalapally is a Software Development Manager for AWS Sagemaker Canvas, which aims at democratizing machine learning by building no code ML applications.

Xin Xu is a software development engineer in the Canvas team, where he works on data preparation, among other aspects in no-code machine learning products. In his spare time, he enjoys jogging, reading and watching movies.

Volkan Unsal is a Sr. Frontend Engineer in the Canvas team, where he builds no-code products to make artificial intelligence accessible to humans. In his spare time, he enjoys running, reading, watching e-sports, and martial arts.

Read More

Predicting new and existing product sales in semiconductors using Amazon Forecast

Predicting new and existing product sales in semiconductors using Amazon Forecast

This is a joint post by NXP SEMICONDUCTORS N.V. & AWS Machine Learning Solutions Lab (MLSL)

Machine learning (ML) is being used across a wide range of industries to extract actionable insights from data to streamline processes and improve revenue generation. In this post, we demonstrate how NXP, an industry leader in the semiconductor sector, collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use ML techniques to optimize the allocation of the NXP research and development (R&D) budget to maximize their long-term return on investment (ROI).

NXP directs its R&D efforts largely to the development of new semiconductor solutions where they see significant opportunities for growth. To outpace market growth, NXP invests in research and development to extend or create leading market positions, with an emphasis on fast-growing, sizable market segments. For this engagement, they sought to generate monthly sales forecasts for new and existing products across different material groups and business lines. In this post, we demonstrate how the MLSL and NXP employed Amazon Forecast and other custom models for long-term sales predictions for various NXP products.

“We engaged with the team of scientists and experts at [the] Amazon Machine Learning Solutions Lab to build a solution for predicting new product sales and understand if and which additional features could help inform [the] decision-making process for optimizing R&D spending. Within just a few weeks, the team delivered multiple solutions and analyses across some of our business lines, material groups, and on [an] individual product level. MLSL delivered a sales forecast model, which complements our current way of manual forecasting, and helped us model the product lifecycle with novel machine learning approaches using Amazon Forecast and Amazon SageMaker. While keeping a constant collaborative workstream with our team, MLSL helped us with upskilling our professionals when it comes to scientific excellence and best practices on ML development using AWS infrastructure.”

– Bart Zeeman, Strategist and Analyst at CTO office in NXP Semiconductors.

Goals and use case

The goal of the engagement between NXP and the MLSL team is to predict the overall sales of NXP in various end markets. In general, the NXP team is interested in macro-level sales that include the sales of various business lines (BLs), which contain multiple material groups (MAGs). Furthermore, the NXP team is also interested in predicting the product lifecycle of newly introduced products. The lifecycle of a product is divided into four different phases (Introduction, Growth, Maturity, and Decline). The product lifecycle prediction enables the NXP team to identify the revenue generated by each product to further allocate R&D funding to the products generating the highest amounts of sales or products with the highest potential to maximize the ROI for R&D activity. Additionally, they can predict the long-term sales on a micro level, which gives them a bottom-up look on how their revenue changes over time.

In the following sections, we present the key challenges associated with developing robust and efficient models for long-term sales forecasts. We further describe the intuition behind various modeling techniques employed to achieve the desired accuracy. We then present the evaluation of our final models, where we compare the performance of the proposed models in terms of sales prediction with the market experts at NXP. We also demonstrate the performance of our state-of-the-art point cloud-based product lifecycle prediction algorithm.

Challenges

One of the challenges we faced while using fine-grained or micro-level modeling like product-level models for sale prediction was missing sales data. The missing data is the result of lack of sales during every month. Similarly, for macro-level sales prediction, the length of the historical sales data was limited. Both the missing sales data and the limited length of historical sales data pose significant challenges in terms of model accuracy for long-term sales prediction into 2026. We observed during the exploratory data analysis (EDA) that as we move from micro-level sales (product level) to macro-level sales (BL level), missing values become less significant. However, the maximum length of historical sales data (maximum length of 140 months) still posed significant challenges in terms of model accuracy.

Modeling techniques

After EDA, we focused on forecasting at the BL and MAG levels and at the product level for one of the largest end markets (the automobile end market) for NXP. However, the solutions we developed can be extended to other end markets. Modeling at the BL, MAG, or product level has its own pros and cons in terms of model performance and data availability. The following table summarizes such pros and cons for each level. For macro-level sales prediction, we employed the Amazon Forecast AutoPredictor for our final solution. Similarly, for micro-level sales prediction, we developed a novel point cloud-based approach.

Macro sales prediction (top-down)

To predict the long terms sales values (2026) at the macro level, we tested various methods, including Amazon Forecast, GluonTS, and N-BEATS (implemented in GluonTS and PyTorch). Overall, Forecast outperformed all other methods based on a backtesting approach (described in the Evaluation Metrics section later in this post) for macro-level sales prediction. We also compared the accuracy of AutoPredictor against human predictions.

We also proposed using N-BEATS due to its interpretative properties. N-BEATS is based on a very simple but powerful architecture that uses an ensemble of feedforward networks that employ the residual connections with stacked residual blocks for forecasting. This architecture further encodes the inductive bias in its architecture to make the time series model capable of extracting trend and seasonality (see the following figure). These interpretations were generated using PyTorch Forecasting.

Micro sales prediction (bottom-up)

In this section, we discuss a novel method developed to predict the product lifecycle shown in the following figure while taking into consideration the cold start product. We implemented this method using PyTorch on Amazon SageMaker Studio. First, we introduced a point cloud-based method. This method first converts sales data into a point cloud, where each point represents sales data at a certain age of the product. The point cloud-based neural network model is further trained using this data to learn the parameters of the product lifecycle curve (see the following figure). In this approach, we also incorporated additional features, including product description as a bag of words to tackle the cold start problem for predicting the product lifecycle curve.

Time series as point cloud-based product lifecycle prediction

We developed a novel point cloud-based approach to predict the product lifecycle and micro-level sales predictions. We also incorporated additional features to further improve the model accuracy for the cold start product lifecycle predictions. These features include product fabrication techniques and other related categorical information related to the products. Such additional data can help the model predict sales of a new product even before the product is released on the market (cold start). The following figure demonstrates the point cloud-based approach. The model takes the normalized sales and age of the product (number of months since the product is launched) as input. Based on these inputs, the model learns parameters during the training using gradient descent. During the forecast phase, the parameters along with the features of a cold start product are used for predicting the lifecycle. The large number of missing values in the data at the product level negatively impacts nearly all of the existing time series models. This novel solution is based on the ideas of lifecycle modeling and treating time series data as point clouds to mitigate the missing values.

The following figure demonstrates how our point cloud-based lifecycle method addresses the missing data values and is capable of predicting the product lifecycle with very few training samples. The X-axis represents the age in time, and the Y-axis represents the sales of a product. Orange dots represent the training samples, green dots represent the testing samples, and the blue line demonstrates the predicted lifecycle of a product by the model.

Methodology

To predict macro-level sales, we employed Amazon Forecast among other techniques. Similarly, for micro sales, we developed a state-of-the-art point cloud-based custom model. Forecast outperformed all other methods in terms of model performance. We used Amazon SageMaker notebook instances to create a data processing pipeline that extracted training examples from Amazon Simple Storage Service (Amazon S3). The training data was further used as input for Forecast to train a model and predict long-term sales.

Training a time series model using Amazon Forecast consists of three main steps. In the first step, we imported the historical data into Amazon S3. Second, a predictor was trained using the historical data. Finally, we deployed the trained predictor to generate the forecast. In this section, we provide a detailed explanation along with code snippets of each step.

We started by extracting the latest sales data. This step included uploading the dataset to Amazon S3 in the correct format. Amazon Forecast takes three columns as inputs: timestamp, item_id, and target_value (sales data). The timestamp column contains the time of sales, which could be formatted as hourly, daily, and so on. The item_id column contains the name of the sold items, and the target_value column contains sales values. Next, we used the path of training data located in Amazon S3, defined the time series dataset frequency (H, D, W, M, Y), defined a dataset name, and identified the attributes of the dataset (mapped the respective columns in the dataset and their data types). Next, we called the create_dataset function from the Boto3 API to create a dataset with attributes such as Domain, DatasetType, DatasetName, DatasetFrequency, and Schema. This function returned a JSON object that contained the Amazon Resource Name (ARN). This ARN was subsequently used in the following steps. See the following code:

dataset_path = "PATH_OF_DATASET_IN_S3"
DATASET_FREQUENCY = "M" # Frequency of dataset (H, D, W, M, Y) 
TS_DATASET_NAME = "NAME_OF_THE_DATASET"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
       {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"float"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']

After the dataset was created, it was imported into Amazon Forecast using the Boto3 create_dataset_import_job function. The create_dataset_import_job function takes the job name (a string value), the ARN of the dataset from the previous step, the location of the training data in Amazon S3 from the previous step, and the time stamp format as arguments. It returns a JSON object containing the import job ARN. See the following code:

TIMESTAMP_FORMAT = "yyyy-MM-dd"
TS_IMPORT_JOB_NAME = "SALES_DATA_IMPORT_JOB_NAME"

ts_dataset_import_job_response = 
    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts_s3_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       TimestampFormat=TIMESTAMP_FORMAT,
                                       TimeZone = TIMEZONE)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']

The imported dataset was then used to create a dataset group using the create_dataset_group function. This function takes the domain (string values defining the domain of the forecast), dataset group name, and the dataset ARN as inputs:

DATASET_GROUP_NAME = "SALES_DATA_GROUP_NAME"
DATASET_ARNS = [ts_dataset_arn]

create_dataset_group_response = 
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=DATASET_GROUP_NAME,
                                  DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']

Next, we used the dataset group to train forecasting models. Amazon Forecast offers various state-of-the-art models; any of these models can be used for training. We used AutoPredictor as our default model. The main advantage of using AutoPredictor is that it automatically generates the item-level forecast, using the optimal model from an ensemble of six state-of-the-art models based on the input dataset. The Boto3 API provides the create_auto_predictor function for training an auto prediction model. The input parameters of this function are PredictorName, ForecastHorizon, and ForecastFrequency. Users are also responsible for selecting the forecast horizon and frequency. The forecast horizon represents the window size of the future prediction, which can be formatted hours, days, weeks, months, and so on. Similarly, forecast frequency represents the granularity of the forecast values, such as hourly, daily, weekly, monthly, or yearly. We mainly focused on predicting monthly sales of NXP on various BLs. See the following code:

PREDICTOR_NAME = "SALES_PREDICTOR"
FORECAST_HORIZON = 24
FORECAST_FREQUENCY = "M"

create_auto_predictor_response = 
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                    })

predictor_arn = create_auto_predictor_response['PredictorArn']

The trained predictor was then used to generate forecast values. Forecasts were generated using the create_forecast function from the previously trained predictor. This function takes the name of the forecast and the ARN of the predictor as inputs and generates the forecast values for the horizon and frequency defined in the predictor:

FORECAST_NAME = "SALES_FORECAST"

create_forecast_response = 
    forecast.create_forecast(ForecastName=FORECAST_NAME,
                             PredictorArn=predictor_arn)

Amazon Forecast is a fully managed service that automatically generates training and test datasets and provides various accuracy metrics to evaluate the reliability of the model-generated forecast. However, to build consensus on the predicted data and compare the predicted values with human predictions, we divided our historic data into training data and validation data manually. We trained the model using the training data without exposing the model to validation data and generated the prediction for the length of validation data. The validation data was compared with the predicted values to evaluate the model performance. Validation metrics may include mean absolute percent error (MAPE) and weighted absolute percent error (WAPE), among others. We used WAPE as our accuracy metric, as discussed in the next section.

Evaluation metrics

We first verified the model performance using backtesting to validate the prediction of our forecast model for long term sales forecast (2026 sales). We evaluated the model performance using the WAPE. The lower the WAPE value, the better the model. The key advantage of using WAPE over other error metrics like MAPE is that WAPE weighs the individual impact of each item’s sale. Therefore, it accounts for each product’s contribution to the total sale while calculating the overall error. For example, if you make an error of 2% on a product that generates $30 million and an error of 10% in a product that generates $50,000, your MAPE will not tell the entire story. The 2% error is actually costlier than the 10% error, something you can’t tell by using MAPE. Comparatively, WAPE will account for these differences. We also predicted various percentile values for the sales to demonstrate the upper and lower bounds of the model forecast.

Macro-level sales prediction model validation

Next, we validated the model performance in terms of WAPE values. We calculated the WAPE value of a model by splitting the data into test and validation sets. For example, in the 2019 WAPE value, we trained our model using sales data between 2011–2018 and predicted sales values for the next 12 months (2019 sale). Next, we calculated the WAPE value using the following formula:

We repeated the same procedure to calculate the WAPE value for 2020 and 2021. We evaluated the WAPE for all BLs in the auto end market for 2019, 2020, and 2021. Overall, we observed that Amazon Forecast can achieve a 0.33 WAPE value even for the year of 2020 (during the COVID-19 pandemic). In 2019 and 2020, our model achieved less than 0.1 WAPE values, demonstrating high accuracy.

Macro-level sales prediction baseline comparison

We compared the performance of the macro sales prediction models developed using Amazon Forecast to three baseline models in terms of WAPE value for 2019, 2020 and 2021 (see the following figure). Amazon Forecast either significantly outperformed the other baseline models or performed on par for all 3 years. These results further validate the effectiveness of our final model predictions.

Macro-level sales prediction model vs. human predictions

To further validate the confidence of our macro-level model, we next compared the performance of our model with the human-predicted sales values. At the beginning of the fourth quarter every year, market experts at NXP predict the sales value of each BL, taking into consideration global market trends as well as other global indicators that could potentially impact the sales of NXP products. We compare the percent error of the model prediction vs. human prediction to the actual sales values in 2019, 2020, and 2021. We trained three models using data from 2011–2018 and predicted the sales values until 2021. We next calculated the MAPE for the actual sales values. We then used the human-predicted values by the end of 2018 (test the model forecast 1Y ahead to 3Y ahead forecast). We repeated this process to predict the values in 2019 (1Y ahead forecast to 2Y ahead forecast) and 2020 (for 1Y ahead forecast). Overall, the model performed on par with the human predictors or better in some cases. These results demonstrate the effectiveness and reliability of our model.

Micro-level sales prediction and product lifecycle

The following figure depicts how the model behaves using product data while having access to very few observations for each product (namely one or two observations at the input for product lifecycle prediction). The orange dots represent the training data, the green dots represent the testing data, and the blue line represents the model predicted product lifecycle.

The model can be fed more observations for context without the need for re-training as new sales data become available. The following figure demonstrates how the model behaves if it is given more context. Ultimately, more context leads to lower WAPE values.

In addition, we managed to incorporate additional features for each product, including fabrication techniques and other categorical information. In this regard, external features helped reduce the WAPE value in the low-context regime (see the following figure). There are two explanations for this behavior. First, we need to let the data speak for itself in the high-context regimes. The additional features can interfere with this process. Second, we need better features. We used 1,000 dimensional one-hot-encoded features (bag of words). The conjecture is that better feature engineering techniques can help reduce WAPE even further.

Such additional data can help the model predict sales of new products even before the product is released on the market. For example, in the following figure, we plot how much mileage we can get only out of external features.

Conclusion

In this post, we demonstrated how the MLSL and NXP teams worked together to predict macro- and micro-level long-term sales for NXP. The NXP team will now learn how to use these sales predictions in their processes—for example, to use it as input for R&D funding decisions and enhance ROI. We used Amazon Forecast to predict the sales for business lines (macro sales), which we referred to as the top-down approach. We also proposed a novel approach using time series as a point cloud to tackle the challenges of missing values and cold start at the product level (micro level). We referred to this approach as bottom-up, where we predicted the monthly sales of each product. We further incorporated external features of each product to enhance the performance of the model for cold start.

Overall, the models developed during this engagement performed on par compared to human prediction. In some cases, the models performed better than human predictions in the long term. These results demonstrate the effectiveness and reliability of our models.

This solution can be employed for any forecasting problem. For further assistance in terms of designing and developing ML solutions, please free to get in touch with the MLSL team.


About the authors

Souad Boutane is a data scientist at NXP-CTO, where she is transforming various data into meaningful insights to support business decision using advanced tools and techniques.

Ben Fridolin is a data scientist at NXP-CTO, where he coordinates on accelerating AI and cloud adoption. He focuses on machine learning, deep learning and end-to-end ML solutions.

Cornee Geenen is a project lead in the Data Portfolio of NXP supporting the organization in it’s digital transformation towards becoming data centric.

Bart Zeeman is a strategist with a passion for data & analytics at NXP-CTO where he is driving for better data driven decisions for more growth and innovation.

Ahsan Ali is an Applied Scientist at the Amazon Machine Learning Solutions Lab, where he works with customers from different domains to solve their urgent and expensive problems using state-of-the-art AI/ML techniques.

Yifu Hu is an Applied Scientist in the Amazon Machine Learning Solutions lab, where he helps design creative ML solutions to address customers’ business problems in various industries.

Mehdi Noori is an Applied Science Manager at Amazon ML Solutions Lab, where he helps develop ML solutions for large organizations across various industries and leads the Energy vertical. He is passionate about using AI/ML to help customers achieve their Sustainability goals.

Huzefa Rangwala is a Senior Applied Science Manager at AIRE, AWS. He leads a team of scientists and engineers to enable machine learning based discovery of data assets. His research interests are in responsible AI, federated learning and applications of ML in health care and life sciences.

Read More

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

The rise of text and semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by unified text and image can provide extra flexibility in search solutions. You can use both text and images as queries. For example, you have a folder of hundreds of family pictures in your laptop. You want to quickly find a picture that was taken when you and your best friend were in front of your old house’s swimming pool. You can use conversational language like “two people stand in front of a swimming pool” as a query to search in a unified text and image search engine. You don’t need to have the right keywords in image titles to perform the query.

Amazon OpenSearch Service now supports the cosine similarity metric for k-NN indexes. Cosine similarity measures the cosine of the angle between two vectors, where a smaller cosine angle denotes a higher similarity between the vectors. With cosine similarity, you can measure the orientation between two vectors, which makes it a good choice for some specific semantic search applications.

Contrastive Language-Image Pre-Training (CLIP) is a neural network trained on a variety of image and text pairs. The CLIP neural network is able to project both images and text into the same latent space, which means that they can be compared using a similarity measure, such as cosine similarity. You can use CLIP to encode your products’ images or description into embeddings, and then store them into an OpenSearch Service k-NN index. Then your customers can query the index to retrieve products that they’re interested in.

You can use CLIP with Amazon SageMaker to perform encoding. Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. With SageMaker, you can deploy serverless for dev and test, and then move to real-time inference when you go to production. SageMaker serverless helps you save cost by scaling down infrastructure to 0 during idle times. This is perfect for building a POC, where you will have long idle times between development cycles. You can also use Amazon SageMaker batch transform to get inferences from large datasets.

In this post, we demonstrate how to build a search application using CLIP with SageMaker and OpenSearch Service. The code is open source, and it is hosted on GitHub.

Solution overview

OpenSearch Service provides text-matching and embedding k-NN search. We use embedding k-NN search in this solution. You can use both image and text as a query to search items from the inventory. Implementing this unified image and text search application consists of two phases:

  • k-NN reference index – In this phase, you pass a set of corpus documents or product images through a CLIP model to encode them into embeddings. Text and image embeddings are numerical representations of the corpus or images, respectively. You save those embeddings into a k-NN index in OpenSearch Service. The concept underpinning k-NN is that similar data points exist in close proximity in the embedding space. As an example, the text “a red flower,” the text “rose,” and an image of red rose are similar, so these text and image embeddings are close to each other in the embedding space.
  • k-NN index query – This is the inference phase of the application. In this phase, you submit a text search query or image search query through the deep learning model (CLIP) to encode as embeddings. Then, you use those embeddings to query the reference k-NN index stored in OpenSearch Service. The k-NN index returns similar embeddings from the embedding space. For example, if you pass the text of “a red flower,” it would return the embeddings of a red rose image as a similar item.

The following figure illustrates the solution architecture.

Solution Diagram

The workflow steps are as follows:

  1. Create a SageMaker model from a pretrained CLIP model for batch and real-time inference.
  2. Generate embeddings of product images using a SageMaker batch transform job.
  3. Use SageMaker Serverless Inference to encode query image and text into embeddings in real time.
  4. Use Amazon Simple Storage Service (Amazon S3) to store the raw text (product description) and images (product images) and image embedding generated by the SageMaker batch transform jobs.
  5. Use OpenSearch Service as the search engine to store embeddings and find similar embeddings.
  6. Use a query function to orchestrate encoding the query and perform a k-NN search.

We use Amazon SageMaker Studio notebooks (not shown in the diagram) as the integrated development environment (IDE) to develop the solution.

Set up solution resources

To set up the solution, complete the following steps:

  1. Create a SageMaker domain and a user profile. For instructions, refer to Step 5 of Onboard to Amazon SageMaker Domain Using Quick setup.
  2. Create an OpenSearch Service domain. For instructions, see Creating and managing Amazon OpenSearch Service domains.

You can also use an AWS CloudFormation template by following the GitHub instructions to create a domain.

You can connect Studio to Amazon S3 from Amazon Virtual Private Cloud (Amazon VPC) using an interface endpoint in your VPC, instead of connecting over the internet. By using an interface VPC endpoint (interface endpoint), the communication between your VPC and Studio is conducted entirely and securely within the AWS network. Your Studio notebook can connect to OpenSearch Service over a private VPC to ensure secure communication.

OpenSearch Service domains offer encryption of data at rest, which is a security feature that helps prevent unauthorized access to your data. Node-to-node encryption provides an additional layer of security on top of the default features of OpenSearch Service. Amazon S3 automatically applies server-side encryption (SSE-S3) for each new object unless you specify a different encryption option.

In the OpenSearch Service domain, you can attach identity-based policies define who can access a service, which actions they can perform, and if applicable, the resources on which they can perform those actions.

Encode images and text pairs into embeddings

This section discusses how to encode images and text into embeddings. This includes preparing data, creating a SageMaker model, and performing batch transform using the model.

Data overview and preparation

You can use a SageMaker Studio notebook with a Python 3 (Data Science) kernel to run the sample code.

For this post, we use the Amazon Berkeley Objects Dataset. The dataset is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalogue images. We only use the item images and item names in US English. For demo purposes, we use approximately 1,600 products. For more details about this dataset, refer to the README. The dataset is hosted in a public S3 bucket. There are 16 files that include product description and metadata of Amazon products in the format of listings/metadata/listings_<i>.json.gz. We use the first metadata file in this demo.

You use pandas to load the metadata, then select products that have US English titles from the data frame. Pandas is an open-source data analysis and manipulation tool built on top of the Python programming language. You use an attribute called main_image_id to identify an image. See the following code:

meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_0.json.gz", lines=True)
def func_(x):
    us_texts = [item["value"] for item in x if item["language_tag"] == "en_US"]
    return us_texts[0] if us_texts else None
 
meta = meta.assign(item_name_in_en_us=meta.item_name.apply(func_))
meta = meta[~meta.item_name_in_en_us.isna()][["item_id", "item_name_in_en_us", "main_image_id"]]
print(f"#products with US English title: {len(meta)}")
meta.head()

There are 1,639 products in the data frame. Next, link the item names with the corresponding item images. images/metadata/images.csv.gz contains image metadata. This file is a gzip-compressed CSV file with the following columns: image_id, height, width, and path. You can read the metadata file and then merge it with item metadata. See the following code:

image_meta = pd.read_csv("s3://amazon-berkeley-objects/images/metadata/images.csv.gz")
dataset = meta.merge(image_meta, left_on="main_image_id", right_on="image_id")
dataset.head()

data sample

You can use the SageMaker Studio notebook Python 3 kernel built-in PIL library to view a sample image from the dataset:

from sagemaker.s3 import S3Downloader as s3down
from pathlib import Path
from PIL import Image
 
def get_image_from_item_id(item_id = "B0896LJNLH", return_image=True):
    s3_data_root = "s3://amazon-berkeley-objects/images/small/"
 
    item_idx = dataset.query(f"item_id == '{item_id}'").index[0]
    s3_path = dataset.iloc[item_idx].path
    local_data_root = f'./data/images'
    local_file_name = Path(s3_path).name
 
    s3down.download(f'{s3_data_root}{s3_path}', local_data_root)
 
    local_image_path = f"{local_data_root}/{local_file_name}"
    if return_image:
        img = Image.open(local_image_path)
        return img, dataset.iloc[item_idx].item_name_in_en_us
    else:
        return local_image_path, dataset.iloc[item_idx].item_name_in_en_us
image, item_name = get_image_from_item_id()
print(item_name)
image

glass cup and title

Model preparation

Next, create a SageMaker model from a pretrained CLIP model. The first step is to download the pre-trained model weighting file, put it into a model.tar.gz file, and upload it to an S3 bucket. The path of the pretrained model can be found in the CLIP repo. We use a pretrained ResNet-50 (RN50) model in this demo. See the following code:

%%writefile build_model_tar.sh
#!/bin/bash
 
MODEL_NAME=RN50.pt
MODEL_NAME_URL=https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt
 
BUILD_ROOT=/tmp/model_path
S3_PATH=s3://<your-bucket>/<your-prefix-for-model>/model.tar.gz
 
 
rm -rf $BUILD_ROOT
mkdir $BUILD_ROOT
cd $BUILD_ROOT && curl -o $BUILD_ROOT/$MODEL_NAME $MODEL_NAME_URL
cd $BUILD_ROOT && tar -czvf model.tar.gz .
aws s3 cp $BUILD_ROOT/model.tar.gz  $S3_PATH
!bash build_model_tar.sh

You then need to provide an inference entry point script for the CLIP model. CLIP is implemented using PyTorch, so you use the SageMaker PyTorch framework. PyTorch is an open-source ML framework that accelerates the path from research prototyping to production deployment. For information about deploying a PyTorch model with SageMaker, refer to Deploy PyTorch Models. The inference code accepts two environment variables: MODEL_NAME and ENCODE_TYPE. This helps us switch between different CLIP model easily. We use ENCODE_TYPE to specify if we want to encode an image or a piece of text. Here, you implement the model_fn, input_fn, predict_fn, and output_fn functions to override the default PyTorch inference handler. See the following code:

!mkdir -p code
%%writefile code/clip_inference.py
 
import io
import torch
import clip
from PIL import Image
import json
import logging
import sys
import os
 
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.transforms import ToTensor
 
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))
 
MODEL_NAME = os.environ.get("MODEL_NAME", "RN50.pt")
# ENCODE_TYPE could be IMAGE or TEXT
ENCODE_TYPE = os.environ.get("ENCODE_TYPE", "TEXT")
 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
# defining model and loading weights to it.
def model_fn(model_dir):
    model, preprocess = clip.load(os.path.join(model_dir, MODEL_NAME), device=device)
    return {"model_obj": model, "preprocess_fn": preprocess}
 
def load_from_bytearray(request_body):
    
    return image
 
# data loading
def input_fn(request_body, request_content_type):
    assert request_content_type in (
        "application/json",
        "application/x-image",
    ), f"{request_content_type} is an unknown type."
    if request_content_type == "application/json":
        data = json.loads(request_body)["inputs"]
    elif request_content_type == "application/x-image":
        image_as_bytes = io.BytesIO(request_body)
        data = Image.open(image_as_bytes)
    return data
 
# inference
def predict_fn(input_object, model):
    model_obj = model["model_obj"]
    # for image preprocessing
    preprocess_fn = model["preprocess_fn"]
    assert ENCODE_TYPE in ("TEXT", "IMAGE"), f"{ENCODE_TYPE} is an unknown encode type."
 
    # preprocessing
    if ENCODE_TYPE == "TEXT":
        input_ = clip.tokenize(input_object).to(device)
    elif ENCODE_TYPE == "IMAGE":
        input_ = preprocess_fn(input_object).unsqueeze(0).to(device)
 
    # inference
    with torch.no_grad():
        if ENCODE_TYPE == "TEXT":
            prediction = model_obj.encode_text(input_)
        elif ENCODE_TYPE == "IMAGE":
            prediction = model_obj.encode_image(input_)
    return prediction
  
# Serialize the prediction result into the desired response content type
def output_fn(predictions, content_type):
    assert content_type == "application/json"
    res = predictions.cpu().numpy().tolist()
return json.dumps(res)

The solution requires additional Python packages during model inference, so you can provide a requirements.txt file to allow SageMaker to install additional packages when hosting models:

%%writefile code/requirements.txt
ftfy
regex
tqdm
git+https://github.com/openai/CLIP.git

You use the PyTorchModel class to create an object to contain the information of the model artifacts’ Amazon S3 location and the inference entry point details. You can use the object to create batch transform jobs or deploy the model to an endpoint for online inference. See the following code:

from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role, Session
 
role = get_execution_role()
shared_params = dict(
    entry_point="clip_inference.py",
    source_dir="code",
    role=role,
    model_data="s3://<your-bucket>/<your-prefix-for-model>/model.tar.gz",
    framework_version="1.9.0",
    py_version="py38",
)
 
clip_image_model = PyTorchModel(
    env={'MODEL_NAME': 'RN50.pt', "ENCODE_TYPE": "IMAGE"},
    name="clip-image-model",
    **shared_params
)
 
clip_text_model = PyTorchModel(
    env={'MODEL_NAME': 'RN50.pt', "ENCODE_TYPE": "TEXT"},
    name="clip-text-model",
    **shared_params
)

Batch transform to encode item images into embeddings

Next, we use the CLIP model to encode item images into embeddings, and use SageMaker batch transform to run batch inference.

Before creating the job, use the following code snippet to copy item images from the Amazon Berkeley Objects Dataset public S3 bucket to your own bucket. The operation takes less than 10 minutes.

from multiprocessing.pool import ThreadPool
import boto3
from tqdm import tqdm
from urllib.parse import urlparse
 
s3_sample_image_root = "s3://<your-bucket>/<your-prefix-for-sample-images>"
s3_data_root = "s3://amazon-berkeley-objects/images/small/"
 
client = boto3.client('s3')
 
def upload_(args):
    client.copy_object(CopySource=args["source"], Bucket=args["target_bucket"], Key=args["target_key"])
 
arugments = []
for idx, record in dataset.iterrows():
    argument = {}
    argument["source"] = (s3_data_root + record.path)[5:]
    argument["target_bucket"] = urlparse(s3_sample_image_root).netloc
    argument["target_key"] = urlparse(s3_sample_image_root).path[1:] + record.path
    arugments.append(argument)
 
with ThreadPool(4) as p:
    r = list(tqdm(p.imap(upload_, arugments), total=len(dataset)))

Next, you perform inference on the item images in a batch manner. The SageMaker batch transform job uses the CLIP model to encode all the images stored in the input Amazon S3 location and uploads output embeddings to an output S3 folder. The job takes around 10 minutes.

batch_input = s3_sample_image_root + "/"
output_path = f"s3://<your-bucket>/inference/output"
 
clip_image_transformer = clip_image_model.transformer(
    instance_count=1,
    instance_type="ml.c5.xlarge",
    strategy="SingleRecord",
    output_path=output_path,
)
 
clip_image_transformer.transform(
    batch_input, 
    data_type="S3Prefix",
    content_type="application/x-image", 
    wait=True,
)

Load embeddings from Amazon S3 to a variable, so you can ingest the data into OpenSearch Service later:

embedding_root_path = "./data/embedding"
s3down.download(output_path, embedding_root_path)
 
embeddings = []
for idx, record in dataset.iterrows():
    embedding_file = f"{embedding_root_path}/{record.path}.out"
    embeddings.append(json.load(open(embedding_file))[0])

Create an ML-powered unified search engine

This section discusses how to create a search engine that that uses k-NN search with embeddings. This includes configuring an OpenSearch Service cluster, ingesting item embedding, and performing free text and image search queries.

Set up the OpenSearch Service domain using k-NN settings

Earlier, you created an OpenSearch cluster. Now you’re going to create an index to store the catalog data and embeddings. You can configure the index settings to enable the k-NN functionality using the following configuration:

index_settings = {
  "settings": {
    "index.knn": True,
    "index.knn.space_type": "cosinesimil"
  },
  "mappings": {
    "properties": {
      "embeddings": {
        "type": "knn_vector",
        "dimension": 1024 #Make sure this is the size of the embeddings you generated, for RN50, it is 1024
      }
    }
  }
}

This example uses the Python Elasticsearch client to communicate with the OpenSearch cluster and create an index to host your data. You can run %pip install elasticsearch in the notebook to install the library. See the following code:

import boto3
import json
from requests_aws4auth import AWS4Auth
from elasticsearch import Elasticsearch, RequestsHttpConnection
 
def get_es_client(host = "<your-opensearch-service-domain-url>",
    port = 443,
    region = "<your-region>",
    index_name = "clip-index"):
 
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key,
                       credentials.secret_key,
                       region,
                       'es',
                       session_token=credentials.token)
 
    headers = {"Content-Type": "application/json"}
 
    es = Elasticsearch(hosts=[{'host': host, 'port': port}],
                       http_auth=awsauth,
                       use_ssl=True,
                       verify_certs=True,
                       connection_class=RequestsHttpConnection,
                       timeout=60 # for connection timeout errors
    )
    return es
es = get_es_client()
es.indices.create(index=index_name, body=json.dumps(index_settings))

Ingest image embedding data into OpenSearch Service

You now loop through your dataset and ingest items data into the cluster. The data ingestion for this practice should finish within 60 seconds. It also runs a simple query to verify if the data has been ingested into the index successfully. See the following code:

# ingest_data_into_es
 
for idx, record in tqdm(dataset.iterrows(), total=len(dataset)):
    body = record[['item_name_in_en_us']].to_dict()
    body['embeddings'] = embeddings[idx]
    es.index(index=index_name, id=record.item_id, doc_type='_doc', body=body)
 
# Check that data is indeed in ES
res = es.search(
    index=index_name, body={
        "query": {
                "match_all": {}
    }},
    size=2)
assert len(res["hits"]["hits"]) > 0

Perform a real-time query

Now that you have a working OpenSearch Service index that contains embeddings of item images as our inventory, let’s look at how you can generate embedding for queries. You need to create two SageMaker endpoints to handle text and image embeddings, respectively.

You also create two functions to use the endpoints to encode images and texts. For the encode_text function, you add this is before an item name to translate an item name to a sentence for item description. memory_size_in_mb is set as 6 GB to serve the underline Transformer and ResNet models. See the following code:

text_predictor = clip_text_model.deploy(
    instance_type='ml.c5.xlarge',
    initial_instance_count=1,
    serverless_inference_config=ServerlessInferenceConfig(memory_size_in_mb=6144),
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
    wait=True
)
 
image_predictor = clip_image_model.deploy(
    instance_type='ml.c5.xlarge',
    initial_instance_count=1,
    serverless_inference_config=ServerlessInferenceConfig(memory_size_in_mb=6144),
    serializer=IdentitySerializer(content_type="application/x-image"),
    deserializer=JSONDeserializer(),
    wait=True
)
 
def encode_image(file_name="./data/images/0e9420c6.jpg"):    
    with open(file_name, "rb") as f:
        payload = f.read()
        payload = bytearray(payload)
    res = image_predictor.predict(payload)
    return res[0]
 
def encode_name(item_name):
    res = text_predictor.predict({"inputs": [f"this is a {item_name}"]})
    return res[0]
 

You can firstly plot the picture that will be used.

item_image_path, item_name = get_image_from_item_id(item_id = "B0896LJNLH", return_image=False)
feature_vector = encode_image(file_name=item_image_path)
print(feature_vector.shape)
Image.open(item_image_path)

glass cup

Let’s look at the results of a simple query. After retrieving results from OpenSearch Service, you get the list of item names and images from dataset:

def search_products(embedding, k = 3):
    body = {
        "size": k,
        "_source": {
            "exclude": ["embeddings"],
        },
        "query": {
            "knn": {
                "embeddings": {
                    "vector": embedding,
                    "k": k,
                }
            }
        },
    }        
    res = es.search(index=index_name, body=body)
    images = []
    for hit in res["hits"]["hits"]:
        id_ = hit["_id"]
        image, item_name = get_image_from_item_id(id_)
        image.name_and_score = f'{hit["_score"]}:{item_name}'
        images.append(image)
    return images
 
def display_images(
    images: [PilImage], 
    columns=2, width=20, height=8, max_images=15, 
    label_wrap_length=50, label_font_size=8):
 
    if not images:
        print("No images to display.")
        return 
 
    if len(images) > max_images:
        print(f"Showing {max_images} images of {len(images)}:")
        images=images[0:max_images]
 
    height = max(height, int(len(images)/columns) * height)
    plt.figure(figsize=(width, height))
    for i, image in enumerate(images):
 
        plt.subplot(int(len(images) / columns + 1), columns, i + 1)
        plt.imshow(image)
 
        if hasattr(image, 'name_and_score'):
            plt.title(image.name_and_score, fontsize=label_font_size); 
            
images = search_products(feature_vector)

results

The first item has a score of 1.0, because the two images are the same. Other items are different types of glasses in the OpenSearch Service index.

You can use text to query the index as well:

feature_vector = encode_name("drinkware glass")
images = search_products(feature_vector)
display_images(images)

results

You’re now able to get three pictures of water glasses from the index. You can find the images and text within the same latent space with the CLIP encoder. Another example of this is to search for the word “pizza” in the index:

feature_vector = encode_name("pizza")
images = search_products(feature_vector)
display_images(images)

pizza results

Clean up

With a pay-per-use model, Serverless Inference is a cost-effective option for an infrequent or unpredictable traffic pattern. If you have a strict service-level agreement (SLA), or can’t tolerate cold starts, real-time endpoints are a better choice. Using multi-model or multi-container endpoints provide scalable and cost-effective solutions for deploying large numbers of models. For more information, refer to Amazon SageMaker Pricing.

We suggest deleting the serverless endpoints when they are no longer needed. After finishing this exercise, you can remove the resources with the following steps (you can delete these resources from the AWS Management Console, or using the AWS SDK or SageMaker SDK):

  1. Delete the endpoint you created.
  2. Optionally, delete the registered models.
  3. Optionally, delete the SageMaker execution role.
  4. Optionally, empty and delete the S3 bucket.

Summary

In this post, we demonstrated how to create a k-NN search application using SageMaker and OpenSearch Service k-NN index features. We used a pre-trained CLIP model from its OpenAI implementation.

The OpenSearch Service ingestion implementation of the post is only used for prototyping. If you want to ingest data from Amazon S3 into OpenSearch Service at scale, you can launch an Amazon SageMaker Processing job with the appropriate instance type and instance count. For another scalable embedding ingestion solution, refer to Novartis AG uses Amazon OpenSearch Service K-Nearest Neighbor (KNN) and Amazon SageMaker to power search and recommendation (Part 3/4).

CLIP provides zero-shot capabilities, which makes it possible to adopt a pre-trained model directly without using transfer learning to fine-tune a model. This simplifies the application of the CLIP model. If you have pairs of product images and descriptive text, you can fine-tune the model with your own data using transfer learning to further improve the model performance. For more information, see Learning Transferable Visual Models From Natural Language Supervision and the CLIP GitHub repository.


About the Authors

Kevin Du is a Senior Data Lab Architect at AWS, dedicated to assisting customers in expediting the development of their Machine Learning (ML) products and MLOps platforms. With more than a decade of experience building ML-enabled products for both startups and enterprises, his focus is on helping customers streamline the productionalization of their ML solutions. In his free time, Kevin enjoys cooking and watching basketball.

Ananya Roy is a Senior Data Lab architect specialised in AI and machine learning based out of Sydney Australia . She has been working with diverse range of customers to provide architectural guidance and help them to deliver effective AI/ML solution via data lab engagement. Prior to AWS , she was working as senior data scientist and dealt with large-scale ML models across different industries like Telco, banks and fintech’s. Her experience in AI/ML has allowed her to deliver effective solutions for complex business problems, and she is passionate about leveraging cutting-edge technologies to help teams achieve their goals.

Read More

Promote search content using Featured Results for Amazon Kendra

Promote search content using Featured Results for Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning (ML). We are excited to announce the launch of Amazon Kendra Featured Results. This new feature makes specific documents or content appear at the top of the search results page whenever a user issues a certain query. You can use Featured Results to improve the visibility of new documents or to promote certain documents when users enter certain queries.

For example, you can specify that if your users enter the query “new products 2023,” then select the documents titled “What’s new” and “Coming soon” will feature at the top of the search results page. Furthermore, if your users frequently use certain queries, you can specify these queries for Featured Results. For example, if you look at your top queries using Amazon Kendra Analytics and find that specific queries such as “How does kendra semantically rank results?” and “kendra semantic search” are frequently used, then it might be useful for the queries to feature the document titled “Amazon Kendra search 101.”

In this post, we introduce Featured Results and show you how to use them.

Overview of solution

Featured results enables you to create direct mappings from exact queries to documents in your index, allowing you to bypass the usual Amazon Kendra ranking process. Amazon Kendra naturally handles keyword type queries to rank the most useful documents in the search results, avoiding excessive featuring of results based on simple keywords. Featured results are designed for specific queries, rather than queries that are too broad in scope. You can experiment with featuring different documents for different queries, or ensure certain documents get the visibility they deserve.

Prerequisites

To follow along, you should have the following prerequisites:

You can skip this step if you have a preexisting index to use for this demo.

Add a sample dataset to your index

Complete the following steps to add sample dataset to your index:

  1. On the Amazon Kendra console, go to your index and choose Data sources.
  2. Choose Add data source.
  3. Under Available data sources, select Sample AWS documentation and choose Add dataset.
  4. Enter a name for your Data source name (such as sample-aws-data) and choose Add data source.

Search without Featured Results

On the Amazon Kendra console, choose Search indexed content. In the query field, start with a query such as “Kendra S3 connectors”.

In search results, “DataSourceConfiguration – Amazon Kendra” is listed as the top search result based on the ranking process. But if you want to promote “Getting started with an Amazon S3 data source (Console) – Amazon Kendra,” you can bypass the Amazon Kendra ranking process to feature this result at the top of the search results page.

Create a Featured Results set

To feature certain results, you must specify an exact match of a full text query, not a partial match of a query using a keyword or phrase contained within a query. For example, if you only specify the query “Kendra” in a featured result set, queries such as “How does Kendra semantically rank results?” will not render the Featured Results. For more information on limits, see Quotas for Amazon Kendra. To create a Featured Results set, complete the following steps:

  1. In the navigation pane, choose Featured results, under Enrichments.
  2. Choose Create set.

  3. Enter a name for your set (such as kendra_connector_feature) and choose Next.
  4. Enter a keyword to find items to feature (kendra s3 connectors).
  5. Select Getting started with an Amazon S3 data source (Console) – Amazon Kendra from the search results.
  6. Choose Next.
  7. Choose Add query.

  8. Enter a query string (such as kendra s3 connectors) and choose Add.
  9. Choose Next.
  10. On the Review and create page, choose Create.

Your Amazon Kendra index is now ready for natural language queries.

Search with Featured Results

On the Amazon Kendra console, choose Search indexed content. In the query field, enter the keyword used in the feature results set kendra s3 connectors.Now, you should see Getting started with an Amazon S3 data source (Console) – Amazon Kendra featured as the top result on the search page

For more information about querying the index, see Querying an Index.

Clean up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created:

  1. On the Amazon Kendra index, choose Indexes in the navigation pane.
  2. Select the index you created and on the Actions menu, choose Delete.
  3. To confirm deletion, enter Delete when prompted and choose Delete.

Wait until you get the confirmation message; the process can take up to 15 minutes.

Conclusion

In this post, you learned how to use Amazon Kendra Featured Results to promote content in an enterprise search solution.

There are many additional features that we didn’t cover. For example:

  • You can enable user-based access control for your Amazon Kendra index, and restrict access to documents based on the access controls you have already configured.
  • You can map object attributes to Amazon Kendra index attributes, and enable them for faceting, search, and display in the search results.
  • You can quickly find information from webpages (HTML tables) using Amazon Kendra tabular search.

To learn more about Amazon Kendra, refer Amazon Kendra Developer Guide.


About the Authors

Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel.

 Kartik Mittal is a Software Engineer at Amazon Web Services, working on Amazon Kendra, an enterprise search engine. Outside of work, he enjoys hiking and loves to travel.

Surya Ram is a Software Engineer at Amazon Web Services, working on Amazon Kendra. Outside of work, he enjoys chess, basketball and cricket.

Read More

Automatic image cropping with Amazon Rekognition

Automatic image cropping with Amazon Rekognition

Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can.

Many publishers have a large library of stock images that they use for their articles. These images can be reused many times for different stories, especially when the publisher has images of celebrities. Quite often, a journalist may need to crop out a desired celebrity from an image to use for their upcoming story. This is a manual, repetitive task that should be automated. Sometimes, an author may want to use an image of a celebrity, but it contains two people and the primary celebrity needs to be cropped from the image. Other times, celebrity images might need to be reformatted for publishing to a variety of platforms like mobile, social media, or digital news. Additionally, an author may need to change the image aspect ratio or put the celebrity in crisp focus.

In this post, we demonstrate how to use Amazon Rekognition to perform image analysis. Amazon Rekognition makes it easy to add this capability to your applications without any machine learning (ML) expertise and comes with various APIs to fulfil use cases such as object detection, content moderation, face detection and analysis, and text and celebrity recognition, which we use in this example.

The celebrity recognition feature in Amazon Rekognition automatically recognizes tens of thousands of well-known personalities in images and videos using ML. Celebrity recognition can detect not just the presence of the given celebrity but also the location within the image.

Overview of solution

In this post, we demonstrate how we can pass in a photo, a celebrity name, and an aspect ratio for the outputted image to be able to generate a cropped image of the given celebrity capturing their face in the center.

When working with the Amazon Rekognition celebrity detection API, many elements are returned in the response. The following are some key response elements:

  • MatchConfidence – A match confidence score that can be used to control API behavior. We recommend applying a suitable threshold to this score in your application to choose your preferred operating point. For example, by setting a threshold of 99%, you can eliminate false positives but may miss some potential matches.
  • Name, Id, and Urls – The celebrity name, a unique Amazon Rekognition ID, and list of URLs such as the celebrity’s IMDb or Wikipedia link for further information.
  • BoundingBox – Coordinates of the rectangular bounding box location for each recognized celebrity face.
  • KnownGender – Known gender identity for each recognized celebrity.
  • Emotions – Emotion expressed on the celebrity’s face, for example, happy, sad, or angry.
  • Pose – Pose of the celebrity face, using three axes of roll, pitch, and yaw.
  • Smile – Whether the celebrity is smiling or not.

Part of the API response from Amazon Rekognition includes the following code:

{
    "CelebrityFaces":
    [
        {
            "Urls":
            [
                "www.wikidata.org/wiki/Q2536951"
            ],
            "Name": "Werner Vogels",
            "Id": "23iZ1oP",
            "Face":
            {
                "BoundingBox":
                {
                    "Width": 0.10331031680107117,
                    "Height": 0.20054641366004944,
                    "Left": 0.5003396272659302,
                    "Top": 0.07391933351755142
                },
                "Confidence": 99.99765014648438,
...

In this exercise, we demonstrate how to use the bounding box element to identify the location of the face, as shown in the following example image. All of the dimensions are represented as ratios of the overall image size, so the numbers in the response are between 0–1. For example, in the sample API response, the width of the bounding box is 0.1, which implies the face width is 10% of the total width of the image.

Werner Vogels Bounding box

With this bounding box, we are now able to use logic to make sure that the face remains within the edges of the new image we create. We can apply some padding around this bounding box to keep the face in the center.

In the following sections, we show how to create the following cropped image output with Werner Vogels in crisp focus.

We launch an Amazon SageMaker notebook, which provides a Python environment where you can run the code to pass an image to Amazon Rekognition and then automatically modify the image with the celebrity in focus.

Werner Vogels cropped

The code performs the following high-level steps:

  1. Make a request to the recognize_celebrities API with the given image and celebrity name.
  2. Filter the response for the bounding box information.
  3. Add some padding to the bounding box such that we capture some of the background.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload the sample image

Upload your sample celebrity image to your S3 bucket.

Run the code

To run the code, we use a SageMaker notebook, however any IDE would also work after installing Python, pillow, and Boto3. We create a SageMaker notebook as well as the AWS Identity and Access Management (IAM) role with the required permissions. Complete the following steps:

  1. Create the notebook and name it automatic-cropping-celebrity.

The default execution policy, which was created when creating the SageMaker notebook, has a simple policy that gives the role permissions to interact with Amazon S3.

  1. Update the Resource constraint with the S3 bucket name:
{
    "Version": "2012-10-17",
    "Statement":
    [
        {
            "Effect": "Allow",
            "Action":
            [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource":
            [
                "arn:aws:s3::: # your-s3-bucket-name "
            ]
        }
    ]
}
  1. Create another policy to add to the SageMaker notebook IAM role to be able to call the RecognizeCelebrities API:
{
    "Version": "2012-10-17",
    "Statement":
    [
        {
            "Effect": "Allow",
            "Action": "rekognition:RecognizeCelebrities",
            "Resource": "*"
        }
    ]
}

IAM permissions

  1. On the SageMaker console, choose Notebook instances in the navigation pane.
  2. Locate the automatic-cropping-celebrity notebook and choose Open Jupyter.
  3. Choose New and conda_python3 as the kernel for your notebook.

Jupyter notebook

For the following steps, copy the code blocks into your Jupyter notebook and run them by choosing Run.

  1. First, we import helper functions and libraries:
import boto3
from PIL import Image
  1. Set variables
bucket = '<YOUR_BUCKET_NAME>'    
file = '<YOUR_FILE_NAME>'
celeb = '<CELEBRITY_NAME>'
aspect_ratio = <ASPECT_RATIO_OF_OUTPUT_IMAGE, e.g. 1 for square>
  1. Create a service client
rek = boto3.client('rekognition')
s3 = boto3.client('s3')
  1. Function to recognize the celebrities
def recognize_celebrity(photo):       

    with open(photo, 'rb') as image:
        response = rek.recognize_celebrities(Image={'Bytes': image.read()})

    image=Image.open(photo)
    file_type=image.format.lower()
    path, ext=image.filename.rsplit(".", 1)
    celeb_faces = response['CelebrityFaces']
        
    print(f'Detected {len(celeb_faces)} faces for {photo}')
    
    return celeb_faces, image, path, file_type
    
  1. Function to get the bounding box of the given celebrity:
def get_bounding_box(celeb_faces, img_width, img_height, celeb):
    bbox = None
    for celebrity in celeb_faces:
        if celebrity['Name'] == celeb:
                    
            box = celebrity['Face']['BoundingBox']    
            left = img_width * box['Left']    
            top = img_height * box['Top']    
            width = img_width * box['Width']    
            height = img_height * box['Height']              
            
            print('Left: ' + '{0:.0f}'.format(left))    
            print('Top: ' + '{0:.0f}'.format(top))    
            print('Face Width: ' + "{0:.0f}".format(width))    
            print('Face Height: ' + "{0:.0f}".format(height))    
                
            #dimenions of famous face inside the bounding boxes    
            x1=left    
            y1=top    
            x2=left+width    
            y2=top+height
            
            bbox = [x1,y1,x2,y2]
            print(f'Bbox coordinates: {bbox}')
    if bbox == None:
        raise ValueError(f"{celeb} not found in results")
            
    return bbox
  1. Function to add some padding to the bounding box, so we capture some background around the face
def pad_bbox(bbox, pad_width=0.5, pad_height=0.3):
    x1, y1, x2, y2 = bbox
    width = x2 - x1
    height = y2 - y1
    
    #dimenions of new image with padding 
    x1= max(x1 - (pad_width * width),0)    
    y1= max(y1 - (pad_height * height),0)  
    x2= max(x2 + (pad_width * width),0)
    y2= max(y2 + (pad_height * height),0)                       
            
    #dimenions of new image with aspect ratio, 1 is square, 1.5 is 6:4, 0.66 is 4:6
                        
    x1= max(x1-(max((y2-y1)*max(aspect_ratio,1)-(x2-x1),0)/2),0)    
    y1= max(y1-(max((x2-x1)*1/(min((aspect_ratio),1))-(y2-y1),0)/2),0) 
    x2= max(x2+(max((y2-y1)*max((aspect_ratio),1)-(x2-x1),0)/2),0)
    y2= max(y2+(max((x2-x1)*1/(min((aspect_ratio),1))-(y2-y1),0)/2),0)
                        
    print('x1-coordinate after padding: ' + '{0:.0f}'.format(x1))    
    print('y1-coordinate after padding: ' + '{0:.0f}'.format(y1))    
    print('x2-coordinate after padding: ' + "{0:.0f}".format(x2))    
    print('y2-coordinate after padding: ' + "{0:.0f}".format(y2))
    
    return [x1,y1,x2,y2]
  1. Function to save the image to the notebook storage and to Amazon S3
def save_image(roi, image, path, file_type):
    
    x1, y1, x2, y2 = roi
    
    image = image.crop((x1,y1,x2,y2))
    
    image.save(f'{path}-cropped.{file_type}')
            
    s3.upload_file(f'{path}-cropped.{file_type}', bucket, f'{path}-cropped.{file_type}')            
        
    return image
  1. Use the Python main() function to combine the preceding functions to complete the workflow of saving a new cropped image of our celebrity:
def main():
    # Download S3 image to local 
    s3.download_file(bucket, file, './'+file)
    
    #Load photo and recognize celebrity
    celeb_faces, img, file_name, file_type = recognize_celebrity(file)
    width, height = img.size
    
    #Get bounding box
    bbox = get_bounding_box(celeb_faces, width, height, celeb)
    
    #Get padded bounding box
    padded_bbox = pad_bbox(bbox)
     
    #Save result and display  
    result = save_image(padded_bbox, img, file_name, file_type)
    display(result)
    
    
if __name__ == "__main__":
    main()

When you run this code block, you can see that we found Werner Vogels and created a new image with his face in the center.

Werner Vogels cropped

The image will be saved to the notebook and also uploaded to the S3 bucket.

Jupyter notebook output

You could include this solution in a larger workflow; for example, a publishing company might want to publish this capability as an endpoint to reformat and resize images on the fly when publishing articles of celebrities to multiple platforms.

Cleaning up

To avoid incurring future charges, delete the resources:

  1. On the SageMaker console, select your notebook and on the Actions menu, choose Stop.
  2. After the notebook is stopped, on the Actions menu, choose Delete.
  3. On the IAM console, delete the SageMaker execution role you created.
  4. On the Amazon S3 console, delete the input image and any output files from your S3 bucket.

Conclusion

In this post, we showed how we can use Amazon Rekognition to automate an otherwise manual task of modifying images to support media workflows. This is particularly important within the publishing industry where speed matters in getting fresh content out quickly and to multiple platforms.

For more information about working with media assets, refer to Media intelligence just got smarter with Media2Cloud 3.0


About the Author

Mark Watkins is a Solutions Architect within the Media and Entertainment team. He helps customers creating AI/ML solutions which solve their business challenges using AWS. He has been working on several AI/ML projects related to computer vision, natural language processing, personalization, ML at the edge, and more. Away from professional life, he loves spending time with his family and watching his two little ones growing up.

Read More

Automate and implement version control for Amazon Kendra FAQs

Automate and implement version control for Amazon Kendra FAQs

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization.

Amazon Kendra FAQs allow users to upload frequently asked questions with their corresponding answers. This helps to consistently answer common queries among end-users. As of this writing, when you want to update FAQs, you must delete the FAQ and create it again. In this post, we present a simpler, faster approach for updating your Amazon Kendra FAQs (with versioning enabled). Our method eliminates the manual steps of creating and deleting FAQs when you update their contents.

Overview of solution

We use a fully deployable AWS CloudFormation template to create an Amazon Simple Storage Service (Amazon S3) bucket, which becomes the source to store your Amazon Kendra FAQs. Each index-based FAQ is maintained in the folder with a prefix relating to the Amazon Kendra index.

This solution uses an AWS Lambda function that gets triggered by an Amazon S3 event notification. When you upload an FAQ to the S3 folder mapped to a specific Amazon Kendra index, it creates a new version of the FAQ for your index. Older versions of FAQs are deleted only after the new FAQ index version is created, achieving near-zero downtime of index searching.

The following figure shows the workflow of how our method creates and deletes a new version of an Amazon Kendra FAQ.

Architecture for Automated FAQ Update for Amazon Kendra

The workflow steps are as follows:

  1. The user uploads the Amazon Kendra FAQ document to the S3 bucket mapped to the Amazon Kendra index.
  2. The Amazon S3 PutObject event triggers the Lambda function, which reads the event details.
  3. The Lambda function creates a new version of the FAQ for the target index for each uploaded document and deletes the older versions of the FAQ.
  4. The Lambda function then publishes a message to Amazon Simple Notification Service (Amazon SNS), which sends an email to the user notifying them that the FAQ has been successfully updated.

Prerequisites

Before you begin the walkthrough, you need an AWS account (if you don’t have one, you can sign up for one). You also need to create the files containing the sample FAQs:

  • basic.csv – The following code is the sample FAQ CSV template:
    How many free clinics are in Spokane WA?, 13, https://www.freeclinics.com/
    How many free clinics are there in Mountain View Missouri?, 7, https://www.freeclinics.com/

  • demo.json – The following code is the sample FAQ JSON template:
    {
      "SchemaVersion": 1,
      "FaqDocuments": [
        {
          "Question": "How many free clinics are in Spokane WA?",
          "Answer": "13"
        },
        {
          "Question": "How many free clinics are there in Mountain View Missouri?",
          "Answer": "7",
          "Attributes": {
            "_source_uri": "https://www.freeclinics.com",
            "_category": "Charitable Clinics"
          }
        }
      ]
    }

  • header_demo.csv – The following code is the sample FAQ CSV template with header:
    _question,_answer,_last_updated_at
    How many free clinics are in Spokane WA?, 13, 2012-03-25T12:30:10+01:00
    How many free clinics are there in Mountain View Missouri?, 7, 2012-03-25T12:30:10+01:00

Deploy the solution

The CloudFormation templates that create the resources used by this solution can found in the GitHub repository. Follow the instructions in the repository to deploy the solution. AWS CloudFormation creates the following resources in your account:

  • An S3 bucket that will be the source for the Amazon Kendra FAQ.
  • An Amazon Kendra index.
  • An AWS Identity and Access Management (IAM) role for the Amazon Kendra FAQ to read (GetObject) from the S3 bucket.
  • A Lambda function that is configured to get triggered by an Amazon S3 event. The function is created outside of an Amazon VPC.

Note that resource creation can take approximately 30 minutes.

After you run the deployment, you’ll receive an email prompting you to confirm the subscription at the approver email address. Choose Confirm subscription.

Amazon SNS subscription Email

You’re redirected to a page confirming your subscription.

SNS Subscription Confirmation

Verify that the Amazon Kendra index is listed on the Amazon Kendra console. In this post, we named the Amazon Kendra index sample-kendra-index.

Amazon Kendra index as seen from the Amazon Kendra console

Upload a sample FAQ document to Amazon S3

In the previous step, you successfully deployed the CloudFormation stack. We use the output of the stack in the following steps:

  1. On the Outputs tab of the CloudFormation stack, note the values for S3Bucket (kendra-faq-<random-stack-id>) and KendraIndex.
    AWS CloudFormation Output
  2. On the Amazon S3 console, navigate to the S3 bucket created from the CloudFormation stack.
  3. Choose Create folder and create a folder called faq-<index-id>. For index-id, use the value you noted for the CloudFormation parameter KendraIndex. After the folder is created, this becomes the prefix for the sample-kendra-index FAQ.
    Create S3 folder prefixed with faq
  4. Upload the demo.json FAQ document to that folder.
    Upload the demo.json FAQ document in that folder

Verify that the index FAQ is created

To confirm that the index FAQ is created, complete the following steps:

  1. On the Amazon Kendra console, navigate to the index sample_kendra_index, which was created as part of the deployment.
  2. Navigate to the FAQs page for this index to check if an FAQ is listed.

The index has the naming convention <file-name>-faq-<Date-Time>.

Resulting FAQ created by the automation solution

When the FAQ is successfully created, you will receive another email informing you about it. You may upload new versions of the FAQ after you have received this email.

Receiving email for successful FAQ creation

Note that the automation identifies the file format that it must use while creating the FAQ by reading the uploaded file extension and as an exception case by the prefix of header_ for the CSV document with a header. The target Amazon Kendra index is identified by the S3 bucket folder name, which has the index ID as the suffix; for example, faq-1f01abb8-341c-4921-ad16-139ee517a845.

Upload additional FAQ documents

Amazon Kendra FAQ supports three types of file format: CSV, CSV_WITH_HEADER, and JSON. Make sure that when you upload a CSV file with the header, the file name should have a prefix with header_ (this is only when using the CSV file format with a header in its contents). To upload your FAQ documents, complete the following steps:

  1. Upload the header_demo.csv file to the same folder.
    Upload the heder_demo.csv FAQ document in that folder
  2. Verify that the FAQ is created on the Amazon Kendra console.
    Verify that the FAQ is created

FAQ creation is case-sensitive to the file format of the FAQ document that you upload. For example, if you upload demo.json and demo.JSON, both are treated as unique objects in Amazon S3. Therefore, this action creates two FAQs, such as demo-json-faq-22-09-2022-20-09-11 and demo-JSON-faq-22-09-2022-20-09-11.

  1. Upload demo.JSON.
    demo.json and demo.JSON are uploaded to the S3 bucket
  2. Verify that the FAQ for demo.JSON is created on the Amazon Kendra console.
    Case sensitive file names result in 2 new FAQs created

Create a new version of the index FAQ

Now the solution is self-sufficient and able to work independently whenever you upload a new version of the FAQ document in Amazon S3.

To test this, upload a new updated version of your demo.json FAQ document to the faq-<index-id> folder. When you navigate to the FAQ for the index, there will be an FAQ named <file-name>-faq-<Date-Time>.

This solution creates a new version of the FAQ for the new version of the FAQ document that was uploaded in Amazon S3. When the FAQ is active, it deletes the older version of the FAQ for the same document.

Verify that only the latest version of the FAQ exists in the index

Create an FAQ with a description

This solution also supports creating an FAQ with a description when files are named in a specific manner: <document_name>-desc-<your faq description>.fileformat[json|csv]. For example, demo-desc-hello world.json. Upload this FAQ document to the faq-<index-id> folder.

Upload the file with the description in its name to S3

After you upload the document, the FAQ will be created and it will have the description as mentioned in the file name.

FAQ created with description

You should only use -desc- when you must add a description to an FAQ. If you upload a file with the same document_name prefix, it will delete the old FAQ created from the document_name.fileformat FAQ document and create a new FAQ with the description.

Clean up

To clean up, perform the following actions:

  1. Empty the S3 bucket that was created by the CloudFormation stack to store the FAQ documents. For instructions, refer to Emptying a bucket.
  2. Delete the CloudFormation stack. For instructions, refer to Deleting a stack on the AWS CloudFormation console.

Conclusion

In this post, we introduced an automated way to manage your Amazon Kendra FAQs. After implementing this solution, you should be able to create and delete FAQs just by uploading them to an S3 bucket. This way, you save time by avoiding repetitive manual changes and troubleshooting inconsistent issues that are caused by unexpected operational incidents. You can also audit Amazon Kendra FAQs across your organization with confidence.

Do you have feedback about this post? Submit your comments in the comments section. You can also post questions on the AWS re:Post forum.


About the Author

debobhadDebojit is a DevOps consultant who specializes in helping customers deliver secure and reliable solutions using AWS services. He concentrates on infrastructure development and building serverless solutions with AWS and DevOps. Apart from work, Debojit enjoys watching movies and spending time with his family.

glennchiGlenn is a Cloud Architect at AWS. He utilizes technology to help customers deliver on their desired outcomes in their cloud adoption journey. His current focus is DevOps and developing open-source software.

shalabhShalabh is a Senior Consultant based in London. His main focus is helping companies deliver secure, reliable, and fast solutions using AWS services. He gets very excited about customers innovating with AWS and DevOps. Outside of work, Shalabh is a cricket fan and a passionate singer.

Read More

Boost your forecast accuracy with time series clustering

Boost your forecast accuracy with time series clustering

Time series are sequences of data points that occur in successive order over some period of time. We often analyze these data points to make better business decisions or gain competitive advantages. An example is Shimamura Music, who used Amazon Forecast to improve shortage rates and increase business efficiency. Another great example is Arneg, who used Forecast to predict maintenance needs.

AWS provides various services catered to time series data that are low code/no code, which both machine learning (ML) and non-ML practitioners can use for building ML solutions. These includes libraries and services like AutoGluon, Amazon SageMaker Canvas, Amazon SageMaker Data Wrangler, Amazon SageMaker Autopilot, and Amazon Forecast.

In this post, we seek to separate a time series dataset into individual clusters that exhibit a higher degree of similarity between its data points and reduce noise. The purpose is to improve accuracy by either training a global model that contains the cluster configuration or have local models specific to each cluster.

We explore how to extract characteristics, also called features, from time series data using the TSFresh library—a Python package for computing a large number of time series characteristics—and perform clustering using the K-Means algorithm implemented in the scikit-learn library.

We use the Time Series Clustering using TSFresh + KMeans notebook, which is available on our GitHub repo. We recommend running this notebook on Amazon SageMaker Studio, a web-based, integrated development environment (IDE) for ML.

Solution overview

Clustering is an unsupervised ML technique that groups items together based on a distance metric. The Euclidean distance is most commonly used for non-sequential datasets. However, because a time series inherently has a sequence (timestamp), the Euclidean distance doesn’t work well when used directly on time series because it’s invariant to time shifts, ignoring the time dimension of data. For a more detailed explanation, refer to Time Series Classification and Clustering with Python. A better distance metric that works directly on time series is Dynamic Time Warping (DTW). For an example of clustering based on this metric, refer to Cluster time series data for use with Amazon Forecast.

In this post, we generate features from the time series dataset using the TSFresh Python library for data extraction. TSFresh is a library that calculates a large number of time series characteristics, which include the standard deviation, quantile, and Fourier entropy, among others. This allows us to remove the time dimensionality of the dataset and apply common techniques that work for data with flattened formats. In addition to TSFresh, we also use StandardScaler, which standardizes features by removing the mean and scaling to unit variance, and Principal component analysis (PCA) to perform dimensionality reduction. Scaling reduces the distance between data points, which in turn promotes stability in the model training process, and dimensionality reduction allows the model to learn from fewer features while retaining the major trends and patterns, thereby enabling more efficient training.

Data loading

For this example, we use the UCI Online Retail II Data Set and perform basic data cleansing and preparation steps as detailed in the Data Cleaning and Preparation notebook.

Feature extraction with TSFresh

Let’s start by using TSFresh to extract features from our time series dataset:

from tsfresh import extract_features
extracted_features = extract_features(
    df_final, 
    column_id="StockCode", 
    column_sort="timestamp")

Note that our data has been converted from a time series to a table comparing StockCode values vs. Feature values.

feature table

Next, we drop all features with n/a values by utilizing the dropna method:

extracted_features_cleaned=extracted_features
extracted_features_cleaned=extracted_features_cleaned.dropna(axis=1)

Then we scale the features using StandardScaler. The values in the extracted features consist of both negative and positive values. Therefore, we use StandardScaler instead of MinMaxScaler:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
extracted_features_cleaned_std = scaler.fit_transform(extracted_features_cleaned)

We use PCA to do dimensionality reduction:

from sklearn.decomposition import PCA
pca = PCA()
pca.fit(extracted_features_cleaned_std)

And we determine the optimal number of components for PCA:

plt.figure(figsize=(20,10))
plt.grid()
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

The explained variance ratio is the percentage of variance attributed to each of the selected components. Typically, you determine the number of components to include in your model by cumulatively adding the explained variance ratio of each component until you reach 0.8–0.9 to avoid overfitting. The optimal value usually occurs at the elbow.

As shown in the following chart, the elbow value is approximately 100. Therefore, we use 100 as the number of components for PCA.

PCA

Clustering with K-Means

Now let’s use K-Means with the Euclidean distance metric for clustering. In the following code snippet, we determine the optimal number of clusters. Adding more clusters decreases the inertia value, but it also decreases the information contained in each cluster. Additionally, more clusters means more local models to maintain. Therefore, we want to have a small cluster size with a relatively low inertia value. The elbow heuristic works well for finding the optimal number of clusters.

from sklearn.cluster import KMeans
wcss = []
for i in range(1,10):
    km = KMeans(n_clusters=i) 
    km.fit(scores_pca)
    wcss.append(km.inertia_)
plt.figure(figsize=(20,10))
plt.grid()
plt.plot(range(1,10),wcss,marker='o',linestyle='--')
plt.xlabel('number of clusters')
plt.ylabel('WCSSS')

The following chart visualizes our findings.

Elbow

Based on this chart, we have decided to use two clusters for K-Means. We made this decision because the within-cluster sum of squares (WCSS) decreases at the highest rate between one and two clusters. It’s important to balance ease of maintenance with model performance and complexity, because although WCSS continues to decrease with more clusters, additional clusters increase the risk of overfitting. Furthermore, slight variations in the dataset can unexpectedly reduce accuracy.

It’s important to note that both clustering methods, K-Means with Euclidian distance (discussed in this post) and K-means algorithm with DTW, have their strengths and weaknesses. The best approach depends on the nature of your data and the forecasting methods you’re using. Therefore, we highly recommend experimenting with both approaches and comparing their performance to gain a more holistic understanding of your data.

Conclusion

In this post, we discussed the powerful techniques of feature extraction and clustering for time series data. Specifically, we showed how to use TSFresh, a popular Python library for feature extraction, to preprocess your time series data and obtain meaningful features.

When the clustering step is complete, you can train multiple Forecast models for each cluster, or use the cluster configuration as a feature. Refer to the Amazon Forecast Developer Guide for information about data ingestion, predictor training, and generating forecasts. If you have item metadata and related time series data, you can also include these as input datasets for training in Forecast. For more information, refer to Start your successful journey with time series forecasting with Amazon Forecast.

References


About the Authors

patrusheAleksandr Patrushev is AI/ML Specialist Solutions Architect at AWS, based in Luxembourg. He is passionate about the cloud and machine learning, and the way they could change the world. Outside work, he enjoys hiking, sports, and spending time with his family.

celimawsChong En Lim is a Solutions Architect at AWS. He is always exploring ways to help customers innovate and improve their workflows. In his free time, he loves watching anime and listening to music.

emiasnikEgor Miasnikov is a Solutions Architect at AWS based in Germany. He is passionate about the digital transformation of our lives, businesses, and the world itself, as well as the role of artificial intelligence in this transformation. Outside of work, he enjoys reading adventure books, hiking, and spending time with his family.

Read More