Plan the locations of green car charging stations with an Amazon SageMaker built-in algorithm

While the fuel economy of new gasoline or diesel-powered vehicles improves every year, green vehicles are considered even more environmentally friendly because they’re powered by alternative fuel or electricity. Hybrid electric vehicles (HEVs), battery only electric vehicles (BEVs), fuel cell electric vehicles (FCEVs), hydrogen cars, and solar cars are all considered types of green vehicles.

Charging stations for green vehicles are similar to the gas pump in a gas station. They can be fixed on the ground or wall and installed in public buildings (shopping malls, public parking lots, and so on), residential district parking lots, or charging stations. They can be based on different voltage levels and charge various types of electric vehicles.

As a charging station vendor, you should consider many factors when building a charging station. The location of charging stations is a complicated problem. Customer convenience, urban setting, and other infrastructure needs are all important considerations.

In this post, we use machine learning (ML) with Amazon SageMaker and Amazon Location Service to provide guidance for charging station vendors looking to choose optimal charging station locations.

Solution overview

In this solution, we focus use SageMaker training jobs to train the cluster model and a SageMaker endpoint to deploy the model. We use an Amazon Location Service display map and cluster result.

We also use Amazon Simple Storage Service (Amazon S3) to store the training data and model artifacts.

The following figure illustrates the architecture of the solution.

Data preparation

GPS data is highly sensitive information because it can be used to track historical movement of an individual. In the following post, we use the tool trip-simulator to generate GPS data that simulates a taxi driver’s driving behavior.

We choose Nashville, Tennessee, as our location. The following script simulates 1,000 agents and generates 14 hours of driving data starting September 15, 2020, 8:00 AM:

trip-simulator 
  --config scooter 
  --pbf nash.osm.pbf 
  --graph nash.osrm 
  --agents 1000 
  --start 1600128000000 
  --seconds 50400 
  --traces ./traces.json 
  --probes ./probes.json 
  --changes ./changes.json 
  --trips ./trips.json

The preceding script generates three output files. We use changes.json. It includes car driving GPS data as well as pickup and drop off information. The file format looks like the following:

{
	"vehicle_id":"PLC-4375",
	"event_time":1600128001000,
	"event_type":"available",
	"event_type_reason":"service_start",
	"event_location":{
					"type":"Feature",
					"properties":{

								},
					"geometry":{
					"type":"Point",
					"coordinates":
								[
								-86.7967066040155,
								36.17115028383999
								]
								}
					}
}

The field event_reason has four main values:

  • service_start – The driver receives a ride request, and drives to the designated location
  • user_pick_up – The driver picks up a passenger
  • user_drop_off – The driver reaches the destination and drops off the passenger
  • maintenance – The driver is not in service mode and doesn’t receive the request

In this post, we only collect the location data with the status user_pick_up and user_drop_off as the algorithm’s input. In real-life situations, you should also consider features such as the passenger’s information and business district information.

Pandas is an extended library of the Python language for data analysis. The following script converts the data from JSON format to CSV format via Pandas:

df=pd.read_json('./data/changes.json', lines=True)
df_event=df.event_location.apply(pd.Series)
df_geo=df_event.geometry.apply(pd.Series)
df_coord=df_geo.coordinates.apply(pd.Series)
result = pd.concat([df, df_coord], axis=1)
result = result.drop("event_location",axis = 1)
result.columns=["vehicle_id","event_time","event_type","event_reason","longitude","latitude"]
result.to_csv('./data/result.csv',index=False,sep=',')

The following table shows our results.

There is noise data in the original GPS data. This includes some pickup and drop-off coordinate points being marked in the lake. The generated GPS data follows uniform distribution without considering business districts, no-stop areas, and depopulated zones. In practice, there is no standard process for data preprocessing. You can simplify the process of data preprocessing and feature engineering with Amazon SageMaker Data Wrangler.

Data exploration

To better to observe and analyze the simulated track data, we use Amazon Location for data visualization. Amazon Location provides frontend SDKs for Android, iOS, and the web. For more information about Amazon Location, see the Developer Guide.

We start by creating a map on the Amazon Location console.

We use the MapLibre GL JS SDK for our map display. The following script displays a map of Nashville, Tennessee, and renders a specific car’s driving route (or trace) line:

async function initializeMap() {
// load credentials and set them up to refresh
await credentials.getPromise();

// Initialize the map
map = new maplibregl.Map({
container: "map",
center:[-86.792845,36.16378],// initial map centerpoint
zoom: 10, // initial map zoom
style: mapName,
transformRequest,
});
});

map.addSource('route', {
'type': 'geojson',
'data': {
'type': 'Feature',
'properties': {},
'geometry': {
'type': 'LineString',
'coordinates': [
				[-86.85009051679292,36.144774042081494],
				[-86.85001827659116,36.14473133061205],
				[-86.85004741661184,36.1446756197635],
				[-86.85007975396945,36.14465452846737],
				[-86.85005249508677,36.14469518290888]
				......
				]
			}
		}
						}
			);

The following graph displays a taxi’s 14-hour driving route.

The following script displays the car’s route distribution:

map.addSource('car-location', {
'type': 'geojson',
'data': {
'type': 'FeatureCollection',
'features': [
{'type': 'Feature','geometry': {'type': 'Point','coordinates': [-86.79417828985571,36.1742558685242]}},
{'type': 'Feature','geometry': {'type': 'Point','coordinates': [-86.76932509874324,36.18006513143749]}},
......
{'type': 'Feature','geometry': {'type': 'Point','coordinates': [-86.84082991448976,36.14558741886923]}}

]
}
});

The following map visualization shows our results.

Algorithm selection

K-means is an unsupervised learning algorithm. It attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.

SageMaker uses a modified version of the web-scale k-means clustering algorithm. Compared to the original version of the algorithm, the version SageMaker uses is more accurate. Like the original algorithm, it scales to massive datasets and delivers improvements in training time. To do this, it streams mini-batches (small, random subsets) of the training data.

The k-means algorithm expects tabular data. In this solution, the GPS coordinate data (longitude, latitude) is the input training data. See the following code:

df = pd.read_csv('./data/result.csv', sep=',',header=0,usecols=['longitude','latitude'])

#routine that converts the training data into protobuf format required for Sagemaker K-means.
def write_to_s3(bucket, prefix, channel, file_prefix, X):
buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, X.astype('float32'))
buf.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, channel, file_prefix + '.data')).upload_fileobj(buf)

#prepare training training and save to S3.
def prepare_train_data(bucket, prefix, file_prefix, save_to_s3=True):
train_data = df.as_matrix()
if save_to_s3:
write_to_s3(bucket, prefix, 'train', file_prefix, train_data)
return train_data

# using the dataset
train_data = prepare_train_data(bucket, prefix, 'train', save_to_s3=True)

# SageMaker k-means ECR images ARNs
images = {'us-west-2': '174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:latest',
'us-east-1': '382416733822.dkr.ecr.us-east-1.amazonaws.com/kmeans:latest',
'us-east-2': '404615174143.dkr.ecr.us-east-2.amazonaws.com/kmeans:latest',
'eu-west-1': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/kmeans:latest'}

image = images[boto3.Session().region_name]

Train the model

Before you train your model, consider the following:

  • Data format – Both protobuf recordIO and CSV formats are supported for training. In this solution, we use protobuf format and File mode as the training data input.
  • EC2 instance selection – AWS suggests using an Amazon Elastic Compute Cloud (Amazon EC2) CPU instance when selecting the k-means algorithm. We use two ml.c5.2xlarge instances for training.
  • Hyperparameters – Hyperparameters are closely related to the dataset; you can adjust them according to the actual situation to get the best results:

    • k – The number of required clusters (k). Because we don’t know the number of clusters in advance, we train many models with different values (k).
    • init_method – The method by which the algorithm chooses the initial cluster centers. A valid value is random or kmeans++.
    • epochs – The number of passes done over the training data. We set this to 10.
    • mini_batch_size – The number of observations per mini-batch for the data iterator. We tried 50, 100, 200, 500, 800, and 1,000 in our dataset.

We train our model with the following code. To get results faster, we start up SageMaker training job concurrently, each training jobs includes two instances. The range of k is between 3 and 16, and each training job will generate a model, the model artifacts are saved in S3 bucket.

K = range(3,16,1) #Select different k, k increased by 1 until 15
INSTANCE_COUNT = 2 #use two CPU instances
run_parallel_jobs = True #make this false to run jobs one at a time, especially if you do not want 
#create too many EC2 instances at once to avoid hitting into limits.
job_names = []

# launching jobs for all k
for k in K:
    print('starting train job:' + str(k))
    output_location = 's3://{}/kmeans_example/output/'.format(bucket) + output_folder
    print('training artifacts will be uploaded to: {}'.format(output_location))
    job_name = output_folder + str(k)

    create_training_params = 
    {
        "AlgorithmSpecification": {
            "TrainingImage": image,
            "TrainingInputMode": "File"
        },
        "RoleArn": role,
        "OutputDataConfig": {
            "S3OutputPath": output_location
        },
        "ResourceConfig": {
            "InstanceCount": INSTANCE_COUNT,
            "InstanceType": "ml.c4.xlarge",
            "VolumeSizeInGB": 20
        },
        "TrainingJobName": job_name,
        "HyperParameters": {
            "k": str(k),
            "feature_dim": "2",
          	"epochs": "100",
            "init_method": "kmeans++",
            "mini_batch_size": "800"
        },
        "StoppingCondition": {
            "MaxRuntimeInSeconds": 60 * 60
        },
            "InputDataConfig": [
            {
                "ChannelName": "train",
                "DataSource": {
                    "S3DataSource": {
                        "S3DataType": "S3Prefix",
                        "S3Uri": "s3://{}/{}/train/".format(bucket, prefix),
                        "S3DataDistributionType": "FullyReplicated"
                    }
                },

                "CompressionType": "None",
                "RecordWrapperType": "None"
            }
        ]
    }

    sagemaker = boto3.client('sagemaker')

    sagemaker.create_training_job(**create_training_params)

Evaluate the model

The number of clusters (k) is the most important hyperparameter in k-means clustering. Because we don’t know the value of k, we can use various methods to find the optimal value of k. In this section, we discuss two methods.

Elbow method

The elbow method is an empirical method to find the optimal number of clusters for a dataset. In this method, we select a range of candidate values of k, then apply k-means clustering using each of the values of k. We find the average distance of each point in a cluster to its centroid, and represent it in a plot. We select the value of k where the average distance falls suddenly. See the following code:

plt.plot()
models = {}
distortions = []
for k in K:
s3_client = boto3.client('s3')
key = 'kmeans_example/output/' + output_folder +'/' + output_folder + str(k) + '/output/model.tar.gz'
s3_client.download_file(bucket, key, 'model.tar.gz')
print("Model for k={} ({})".format(k, key))
!tar -xvf model.tar.gz
kmeans_model=mx.ndarray.load('model_algo-1')
kmeans_numpy = kmeans_model[0].asnumpy()
print(kmeans_numpy)
distortions.append(sum(np.min(cdist(train_data, kmeans_numpy, 'euclidean'), axis=1)) / train_data.shape[0])
models[k] = kmeans_numpy

# Plot the elbow
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('distortion')
plt.title('Elbow graph')
plt.show()

We select a k range from 3–15 and train the model with a built-in k-means clustering algorithm. When the model is fit with 10 clusters, we can see an elbow shape in the graph. This is an optimal cluster number.

Silhouette method

The silhouette method is another method to find the optimal number of clusters and interpretation and validation of consistency within clusters of data. The silhouette method computes silhouette coefficients of each point that measure how much a point is similar to its own cluster compared to other clusters by providing a succinct graphical representation of how well each object has been classified.

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The value of the silhouette ranges between [1, -1], where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters.

First, we must deploy the model and predict the y value as silhouette input:

import json
runtime = boto3.Session().client('runtime.sagemaker')
endpointName="kmeans-30-2021-08-06-00-48-38-963"
response = runtime.invoke_endpoint(EndpointName=endpointName,
ContentType='text/csv',
Body=b"-86.77971153,36.16336978n-86.77971153,36.16336978")
r=response['Body'].read()
response_json = json.loads(r)
y_km=[]
for item in response_json['predictions']:
y_km.append(int(item['closest_cluster']))

Next, we call the silhouette:

import numpy as np
from matplotlib import cm
import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score,silhouette_samples

cluster_labels=np.unique(y_km)
print(cluster_labels)
n_clusters=cluster_labels.shape[0]
silhouette_score_cluster_10=silhouette_score(X, y_km)
print("Silhouette Score When Cluster Number Set to 10: %.3f" % silhouette_score_cluster_10)
silhouette_vals=silhouette_samples(X,y_km,metric='euclidean')
y_ax_lower,y_ax_upper=0,0
yticks=[]
for i,c in enumerate(cluster_labels):
c_silhouette_vals=silhouette_vals[y_km==c]
c_silhouette_vals.sort()
y_ax_upper+=len(c_silhouette_vals)
color=cm.jet(float(i)/n_clusters)
plt.barh(range(y_ax_lower,y_ax_upper),
c_silhouette_vals,
height=1.0,
edgecolor='none',
color=color)
yticks.append((y_ax_lower+y_ax_upper)/2.0)
y_ax_lower+=len(c_silhouette_vals)

silhouette_avg=np.mean(silhouette_vals)
plt.axvline(silhouette_avg,
color='red',
linestyle='--')
plt.yticks(yticks,cluster_labels+1)
plt.ylabel("Cluster")
plt.xlabel("Silhouette Coefficients k=10,Score=%.3f" % silhouette_score_cluster_10)
plt.savefig('./figure.png')
plt.show()

When the silhouette score is closer to 1, it means clusters are well apart from each other. In the following experiment result, when k is set to 8, each cluster is well apart from each other.

We can use different model evaluation methods to get different values for the best k. In our experiment, we choose k=10 as optimal clusters.

Now we can display the k-means clustering result via Amazon Location. The following code marks selected locations on the map:

new maplibregl.Marker().setLngLat([-86.755974, 36.19235]).addTo(map);
new maplibregl.Marker().setLngLat([-86.710972, 36.203389]).addTo(map);
new maplibregl.Marker().setLngLat([-86.733895, 36.150209]).addTo(map);
new maplibregl.Marker().setLngLat([-86.795974, 36.165639]).addTo(map);
new maplibregl.Marker().setLngLat([-86.786743, 36.222799]).addTo(map);
new maplibregl.Marker().setLngLat([-86.701209, 36.267679]).addTo(map);
new maplibregl.Marker().setLngLat([-86.820134, 36.209863]).addTo(map);
new maplibregl.Marker().setLngLat([-86.769743, 36.131246]).addTo(map);
new maplibregl.Marker().setLngLat([-86.803346, 36.142358]).addTo(map);
new maplibregl.Marker().setLngLat([-86.833890, 36.113466]).addTo(map);

The following map visualization shows our results, with 10 clusters.

We also need to consider the scale of the charging station. Here, we divide the number of points around the center of each cluster by a coefficient (for example, the coefficient value is 100, which means every 100 cars share a charger pile). The following visualization includes charging station scale.

Conclusion

In this post, we explained an end-to-end scenario for creating a clustering model in SageMaker based on simulated driving data. The solution includes training an MXNet model and creating an endpoint for real-time model hosting. We also explained how you can display the clustering results via the Amazon Location SDK.

You should also consider charging type and quantity. Plug-in charging is categorized by voltage and power levels, leading to different charging times. Slow charging usually takes several hours to charge, whereas fast charging can achieve a 50% charge in 10–15 minutes. We cover these factors in a later post.

Many other industries are also affected by location planning problems, including retail stores and warehouses. If you have feedback about this post, submit comments in the Comments section below.


About the Author

Zhang Zheng is a Sr. Partner Solutions Architect with AWS, helping industry partners on their journey to well-architected machine learning solutions at scale.

Read More

AWS computer vision and Amazon Rekognition: AWS recognized as an IDC MarketScape Leader in Asia Pacific (excluding Japan), up to 38% price cut, and major new features

Computer vision, the automatic recognition and description of documents, images, and videos, has far-reaching applications, from identifying defects in high-speed assembly lines, to intelligently automating document processing workflows, and identifying products and people in social media. AWS computer vision services, including Amazon Lookout for Vision, AWS Panorama, Amazon Rekognition, and Amazon Textract, help developers automate image, video, and text analysis without requiring machine learning (ML) experience. As a result, you can implement solutions faster and decrease your time to value.

As customers continue to expand their use of computer vision, we have been investing in all of our services to make them easier to apply to use cases, easier to implement with fewer data requirements, and more cost-effective. Recently, AWS was named a Leader in the IDC MarketScape: Asia/Pacific (Excluding Japan) Vision AI Software Platform 2021 Vendor Assessment (Doc # AP47490521, October 2021). The IDC MarketScape evaluated our product functionality, service delivery, research and innovation strategy, and more for three vision AI use cases: productivity, end-user experience, and decision recommendation. They found that our offerings have a product-market fit for all three use cases. The IDC MarketScape recommends that computer vision decision-makers consider AWS for Vision AI services when you need to centrally plan vision AI capabilities in a large-scope initiative, such as digital transformation (DX), or want flexible ways to control costs.

“Vision AI is one of the emerging technology markets,” says Christopher Lee Marshall, Associate Vice President, Artificial Intelligent and Analytics Strategies at IDC Asia Pacific. “AWS is placed in the Leader’s Category in IDC MarketScape: Asia/Pacific (Excluding Japan) Vision AI Software Platform 2021 Vendor Assessment. It’s critical to watch the major vendors and more mature market solutions, as the early movers tend to consolidate their strengths with greater access to training data, more iterations of algorithm variations, deeper understanding of the operational contexts, and more systematic approaches to work with solution partners in the ecosystem.”

A key service of focus in the report was Amazon Rekognition. We’re excited to announce several enhancements to make Amazon Rekognition more cost-effective, more accurate, and easier to implement. First, we’re lowering prices for image APIs. Next, we’re enriching Amazon Rekognition with new features for content moderation, text-in-image analysis, and automated machine learning (AutoML). The new capabilities enable more accurate content moderation workflows, optical character recognition for a broader range of scenarios, and simplified training and deployment of custom computer vision models.

These latest announcements add to the Amazon Textract innovations we introduced recently, where we added TIFF file support, lowered the latency of asynchronous operations by 50%, and reduced prices by up to 32% in eight AWS Regions. The Amazon Textract innovations make it easier, faster, and less expensive to process documents at scale using computer vision on AWS.

Let’s dive deeper into the Amazon Rekognition announcements and product improvements.

Up to 38% price reduction for Amazon Rekognition Image APIs

We want to help you get a better return on investment for computer vision workflows. Therefore, we’re lowering the price for all Amazon Rekognition Image APIs by up to 38%. This price reduction applies to all 14 Regions where the Amazon Rekognition service endpoints are available.

We offer four pricing tiers based on usage volume for Amazon Rekognition Image APIs today: up to 1 million, 1 – 10M, 10 – 100M, and above 100M images processed per month. The price points for these tiers are $0.001, $0.0008, $0.0006, and $0.0004 per image. With this price reduction, we lowered the API volumes that unlock lower prices:

  • We lowered the threshold from 10 million images per month to 5 million images per month for Tier 2. As a result, you can now benefit from a lower Tier 3 price of $0.0006 per image after 5 million images.
  • We lowered the Tier 4 threshold from 100 million images per month to 35 million images per month.

We summarize the volume threshold changes in the following table.

Old volume (images processed per month) New volume (images processed per month)
Tier 1 Unchanged at first 1 million images
Tier 2 Next 9 million images Next 4 million images
Tier 3 Next 90 million images Next 30 million images
Tier 4 Over 100 million images Over 35 million images

Finally, we’re lowering the price per image for the highest-volume tier from $0.0004 to $0.00025 per image for select APIs. The prices in the following table are for the US East (N. Virginia) Region. In summary, the new prices are as follows.

Pricing tier Volume (images per month) Price per image
Images processed by Group 1 APIs: CompareFaces, IndexFaces, SearchFacebyImage, and SearchFaces Images processed by Group 2 APIs: DetectFaces, DetectModerationLabels, DetectLabels, DetectText, and RecognizeCelebrities
Tier 1 First 1 million images $0.00100 $0.00100
Tier 2 Next 4 million images $0.00080 $0.00080
Tier 3 Next 30 million images $0.00060 $0.00060
Tier 4 Over 35 million images $0.00040 $0.00025

Your savings will vary based on your usage. The following table provides example savings for a few scenarios in the US East (N. Virginia) Region.

API Volumes Group 1 & 2 Image APIs: Old Price Group 1 Image APIs Group 2 Image APIs
New Price % Reduction New Price % Reduction
12 Million in a month $9,400 $8,400 -10.6% $8,400 -10.6%
12M Annual (1M in a month) $12,000 $12,000 0.0% $12,000 0.0%
60M in a month $38,200 $32,200 -15.7% $28,450 -25.5%
60M Annual (5M in a month) $50,400 $50,400 0.0% $50,400 0.0%
120M in a month $70,200 $56,200 -19.9% $43,450 -38.1%
120M Annual (10M in a month) $98,400 $86,400 -12.2% $86,400 -12.2%
420M in a month $190,200 $176,200 -7.4% $118,450 -37.7%
420M Annual (35M in a month) $278,400 $266,400 -4.3% $266,400 -4.3%
1.2 Billion in a month $502,200 $488,200 -2.8% $313,450 -37.6%
1.2B Annual (100M in a month) $746,400 $578,400 -22.5% $461,400 -38.2%

Learn more about the price reduction by visiting the pricing page.

Accuracy improvements for content moderation

Organizations need a scalable solution to make sure users aren’t exposed to inappropriate content from user-generated and third-party content in social media, ecommerce, and photo-sharing applications.

The Amazon Rekognition Content Moderation API helps you automatically detect inappropriate or unwanted content to streamline moderation workflows.

With the Amazon Rekognition Content Moderation API, you now get improved accuracy across all ten top-level categories (such as explicit nudity, violence, and tobacco) and all 35 subcategories.

The improvements in image model moderation reduce false positive rates across all moderation categories. Lower false positive rates lead to lower volumes of images flagged for further review by human moderators, reducing their workload and improving efficiency. When combined with a price reduction for image APIs, you get more value for your content moderation solution at lower prices. Learn more about the improved Content Moderation API by visiting Moderating content.

11 Street is an online shopping company. They’re using Amazon Rekognition to automate the review of images and videos. “As part of 11st’s interactive experience, and to empower our community to express themselves, we have a feature where users can submit a photo or video review of the product they have just purchased. For example, a user could submit a photo of themselves wearing the new makeup they just bought. To make sure that no images or videos contain content that is prohibited by our platform guidelines, we originally resorted to manual content moderation. We quickly found that this was costly, error-prone, and not scalable. We then turned to Amazon Rekognition for Content Moderation, and found that it was easy to test, deploy, and scale. We are now able to automate the review of more than 7,000 uploaded images and videos every day with Amazon Rekognition, saving us time and money. We look forward to the new model update that the Amazon Rekognition team is releasing soon.” – 11 Street Digital Transformation team

Flipboard is a content recommendation platform that enables publishers, creators, and curators to share stories with readers to help them stay up to date on their passions and interests. Says Anuj Ahooja, Senior Engineering Manager at Flipboard: “On average, Flipboard processes approximately 90 million images per day. To maintain a safe and inclusive environment and to confirm that all images comply with platform guidelines at scale, it is crucial to implement a content moderation workflow using ML. However, building models for this system internally was labor-intensive and lacked the accuracy necessary to meet the high-quality standards Flipboard users expect. This is where Amazon Rekognition became the right solution for our product. Amazon Rekognition is a highly accurate, easily deployed, and performant content moderation platform that provides a robust moderation taxonomy. Since putting Amazon Rekognition into our workflows, we’ve been catching approximately 63,000 images that violate our standards per day. Moreover, with frequent improvements like the latest content moderation model update, we can be confident that Amazon Rekognition will continue to help make Flipboard an even more inclusive and safe environment for our users over time.”

Yelp connects people with great local businesses. With unmatched local business information, photos, and review content, Yelp provides a one-stop local platform for consumers to discover, connect, and transact with local businesses of all sizes by making it easy to request a quote, join a waitlist, and make a reservation, appointment, or purchase. Says Alkis Zoupas, Head of Trust and Safety Engineering at Yelp: “Yelp’s mission is to connect people with great local businesses, and we take significant measures to give people access to reliable and useful information. As part of our multi-stage, multi-model approach to photo classification, we use Amazon Rekognition to tune our systems for various outcomes and levels of filtering. Amazon Rekognition has helped reduce development time, allowing us to be more effective with our resource utilization and better prioritize what our teams should focus on.”

Support for seven more languages and accuracy improvements for text analysis

Customers use the Amazon Rekognition text service for a variety of applications, such as ensuring compliance of images with corporate policies, analysis of marketing assets, and reading street signs. With the Amazon Rekognition DetectText API, you can detect text in images and check it against your list of inappropriate words and phrases. In addition, you can further enable content redaction by using the detected text bounding box area to blur sensitive information.

The newest version of the DetectText API now supports Arabic, French, German, Italian, Portuguese, Russian, and Spanish languages in addition to English. The DetectText API also provides improved accuracy for detecting curved and vertical text in images. With the expanded language support and higher accuracy for curved and vertical text, you can scale and improve your content moderation, text moderation, and other text detection workflows.

OLX Group is one of the world’s fastest-growing networks of trading platforms, with operations in over 30 countries and over 20 brands worldwide. Says Jaroslaw Szymczak, Data Science Manager at OLX Group: “As a leader in the classifieds marketplace sector, and to foster a safe, inclusive, and vibrant buying and selling community, it is paramount that we make sure that all products listed on our platforms comply with our rules for product display and authenticity. To do that, among other aspects of the ads, we have placed focus on analyzing the non-organic text featured on images uploaded by our users. We tested Amazon Rekognition’s text detection functionality for this purpose and found that it was highly accurate and augmented our in-house violations detection systems, helping us improve our moderation workflows. Using Amazon Rekognition for text detection, we were able to flag 350,000 policy violations last year. It has also helped us save significant amounts in development costs and has allowed us to refocus data science time on other projects. We are very excited about the upcoming text model update as it will even further expand our capabilities for text analysis.”

VidMob is a leading creative analytics platform that uses data to understand the audience, improve ads, and increase marketing performance. Says James Kupernick, Chief Technology Officer at VidMob: “At VidMob, our goal is to maximize ROI for our customers by leveraging real-time insights into creative content. We have been working with the Amazon Rekognition team for years to extract meaningful visual metadata from creative content, helping us drive data-driven outcomes for our customers. It is of the utmost importance that our customers get actionable data signals. In turn, we have used Amazon Rekognition’s text detection feature to determine when there is overlaid text in a creative and classify that text in a way that creates unique insights. We can scale this process using the Amazon Rekognition Text API, allowing our data science and engineers teams to create differentiated value. In turn, we are very excited about the new text model update and the addition of new languages so that we can better support our international clients.”

Simplicity and scalability for AutoML

Amazon Rekognition Custom Labels is an AutoML service that allows you to build custom computer vision models to detect objects and scenes in images specific to your business needs. For example, with Rekognition Custom Labels, you can develop solutions for detecting brand logos, proprietary machine parts, and items on store shelves without the need for in-depth ML expertise. Instead, your critical ML experts can continue working on higher-value projects.

With the new capabilities in Rekognition Custom Labels, you can simplify and scale your workflows for custom computer vision models.

First, you can train your computer vision model in four simple steps with a few clicks. You get a guided step-by-step console experience with directions for creating projects, creating image datasets, annotating and labeling images, and training models.

Next, we improved our underlying ML algorithms. As a result, you can now build high-quality models with less training data to detect vehicles, their make, or possible damages to vehicles.

Finally, we have introduced seven new APIs to make it even easier for you to build and train computer vision models programmatically. With the new APIs, you can do the following:

  • Create, copy, or delete datasets
  • List the contents and get details of the datasets
  • Modify datasets and auto-split them to create a test dataset

For more information, visit the Rekognition Custom Labels Guide.

Prodege, LLC is a cutting-edge marketing and consumer insights platform that leverages its global audience of reward program members to power its business solutions. Prodege uses Rekognition Custom Labels to detect anomalies in store receipts. Says Arun Gupta, Director, Business Intelligence at Prodege: “By using Rekognition Custom Labels, Prodege was able to detect anomalies with high precision across store receipt images being uploaded by our valued members as part of our rewards program offerings. The best part of Rekognition Custom Labels is that it’s easy to set up and requires only a small set of pre-classified images (a couple of hundred in our case) to train the ML model for high confidence image detection. The model’s endpoints can be easily accessed using the API. Rekognition Custom Labels has been an extremely effective solution to enable the smooth functioning of our validated receipt scanning product and helped us save a lot of time and resources performing manual detection. The new console experience of Rekognition Custom Labels has made it even easier to build and train a model, especially with the added capability of updating and deleting an existing dataset. This will significantly improve our constant iteration of training models as we grow and add more data in the pursuit of enhancing our model performance. I can’t even thank the AWS Support Team enough, who has been diligently helping us with all aspects of the product through this journey.”

Says Arnav Gupta, Global AWS Practice Lead at Quantiphi: “As an advanced consulting partner for AWS, Quantiphi has been leveraging Amazon’s computer vision services such as Amazon Rekognition and Amazon Textract to solve some of our customer’s most pressing business challenges. The simplified and guided experience offered by the updated Rekognition Custom Labels console and the new APIs has made it easier for us to build and train computer vision models, significantly reducing the time to deliver solutions from months to weeks for our customers. We have also built our document processing solution Qdox on top of Amazon Textract, which has enabled us to provide our own industry-specific document processing solutions to customers.”

Get started with Amazon Rekognition

With the new features we’re announcing today, you can increase the accuracy of your content moderation workflows, deploy text moderation solutions across a broader range of scenarios and languages, and simplify your AutoML implementation. In addition, you can use the price reduction on the image APIs to analyze more images with your existing budget. Use one or more of the following options to get started today:


About the Author

Roger Barga is the GM of Computer Vision at AWS.

Read More

AWS BugBust sets the Guinness World Record for the largest bug fixing challenge

AWS BugBust is the first global bug-busting challenge for developers to eliminate 1 million software bugs and save $100 million in technical debt for their organizations. AWS BugBust allows you to create and manage private events that transform and gamify the process of finding and fixing bugs in your software. With automated code analysis, built-in leaderboards, custom challenges, and rewards, AWS BugBust helps foster team building and introduces some friendly competition into improving code quality and application performance.

AWS BugBust utilizes the machine learning (ML)-powered developer tools in Amazon CodeGuru—CodeGuru Reviewer and CodeGuru Profiler—to automatically scan your code to weed out gnarly bugs, and gamifies fixing and eliminating them.

Since launch in June 2021, thousands of Java and Python developers have participated in AWS BugBust events hosted internally by their organizations. They have used their coding skills to collectively fix over 33,000 software bugs and helped organizations save approximately $4 million in technical debt—all while earning points and exclusive prizes.

Take a look at how the students of Miami Dade College took on the challenge to improve code quality using AWS BugBust for Games for Love, a 501(c)(3) public charity dedicated to easing suffering, saving lives, and creating sustainable futures for children.

First annual AWS BugBust re:Invent challenge

To increase the impact of the AWS BugBust events, we launched the first annual AWS BugBust re:Invent challenge—an open competition for Java and Python developers to help fix bugs in open-source code bases. Beginning November 29, 2021, at 10 AM PST, thousands of Java and Python developers including enthusiasts, students, and professionals settled into a 76-hour session to fix bugs, earn points, and win an array of prizes such as hoodies, Amazon Echo Dots, as well as the coveted title of the Ultimate AWS BugBuster, accompanied by a cash prize of $1,500. The mammoth challenge included 613 developers of all skills levels, participating on-site at AWS re:Invent and virtually in over 20 countries. The event captured the Guinness World Record for the largest bug fixing challenge and helped the open-source community by providing 30,667 software bug fixes.

As part of the challenge, participants tackled and fixed a myriad of software bugs ranging from security issues, to duplicate code, to resource leaks and more. For each bug that a participant fixed, they received points based on the complexity of the bug. Each bug fix submitted was verified by CodeGuru to determine if the issue was resolved.

Join the mission to exterminate 1 million bugs today!

Get started by going head-to-head with your teammates in your own private AWS BugBust events. With just a few clicks, you can set up private AWS BugBust virtual events quickly and easily on the AWS Management Console, with built-in leaderboards, challenges, and rewards. BugBusters (your developers) from around the world can join your BugBust events to fix as many bugs as possible, score points, and contribute to busting 1 million bugs—and compete for prizes and prestige by fixing bugs in their applications.

Watch AWS customer Nextroll’s experience hosting an AWS BugBust event for their developers to eliminate software bugs and improve application reliability for their organization.

 You can get started with AWS BugBust at no cost. When you create your first AWS BugBust event, all costs incurred by the underlying usage of CodeGuru Reviewer and CodeGuru Profiler are free of charge for 30 days per AWS account. See the CodeGuru pricing page for details.


About the Authors

Sama Bali is a Product Marketing Manager within the AWS AI Services team.

Jordan Gruber is a Product Manager-Technical within the AWS AI-DevOps team.

Read More

Announcing support for extracting data from identity documents using Amazon Textract

Creating efficiencies in your business is at the top of your list. You want your employees to be more productive, have them focus on high impact tasks, or find ways to implement better processes to improve the outcomes to your customers. There are various ways to solve this problem, and more companies are turning to artificial intelligence (AI) and machine learning (ML) to help. In the financial services sector, there is the creation of new accounts online, or in healthcare there are new digital platforms to schedule and manage appointments, which require users to fill out forms. These can be error prone, time consuming, and certainly improved upon. Some businesses (or organizations) have attempted to simplify and automate this process by including identity document uploads, such as a drivers’ license or passport. However, the technology available is template-based and doesn’t scale well. You need a solution to help automate the extraction of information from identity documents to enable your customers to open bank accounts with ease, or schedule and manage appointments online using accurate information.

Today, we are excited to announce a new API to Amazon Textract called Analyze ID that will help you automatically extract information from identification documents, such as driver’s licenses and passports. Amazon Textract uses AI and ML technologies to extract information from identity documents, such as U.S. passports and driver’s licenses, without the need for templates or configuration. You can automatically extract specific information, such as date of expiry and date of birth, as well as intelligently identify and extract implied information, such as name and address.

We will cover the following topics in this post:

  • How Amazon Textract processes identity documents
  • A walkthrough of the Amazon Textract console
  • Structure of the Amazon Textract AnalyzeID API response
  • How to process the response with the Amazon Textract parser library

Identity Document processing using Amazon Textract

Companies have accelerated the adoption of digital platforms, especially in light of the COVID-19 pandemic. Organizations are now offering their users the flexibility to use smartphones and other mobile devices for everyday tasks—such as signing up for new accounts, scheduling appointments, completing employment applications online, and many more. Even though your users fill out an online form with personal and demographic information, the process is manual and error-prone, and it can affect the application decision if submitted incorrectly. Some of you have simplified and automated the online application process by asking your users to upload a picture of their ID, and then use market solutions to extract data and prefill the applications automatically. This automation can help you minimize data entry errors and potentially reduce end user abandonments in application completions. However, even the current market solutions are limited in what they can achieve. They often fall short when extracting all of the required fields accurately due to the rich background image on IDs or the inability to recognize names and addresses and the fields associated with them. For example, the Washington State driver license lists home addresses with the key “8”. Another major challenge with the current market solutions is that IDs have a different template or format depending on the issuing country and state, and even those can change from time-to-time. Therefore, the traditional template-based solutions do not work at scale. Even traditional OCR solutions are expensive and slow, especially when combined with human reviews, and they don’t move the needle in digital automation. These approaches provide poor results, thereby inhibiting your organization from scaling and becoming efficient. You need a solution to help automate the extraction of information from identity documents to enable your customers to open bank accounts with ease, or schedule and manage appointments online with accurate information.

To solve this problem, you can now use Amazon Textract’s newly launched Analyze ID API, powered by ML instead of a traditional template matching solution, to process identity documents at scale. It works with U.S. driver’s licenses and passports to extract relevant data, such as name, address, date of birth, date of expiry, place of issue, etc. Analyze ID API returns two categories of data types: (A) Key-value pairs available on IDs, such as Date of Birth, Date of Issue, ID #, Class, Height, and Restrictions. (B) Implied fields on the document that may not have explicit keys, such as Name, Address, and Issued By. The key-value pairs are also normalized into a common taxonomy (for example, Document ID number = LIC# or Passport No.). This lets you easily combine information across many IDs that use different terms for the same concept.

Amazon Textract console walkthrough

Before we get started with the API and code samples, let’s review the Amazon Textract console. The following images show examples of a passport and a drivers’ license document on the Analyze Document output tab of the Amazon Textract console. Amazon Textract automatically and easily extracts key-value elements, such as the type, code, passport number, surname, given name, nationality, date of birth, place of birth, and more fields, from the sample image.

The following is another example with a sample drivers’ license. Analyze ID extracts key-value elements such as class, as well as implied fields such as first name, last name, and address. It also normalizes keys, such as “Document number” from “4d NUMBER” as “820BAC729CBAC”, and “Date of birth” from “DOB” as “03/18/1978”, so that it is standardized across IDs.

AnalyzeID API request

In this section, we explain how to pass the ID image in the request and how to invoke the Analyze ID API. The input document is either in a byte array format or present on an Amazon Simple Storage Service (Amazon S3) object. You pass image bytes to an Amazon Textract API operation by using the Bytes property. For example, you can use the Bytes property to pass a document loaded from a local file system. Image bytes passed by using the Bytes property must be base64 encoded. Your code might not need to encode document file bytes if you’re using an AWS SDK to call Amazon Textract API operations. Alternatively, you can pass images stored in an S3 bucket to an Amazon Textract API operation by using the S3Object property. Documents stored in an S3 bucket don’t need to be base64 encoded.

The following examples show how to call the Amazon Textract AnalyzeID function in Python and use the CLI command.

Sample Python code:

import boto3

textract = boto3.client('textract')

# Call textract AnalyzeId by passing photo on local disk
documentName = "us-driver-license.jpeg"
with open(documentName, 'rb') as document:
    imageBytes = bytearray(document.read())

response = textract.analyze_id(
    DocumentPages=[{"Bytes":imageBytes}]
)

# Call textract AnalyzeId by passing photo on S3
response= textract.analyze_id(
    DocumentPages=[
        {
            "S3Object":{
                "Bucket":"BUCKET_NAME",
                "Name":"PREFIX_AND_FILE_NAME"
            }
        }
    ]
)

Sample CLI command:

aws textract analyze-id --document-pages '[{"S3Object":{"Bucket":"BUCKET_NAME","Name":"PREFIX_AND_FILE_NAME1"}},{"S3Object":{"Bucket":"BUCKET_NAME","Name":"PREFIX_AND_FILE_NAME2"}}]' --region us-east-1

Analyze ID API response

In this section, we explain the Analyze ID response structure using the sample passport image. The following is the sample passport image and the corresponding AnalyzeID response JSON.

Sample abbreviated response

{
  "IdentityDocuments": [
    {
      "DocumentIndex": 1,
      "IdentityDocumentFields": [
        {
          "Type": {
            "Text": "FIRST_NAME"
          },
          "ValueDetection": {
            "Text": "LI",
            "Confidence": 98.9061508178711
          }
        },
        {
          "Type": {
            "Text": "LAST_NAME"
          },
          "ValueDetection": {
            "Text": "JUAN",
            "Confidence": 99.0864486694336
          }
        },
        {
          "Type": {
            "Text": "DATE_OF_ISSUE"
          },
          "ValueDetection": {
            "Text": "09 MAY 2019",
            "NormalizedValue": {
              "Value": "2019-05-09T00:00:00",
              "ValueType": "Date"
            },
            "Confidence": 98.68514251708984
          }
        },
        {
          "Type": {
            "Text": "ID_TYPE"
          },
          "ValueDetection": {
            "Text": "PASSPORT",
            "Confidence": 99.3958740234375
          }
        },
        {
          "Type": {
            "Text": "ADDRESS"
          },
          "ValueDetection": {
            "Text": "",
            "Confidence": 99.62577819824219
          }
        },
        {
          "Type": {
            "Text": "COUNTY"
          },
          "ValueDetection": {
            "Text": "",
            "Confidence": 99.6469955444336
          }
        },
        {
          "Type": {
            "Text": "PLACE_OF_BIRTH"
          },
          "ValueDetection": {
            "Text": "NEW YORK CITY",
            "Confidence": 98.29044342041016
          }
        }
      ]
    }
  ],
  "DocumentMetadata": {
    "Pages": 1
  },
  "AnalyzeIDModelVersion": "1.0"
}

The AnalyzeID JSON output contains AnalyzeIDModelVersionDocumentMetadata and IdentityDocuments, and each IdentityDocument item contains IdentityDocumentFields.

The most granular level of data in the IdentityDocumentFields response consists of Type and ValueDetection.

Let’s call this set of data an IdentityDocumentField element. The preceding example illustrates an AnalyzeDocument containing the Type with the Text and Confidence, and the ValueDetection which includes the Text, the Confidence, and the optional field NormalizedValue.

In the preceding example, Amazon Textract detected 44 key-value pairs, including PLACE_OF_BIRTH: New York City For the list of fields extracted from identity documents, refer to the Amazon Textract Developer Guide.

In addition to the detected content, the Analyze ID API provides information such as confidence scores for detected elements. It gives you control over how you consume extracted content and integrate it into your applications. For example, you can flag any elements that have a confidence score under a certain threshold for manual review.

The following is the Analyze ID response structure using the sample driving license image:

Sample abbreviated response

{
  "IdentityDocuments": [
    {
      "DocumentIndex": 1,
      "IdentityDocumentFields": [
        {
          "Type": {
            "Text": "FIRST_NAME"
          },
          "ValueDetection": {
            "Text": "GARCIA",
            "Confidence": 99.48689270019531
          }
        },
        {
          "Type": {
            "Text": "LAST_NAME"
          },
          "ValueDetection": {
            "Text": "MARIA",
            "Confidence": 98.49578857421875
          }
        },
        {
          "Type": {
            "Text": "STATE_NAME"
          },
          "ValueDetection": {
            "Text": "MASSACHUSETTS",
            "Confidence": 98.30329132080078
          }
        },
        {
          "Type": {
            "Text": "DOCUMENT_NUMBER"
          },
          "ValueDetection": {
            "Text": "736HDV7874JSB",
            "Confidence": 95.6583251953125
          }
        },
        {
          "Type": {
            "Text": "EXPIRATION_DATE"
          },
          "ValueDetection": {
            "Text": "01/20/2028",
            "NormalizedValue": {
              "Value": "2028-01-20T00:00:00",
              "ValueType": "Date"
            },
            "Confidence": 98.64090728759766
          }
        },
        {
          "Type": {
            "Text": "DATE_OF_ISSUE"
          },
          "ValueDetection": {
            "Text": "03/18/2018",
            "NormalizedValue": {
              "Value": "2018-03-18T00:00:00",
              "ValueType": "Date"
            },
            "Confidence": 98.7216567993164
          }
        },
        {
          "Type": {
            "Text": "ID_TYPE"
          },
          "ValueDetection": {
            "Text": "DRIVER LICENSE FRONT",
            "Confidence": 98.71986389160156
          }
        },
        {
          "Type": {
            "Text": "PLACE_OF_BIRTH"
          },
          "ValueDetection": {
            "Text": "",
            "Confidence": 99.62541198730469
          }
        }
      ]
    }
  ],
  "DocumentMetadata": {
    "Pages": 1
  },
  "AnalyzeIDModelVersion": "1.0"
}

Process Analyze ID response with the Amazon Textract parser library

You can use the Amazon Textract response parser library to easily parse the JSON returned by Amazon Textract AnalyzeID. The library parses JSON and provides programming language specific constructs to work with different parts of the document.

Install the Amazon Textract Response Parser library:

python -m pip install amazon-textract-response-parser

The following example shows how to deserialize Textract AnalyzeID JSON response to an object:

# j holds the Textract response JSON
from trp.trp2_analyzeid import TAnalyzeIdDocumentSchema
t_doc = TAnalyzeIdDocumentSchema().load(json.loads(j))

The following example shows how to serialize a Textract AnalyzeId object to dictionary:

from trp.trp2_analyzeid import TAnalyzeIdDocumentSchema
t_doc = TAnalyzeIdDocumentSchema().dump(t_doc)

Summary

In this post, we provided an overview of the new Amazon Textract AnalyzeID API to quickly and easily retrieve structured data from U.S. government-issued drivers’ licenses and passports. We also described how you can parse the Analyze ID response JSON. For more information, see the Amazon Textract Developer Guide, or check out the developer console and try out Analyze ID API.


About the Authors

Wrick Talukdar is a Senior Solutions Architect with AWS and is based in Calgary, Canada. Wrick works with enterprise AWS customers to transform their business through innovative use of cloud technologies. Outside of work, he enjoys reading and photography.

Lana Zhang is a Sr. Solutions Architect at AWS with expertise in Machine Learning. She is responsible for helping customers architect scalable, secure, and cost-effective workloads on AWS.

Read More

Evolution of Cresta’s machine learning architecture: Migration to AWS and PyTorch

Cresta Intelligence, a California-based AI startup, makes businesses radically more productive by using Expertise AI to help sales and service teams unlock their full potential. Cresta is bringing together world-renowned AI thought-leaders, engineers, and investors to create a real-time coaching and management solution that transforms sales and increases service productivity, weeks after application deployment. Cresta enables customers such as Intuit, Cox Communications, and Porsche to realize a 20% improvement in sales conversion rate, 25% greater average order value, and millions of dollars in additional annual revenue.

This post discusses Cresta’s journey as they moved from a multi-cloud environment to consolidating their machine learning (ML) workloads on AWS. It also gives a high-level view of their legacy and current training and inference architectures. Cresta chose to migrate to using Meta’s PyTorch ML framework due to its ease of use, efficiency, and enterprise adoption. This includes their use of TorchServe for ML inference in production.

Machine learning at Cresta

Cresta uses multiple natural language processing (NLP) models in their production applications. The Suggestions model monitors the conversation between the call center agent and the customer and generates a full form response, which the agent can use to respond to the customer. A second model called Smart Compose predicts the next few words to auto-complete the agent’s response while typing. Cresta also uses other ML models for intent classification and named entity recognition.

Cresta was born in the cloud and initially used multiple public clouds to build architectures to store, manage, and process datasets, and to train and deploy ML models. As Cresta’s development and production workloads grew in size, managing resources, moving data, and maintaining ML pipelines across multiple clouds became increasingly tedious, time-consuming to manage, and added to operational costs. As a result, Cresta took a holistic view of their siloed ML pipelines and chose AWS to host all their ML training and inference workloads.

“Using multiple cloud providers required us to effectively double our efforts on security and compliance, as each cloud provider needed similar effort to ensure strict security limitations,” says Jack Lindamood, Head of Infrastructure at Cresta. “It also split our infrastructure expertise as we needed to become experts in services provided by multiple clouds. We chose to consolidate ML workloads on AWS because of our trust in their commitment to backward-compatibility, history of service availability, and strong customer support on both the account and technical side.”

Multi-cloud environments and workload consolidation

At a high level, the following diagram captures Cresta’s previous architecture spanning two public cloud service providers. The main datasets were hosted and maintained on Amazon Aurora, and training was performed outside AWS, on another cloud service provider, using custom chips. Based on training requirements, a subset of the data would be curated from Aurora, copied to Amazon Simple Storage Service (Amazon S3), then exported out of AWS into the other cloud where Cresta trained their NLP models. The size of data moved each time ranged from 1–100 GB. The ML training pipeline was built around Argo Workflows, which is an open-source workflow engine where each step in a workflow is implemented in a container. Once trained, the models were automatically evaluated for accuracy before manual checks. The models that passed this validation were imported back to AWS, containerized, and deployed into production using Amazon Elastic Kubernetes Service (Amazon EKS). Cresta’s production inference was hosted on AWS.

This approach initially worked well for Cresta when the number of datasets and models were limited and performance requirements were low. As the complexity of their applications grew over time, Cresta faced multiple challenges in managing environments on two cloud providers. Security audits had to be performed on both cloud environments, which prolonged release cycles. Keeping the datasets current while moving large amounts of data and trained models between environments was challenging. It also became increasingly difficult to maintain the system’s architecture—the workflow often broke at the cloud boundaries, and resource partitioning between clouds was difficult to optimize. This multi-cloud complexity prevented Cresta from scaling faster and cost-effectively.

To overcome these challenges, Cresta decided to consolidate all their ML workloads on AWS. The key drivers to choosing AWS for all development and production ML workloads were AWS’s breadth of feature-rich services like Amazon Elastic Compute Cloud (Amazon EC2), Amazon S3, Amazon EKS, EC2 Spot Instances, and databases, the built-in cost-optimization features in these services, native support for ML frameworks like PyTorch, and superior technical support. The AWS team worked closely with Cresta to architect the ML training pipeline with Amazon EKS and Spot Instances, and optimized the model training and inference performance. In addition to developing custom ML models, Cresta uses NLP models from Hugging Face, which are supported on AWS GPU instances out of the box for training and inference. To train these models on AWS, Cresta used P3 instances (based on NVIDIA V100 GPUs) of varying sizes.

As a result of this migration, the teams at Cresta no longer had to worry about managing ML pipelines across separate clouds, thereby significantly improving productivity. The Amazon Aurora PostgreSQL database was integrated into the development pipeline, removing the need to use an intermediate storage system to save results or to export datasets externally. Dataset generation, model training, and inferencing are now all performed on the same cloud environment, which has simplified operations, improved reliability, and reduced the complexity of the build and deploy toolchain.

Model training and validation on AWS

The following figure represents the development and training pipeline after the migration to AWS. The pipeline uses Argo Workflows, an open-source container-native workflow engine for orchestrating parallel jobs in Kubernetes. Argo Workflows is deployed on Amazon EKS in a Multi-AZ model.

For the Suggestions model use case, Cresta uses chat data for training, and these datasets are stored in the Aurora database. When a model is ready to be trained, data generation scripts query the database, identify the datasets, and develop a snapshot of the dataset for training. C5.4xlarge instances are used to handle these operations. The preprocessing step converts the dataset to a low level where it is ready to be fed to the model. Training language models requires two preprocessing steps: serialization and tokenization. Structured data is converted to a single stream of characters, finalizing the string representation of the data. This is followed by the tokenization step, where the serial string representation is converted to a vector of integers. Preprocessing data helps accelerate the training process and hyperparameter sweeps. To train the Suggestions models, Cresta serializes data during preprocessing. Tokenization is handled during the training phase.

During training, a blind validation of the model is performed over a huge dataset of past chats during the epochal training. The epochal training continues only when the model shows improvement, otherwise the training step is stopped early, thereby preserving compute resources.

In the legacy architecture, model training was performed on a custom training chip followed by a large model validation step to check for accuracy improvement at the end of each epoch. Because the validation dataset was large, model validation couldn’t be performed on the same custom training chip, and had to be performed across multiple GPUs. This approach had single points of failure that could stall the training job. This was because the process required the launch of asynchronous threads to monitor the validation process and periodically poll to check for completion. Using the same hardware accelerator for both training and validation allows for seamless management of this process. After the training and validation steps are performed, manual verification of the training results is performed before deploying the model to the production environment.

To optimize for compute costs for the training process, Cresta used EC2 Spot Instances, which is spare Amazon EC2 capacity available at up to 90% discount compared to On-Demand pricing. For production inference workloads, Cresta uses G4dn instances, which are the industry’s most cost-effective and versatile GPU instances for deploying ML models such as image classification, object detection, and speech recognition. To minimize interruptions, Cresta uses a launch template that specifies multiple instance sizes, including g4dn.xlarge and g4dn.2xlarge. Cresta uses checkpoints and dataset loading from Amazon S3 to allow for model training to be restarted from the point of interruption. This makes it possible to train models efficiently with EC2 Spot Instances, which can be reclaimed with a 2-minute notice.

Model inference on AWS

The trained models are stored on Amazon S3 and are served using PyTorch TorchServe on an Amazon EKS cluster using G4dn instances (NVIDIA T4 GPUs) instances. The cluster is deployed across multiple Availability Zones, and the node groups include GPUs to enable high throughput and low-latency inferences. The model server pods are deployed on these nodes and are horizontally scaled to meet the throughput requirements of any given customer. As the models get retrained, the pods are restarted to pick up and serve the latest models. One Amazon EKS cluster serves all the customers, and customers are logically separated based on the Kubernetes namespace.

Migration to PyTorch

To support the growing capabilities of their products, Cresta needed to use and fine-tune newer NLP models faster. PyTorch, being popular among the research community, drives much of the innovation in NLP and natural language understanding (NLU) areas. Cresta handpicks NLP models from Hugging Face to retool and fine-tune for reuse, and most models available are based on PyTorch. Lastly, Cresta’s ML teams found PyTorch to be simpler than other frameworks to learn, ramp up, and build on.

“We are moving to PyTorch because most research in the NLP world is migrating to PyTorch,” says Saurabh Misra, AI Lead at Cresta. “A large ecosystem around PyTorch, like the Hugging Face library, enables us to quickly utilize the latest advancements in NLP without rewriting code. PyTorch is also very developer friendly and allows us to develop new models quickly with its ease of use, model debuggability, and support for efficient deployments.”

Because of these reasons, Cresta has chosen to migrate all their ML workloads to use PyTorch for model training and inference, aligning with the ongoing industry trend. Specifically, Cresta uses parallel training on 4 GPUs using torch.nn.DataParallel provided with the Hugging Face Trainer. Before using PyTorch, Cresta had to develop custom implementations of parallel training. This requirement was eliminated with the use of PyTorch, because PyTorch enables the implementation of a variety of training backends and methods, essentially for free. For large-scale inference in production, Cresta uses TorchServe as a model server because of its ease of use and out-of-the-box monitoring of the model, which helps with auto scaling the deployment according to the traffic.

Conclusion and next steps

In this post, we discussed how Cresta moved from a multi-cloud environment to consolidating their ML workloads on AWS. By moving all development and production ML workloads to AWS, Cresta is able to streamline efforts, better optimize for cost, and take advantage of the breadth and depth of AWS services. To further improve performance and cost-effectiveness, Cresta is investigating the following topics:

  • Pack multiple models into a single chip using bin-packing for optimal use of resources (memory and compute). This also helps with A/B tests on model performance.
  • Deploy models for inference using AWS Inferentia as a way to improve inference performance while keeping costs low.
  • Investigate different ways of static compilation of model graphs to reduce the compute required during inference. This will further improve the cost-effectiveness of Cresta’s deployments.

To dive deeper into developing scalable ML architectures with EKS, please refer these two reference architectures – distributed training with TorchElastic and serving 3000 models on EKS with AWS Inferentia.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.


About the Authors

Jaganath Achari is a Sr. Startup Solutions Architect at Amazon Web Services based out of San Francisco. He focuses on providing technical guidance to startup customers, helping them architect and build secure and scalable solutions on AWS. Outside of work, Jaganath is an amateur astronomer with an interest in deep sky astrophotography.

Sundar Ranganathan is the Head of Business Development, ML Frameworks on the Amazon EC2 team. He focuses on large-scale ML workloads across AWS services like Amazon EKS, Amazon ECS, Elastic Fabric Adapter, AWS Batch, and Amazon SageMaker. His experience includes leadership roles in product management and product development at NetApp, Micron Technology, Qualcomm, and Mentor Graphics.

Mahadevan Balasubramaniam is a Principal Solutions Architect for Autonomous Computing with nearly 20 years of experience in the area of physics-infused deep learning, building, and deploying digital twins for industrial systems at scale. Mahadevan obtained his PhD in Mechanical Engineering from the Massachusetts Institute of Technology and has over 25 patents and publications to his credit.

Saurabh Misra is a Staff Machine Learning Engineer at Cresta. He currently works on creating conversational technologies to make customer care organizations highly effective and efficient. Outside of work, he loves to play the drums and read books.

Jack Lindamood is the Head of Infrastructure at Cresta. In his spare time, he enjoys basketball and watching Esports.

Read More