Amazon AWS – Page 316

Alexa scientists discuss relevant work in the field of conversational AI

October 29, 2020

by admin Amazon AWS

Watch the replay of the Interspeech 2020 industry forum session.Read More

Amazon Consumer Science Summit goes virtual

October 29, 2020

by admin Amazon AWS

COVID-19-induced trend toward virtual conferences may change how science is conducted.Read More

Building a real-time conversational analytics platform for Amazon Lex bots

October 29, 2020

by Shanthan Kesharaju Amazon AWS

Conversational interfaces like chatbots have become an important channel for brands to communicate with their customers, partners, and employees. They offer faster service, 24/7 availability, and lower service costs. By analyzing your bot’s customer conversations, you can discover challenges in user experience, trending topics, and missed utterances. These additional insights can help you identify how to improve your bot and user engagement continuously. Whether you’re a product owner looking for user engagement insights or a conversation designer wanting to review missed utterances, a conversational analytics dashboard plays a vital role in serving these needs.

In this post, we build a real-time conversational analytics solution using the conversational logs from Amazon Lex. Amazon Lex is a service for building conversational interfaces into any application using voice and text. We use Amazon QuickSight to create a dashboard to visualize business KPIs, identify trends, and provide training data for bots to learn from their past failures. Some of the metrics we cover in this post include:

Daily summary statistics
User adoption
Intent and utterance metrics
Conversation review
Sentiment analysis

Solution architecture

The following diagram illustrates the architecture of our solution.

The architecture comprises streaming the conversation logs from Amazon CloudWatch to Amazon Kinesis Data Streams and having a stream consumer (an AWS Lambda function) transforming the data to be written into an Amazon Aurora database that serves as the analytics store.

Depending on your project’s scale and your organizational needs and preferences, you may want to look into a data warehousing solution like Amazon Redshift or use Amazon Athena and Amazon Simple Storage Service (Amazon S3). For more information, see Building a business intelligence dashboard for your Amazon Lex bots.

We use the Aurora connector in QuickSight to pull in the data, create datasets and analysis, and publish a conversation analytics dashboard. QuickSight lets you easily create and publish interactive dashboards. You can choose from an extensive library of visualizations, charts, and tables, and add interactive features such as drill-downs and filters.

Solution overview

For this post, we created an Amazon Lex bot using the sample OrderFlowers blueprint. The default sample only comes with one intent: OrderFlowers. To make the analytics more interesting, we added custom intents like BusinessHoursIntent, OffersIntent, and MyFallbackIntent. For the export of this bot, download OrderFlowers.zip. You can import this file into your Amazon Lex console or use your own Amazon Lex bot.

To implement the solution, we need to complete the following tasks:

Enable the conversation logs feature for your Amazon Lex bot.
Create a Kinesis data stream and make it a subscriber to the CloudWatch log group created on the AWS CloudFormation
Create an Aurora database to store the conversation log data.
Create a Lambda function and subscribe it to listen to the data stream. The Lambda function extracts the data from the stream and writes it to the Aurora database.
Set up QuickSight to consume data from the Aurora database.
Create datasets and analysis, and publish the dashboard in QuickSight.

Deploying the CloudFormation template

The CloudFormation template deploys the following resources:

An AWS Identity and Access Management (IAM) role to allow Amazon Lex to stream to CloudWatch Logs
A CloudWatch log group
A CloudWatch subscription filter
A Kinesis data stream and its associated IAM role
A Kinesis data stream consumer
A Lambda function for object construction and its associated IAM role
A serverless Aurora RDS cluster and its associated security group
A security group for QuickSight access to Amazon Relational Database Service (Amazon RDS)
An AWS Secrets Manager secret with Amazon RDS information
A fresh VPC for the Aurora cluster
Two subnets in the generated VPC
A DB Subnet Group comprised of the two subnets

Complete the following steps:

Deploy the template by choosing Launch Stack:

Give your stack a unique name.
Customize AWS CloudFormation deployment as needed.
Deploy the template. This deployment should take approximately 5 minutes to complete
Navigate to the Outputs tab of the CloudFormation stack and take note of the following values to use later:
1. SecretARN
2. QuickSightSecurityGroupID
3. RDSEndpoint
4. RDSPort

Enabling the conversation logs option in your Amazon Lex bot

Conversation logs are generated when communicating with a Lex bot on an associated alias. Make sure that the AWS CloudFormation deployment is complete before attempting this step.

On the Amazon Lex console, open your bot page and make sure the bot has been built and published.
On the Settings tab, choose Conversation Logs.
Publish an alias if you haven’t done so already by choosing the one you want and choosing the Settings

You’re prompted to select the log type, the CloudWatch log group, and IAM role on the next page.

For Log Type, select Text logs.
For Log Group, choose [STACK-NAME]-LexAnalyticsLogGroup-[RANDOM-STRING].
For IAM Role, choose [STACK-NAME]-LexAnalyticsToCWLRole-[RANDOM-STRING].

You now create the FlowersLogs table in Amazon RDS.

On the Amazon RDS console, navigate to the cluster created by the CloudFormation stack ([STACK-NAME]-orderflowersrds-[RANDOM-STRING]).
Choose Query Editor.
Select your RDS cluster.
Choose Connect with a Secrets Manager ARN.
Enter the SecretARN from the Outputs tab of the CloudFormation stack.
Connect to the database and run the following query to create the table:

CREATE TABLE LexAnalyticsDB.FlowersLogs ( `id` mediumint(9) NOT NULL AUTO_INCREMENT, `botName` varchar(50) DEFAULT NULL, `botAlias` varchar(50) DEFAULT NULL, `botVersion` int(11) DEFAULT NULL, `inputTranscript` varchar(255) DEFAULT NULL, `botResponse` varchar(255) DEFAULT NULL, `intent` varchar(100) DEFAULT NULL, `slots` varchar(255) DEFAULT NULL, `missedUtterance` BOOLEAN DEFAULT NULL, `inputDialog` varchar(50) DEFAULT NULL, `requestId` varchar(255) DEFAULT NULL, `userId` varchar(100) DEFAULT NULL, `sessionId` varchar(255) DEFAULT NULL, `tmstmp` timestamp(2) NULL DEFAULT NULL, `sentiment` varchar(50) DEFAULT NULL, `topic` varchar(50) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2865933 DEFAULT CHARSET=latin1

If you don’t have a client generating data on the Amazon Lex alias, you can generate test data using the aws-lex-web-ui deployment.

Navigate to the aws-lex-web-ui GitHub repo.
In the Getting Started section, choose Launch Stack for the Region you want to build in.
For BotName, enter the name of your bot.
For BotAlias, enter the alias of your bot.
Keep the other settings at their default; they should be sufficient in generating sample data.
Choose Create stack.
When the stack is built, on the Outputs page, choose the link for WebAppUrl.

You can now use this page to generate traffic for your bot.

Configuring QuickSight access

For this post, we assume that you’re starting from scratch and haven’t signed up for QuickSight.

Create a VPC Connection in Amazon QuickSight

On the QuickSight console, choose Sign up for QuickSight.

Keep the default settings, and make sure that you deploy in the same region where you deployed your CloudFormation stack.
On the Settings page, on the Manage VPC connections tab, choose Add VPC connection.

Enter a connection name.
Choose the same VPC you deployed your RDS instance into.
Choose any subnet in the VPC.
For Security Group ID, enter the QuickSightSecurityGroupID value from the Outputs tab of the CloudFormation stack.
Choose Create.

Create a Dataset in QuickSight

Go to the Resources tab of your CloudFormation stack.
Navigate to the LexAnalyticsSecret resource and choose the blue link to the resource.
Choose Retrieve secret value.
Copy the username and password.
On the QuickSight console, choose Manage Data.
Choose New Data Set.
For Data source, choose Aurora.
Enter a name for your data source.
For Connection type, select the connection you created in the previous section.
For Database connector, choose MySQL.
For the server and port, use the RDSEndpoint and RDSPort fields from the CloudFormation stack Outputs.
For Database name, enter LexAnalyticsDB.
Enter the username and password for the RDS instance earlier.
Choose Create data source.
Select the FlowersLogs
Import to SPICE.

Configuring QuickSight visuals

You have an assortment of pivots to base the analytical dashboard on, depending on the use case you’re targeting: summary view, trend analysis, user level, intent level, utterance level, conversation review, and sentiment analysis.

A summary view can help you compare and contrast the number of users, sessions, and utterances between the current day and the previous day, or the current hour and the previous hour.

A trend analysis of sessions, users, and utterances can help you spot anomalies and cyclical patterns.

User-level metrics measure which users are adopting the chatbots more regularly versus users who are not. You can use this data in conjunction with persona data to segment users to create personalized experiences.

Intent-level metrics help identify the top N intents, which improves staffing decisions at the contact centers serving phone and chat channels. When deciding to prune a bot’s intent structure, you can use these metrics to remove the bottom N intents that don’t serve significant traffic.

Utterance-level metrics help you identify missed utterances and group them by phrases. You can either add the utterances with high counts to the existing intents or create new intents if those utterances don’t already fit into the existing intents.

Conversation review helps you look at the entire conversation between the user and the bot.

Sentiment analysis helps you learn your users’ overall sentiment concerning their experience with the bot. Reviewing conversations that received negative sentiment helps you identify the root cause.

Conclusion

Whether you’re a product owner, conversation designer, developer, or data scientist, conversational analytics are pivotal to understanding user adoption and teaching your bot to learn from its past mistakes. This post covered how to use conversation logs and QuickSight to capture useful insights from user conversations and visualize them. Get started with Amazon Lex and start building your a customized analytics dashboard for your conversation logs.

About the Authors

Shanthan Kesharaju is a Senior Architect who helps our customers with AI/ML strategy and architecture. Shanthan has an MBA in Marketing from Duke University and an MS in Management Information Systems from Oklahoma State University.

Blake DeLee is a Rochester, NY-based conversational AI consultant with AWS Professional Services. He has spent five years in the field of conversational AI and voice, and has experience bringing innovative solutions to dozens of Fortune 500 businesses. Blake draws on a wide-ranging career in different fields to build exceptional chatbot and voice solutions.

Amazon’s new research on automatic speech recognition

October 29, 2020

by admin Amazon AWS

Interspeech papers include novel approaches to speaker identification and the training of end-to-end speech recognition models.Read More

Configuring your Amazon Kendra Confluence Server connector

October 28, 2020

by Ben Snively Amazon AWS

Many builders and teams on AWS use Confluence as a way of collaborating and sharing information within their teams and across their organizations. These types of workspaces are rich with data and contain sets of knowledge and information that can be a great source of truth to answer organizational questions.

Unfortunately, it isn’t always easy to tap into these data sources to extract the information you need. For example, the data source might not be connected to an enterprise search service within the organization, or the service is outdated and lacks natural language search capabilities, leading to poorer search experiences.

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization.

Amazon Kendra lets you easily add data sources using a wide range of connector types, so you can use its intelligent search capabilities to search your content repositories. Amazon Kendra maintains document access rights and automatically syncs with your index to make sure you’re always searching the most up-to-date content.

In this post, we walk through the process of setting up your Amazon Kendra connector for Confluence Server.

Prerequisites

The post assumes that you have Confluence set up and an index created in Amazon Kendra. For instructions on setting up your index, see Creating an index.

Creating the Confluence connector

To set up your Confluence connector, complete the following steps:

On the Amazon Kendra console, navigate to your index and choose Add data sources.

From the list of available connectors, choose Confluence Server.
Choose Add connector.

Next, we need to specify the data source details.

For Data source name, enter a name.
For Description, enter an optional description.

The next step is data access and security.

For Confluence URL, enter the URL to your Confluence site.

If your site is running in a private VPC, you must configure Amazon Kendra to access your VPC resources.

In the Set authentication section, for Type of authentication, you can choose to create new authentication credentials or use an existing one. (For this post, we choose New.)
For Secret name, enter a name.
For User name¸ enter your Confluence account user name.
For Password, enter a password.

This information is stored in AWS Secrets Manager.

In the Set IAM role section, choose the AWS Identity and Access Management (IAM) role that Amazon Kendra uses to crawl your Confluence data and update the index.

At minimum, the role should have permission to create and update indexes in Amazon Kendra and read your Confluence credentials from Secrets Manager.

In the Configure sync settings section, you set up your index sync options.

For Set sync scope, choose to include or exclude specific Confluence workspaces.
For Set sync run schedule, choose the schedule you want for your sync jobs. Each data source can have its own update schedule.

Custom attributes allow you to add additional metadata to your documents in the index. For example, you can create a custom attribute called Department with values HR, Sales, and Manufacturing. You can apply these attributes to your documents so that you can limit the response to documents in the HR department, for example.

In the field mapping section, you can choose the mappings of Confluence fields to Amazon Kendra fields in the index. You can update required fields, recommended fields, and additional suggested field mappings.

Review your settings summary to check if everything looks okay and choose Add data source.

Starting the Confluence connector manually

After you create your data source, you can start the sync process manually by choosing Sync now.

When the sync job is complete, the status shows as Succeeded.

Testing the results

After the sync job is complete, you can search many different ways. For this post, we walk through using the Amazon Kendra console to test the results. For more information, see Querying an index (console).

In the navigation pane, choose Search console.

Now you can search the index.

Conclusion

In this post, we walked through the process of creating and running the Confluence Server data source connector. This connector enables you to connect to a Confluence data source, specify which areas to crawl, and how to process field metadata elements and other key functions.

By doing this, you can use the intelligent search capabilities of Amazon Kendra, powered by ML, on your Confluence Server content. To see a full list of data sources currently supported by Amazon Kendra, see Data sources.

About the Authors

Ben Snively is an AWS Public Sector Specialist Solutions Architect. He works with government, non-profit, and education customers on big data/analytical and AI/ML projects, helping them build solutions using AWS.

Sam Palani is an AI/ML Specialist Solutions Architect at AWS. He works with public sector customers to help them architect and implement machine learning solutions at scale. When not helping customers, he enjoys long hikes, unwinding with a good book, listening to his classical vinyl collection and hacking projects with Raspberry Pi.

How Alexa scientists are advancing speech science

October 28, 2020

by admin Amazon AWS

Watch as four Amazon Alexa scientists talk about current state, new developments, and recent announcements surrounding advancements in Alexa speech technologies.Read More

New sound detection approach improves on state of the art

October 28, 2020

by admin Amazon AWS

Knowledge distillation technique for shrinking neural networks yields relative performance increases of up to 122%.Read More

Optimizing costs for machine learning with Amazon SageMaker

October 27, 2020

by BK Chaurasiya Amazon AWS

Applications based on machine learning (ML) can provide tremendous business value. Using ML, we can solve some of the most complex engineering problems that previously were infeasible. One of the advantages of running ML on the AWS Cloud is that you can continually optimize your workloads and reduce your costs. In this post, we discuss how to apply such optimization to ML workloads. We consider available options such as elasticity, different pricing models in cloud, automation, advantage of scale, and more.

Developing, training, maintaining, and performance tuning ML models is an iterative process that requires continuous improvement. Determining the optimum state in the model while going through the permutations and combinations of model parameters and data dependencies to adjust is just one leg of the journey. There is more to optimizing the cost of ML than just algorithm performance and model tuning. There is also some effort required to integrate developed models into applications and realize their benefits. Throughout this process, you can keep the cost down in numerous ways. Amazon SageMaker has made most of this journey smooth so developers and data scientists can spend most of their time focusing on what matters the most—delivering business value.

Amazon SageMaker notebook instances

An Amazon SageMaker notebook instance is an ML compute instance running the Jupyter Notebook app. This notebook instance comes with sample notebooks, several optimized algorithms, and complete code walkthroughs. Amazon SageMaker manages the creation of this instance and related resources. Consider using Amazon SageMaker Studio notebooks for collaborative workloads and when you don’t need to set up compute instances and file storage beforehand.

You can follow these best practices to help reduce the cost of notebook instances.

GPU or CPU?

CPUs are best at handling single, more complex calculations sequentially, whereas GPUs are better at handling multiple but simple calculations in parallel. For many use cases, a standard current generation instance type from an instance family such as ml.m* provides enough computing power, memory, and network performance for many Jupyter notebooks to perform well. GPUs provide a great price/performance ratio if you take advantage of them effectively. However, GPUs also cost more, and you should choose GPU-based notebooks only when you really need them.

Ask yourself: Is my neural network relatively small scale? Is my network performing tons of calculations involving hundreds of thousands of parameters? Can my model take advantage of hardware parallelism such as P3 and P3dn instance families?

Depending on the model, the GPU communication overhead might even degrade performance. So, take a step back and start with what you think is the minimum requirement in terms of ml instance specification and work your way up to identifying the best instance type and family for your model.

If you’re using your notebook instance to train multiple jobs, decide when you need a GPU-enabled instance and when you don’t. If you need accelerated computing in your notebook environment, you can stop your m* family notebook instance, switch to a GPU-enabled P* family instance, and start it again. Don’t forget to switch it back when you no longer need that extra boost in your development environment.

If you’re using massive datasets for training and don’t want to wait for days or weeks to finish your training job, you can speed up the process by distributing training on multiple machines or processes in a cluster.

It’s recommended to use a small subset of your data for development in your notebook instance. You can use the full dataset for a training job that is distributed across optimized instances such as P2 or P3 GPU instances or an instance with powerful CPU, such as c5.

Maximize instance utilization

You can optimize your Amazon SageMaker notebook utilization many different ways. One simple way is to stop your notebook instance when you’re not using it and start when you need it. Consider auto-detecting idle notebook instances and managing their lifecycle using a lifecycle configuration script. For detailed implementation, see Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker. Remember that the instance is only useful when you’re using the Jupyter notebook. If you’re not working on a notebook overnight or over the weekend, it’s a good idea to schedule a stop and start. Another way to save instance cost is by scheduling an AWS Lambda function. For example, you can stop all instances at 7:00 PM and start them at 7:00 AM.

You can also use Amazon CloudWatch Events to start and stop the instance based on an event. If you’re feeling geeky, connect it to your Amazon Rekognition based system to start a data scientist’s notebook instance when they step into the office or have Amazon Alexa do it as you grab a coffee.

Training jobs

The following are some best practices for saving costs on training jobs.

Use pre-trained models or even APIs

Pre-trained models eliminate the time spent gathering data and training models with that data. Consider using higher-level APIs such as provided by Amazon Rekognition or Amazon Comprehend to help you avoid spending on tasks that are already done for you. As an example, Amazon Comprehend simplifies topic modeling on a large corpus of documents. You can also use the Neural topic modeling (NTM) algorithm in Amazon SageMaker to get similar results with more effort. Although you have more control over hyperparameters when training your own model, your use case may not need it. A lot of engineering work and experience goes into creating ready-to-consume and highly optimized models, therefore an upfront ROI analysis is highly recommended if you’re embarking on a journey to develop similar models.

Use Pipe mode (where applicable) to reduce training time

Certain algorithms in Amazon SageMaker like Blazing text work on a large corpus of data. When these jobs are launched, significant time goes into downloading the data from Amazon Simple Storage Service (Amazon S3) into the local Amazon Elastic Block Storage (Amazon EBS) store. Your training jobs don’t start until this download finishes. These algorithms can take advantage of Pipe mode, in which training data is streamed from Amazon S3 into Amazon EBS and your training jobs start immediately. For example, training Blazing text on common crawl (3 TB) can take a few days, out of which a significant number of hours are just lost in download. This process can take advantage of Pipe mode to reduce significant training time.

Managed spot training in Amazon SageMaker

Managed spot training can optimize the cost of training models up to 90% over On-Demand Instances. Amazon SageMaker manages the Spot interruptions on your behalf. If your training job can be interrupted, use managed spot training. You can specify which training jobs use Spot Instances and a stopping condition that specifies how long Amazon SageMaker waits for a job to run using EC2 Spot Instances.

You may also consider using EC2 Spot Instances if you’re willing to do some extra work and if your algorithm is resilient enough to interruptions. For more information, see Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs.

Test your code locally

Resolve issues with code and data so you don’t need to pay to run training clusters for failed training jobs. This also saves you time spent initializing the training cluster. Before you submit a training job, try to run the fit function in local mode to fetch some early feedback:

mxnet_estimator = MXNet('train.py', train_instance_type='local', train_instance_count=1)

Monitor the performance of your training jobs to identify waste

Amazon SageMaker is integrated with CloudWatch out of the box and publishes instance metrics of the training cluster in CloudWatch. You can use these metrics to see if you should make adjustments to your cluster, such as CPUs, memory, number of instances, and more. To view the CloudWatch metric for your training jobs, navigate to the Jobs page on the Amazon SageMaker console and choose View Instance metrics in the Monitor section.

Also, use Amazon SageMaker Debugger, which provides full visibility into model training by monitoring, recording, analyzing, and visualizing training process tensors. Debugger can dramatically reduce the time, resources, and cost needed to train models.

Find the right balance: Performance vs. accuracy

Compare the throughput of 16-bit floating point and 32-bit floating point calculations and determine what is right for your model. 32-bit (single precision or FP32) and even 64-bit (double precision or FP64) floating point variables are popular for many applications that require high precision. These are workloads like engineering simulations that simulate real-world behavior and need the mathematical model to be as exact as possible. In many cases, however, reducing memory usage and increasing speed gained by moving to half or mixed precision (16-bit or FP16) is worth the minor tradeoffs in accuracy. For more information, see Accelerating GPU computation through mixed-precision methods.

A similar trade-off also applies when deciding on the number of layers in your neural network for your classification algorithms, such as image classification.

Tuning (hyperparameter optimization) jobs

Use hyperparameter optimization (HPO) when needed and choose the hyperparameters and their ranges to tune on wisely.

Some API calls can result in a bill of hundreds or even thousands of dollars, and tuning jobs are one of those. A good tuning job can save you many working days of expensive data scientists’ time and provide a significant lift in model performance, which is highly beneficial. HPO in Amazon SageMaker finds good hyperparameters quicker if the search space is narrow (for example, a learning rate of 0.01–0.05 rather than 0.001–0.9). If you have some relevant prior knowledge about the hyperparameter range, start with that. For wide hyperparameter ranges, you may want to consider logarithmic transformations.

Amazon SageMaker also reduces the amount of time spent tuning models using built-in HPO. This technology automatically adjusts hundreds of different combinations of parameters to quickly arrive at the best solution for your ML problem. With high-performance algorithms, distributed computing, managed infrastructure, and HPO, Amazon SageMaker drastically decreases the training time and overall cost of building production grade systems. You can see examples of HPO in some of the Amazon SageMaker built-in algorithms.

For longer training jobs and as the training time for each training job gets longer, you may also want to consider early stopping of training jobs.

Hosting endpoints

The following section discusses how to save cost when hosting endpoints using Amazon SageMaker hosting services.

Delete endpoints that aren’t in use

Amazon SageMaker is great for testing new models because you can easily deploy them into an A/B testing environment. When you’re done with your tests and not using the endpoint extensively anymore, you should delete it. You can always recreate it when you need it again because the model is stored in Amazon S3.

Use Automatic Scaling

Auto Scaling your Amazon SageMaker endpoint doesn’t just provide high availability, better throughput, and better performance, it also optimizes the cost of your endpoint. Make sure that you configure Auto Scaling for your endpoint, monitor your model endpoint, and adjust the scaling policy based on the CloudWatch metrics. For more information, see Load test and optimize and Amazon SageMaker endpoint using automatic scaling.

Amazon Elastic Inference for deep learning

Selecting a GPU instance type that is big enough to satisfy the requirements of the most demanding resource for inference may not be a smart move. Even at peak load, a deep learning application may not fully utilize the capacity offered by a GPU. Consider using Amazon Elastic Inference, which allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75%.

Host multiple models with multi-model endpoints

You can create an endpoint that can host multiple models. Multi-model endpoints reduce hosting costs by improving endpoint utilization and provide a scalable and cost-effective solution to deploying a large number of models. Multi-model endpoints enable time-sharing of memory resources across models. It also reduces deployment overhead because Amazon SageMaker manages loading models in memory and scaling them based on traffic patterns to models.

Reducing labeling time with Amazon SageMaker Ground Truth

Data labeling is a key process of identifying raw data (such as images, text files, and videos) and adding one or more meaningful and informative labels to provide context so that an ML model can learn from it. This process is essential because the accuracy of trained model depends on accuracy of properly labeled dataset, or ground truth.

Amazon SageMaker Ground Truth uses combination of ML and a human workforce (vetted by AWS) to label images and text. Many ML projects are delayed because of insufficient labeled data. You can use Ground Truth to accelerate the ML cycle and reduce overall costs.

Tagging your resources

Consider tagging your Amazon SageMaker notebook instances and the hosting endpoints. Tags such as name of the project, business unit, environment (such as development, testing, or production) are useful for cost-optimization and can provide a clear visibility into where the money is spent. Cost allocation tags can help track and categorize your cost of ML. It can answer questions such as “Can I delete this resource to save cost?”

Keeping track of cost

If you need visibility of your ML cost on AWS, use AWS Budgets. This helps you track your Amazon SageMaker cost, including development, training, and hosting. You can also set alerts and get a notification when your cost or usage exceeds (or is forecasted to exceed) your budgeted amount. After you create your budget, you can track the progress on the AWS Budgets console.

Conclusion

In this post, I highlighted a few approaches and techniques to optimize cost without compromising on the implementation flexibility so you can deliver best-in-class ML-based business applications.

For more information about optimizing costs, consider the following:

Refer to more ways of optimizing your cost on the cloud by right-sizing your infrastructure. Also take a look at best practices.
For an in-depth cost saving analysis when using an Elastic Inference accelerator, see Serving deep learning at Curalate with Apache MXNet, AWS Lambda, and Amazon Elastic Inference.
Give Amazon SageMaker a try with any of the several sample Jupyter notebooks. For more information about getting started, see Amazon SageMaker – Accelerated Machine Learning.
Learn more about managing ML projects in the whitepaper Managing Machine Learning Projects.

About the Author

BK Chaurasiya is a Principal Product Manager at Amazon Web Services R&D and Innovation team. He provides technical guidance, design advice, and thought leadership to some of the largest and successful AWS customers and partners. A technologist by heart, BK specializes in driving DevOps, continuous delivery, and large-scale cloud transformation initiatives to success.

zomato digitizes menus using Amazon Textract and Amazon SageMaker

October 27, 2020

by Chiranjeev Ghai Amazon AWS

This post is co-written by Chiranjeev Ghai, ML Engineer at zomato. zomato is a global food-tech company based in India.

Are you the kind of person who has very specific cravings? Maybe when the mood hits, you don’t want just any kind of Indian food—you want Chicken Chettinad with a side of paratha, and nothing else will hit the spot! To help picky eaters satisfy their cravings, we at zomato have recently added enhanced search engine capabilities to our restaurant aggregation and food delivery platform. These capabilities enable us to recommend restaurants to zomato users based on searches for specific dishes.

We power this functionality with machine learning (ML), using it to extract and structure text data from menu images. To develop this menu digitization technology, we partnered with Amazon ML Solutions Lab to explore the capabilities of the AWS ML Stack. This post summarizes how we used Amazon Textract and Amazon SageMaker to develop a customized menu digitization solution.

Extracting raw text from menus with Amazon Textract

The first component of this solution was to accurately extract all the text in the menu image. This process is known as optical character recognition (OCR). For our use case, we experimented with both in-house and commercial OCR solutions.

We first created an in-house OCR solution by stacking a pre-trained text detection model and a pre-trained text recognition model. The challenge with these models was that they were trained on a standard text dataset that didn’t match the eclectic fonts found in restaurant menus. To improve system performance, we fine-tuned these models by generating a dataset of 1.5 million synthetic text images that were more representative of text in menus.

After evaluating our in-house solution and several commercial OCR solutions, we found that Amazon Textract offers the best text recognition precision and recall. Restaurants often get creative when designing their menus, so OCR robustness was crucial for this use case. Amazon Textract particularly differentiated itself when processing menus with unique fonts, background images, and low image resolutions. Using it is as simple as making an API call:

#Python 3.6
import boto3
textract_client = boto3.client(
    'textract',
    region_name = '' #insert the AWS region you're working in
)
textract_response = textract_client.detect_document_text(
    Document={
        'S3Object': {
        'Bucket': '', #insert the name of the S3 bucket containing your image
        'Name': '' #insert the S3 key of your image
        }
    }
)

print(textract_response)

The following code is the Amazon Textract output for a sample image:

{'DocumentMetadata': {'Pages': 1},
 'Blocks': [{'BlockType': 'PAGE',
   'Geometry': {'BoundingBox': {'Width': 1.0,
     'Height': 1.0,
     'Left': 0.0,
     'Top': 0.0},
  ...
  {'BlockType': 'WORD',
   'Text': 'Dim',
   'Geometry': {'BoundingBox': {'Width': 0.10242128372192383,
     'Height': 0. 048968635499477386,
     'Left': 0. 24052166938781738,
     'Top': 0. 02556285448372364},
...

The raw outputs are visualized by overlaying them on top of the image. The following image visualizes the preceding raw output. The black boxes are the text-detection bounding boxes provided by Amazon Textract. Extracted text is displayed on the right. Note the unconventional fonts, colors, and images on this menu.

The following image visualizes Amazon Textract outputs for a menu with a different design. Black boxes are the text-detection bounding boxes provided by Amazon Textract. Extracted text is displayed on the right. Again, this menu has unconventional fonts, colors, and images.

Using Amazon SageMaker to build a menu structure detector

The next component of this solution was to group the detections from Amazon Textract by menu section. This enabled our search engine to distinguish between entrees, desserts, beverages, and so on. We framed this as a computer vision problem—object detection, to be precise—and used Amazon SageMaker Ground Truth to collect training data. Ground Truth accelerated this process by providing a fully managed annotation tool that we customized to ask human annotators to draw bounding boxes around every menu section in the image. We used an annotation workforce from AWS Marketplace because this was a niche labeling task, and public labelers from Amazon Mechanical Turk didn’t perform well. With Ground Truth, it took just a few days and approximately $1,400 to label 4,086 images with triplicate redundancy.

With labeled data in hand, we faced a paradox of choice when selecting model-building approaches because object detection is such a thoroughly studied problem. Our choices included:

Removing low-confidence labels from the labeled dataset – Because even human annotators can make mistakes, Ground Truth calculates confidence scores for labels by having multiple annotators (for this use case, three) label the same image. Setting a higher confidence threshold for labels can decrease the noise in the training data at the expense of having less training data.
Data augmentation – Techniques for image data augmentation include horizontal flipping, cropping, shearing, and rotation. Data augmentation can make models more robust by increasing the amount of training data. However, excessive data augmentation may result in poor model convergence.
Feature engineering – From our experience in applying computer vision to processing menus, we had a variety of techniques in mind to emphasize or de-emphasize various aspects of the input images. For example, see the following images.

The following is the original image of a menu.

The following image shows the redacted image (overlay white boxes on a black background where text detections were found).

The following is a text cropped image. On a black background, the image has overlay crops from the original image where text detections were found.

The following is a single channel and text cropped image. The image is encoded as a single RGB channel (for this image, green). You can apply this with other transformations, in this case text cropping.

We also had the following additional model-building methods to choose from:

Model architectures like YOLO, SSD, and RCNN, with VGG or ResNet backbones – Each architecture has different trade-offs of model accuracy, inference time, model size, and more. For this use case, model accuracy was the most important metric because menu images were batch processed.
Using a model pre-trained on a general object detection task or starting from scratch – Transfer learning can be helpful when training complex models on small datasets. However, the task of detecting menu sections is very different from a general object detection task (for example, PASCAL VOC), so the pre-training may not be relevant.
Optimizer parameters – These include learning rate, momentum, regularization coefficients, and early stopping configuration.

With so many hyperparameters to consider, we turned to the automatic tuning feature of Amazon SageMaker to coordinate a massive tuning job across all these variables. The following code is an example of tuning a single model architecture and input data configuration:

import sagemaker
import boto3
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.estimator import Estimator
from sagemaker.tuner import HyperparameterTuner, IntegerParameter, CategoricalParameter, ContinuousParameter
import itertools
from time import sleep

#set to the region you're working in
REGION_NAME = ''
#set a S3 path for SageMaker to store the outputs of the training jobs 
S3_OUTPUT_PATH = ''
#set a S3 location for your training dataset, 
#assumed to be an augmented manifest file
#see: https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html
TRAIN_DATA_LOCATION = ''
#set a S3 location for your validation data, 
#assumed to be an augmented manifest file
VAL_DATA_LOCATION = ''
#specify which fields in the augmented manifest file are relevant for training
DATA_ATTRIBUTE_NAMES = [,]
#specify image shape
IMAGE_SHAPE = 
#specify label width
LABEL_WIDTH = 
#specify number of samples in the training dataset
NUM_TRAINING_SAMPLES = 

sgm_role = sagemaker.get_execution_role()
boto_session = boto3.session.Session(
    region_name = REGION_NAME
)
sgm_session = sagemaker.Session(
    boto_session = boto_session
)
training_image = get_image_uri(
    region_name = REGION_NAME, 
    repo_name = 'object-detection', 
    repo_version = 'latest'
)

#set training job configuration
object_detection_estimator = Estimator(
    image_name = training_image,
    role = sgm_role,
    train_instance_count = 1,
    train_instance_type = 'ml.p3.2xlarge',
    train_volume_size = 50,
    train_max_run = 360000,
    input_mode = 'Pipe',
    output_path = S3_OUTPUT_PATH,
    sagemaker_session = sgm_session
)

#set input data configuration
train_data = sagemaker.session.s3_input(
    s3_data = TRAIN_DATA_LOCATION,
    distribution = 'FullyReplicated',
    record_wrapping = 'RecordIO',
    s3_data_type = 'AugmentedManifestFile',
    attribute_names = DATA_ATTRIBUTE_NAMES
) 

val_data = sagemaker.session.s3_input(
    s3_data = VAL_DATA_LOCATION,
    distribution = 'FullyReplicated',
    record_wrapping = 'RecordIO',
    s3_data_type = 'AugmentedManifestFile',
    attribute_names = DATA_ATTRIBUTE_NAMES
)

data_channels = {
    'train': train_data, 
    'validation' : val_data
}

#set static hyperparameters
#see: https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html
static_hyperparameters = {
    'num_classes' : 1,
    'epochs' : 100,               
    'lr_scheduler_step' : '15,30',      
    'lr_scheduler_factor' : 0.1,
    'overlap_threshold' : 0.5,
    'nms_threshold' : 0.45,
    'image_shape' : IMAGE_SHAPE,
    'label_width' : LABEL_WIDTH,
    'num_training_samples' : NUM_TRAINING_SAMPLES,
    'early_stopping' : True,
    'early_stopping_min_epochs' : 5,
    'early_stopping_patience' : 1,
    'early_stopping_tolerance' : 0.05,
}

#set ranges for tunable hyperparameters
hyperparameter_ranges = {
    'learning_rate': ContinuousParameter(
        min_value = 1e-5, 
        max_value = 1e-2, 
        scaling_type = 'Auto'
    ),
    'mini_batch_size': IntegerParameter(
        min_value = 8, 
        max_value = 64, 
        scaling_type = 'Auto'
    )
}

#Not all hyperparameters are feasible to tune directly
#see: https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-tuning.html
#For these we run model tuning jobs in parallel using a for loop
#We take this approach for tuning over different model architectures 
#and different feature engineering configurations
use_pretrained_options = [0, 1]
base_network_options = ['resnet-50', 'vgg-16']

for use_pretrained, base_network in itertools.product(use_pretrained_options, base_network_options):
    static_hyperparameter_configuration = {
        **static_hyperparameters, 
        'use_pretrained_model' : use_pretrained, 
        'base_network' : base_network
    }
    
    object_detection_estimator.set_hyperparameters(
        **static_hyperparameter_configuration
    )
    
    tuner = HyperparameterTuner(
        estimator = object_detection_estimator,
        objective_metric_name = 'validation:mAP',
        strategy = 'Bayesian',
        hyperparameter_ranges = hyperparameter_ranges,
        max_jobs = 24,
        max_parallel_jobs = 2,
        early_stopping_type = 'Auto',
    )
    
    tuner.fit(
        inputs = data_channels
    )
    
    print(f'Started tuning job: {tuner.latest_tuning_job.name}')
    
    #wait a bit before starting next job so auto generated names don't conflict
    sleep(60)

This code uses version 1.72.0 of the Amazon SageMaker Python SDK, which is the default version installed in Amazon SageMaker notebook instances. Version 2.X introduces breaking changes. For more information, see Use Version 2.x of the SageMaker Python SDK.

We used powerful GPU hardware (p3.2xlarge instances), and it took us just 1 week and approximately $1,500 to explore 455 unique parameter configurations. Of these configurations, Amazon SageMaker found that a fine-tuned Faster R-CNN model with text cropping performed the best, with a mean average precision score of 0.93. This aligned with results from our prior work in this space, which found that two-stage detectors generally outperform single-stage detectors in processing menus.

The following is an example of how the object detection model processed a menu. In this image, the purple boxes are the predicted bounding boxes from the menu section detection model. Black boxes are the text detection bounding boxes provided by Amazon Textract.

Using Amazon SageMaker to build rule- and ML-based text classifiers

The final component in the solution was a layer of text classification. To enable our enhanced search functionality, we had to know if each detection within a menu section was the menu section title, name of a dish, price of a dish, or something else (such as a description of a dish or the name of the restaurant). To this end, we developed a hybrid rule- and ML-based text classification system.

The first step of the classification was to use a rule to determine if a detection was a price or not. This rule simply calculated the proportion of numeric characters in the detection. If the proportion was greater than 40%, the detection was classified as a price. Although simple, this classifier worked well in practice. We used Amazon SageMaker notebook instances as a convenient interactive environment to develop this and other rules.

After the prices were filtered out, the remaining detections were classified as dish or not dish. From our experience in processing menus, we intuitively knew that in many cases, the location of prices was sufficient to do this classification. For these menus, dishes and prices are listed side by side, so simply classifying detections located to the left of prices as dishes worked well.

The following example shows how the rules-based text classification system processed a menu. Green boxes are detections classified as dishes (by the price location rule). Red boxes are detections classified as not dishes (by the price location rule). Blue boxes are detections classified as prices. Final dish detections are on the right.

Some menus might include lengthy dish descriptions or may not list prices next to individual dishes. These menus violate the assumptions of the price location rules, so we turned to model-based text classification. We used Amazon SageMaker training jobs to experiment with many modeling approaches in parallel, including an XGBoost model trained on hashed word count vectors. In the end, we found that a fine-tuned BERT model from GluonNLP achieved the best performance with an AUROC score of 0.86.

The following image is an example of how the model-based text classification system processed a menu. Green boxes are detections classified as dishes (by the BERT model). Red boxes are detections classified as not dishes (by the BERT model). Blue boxes are detections classified as prices. The final dish detections are on the right.

Of the remaining detections (those not classified as prices or dishes), a final round of classification identified menu section titles. We created features that captured the font size of the detection, the location of the detection on the menu, and the length of the words within the detection. We used these features as inputs to a logistic regression model that predicted if a detection is a menu section title or not.

Key features of Amazon SageMaker

In the end, we found that doing OCR was as simple as making an API call to Amazon Textract. However, our use case required additional customization. We selected Amazon SageMaker as an ML platform to develop this customization because it offered several key features:

Amazon SageMaker Notebooks made it easy to spin up Jupyter notebook environments for prototyping and testing rules and models.
Ground Truth helped us build and deploy a custom image annotation tool with no front-end experience required.
Amazon SageMaker automatic tuning enabled us to run massive hyperparameter tuning jobs on powerful hardware, and included an intuitive interface for tracking the results of hundreds of experiments. You can implement tuning jobs with early stopping conditions, which makes experimentation cost-effective.

Amazon SageMaker offers additional integration benefits from including all the preceding features in a single platform:

Amazon SageMaker Notebooks come pre-installed with all the dependencies needed to build models that can be optimized with automatic tuning.
Ground Truth offers easy access to labelers from Mechanical Turk or AWS Marketplace.
Automatic tuning can directly ingest the manifest files created by Amazon SageMaker Ground Truth.

Putting it all together

Our menu digitization system can extract text from images of menus, group it by menu section, extract the title of the section, extract the dishes within each section, and pair each dish with its price. The following is a visualization of the end-to-end solution.

The workflow contains the following steps:

The input is an image of a menu.
Amazon Textract performs OCR on the input image.
An ML-based computer vision model predicts bounding boxes for menu sections in the menu image.
A rules-based classifier classifies Amazon Textract detections as price or not price.
A rules-based classifier (5a) attempts to use the location of price detections to classify the not price detections as dish or not dish. If this rule doesn’t successfully classify most of the detections on the page, an ML-based classifier is used instead (5b).
The ML-based classifier uses hand-crafted features to classify not dish detections as menu section title or not menu section title.
The menu text is structured by combining the menu section detections and the text classification results.

The following image visualizes a sample output of the system. Green boxes are detections classified as dishes. Blue boxes are detections classified as prices. Yellow boxes are detections classified as menu section titles. Purple boxes are predicted menu section bounding boxes.

The following code is the structured output:

[
   {
      "title":{
         "text":"Shrimp Dishes"
      },
      "dishes":[
         {
            "text":"Shrimp Masala",
            "price":{
               "text":"140"
            }
         },
         {
            "text":"Shrimp Biryani",
            "price":{
               "text":"170"
            }
         },
         {
            "text":"Shrimp Pulav",
            "price":{
               "text":"160"
            }
         }
      ]
   },
   ...
]

Conclusion

We built a system that uses ML to digitize menus without any human input required. This system will improve user experience by powering new features such as advanced dish search and review highlight verification. Our content team will also use it to accelerate creating menus for online ordering.

To explore these capabilities of Amazon Textract and Amazon SageMaker in more depth, see Automatically extract text and structured data from documents with Amazon Textract and Amazon SageMaker Automatic Model Tuning: Using Machine Learning for Machine Learning.

The Amazon ML Solutions Lab helped us accelerate our use of ML by pairing our team with ML experts. The ML Solutions Lab brings to every customer engagement learnings from more than 20 years of Amazon’s ML innovations in areas such as fulfillment and logistics, personalization and recommendations, computer vision and translation, fraud prevention, forecasting, and supply chain optimization. To learn more about the AWS ML Solutions Lab, contact your account manager or visit Amazon Machine Learning Solutions Lab.

About the Authors

Chiranjeev Ghai is a Machine Learning Engineer. In his current role, he has been aiding automation at zomato by leveraging a wide variety of ML optimisations ranging from Image Classification, Product Recommendation, and Text Detection. When not building models, he likes to spend his time playing video games at home.

Ryan Cheng is a Deep Learning Architect in the Amazon ML Solutions Lab. He has worked on a wide range of ML use cases from sports analytics to optical character recognition. In his spare time, Ryan enjoys cooking.

Andrew Ang is a Deep Learning Architect at the Amazon ML Solutions Lab, where he helps AWS customers identify and build AI/ML solutions to address their business problems.

Vinayak Arannil is a Data Scientist at the Amazon Machine Learning Solutions Lab. He has worked on various domains of data science like computer vision, natural language processing, recommendation systems, etc.

Video streaming and deep learning: Using Amazon Kinesis Video Streams with Deep Java Library

October 27, 2020

by Zach Kimberg Amazon AWS

Amazon Kinesis Video Streams allows you to easily ingest video data from connected devices for processing. One of the most effective ways to process this video data is using the power of deep learning. You can create an efficient service infrastructure to run these computations with a Java server, but Java support for deep learning has traditionally been difficult to come by.

Deep Java Library (DJL) is a new open-source deep learning framework for Java built by AWS. It sits on top of native engines, so you can train entirely in DJL while using different engines on the backend, such as PyTorch and Apache MXNet. It can also import and run models built using Tensorflow, Keras, and PyTorch. DJL can bridge the ease of Kinesis Video Streams with the power of deep learning for your own video analytics application.

In this tutorial, we walk through running an object detection model against a Kinesis video stream. In object detection, the computer finds different types of objects in an image and draws a bounding box, describing their locations inside the image. For example, you can use detection to recognize objects like dogs or people to avoid false alarms in a home security camera.

The full project and instructions to run it are available in the DJL demo repository.

Setting up

To begin, create a new Java project with the following dependencies, shown here in gradle format:

dependencies {
    implementation platform("ai.djl:bom:0.8.0")
    implementation "ai.djl:api"
    
    runtimeOnly "ai.djl.mxnet:mxnet-model-zoo"
    runtimeOnly "ai.djl.mxnet:mxnet-native-auto"
    
    implementation "software.amazon.awssdk:kinesisvideo:2.10.75"
    implementation "software.amazon.kinesis:amazon-kinesis-client:2.2.9"
    implementation "com.amazonaws:amazon-kinesis-video-streams-parser-library:1.0.13"
}

The DJL ImageVisitor

Because the model works on images, you can create a DJL FrameVisitor that visits and runs your model on each frame in the video. In real applications, it might help to only run your model on a fraction of the frames in the video. See the following code:

FrameVisitor frameVisitor = FrameVisitor.create(new DjlImageVisitor());

The DjlImageVisitor class extends the H264FrameDecoder to provide the capability to convert the frame into a standard Java BufferedImage. Because DJL natively supports this class, you can run it directly from the BufferedImage.

In DJL, the Predictor is used to run the trained model against live data. This is often referred to as inference or prediction. It fully encapsulates the inference experience by taking your input through preprocessing to prepare it into the model’s data structure, running the model itself, and postprocessing the data into an easy-to-use output class. In the following code block, the Predictor converts an Image to the set of outputs, DetectedObjects. An ImageFactory converts a standard Java BufferedImage into the DJL Image class:

public class DjlImageVisitor extends H264FrameDecoder {

    Predictor<Image, DetectedObjects> predictor;
    ImageFactory factory = ImageFactory.getInstance();

    ...

}

DJL also provides a model zoo where you can find many models trained on different tasks, datasets, and engines. For now, create a Predictor using the basic SSD object detection model. You can also use the default preprocessing and postprocessing defined within the model zoo to directly create a Predictor. For your own applications, you can define custom processing in a Translator and pass it in when creating a new Predictor:

Criteria<Image, DetectedObjects> criteria = Criteria.builder()
    .setTypes(Image.class, DetectedObjects.class)
    .optArtifactId("ai.djl.mxnet:ssd")
    .build();
predictor = ModelZoo.loadModel(criteria).newPredictor();

Then, you just need to define the FrameVisitors process method that is called to handle the various frames as follows. You convert the Frame into a BufferedImage using the decodeH264Frame method defined within the H264FrameDecoder. You wrap that into an Image using the ImageFactory you created earlier. Then, you use your Predictor to run prediction using the SSD model. See the following code:

    @Override
    public void process(
            Frame frame,
            MkvTrackMetadata trackMetadata,
            Optional<FragmentMetadata> fragmentMetadata)
            throws FrameProcessException {

        Image image = factory.fromImage(decodeH264Frame(frame, trackMetadata));
        DetectedObjects prediction = predictor.predict(image);
    }

Using the prediction

At this point, you have the detected objects and can use them for whatever your application requires. For a simple application, you could just print out all the class names that you detected to standard out as follows:

        String classStr =
                prediction
                        .items()
                        .stream()
                        .map(Classification::getClassName)
                        .collect(Collectors.joining(", "));
        System.out.println("Found objects: " + classStr);

You could also find out if there is a high probability that a person was in the image using the following code:

        boolean hasPerson =
                prediction
                        .items()
                        .stream()
                        .anyMatch(
                                c ->
                                        "person".equals(c.getClassName())
                                                && c.getProbability() > 0.5);

Another option is to use the image visualization methods in the Image class to draw the bounding boxes on top of the original image. Then, you can get a visual representation of the detected objects. See the following code:

        image.drawBoundingBoxes(prediction);
        Path outputFile = Paths.get("out/annotatedImage.png");
        try (OutputStream os = Files.newOutputStream(outputFile)) {
            image.save(os, "png");
        }

Running the stream

You’re now ready to set up your video stream. For instructions, see Create a Kinesis Video Stream. Make sure to record the REGION and STREAM_NAME that you used so you can pass it into your application.

Then, create a new thread pool to run your application. You also need to build a GetMediaWorker with all the data for your video stream and run it on the thread pool. For your getMediaworker, you need to pass in the data you pulled from the Kinesis Video Streams console describing your video stream. You also need to provide the AWS credentials for accessing the stream. Use the SystemPropertiesCredentialsProvider, which finds the credentials in the JVM System Properties. You can find more details about providing these credentials in the demo repository. Lastly, we need to pass in the StartSelectorType.NOW to start using the stream immediately. See the following code:

ExecutorService executorService = Executors.newFixedThreadPool(1);

AmazonKinesisVideoClientBuilder amazonKinesisVideoBuilder =
        AmazonKinesisVideoClientBuilder.standard();
amazonKinesisVideoBuilder.setRegion(REGION.getName());
amazonKinesisVideoBuilder.setCredentials(new SystemPropertiesCredentialsProvider());
AmazonKinesisVideo amazonKinesisVideo = amazonKinesisVideoBuilder.build();



GetMediaWorker getMediaWorker =
        GetMediaWorker.create(
                REGION,
                new SystemPropertiesCredentialsProvider(),
                STREAM_NAME,
                new StartSelector().withStartSelectorType(StartSelectorType.NOW),
                amazonKinesisVideo,
                frameVisitor);
executorService.submit(getMediaWorker);

Conclusion

That’s it! You’re ready to begin sending data to your stream and detecting the objects in the video. You can find more information about the Kinesis Video Streams API in the Amazon Kinesis Video Streams Producer SDK Java GitHub repo. The full Kinesis Video Streams DJL demo is available with the rest of the DJL demo applications and integrations with many other AWS and Java tools in the demo repository.

Now that you have integrated Kinesis Video Streams and DJL, you can improve your application in many different ways. You can choose additional object detection and image-based models from the more than 70 pre-trained and ready-to-use models in our model zoo from GluonCV, TorchHub, and Keras. You can run these or custom models across any of the engines supported by DJL, including Tensorflow, PyTorch, MXNet, and ONNX Runtime. DJL even has full training support so you can build your own model to add to your video streaming application instead of relying on a pre-trained one.

Don’t forget to follow our GitHub repo, demo repository, Slack channel, and Twitter for more documentation and examples of DJL!

About the Authors

Zach Kimberg is a Software Engineer with AWS Deep Learning working mainly on Apache MXNet for Java and Scala. Outside of work he enjoys reading, especially Fantasy.

Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.

Solution architecture

Solution overview

Deploying the CloudFormation template

Enabling the conversation logs option in your Amazon Lex bot

Configuring QuickSight access

Configuring QuickSight visuals

Conclusion

About the Authors

Prerequisites

Creating the Confluence connector

Starting the Confluence connector manually

Testing the results

Conclusion

About the Authors

Amazon SageMaker notebook instances

GPU or CPU?

Maximize instance utilization

Training jobs

Use pre-trained models or even APIs

Use Pipe mode (where applicable) to reduce training time

Managed spot training in Amazon SageMaker

Test your code locally

Monitor the performance of your training jobs to identify waste

Find the right balance: Performance vs. accuracy

Tuning (hyperparameter optimization) jobs

Hosting endpoints

Delete endpoints that aren’t in use

Use Automatic Scaling

Amazon Elastic Inference for deep learning

Host multiple models with multi-model endpoints

Reducing labeling time with Amazon SageMaker Ground Truth

Tagging your resources

Keeping track of cost

Conclusion

About the Author

Extracting raw text from menus with Amazon Textract

Using Amazon SageMaker to build a menu structure detector

Using Amazon SageMaker to build rule- and ML-based text classifiers

Key features of Amazon SageMaker

Putting it all together

Conclusion

About the Authors

Setting up

The DJL ImageVisitor

Using the prediction

Running the stream

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.