February 2023 – Page 11

Extend your TFX pipeline with TFX-Addons

Posted by Hannes Hapke and Robert Crowe

figuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system.

What is TFX-Addons?

TFX-Addons is a special interest group (SIG) for TFX users who are extending the standard set of components provided by Google’s TensorFlow team. The addons are implementations by other machine learning companies and developers which rely heavily on TFX for their production machine learning operations.

Common MLOps patterns, for example ingesting data into machine learning pipelines, are solved through TFX components. As an example, members of TFX-Addons developed and open-sourced a TFX component to ingest data from a Feast feature store, a component maintained by machine learning engineers at Twitter and Apple.

How can you use the TFX-Addons components or examples?

The TFX-Addons components and examples are accessible via a simple pip installation. To install the latest version, run the following:

pip install tfx-addons

To ensure you have a compatible version of dependencies for any given project, you can specify the project name as an extra requirement during install:

pip install tfx-addons[feast_examplegen]

To use TFX-Addons:

from tfx import v1 as tfx
import tfx_addons as tfxa

# Then you can easily load projects tfxa.{project_name}. Ex:

tfxa.feast_examplegen.FeastExampleGen(...)

The TFX-Addons components can be used in any TFX pipeline. Most components support all TFX orchestrators including Google Cloud’s Vertex Pipelines, Apache Beam, Apache Airflow, or Kubeflow Pipelines.

Which additional components are currently available?

The list of components, libraries, and examples is constantly growing, with several new projects currently in development. As of this writing, these are the currently available components.

Feast Component

The Example Generator allows you to ingest data samples from a Feast Feature Store.

More information: tfxa.feast_examplegen

Message Exit Handler

This component provides an exit handler for TFX pipelines which notifies the user about the final state of the pipeline (failed or succeeded) via a Slack message. If the pipeline fails, the component will provide the error message. The message component supports a number of message providers (e.g. Slack, stdout, logging providers) and can easily be extended to support Twilio. It also serves as an example of how to write exit handlers for TFX pipelines.

More information: tfxa.message_exit_handler

Schema Curation Component

This component allows its users to update/change the schema produced by the SchemaGen component, and curate it based on domain knowledge. The curated schema can be used to stop pipelines if a feature drift is detected.

More information: tfxa.schema_curation

Feature Selection Component

This component allows users to select features from datasets. This component is useful if you want to select features based on statistical feature selection metrics.

More information: tfxa.feature_selection

XGBoost Evaluator Component

This component extends the standard TFX Evaluator component to support trained XGBoost models, in order to do deep analysis of model performance.

More information: tfxa.xgboost_evaluator

Sampling Component

This component allows users to balance their training datasets by randomly undersampling or oversampling, reducing the data to the lowest- or highest-frequency class.

More information: tfxa.sampling

Pandas Transform Component

This component can be used instead of the standard TFX Transform component, and allows you to work with Pandas dataframes for your feature engineering. Processing is distributed using Beam for scalability.

More information: tfxa.pandas_transform

Firebase Publisher

This project helps users to publish trained models directly from a TFX pipeline to Firebase ML.

More information: tfxa.firebase_publisher

HuggingFace Model Pusher

The HuggingFace Model Pusher (HFModelPusher) pushes a blessed model to the HuggingFace Model Hub. Also, it optionally pushes an application to HuggingFace Space Hub.

More information: tfxa.huggingface_pusher

How can you participate?

The TFX-Addons SIG is all about sharing reusable components and best practices. If you are interested in MLOps, join our bi-weekly conference calls. It doesn’t matter if you are new to TFX or an experienced ML engineer, everyone is welcome and the SIG accepts open source contributions from all participants.

If you want to join our next meeting, sign up to our list group sig-tfx-addons@tensorflow.org.

Other resources:

TFX-Addons Slack – join here
TFX-Addons Repository

Already using TFX-Addons?

If you’re already using TFX-Addons we’d love to hear from you! Use this form to send us your story!

Thanks to all Contributors

Big thanks to all the open-source component contributions from following members:
Badrul Chowdhury, Daniel Kim, Fatimah Adwan, Gerard Casas Saez, Hannes Hapke, Marcus Chang, Kshitijaa Jaglan, Pratishtha Abrol, Robert Crowe, Nirzari Gupta, Thea Lamkin, Wihan Booyse, Michael Hu, Vulko Milev, and all the other contributors! Open-source only happens when people like you contribute!

Amazon SageMaker Automatic Model Tuning now supports three new completion criteria for hyperparameter optimization

Amazon SageMaker has announced the support of three new completion criteria for Amazon SageMaker automatic model tuning, providing you with an additional set of levers to control the stopping criteria of the tuning job when finding the best hyperparameter configuration for your model.

In this post, we discuss these new completion criteria, when to use them, and some of the benefits they bring.

SageMaker automatic model tuning

Automatic model tuning, also called hyperparameter tuning, finds the best version of a model as measured by the metric we choose. It spins up many training jobs on the dataset provided, using the algorithm chosen and hyperparameters ranges specified. Each training job can be completed early when the objective metric isn’t improving significantly, which is known as early stopping.

Until now, there were limited ways to control the overall tuning job, such as specifying the maximum number of training jobs. However, the selection of this parameter value is heuristic at best. A larger value increases tuning costs, and a smaller value may not yield the best version of the model at all times.

SageMaker automatic model tuning solves these challenges by giving you multiple completion criteria for the tuning job. It’s applied at the tuning level rather than at each individual training job level, which means it operates at a higher abstraction layer.

Benefits of tuning job completion criteria

With better control over when the tuning job will stop, you get the benefit of cost savings by not having the job run for extended periods and being computationally expensive. It also means you can ensure that the job doesn’t stop too early and you get a sufficiently good quality model that meets your objectives. You can choose to stop the tuning job when the models are no longer improving after a set of iterations or when the estimated residual improvement doesn’t justify the compute resources and time.

In addition to the existing maximum number of training job completion criteria MaxNumberOfTrainingJobs, automatic model tuning introduces the option to stop tuning based on a maximum tuning time, Improvement monitoring, and convergence detection.

Let’s explore each of these criteria.

Maximum tuning time

Previously, you had the option to define a maximum number of training jobs as a resource limit setting to control the tuning budget in terms of compute resource. However, this can lead to unnecessary longer or shorter training times than needed or desired.

With the addition of the maximum tuning time criteria, you can now allocate your training budget in terms of amount of time to run the tuning job and automatically terminate the job after a specified amount of time defined in seconds.

"ResourceLimits": {
"MaxParallelTrainingJobs": 10,
"MaxNumberOfTrainingJobs": 100
"MaxRuntimeInSeconds": 3600
}

As seen above, we use the MaxRuntimeInSeconds to define the tuning time in seconds. Setting the tuning time limit helps you limit the duration of the tuning job and also the projected cost of the experiment.

The total cost before any contractual discount can be estimated with the following formula:
EstimatedComputeSeconds= MaxRuntimeInSeconds * MaxParallelTrainingJobs * InstanceCost

The max runtime in seconds could be used to bound cost and runtime. In other words, it’s a budget control completion criteria.

This feature is part of a resource control criteria and doesn’t take into account the convergence of the models. As we see later in this post, this criteria can be used in combination with other stopping criteria to achieve cost control without sacrificing accuracy.

Desired target metric

Another previously introduced criteria is to define the target objective goal upfront. The criteria monitors the performance of the best model based on a specific objective metric and stops tuning when the models reach the defined threshold in relation to a specified objective metric.

With the TargetObjectiveMetricValue criteria, we can instruct SageMaker to stop tuning the model after the objective metric of the best model has reached the specified value:

{
    "TuningJobCompletionCriteria": {
        "TargetObjectiveMetricValue": 0.95
    },
    "HyperParameterTuningJobObjective": {
        "MetricName": "validation:auc", 
         "Type": "Maximize"
        }, 
 }

In this example, we are instructed SageMaker to stop tuning the model when the objective metric of the best model has reached 0.95.

This method is useful when you have a specific target that you want your model to reach, such as a certain level of accuracy, precision, recall, F1-score, AUC, log-loss, and so on.

A typical use case for this criteria would be for a user who is already familiar with the model performance at given thresholds. A user in the exploration phase may first tune the model with a small subset of a larger dataset to identify a satisfactory evaluation metric threshold to target when training with the full dataset.

Improvement monitoring

This criteria monitors the models’ convergence after each iteration and stops the tuning if the models don’t improve after a defined number of training jobs. See the following configuration:

"TuningJobCompletionCriteria": {
    "BestObjectiveNotImproving":{
        "MaxNumberOfTrainingJobsNotImproving":10
        }, 
    }

In this case we set the MaxNumberOfTrainingJobsNotImproving to 10, which means if the objective metric stops improving after 10 training jobs, the tuning will be stopped and the best model and metric reported.

Improvement monitoring should be used to tune a tradeoff between model quality and overall workflow duration in a way that is likely transferable between different optimization problems.

Convergence detection

Convergence detection is a completion criteria that lets automatic model tuning decide when to stop tuning. Generally, automatic model tuning will stop tuning when it estimates that no significant improvement can be achieved. See the following configuration:

"TuningJobCompletionCriteria": {
    "ConvergenceDetected":{
        "CompleteOnConvergence":"Enabled"
    },
}

The criteria is best suited when you initially don’t know what stopping settings to select.

It’s also useful if you don’t know what target objective metric is reasonable for a good prediction given the problem and dataset in hand, and would rather have the tuning job complete when it is no longer improving.

Experiment with a comparison of completion criteria

In this experiment, given a regression task, we run 3 tuning experiments to find the optimal model within a search space of 2 hyperparameters having 200 hyperparameter configurations in total using the direct marketing dataset.

With everything else being equal, the first model was tuned with the BestObjectiveNotImproving completion criteria, the second model was tuned with the CompleteOnConvergence and the third model was tuned with no completion criteria defined.

When describing each job, we can observe that setting the BestObjectiveNotImproving criteria has led to the most optimal resource and time relative to the objective metric with significantly fewer jobs ran.

The CompleteOnConvergence criteria was also able to stop tuning halfway through the experiment resulting in fewer training jobs and shorter training time compared to not setting a criteria.

While not setting a completion criteria resulted in a costly experiment, defining the MaxRuntimeInSeconds as part of the resource limit would be one way of minimizing the cost.

The results above show that when defining a completion criteria, Amazon SageMaker is able to intelligently stop the tuning process when it detects that the model is less likely to improve beyond the current result.

Note that the completion criteria supported in SageMaker automatic model tuning are not mutually exclusive and can be used concurrently when tuning a model.

When more than one completion criteria is defined, the tuning job completes when any of the criteria is met.

For example, a combination of a resource limit criteria like maximum tuning time with a convergence criteria, such as improvement monitoring or convergence detection, may produce an optimal cost control and an optimal objective metrics.

Conclusion

In this post, we discussed how you can now intelligently stop your tuning job by selecting a set of completion criteria newly introduced in SageMaker, such as maximum tuning time, improvement monitoring, or convergence detection.

We demonstrated with an experiment that intelligent stopping based on improvement observation across iteration may lead to a significantly optimized budget and time management compared to not defining a completion criteria.

We also showed that these criteria are not mutually exclusive and can be used concurrently when tuning a model, to take advantage of both, budget control and optimal convergence.

For more details on how to configure and run automatic model tuning, refer to Specify the Hyperparameter Tuning Job Settings.

About the Authors

Doug Mbaya is a Senior Partner Solution architect with a focus in data and analytics. Doug works closely with AWS partners, helping them integrate data and analytics solutions in the cloud.

Chaitra Mathur is a Principal Solutions Architect at AWS. She guides customers and partners in building highly scalable, reliable, secure, and cost-effective solutions on AWS. She is passionate about Machine Learning and helps customers translate their ML needs into solutions using AWS AI/ML services. She holds 5 certifications including the ML Specialty certification. In her spare time, she enjoys reading, yoga, and spending time with her daughters.

Iaroslav Shcherbatyi is a Machine Learning Engineer at AWS. He works mainly on improvements to the Amazon SageMaker platform and helping customers best use its features. In his spare time, he likes to go to gym, do outdoor sports such as ice skating or hiking, and to catch up on new AI research.

On a mission to demystify artificial intelligence

Parmida Beigi, an Amazon senior research scientist, shares a lifetime worth of experience, and uses her skills to help others grow into machine learning career paths.Read More

Vietnam’s VinBrain Deploys Healthcare AI Models to 100+ Hospitals

Doctors rarely make diagnoses based on a single factor — they look at a mix of data types, such as a patient’s symptoms, laboratory and radiology reports, and medical history.

VinBrain, a Vietnam-based health-tech startup, is ensuring that AI diagnostics can take a similarly holistic view across vital signs, blood tests, medical images and more.

“Multimodal data is key to delivering precision care that can improve patient outcomes,” said Steven Truong, CEO of VinBrain. “Our medical imaging models, for instance, can analyze chest X-rays and make automated observations about abnormal findings in a patient’s heart, lungs and bones.”

If a medical-imaging AI model reports that a patient’s scan shows lung consolidation, Truong explained, doctors could combine the X-ray analysis with a large language model that reads health records to learn the patient has a fever — helping clinicians more quickly determine a more specific diagnosis of pneumonia.

Funded by Vingroup — one of Vietnam’s largest public companies — VinBrain is the creator of DrAid, which is the only AI software for automated X-ray diagnostics in Southeast Asia, and among the first AI platforms to be cleared by the FDA to detect features suggestive of collapsed lungs from chest X-rays.

Trained on a dataset of more than 2.5 million images, DrAid is deployed in more than 100 hospitals in Vietnam, Myanmar, New Zealand and the U.S. The software applies AI analysis to medical images for more than 120,000 patients each month. VinBrain is also building a host of other AI applications, including a telehealth product that analyzes lab test results, medical reports and other electronic health records.

The company is part of NVIDIA Inception, a global program designed to offer cutting-edge startups expertise, technology and go-to-market support. The VinBrain team has also collaborated with Microsoft and with academic researchers at Stanford University, Harvard University, the University of Toronto and the University of California, San Diego to develop its core AI technology and submit research publications to top conferences.

Many Models, Easy Deployment

The VinBrain team has developed more than 300 AI models that process speech, text, video and images — including X-ray, CT and MRI data.

“Healthcare is complex, so the pipeline requires hundreds of models for each step, such as preprocessing, segmentation, object detection and post-processing,” Truong said. “We aim to package these models together so everything runs on GPU servers at the hospital — like a refrigerator or household appliance.”

VinBrain recently launched DrAid Appliance, an on-premises, NVIDIA GPU-powered device for automatic screening of medical imaging studies that could improve doctors’ productivity by up to 80%, the team estimates.

The company also offers a hybrid solution, where images are preprocessed at the edge with DrAid Appliance, then sent to NVIDIA GPUs in the cloud for more demanding computational workloads.

Another way to access VinBrain’s DrAid software is through Ferrum Health, an NVIDIA Inception company that has developed a secure platform to help healthcare organizations deploy AI applications across therapeutic areas.

Accelerating AI Training and Inference

VinBrain trains its AI models — which include medical imaging, intelligent video analytics, automatic speech recognition, natural language processing and text-to-speech — using NVIDIA DGX SuperPOD. Adopting DGX SuperPOD enabled Vinbrain to achieve near-linear-level speedups for model training, achieving 100x faster training compared with CPU-only training and significantly shortening the turnaround time for model development.

The team is using software from NVIDIA AI Enterprise, an end-to-end solution for production AI, which includes the NVIDIA Clara platform, the MONAI open-source framework for medical imaging development and the NVIDIA NeMo conversational AI toolkit for its transcription model.

“To develop good AI models, you can’t just train once and be done,” said Truong. “It’s an evolving process to refine the neural networks.”

VinBrain has set up an early validation pipeline for its AI projects: The company tests its early-stage models across a couple dozen hospitals in Vietnam to collect performance data, gather feedback and fine-tune its neural networks.

In addition to using NVIDIA DGX SuperPOD for AI training, the company has adopted NVIDIA GPUs to improve run-time efficiency and deployment. It uses the NVIDIA Triton inference server and NVIDIA TensorRT to streamline inference for more than hundreds of AI models on cloud-based NVIDIA Tensor Core GPUs.

“We shifted to NVIDIA GPUs for inference because of the higher throughput, faster response time and, most importantly, the cost ratio,” Truong said.

After switching from CPUs to NVIDIA Tensor Core GPUs, the team was able to accelerate inference for medical imaging AI by more than 3x, and video streaming by more than 30x.

“In the coming years, we want to become the top company solving the problem of multimodality in healthcare data,” said Truong. “Using AI and edge computing, we aim to improve the quality and accessibility of healthcare, making intelligent insights accessible to patients and doctors across countries.”

Designing Data: Proactive Data Collection and Iteration for Machine Learning

Lack of diversity in data collection has caused significant failures in machine learning (ML) applications. While ML developers perform post-collection interventions, these are time intensive and rarely comprehensive. Thus, new methods to track and manage data collection, iteration, and model training are necessary for evaluating whether datasets reflect real world variability. We present designing data, an iterative, bias mitigating approach to data collection connecting HCI concepts with ML techniques. Our process includes (1) Pre-Collection Planning, to reflexively prompt and document…Apple Machine Learning Research

Create powerful self-service experiences with Amazon Lex on Talkdesk CX Cloud contact center

This blog post is co-written with Bruno Mateus, Jonathan Diedrich and Crispim Tribuna at Talkdesk.

Contact centers are using artificial intelligence (AI) and natural language processing (NLP) technologies to build a personalized customer experience and deliver effective self-service support through conversational bots.

This is the first of a two-part series dedicated to the integration of Amazon Lex with the Talkdesk CX Cloud contact center. In this post, we describe a solution architecture that combines the powerful resources of Amazon Lex and Talkdesk CX Cloud for the voice channel. In the second part of this series, we describe how to use the Amazon Lex chatbot UI with Talkdesk CX Cloud to allow customers to transition from a chatbot conversation to a live agent within the same chat window.

The benefits of Amazon Lex and Talkdesk CX Cloud are exemplified by WaFd Bank, a full-service commercial US bank in 200 locations and managing $20 billion in assets. The bank has invested in a digital transformation of its contact center to provide exceptional service to its clients. WaFd has pioneered an omnichannel banking experience that combines the advanced conversational AI capabilities of Amazon Lex voice and chat bots with Talkdesk Financial Services Experience Cloud for Banking.

“We wanted to combine the power of Amazon Lex’s conversational AI capabilities with the Talkdesk modern, unified contact center solution. This gives us the best of both worlds, enabling WaFd to serve its clients in the best way possible.”

-Dustin Hubbard, Chief Technology Officer at WaFd Bank.

To support WaFd’s vision, Talkdesk has extended its self-service virtual agent voice and chat capabilities with an integration with Amazon Lex and Amazon Polly. Additionally, the combination of Talkdesk Identity voice authentication with an Amazon Lex voicebot allows WaFd clients to resolve common banking transactions on their own. Tasks like account balance lookups are completed in seconds, a 90% reduction in time compared to WaFd’s legacy system. The newly designed Amazon Lex website chatbot has led to a substantial decrease in voicemail volume as its chatbot UI seamlessly integrates with Talkdesk systems.

In the following sections, we provide an overview of the components that have this integration possible. We then present the solution architecture, highlight its main components, and describe the customer journey from interacting with Amazon Lex to escalation to an agent. We end by explaining how contact centers can keep AI models up to date using Talkdesk AI Trainer.

Solution overview

The solution consists of the following key components:

Amazon Lex – Amazon Lex combines with Amazon Polly to automate customer service interactions by adding conversational AI capabilities to your contact center. Amazon Lex delivers fast responses to customers’ most common questions and seamlessly hands over complex cases to a human agent. Augmenting your contact center operations with Amazon Lex bots provides an enhanced customer experience and helps you build an omnichannel experience, allowing customers to engage across phone lines, websites, and messaging platforms.
Talkdesk CX Cloud contact center – Talkdesk, Inc. is a global cloud contact center leader for customer-obsessed companies. Talkdesk CX Cloud offers enterprise scale with consumer simplicity to deliver speed, agility, reliability, and security. As an AWS Partner, Talkdesk is using AI capabilities like Amazon Transcribe, a speech-to-text service, with the Talkdesk Agent Assist and Talkdesk Customer Experience Analytics products across a number of languages and accents. Talkdesk has extended its self-service virtual agent voice and chat capabilities with an integration with Amazon Lex and Amazon Polly. These virtual agents can automate routine tasks as well as seamlessly elevate complex interactions to a live agent.
Authentication and voice biometrics with Talkdesk Identity – Talkdesk Identity provides fraud protection through self-service authentication using voice biometrics. Voice biometrics solutions provide contact centers with improved levels of security while streamlining the authentication process for the customer. This secure and efficient authentication experience allows contact centers to handle a wide range of self-service functionalities. For example, customers can check their balance, schedule a funds transfer, or activate/deactivate a card using a banking bot.

The following diagram illustrates our solution architecture.

The voice authentication call flow implemented in Talkdesk interacts with Amazon Lex as follows:

When a phone call is initiated, a customer lookup is performed using the incoming caller’s phone number. If multiple customers are retrieved, further information, like date of birth, is requested in order to narrow down the list to a unique customer record.
If the caller is identified and has previously enrolled in voice biometrics, the caller will be prompted to say their voice pass code. If successful, the caller is offered an authenticated Amazon Lex experience.
If a caller is identified and not enrolled in voice biometrics, they can work with an agent to verify their identity and record their voice print as the password. For more information, visit the Talkdesk Voice Biometric documentation.
If the caller is not identified or not enrolled in voice biometrics, the caller can interact with Amazon Lex to perform tasks that don’t require authentication, or they can request a transfer to an agent.

How Talkdesk integrates with Amazon Lex

When the call reaches Talkdesk Virtual Agent, Talkdesk uses the continuous streaming capability of the Amazon Lex API to enable conversation with the Amazon Lex bot. Talkdesk Virtual Agent has an Amazon Lex adapter that initiates an HTTP/2 bidirectional event stream through the StartConversation API operation. Talkdesk Virtual Agent and the Amazon Lex bot start exchanging information in real time following the sequence of events for an audio conversation. For more information, refer to Starting a stream to a bot.

All the context data from Talkdesk Studiois sent to Amazon Lex through session attributes established on the initial ConfigurationEvent. The Amazon Lex voicebot has been equipped with a welcome intent, which is invoked by Talkdesk to initiate the conversation and play a welcome message. In Amazon Lex, a session attribute is set to ensure the welcome intent and its message are used only once in any conversation. The greeting message can be customized to include the name of the authenticated caller, if provided from the Talkdesk system in session attributes.

The following diagram shows the basic components and events used to enable communications.

Agent escalation from Amazon Lex

If a customer requests agent assistance, all necessary information to ensure the customer is routed to the correct agent is made available by Amazon Lex to Talkdesk Studio through session attributes.

Examples of session attributes include:

A flag to indicate the customer requests agent assistance
The reason for the escalation, used by Talkdesk to route the call appropriately
Additional data regarding the call to provide the agent with contextual information about the customer and their earlier interaction with the bot
The sentiment of the interaction

Training

Talkdesk AI Trainer is a human-in-the-loop tool that is included in the operational flow of Talkdesk CX Cloud. It performs the continuous training and improvement of AI models by real agents without the need for specialized data science teams.

Talkdesk developed a connector that allows AI Trainer to automatically collect intent data from Amazon Lex intent models. Non-technical users can easily fine-tune these models to support Talkdesk AI products such as Talkdesk Virtual Agent. The connector was built by using the Amazon Lex Model Building API with the AWS SDK for Java 2.x.

It is possible to train intent data from Amazon Lex using real-world conversations between customers and (virtual) agents by:

Requesting feedback of intent classifications with a low confidence level
Adding new training phrases to intents
Adding synonyms or regular expressions to slot types

AI Trainer receives data from Amazon Lex, namely intents and slot types. This data is then displayed and managed on Talkdesk AI Trainer, along with all the events that are part of the conversational orchestration taking place in Talkdesk Virtual Agent. Through the AI Trainer quality system or agreement, supervisors or administrators decide which improvements will be introduced in the Amazon Lex model and reflected in Talkdesk Virtual Agent.

Adjustments to production can be easily published on AI Trainer and sent to Amazon Lex. Continuously training AI models ensures that AI products reflect the evolution of the business and the latest needs of customers. This in turn helps increase the automation rate via self-servicing and resolve cases faster, resulting in a higher customer satisfaction.

Conclusion

In this post, we presented how the power of Amazon Lex conversational AI capabilities can be combined with the Talkdesk modern, unified contact center solution through the Amazon Lex API. We explained how Talkdesk voice biometrics offers the caller a self-service authenticated experience and how Amazon Lex provides contextual information to the agent to assist the caller more efficiently.

We are excited about the new possibilities that the integration of Amazon Lex and Talkdesk CX Cloud solutions offers to our clients. We at AWS Professional Services and Talkdesk are available to help you and your team implement your vision of an omnichannel experience.

The next post in this series will provide guidance on how to integrate an Amazon Lex chatbot to Talkdesk Studio, and how to enable customers to interact with a live agent from the chatbot.

About the authors

Grazia Russo Lassner is a Senior Consultant with the AWS Professional Services Natural Language AI team. She specializes in designing and developing conversational AI solutions using AWS technologies for customers in various industries. Outside of work, she enjoys beach weekends, reading the latest fiction books, and family.

Cecil Patterson is a Natural Language AI consultant with AWS Professional Services based in North Texas. He has many years of experience working with large enterprises to enable and support global infrastructure solutions. Cecil uses his experience and diverse skill set to build exceptional conversational solutions for customers of all types.

Bruno Mateus is a Principal Engineer at Talkdesk. With over 20 years of experience in the software industry, he specializes in large-scale distributed systems. When not working, he enjoys spending time outside with his family, trekking, mountain bike riding, and motorcycle riding.

Jonathan Diedrich is a Principal Solutions Consultant at Talkdesk. He works on enterprise and strategic projects to ensure technical execution and adoption. Outside of work, he enjoys ice hockey and games with his family.

Crispim Tribuna is a Senior Software Engineer at Talkdesk currently focusing on the AI-based virtual agent project. He has over 17 years of experience in computer science, with a focus on telecommunications, IPTV, and fraud prevention. In his free time, he enjoys spending time with his family, running (he has completed three marathons), and riding motorcycles.

An important next step on our AI journey

Introducing Bard, Google’s experimental conversational AI service powered by LaMDA — plus, new AI features in Search coming soon.Read More

Image classification model selection using Amazon SageMaker JumpStart

Researchers continue to develop new model architectures for common machine learning (ML) tasks. One such task is image classification, where images are accepted as input and the model attempts to classify the image as a whole with object label outputs. With many models available today that perform this image classification task, an ML practitioner may ask questions like: “What model should I fine-tune and then deploy to achieve the best performance on my dataset?” And an ML researcher may ask questions like: “How can I generate my own fair comparison of multiple model architectures against a specified dataset while controlling training hyperparameters and computer specifications, such as GPUs, CPUs, and RAM?” The former question addresses model selection across model architectures, while the latter question concerns benchmarking trained models against a test dataset.

In this post, you will see how the TensorFlow image classification algorithm of Amazon SageMaker JumpStart can simplify the implementations required to address these questions. Together with the implementation details in a corresponding example Jupyter notebook, you will have tools available to perform model selection by exploring pareto frontiers, where improving one performance metric, such as accuracy, is not possible without worsening another metric, such as throughput.

Solution overview

The following figure illustrates the model selection trade-off for a large number of image classification models fine-tuned on the Caltech-256 dataset, which is a challenging set of 30,607 real-world images spanning 256 object categories. Each point represents a single model, point sizes are scaled with respect to the number of parameters comprising the model, and the points are color-coded based on their model architecture. For example, the light green points represent the EfficientNet architecture; each light green point is a different configuration of this architecture with unique fine-tuned model performance measurements. The figure shows the existence of a pareto frontier for model selection, where higher accuracy is exchanged for lower throughput. Ultimately, the selection of a model along the pareto frontier, or the set of pareto efficient solutions, depends on your model deployment performance requirements.

If you observe test accuracy and test throughput frontiers of interest, the set of pareto efficient solutions on the preceding figure are extracted in the following table. Rows are sorted such that test throughput is increasing and test accuracy is decreasing.

Model Name	Number of Parameters	Test Accuracy	Test Top 5 Accuracy	Throughput (images/s)	Duration per Epoch(s)
swin-large-patch4-window12-384	195.6M	96.4%	99.5%	0.3	2278.6
swin-large-patch4-window7-224	195.4M	96.1%	99.5%	1.1	698.0
efficientnet-v2-imagenet21k-ft1k-l	118.1M	95.1%	99.2%	4.5	1434.7
efficientnet-v2-imagenet21k-ft1k-m	53.5M	94.8%	99.1%	8.0	769.1
efficientnet-v2-imagenet21k-m	53.5M	93.1%	98.5%	8.0	765.1
efficientnet-b5	29.0M	90.8%	98.1%	9.1	668.6
efficientnet-v2-imagenet21k-ft1k-b1	7.3M	89.7%	97.3%	14.6	54.3
efficientnet-v2-imagenet21k-ft1k-b0	6.2M	89.0%	97.0%	20.5	38.3
efficientnet-v2-imagenet21k-b0	6.2M	87.0%	95.6%	21.5	38.2
mobilenet-v3-large-100-224	4.6M	84.9%	95.4%	27.4	28.8
mobilenet-v3-large-075-224	3.1M	83.3%	95.2%	30.3	26.6
mobilenet-v2-100-192	2.6M	80.8%	93.5%	33.5	23.9
mobilenet-v2-100-160	2.6M	80.2%	93.2%	40.0	19.6
mobilenet-v2-075-160	1.7M	78.2%	92.8%	41.8	19.3
mobilenet-v2-075-128	1.7M	76.1%	91.1%	44.3	18.3
mobilenet-v1-075-160	2.0M	75.7%	91.0%	44.5	18.2
mobilenet-v1-100-128	3.5M	75.1%	90.7%	47.4	17.4
mobilenet-v1-075-128	2.0M	73.2%	90.0%	48.9	16.8
mobilenet-v2-075-96	1.7M	71.9%	88.5%	49.4	16.6
mobilenet-v2-035-96	0.7M	63.7%	83.1%	50.4	16.3
mobilenet-v1-025-128	0.3M	59.0%	80.7%	50.8	16.2

This post provides details on how to implement large-scale Amazon SageMaker benchmarking and model selection tasks. First, we introduce JumpStart and the built-in TensorFlow image classification algorithms. We then discuss high-level implementation considerations, such as JumpStart hyperparameter configurations, metric extraction from Amazon CloudWatch Logs, and launching asynchronous hyperparameter tuning jobs. Finally, we cover the implementation environment and parameterization leading to the pareto efficient solutions in the preceding table and figure.

Introduction to JumpStart TensorFlow image classification

JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, making it easier to develop high-quality models and reducing time to deployment. The JumpStart APIs allow you to programmatically deploy and fine-tune a vast selection of pre-trained models on your own datasets.

The JumpStart model hub provides access to a large number of TensorFlow image classification models that enable transfer learning and fine-tuning on custom datasets. As of this writing, the JumpStart model hub contains 135 TensorFlow image classification models across a variety of popular model architectures from TensorFlow Hub, to include residual networks (ResNet), MobileNet, EfficientNet, Inception, Neural Architecture Search Networks (NASNet), Big Transfer (BiT), shifted window (Swin) transformers, Class-Attention in Image Transformers (CaiT), and Data-Efficient Image Transformers (DeiT).

Vastly different internal structures comprise each model architecture. For instance, ResNet models utilize skip connections to allow for substantially deeper networks, whereas transformer-based models use self-attention mechanisms that eliminate the intrinsic locality of convolution operations in favor of more global receptive fields. In addition to the diverse feature sets these different structures provide, each model architecture has several configurations that adjust the model size, shape, and complexity within that architecture. This results in hundreds of unique image classification models available on the JumpStart model hub. Combined with built-in transfer learning and inference scripts that encompass many SageMaker features, the JumpStart API is a great launching point for ML practitioners to get started training and deploying models quickly.

Refer to Transfer learning for TensorFlow image classification models in Amazon SageMaker and the following example notebook to learn about SageMaker TensorFlow image classification in more depth, including how to run inference on a pre-trained model as well as fine-tune the pre-trained model on a custom dataset.

Large-scale model selection considerations

Model selection is the process of selecting the best model from a set of candidate models. This process may be applied across models of the same type with different parameter weights and across models of different types. Examples of model selection across models of the same type include fitting the same model with different hyperparameters (for example, learning rate) and early stopping to prevent the overfitting of model weights to the train dataset. Model selection across models of different types includes selecting the best model architecture (for example, Swin vs. MobileNet) and selecting the best model configurations within a single model architecture (for example, mobilenet-v1-025-128 vs. mobilenet-v3-large-100-224).

The considerations outlined in this section enable all of these model selection processes on a validation dataset.

Select hyperparameter configurations

TensorFlow image classification in JumpStart has a large number of available hyperparameters that can adjust the transfer learning script behaviors uniformly for all model architectures. These hyperparameters relate to data augmentation and preprocessing, optimizer specification, overfitting controls, and trainable layer indicators. You are encouraged to adjust the default values of these hyperparameters as necessary for your application:

model_id: str
model_version: str = "*"

hyperparameters = sagemaker.hyperparameters.retrieve_default(
    model_id=model_id, model_version=model_version
)

For this analysis and the associated notebook, all hyperparameters are set to default values except for learning rate, number of epochs, and early stopping specification. Learning rate is adjusted as a categorical parameter by the SageMaker automatic model tuning job. Because each model has unique default hyperparameter values, the discrete list of possible learning rates includes the default learning rate as well as one-fifth the default learning rate. This launches two training jobs for a single hyperparameter tuning job, and the training job with the best reported performance on the validation dataset is selected. Because the number of epochs is set to 10, which is greater than the default hyperparameter setting, the selected best training job doesn’t always correspond to the default learning rate. Finally, an early stopping criterion is utilized with a patience, or the number of epochs to continue training with no improvement, of three epochs.

One default hyperparameter setting of particular importance is train_only_on_top_layer, where, if set to True, the model’s feature extraction layers are not fine-tuned on the provided training dataset. The optimizer will only train parameters in the top fully connected classification layer with output dimensionality equal to the number of class labels in the dataset. By default, this hyperparameter is set to True, which is a setting targeted for transfer learning on small datasets. You may have a custom dataset where the feature extraction from the pre-training on the ImageNet dataset is not sufficient. In these cases, you should set train_only_on_top_layer to False. Although this setting will increase training time, you will extract more meaningful features for your problem of interest, thereby increasing accuracy.

Extract metrics from CloudWatch Logs

The JumpStart TensorFlow image classification algorithm reliably logs a variety of metrics during training that are accessible to SageMaker Estimator and HyperparameterTuner objects. The constructor of a SageMaker Estimator has a metric_definitions keyword argument, which can be used to evaluate the training job by providing a list of dictionaries with two keys: Name for the name of the metric, and Regex for the regular expression used to extract the metric from the logs. The accompanying notebook shows the implementation details. The following table lists the available metrics and associated regular expressions for all JumpStart TensorFlow image classification models.

Metric Name	Regular Expression
number of parameters	“- Number of parameters: ([0-9\.]+)”
number of trainable parameters	“- Number of trainable parameters: ([0-9\.]+)”
number of non-trainable parameters	“- Number of non-trainable parameters: ([0-9\.]+)”
train dataset metric	f”- {metric}: ([0-9\.]+)”
validation dataset metric	f”- val_{metric}: ([0-9\.]+)”
test dataset metric	f”- Test {metric}: ([0-9\.]+)”
train duration	“- Total training duration: ([0-9\.]+)”
train duration per epoch	“- Average training duration per epoch: ([0-9\.]+)”
test evaluation latency	“- Test evaluation latency: ([0-9\.]+)”
test latency per sample	“- Average test latency per sample: ([0-9\.]+)”
test throughput	“- Average test throughput: ([0-9\.]+)”

The built-in transfer learning script provides a variety of train, validation, and test dataset metrics within these definitions, as represented by the f-string replacement values. The exact metrics available vary based on the type of classification being performed. All compiled models have a loss metric, which is represented by a cross-entropy loss for either a binary or categorical classification problem. The former is used when there is one class label; the latter is used if there are two or more class labels. If there is only a single class label, then the following metrics are computed, logged, and extractable via the f-string regular expressions in the preceding table: number of true positives (true_pos), number of false positives (false_pos), number of true negatives (true_neg), number of false negatives (false_neg), precision, recall, area under the receiver operating characteristic (ROC) curve (auc), and area under the precision-recall (PR) curve (prc). Similarly, if there are six or more class labels, a top-5 accuracy metric (top_5_accuracy) is also be computed, logged, and extractable via the preceding regular expressions.

During training, metrics specified to a SageMaker Estimator are emitted to CloudWatch Logs. When the training is complete, you can invoke the SageMaker DescribeTrainingJob API and inspect the FinalMetricDataList key in the JSON response:

tuner: sagemaker.tuner.HyperparameterTuner
session: sagemaker.Session

training_job_name = tuner.best_training_job()
description = session.describe_training_job(training_job_name)
metrics = description["FinalMetricDataList"]

This API requires only the job name to be provided to the query, so, once completed, metrics can be obtained in future analyses so long as the training job name is appropriately logged and recoverable. For this model selection task, hyperparameter tuning job names are stored and subsequent analyses reattach a HyperparameterTuner object given the tuning job name, extract the best training job name from the attached hyperparameter tuner, and then invoke the DescribeTrainingJob API as described earlier to obtain metrics associated with the best training job.

Launch asynchronous hyperparameter tuning jobs

Refer to the corresponding notebook for implementation details on asynchronously launching hyperparameter tuning jobs, which uses the Python standard library’s concurrent futures module, a high-level interface for asynchronously running callables. Several SageMaker-related considerations are implemented in this solution:

Each AWS account is affiliated with SageMaker service quotas. You should view your current limits to fully utilize your resources and potentially request resource limit increases as needed.
Frequent API calls to create many simultaneous hyperparameter tuning jobs may exceed the Python SDK rate and throw throttling exceptions. A resolution to this is to create a SageMaker Boto3 client with a custom retry configuration.
What happens if your script encounters an error or the script is stopped before completion? For such a large model selection or benchmarking study, you can log tuning job names and provide convenience functions to reattach hyperparameter tuning jobs that already exist:

tuning_job_name: str
session: sagemaker.Session

tuner = sagemaker.tuner.HyperparameterTuner.attach(tuning_job_name, session)

Analysis details and discussion

The analysis in this post performs transfer learning for model IDs in the JumpStart TensorFlow image classification algorithm on the Caltech-256 dataset. All training jobs were performed on the SageMaker training instance ml.g4dn.xlarge, which contains a single NVIDIA T4 GPU.

The test dataset is evaluated on the training instance at the end of training. Model selection is performed prior to the test dataset evaluation to set model weights to the epoch with the best validation set performance. Test throughput is not optimized: the dataset batch size is set to the default training hyperparameter batch size, which isn’t adjusted to maximize GPU memory usage; reported test throughput includes data loading time because the dataset isn’t pre-cached; and distributed inference across multiple GPUs isn’t utilized. For these reasons, this throughput is a good relative measurement, but actual throughput would depend heavily on your inference endpoint deployment configurations for the trained model.

Although the JumpStart model hub contains many image classification architecture types, this pareto frontier is dominated by select Swin, EfficientNet, and MobileNet models. Swin models are larger and relatively more accurate, whereas MobileNet models are smaller, relatively less accurate, and suitable for resource constraints of mobile devices. It’s important to note that this frontier is conditioned on a variety of factors, including the exact dataset used and the fine-tuning hyperparameters selected. You may find that your custom dataset produces a different set of pareto efficient solutions, and you may desire longer training times with different hyperparameters, such as more data augmentation or fine-tuning more than just the top classification layer of the model.

Conclusion

In this post, we showed how to run large-scale model selection or benchmarking tasks using the JumpStart model hub. This solution can help you choose the best model for your needs. We encourage you to try out and explore this solution on your own dataset.

References

More information is available at the following resources:

About the authors

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker built-in algorithms team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana Champaign. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

AI Joins Hunt for ET: Study Finds 8 Potential Alien Signals

Artificial intelligence is now a part of the quest to find extraterrestrial life.

Researchers have developed an AI system that outperforms traditional methods in the search for alien signals. And early results were intriguing enough to send scientists back to their radio telescopes for a second look.

The study, published last week in Nature Astronomy, highlights the crucial role that AI techniques will play in the ongoing search for extraterrestrial intelligence.

The team behind the paper trained an AI to recognize signals that natural astrophysical processes couldn’t produce. They then fed it a massive dataset of over 150 terabytes of data collected by the Green Bank Telescope, one of the world’s largest radio telescopes, located in West Virginia.

The AI flagged more than 20,000 signals of interest, with eight showing the tell-tale characteristics of what scientists call “technosignatures,” such as a radio signal that could tip scientists off to the existence of another civilization.

In the face of a growing deluge of data from radio telescopes, it’s critical to have a fast and effective means of sorting through it all.

That’s where the AI system shines.

The system was created by Peter Ma, an undergraduate student at the University of Toronto and the lead author of the paper co-authored by a constellation of experts affiliated with the University of Toronto, UC Berkeley and Breakthrough Listen, an international effort launched in 2015 to search for signs of alien civilizations.

Ma, who taught himself how to code, first became interested in computer science in high school. He started working on a project where he aimed to use open-source data and tackle big data problems with unanswered questions, particularly in the area of machine learning.

“I wanted a big science problem with open source data and big, unanswered questions,” Ma says. “And finding aliens is big.”

Despite initially facing some confusion and disbelief from his teachers, Ma continued to work on his project throughout high school and into his first year of college, where he reached out to others and found support from researchers at the University of Toronto, UC Berkeley and Breakthrough Listen to identify signals from extraterrestrial civilizations.

The paper describes a two-step AI method to classify signals as either radio interference or a potential technosignature.

The first step uses an autoencoder to identify salient features in the data. This system, built using the TensorFlow API, was accelerated by four NVIDIA TITAN X GPUs at UC Berkeley.

The second step feeds those features to a random forest classifier, which decides whether a signal is noteworthy or just interference.

The AI system is particularly adept at identifying narrowband signals with a non-zero drift rate. These signals are much more focused and specific than natural phenomena and suggest that they may be coming from a distant source.

Additionally, the signals only appear in observations of some regions of the sky, further evidence of a celestial origin.

To train the AI system, Ma inserted simulated signals into actual data, allowing the autoencoder to learn what to look for. Then the researchers fed the AI more than 150 terabytes of data from 480 observing hours at the Green Bank Telescope.

The AI identified 20,515 signals of interest, which the researchers had to inspect manually. Of those, eight had the characteristics of technosignatures and couldn’t be attributed to radio interference.

The researchers then returned to the telescope to look at systems from which all eight signals originated but couldn’t re-detect them.

“Eight signals looked very suspicious, but after we took another look at the targets with our telescopes, we didn’t see them again,” Ma says. “It’s been almost five to six years since we took the data, but we still haven’t seen the signal again. Make of that what you will.”

To be sure, because they don’t have real signals from an extraterrestrial civilization, the researchers had to rely on simulated signals to train their models. The researchers note that this could lead to the AI system learning artifacts that aren’t there.

Still, Cherry Ng, one of the paper’s co-authors, points out the team has a good idea of what to look for.

“A classic example of human-generated technology from space that we have detected is the Voyager,” said Ng, who studies fast radio bursts and pulsars, and is currently affiliated with the French National Centre for Scientific Research, known as CNRS.

“Peter’s machine learning algorithm is able to generate these signals that the aliens may or may not have sent,” she said.

And while aliens haven’t been found — yet, the study shows the potential of AI in SETI research and the importance of analyzing vast quantities of data.

“We’re hoping to extend this search capacity and algorithm to other kinds of telescope setups,” Ma said, connecting the efforts to advancements made in a broad array of fields thanks to AI.

There will be plenty of opportunities to see what AI can do.

Despite efforts dating back to the ‘60s, only a tiny fraction of stars in the Milky Way have been monitored, Ng says. However, with advances in technology, astronomers are now able to conduct more observations in parallel and maximize their scientific output.

Even the data that has been collected, such as the Green Bank data, has yet to be fully searched, Ng explains.

And with the next-generation radio telescopes, including MeerKAT, the Very Large Array (VLA), Square Kilometre Array, and the next-generation VLA (ngVLA) gathering vast amounts of data in the search for extraterrestrial intelligence, implementing AI will become increasingly important to overcome the challenges posed by the sheer volume of data.

So will we find anything?

“I’m skeptical about the idea that we are alone in the universe,” Ma said, pointing to breakthroughs over the past decade showing our planet is not as unique as we once thought it was. “Whether we will find anything is up to science and luck to verify, but I believe it is very naive to believe we are alone.”

Image Credit: NASA, JPL-Caltech, Susan Stolovy (SSC/Caltech) et al.

AAAI: Prompt engineering and reasoning in the spotlight

Methods for controlling the outputs of large generative models and integrating symbolic reasoning with machine learning are among the conference’s hot topics.Read More

What is TFX-Addons?

How can you use the TFX-Addons components or examples?

Which additional components are currently available?

Feast Component

Message Exit Handler

Schema Curation Component

Feature Selection Component

XGBoost Evaluator Component

Sampling Component

Pandas Transform Component

Firebase Publisher

HuggingFace Model Pusher

How can you participate?

Already using TFX-Addons?

Thanks to all Contributors

SageMaker automatic model tuning

Benefits of tuning job completion criteria

Maximum tuning time

Desired target metric

Improvement monitoring

Convergence detection

Experiment with a comparison of completion criteria

Conclusion

About the Authors

Many Models, Easy Deployment

Accelerating AI Training and Inference

Solution overview

How Talkdesk integrates with Amazon Lex

Agent escalation from Amazon Lex

Training

Conclusion

About the authors

Solution overview

Introduction to JumpStart TensorFlow image classification

Large-scale model selection considerations

Select hyperparameter configurations

Extract metrics from CloudWatch Logs

Launch asynchronous hyperparameter tuning jobs

Analysis details and discussion

Conclusion

References

About the authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.