Build an MLOps sentiment analysis pipeline using Amazon SageMaker Ground Truth and Databricks MLflow

As more organizations move to machine learning (ML) to drive deeper insights, two key stumbling blocks they run into are labeling and lifecycle management. Labeling is the identification of data and adding labels to provide context so an ML model can learn from it. Labels might indicate a phrase in an audio file, a car in a photograph, or an organ in an MRI. Data labeling is necessary to enable ML models to work against the data. Lifecycle management has to do with the process of setting up an ML experiment and documenting the dataset, library, version, and model used to get results. A team might run hundreds of experiments before settling on one approach. Going back and recreating that approach can be difficult without records of the elements of that experiment.

Many ML examples and tutorials start with a dataset that includes a target value. However, real-world data doesn’t always have such a target value. For example, in sentiment analysis, a person can usually make a judgment on whether a review is positive, negative, or mixed. But reviews are made up of a collection of text with no judgment value attached to it. In order to create a supervised learning model to solve this problem, a high-quality labeled dataset is essential. Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for ML.

For organizations that use Databricks as their data and analytics platform on AWS to perform extract, transform, and load (ETL) tasks, the ultimate goal is often training a supervised learning model. In this post, we show how Databricks integrates with Ground Truth and Amazon SageMaker for data labeling and model distribution.

Solution overview

Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for ML. Through the Ground Truth console, we can create custom or built-in data labeling workflows in minutes. These workflows support a variety of use cases, including 3D point clouds, video, images, and text. In addition, Ground Truth offers automatic data labeling, which uses an ML model to label our data.

We train our model on the publicly available Amazon Customer Reviews dataset. At a high level, the steps are as follows:

  1. Extract a raw dataset to be labeled and move it to Amazon Simple Storage Service (Amazon S3).
  2. Perform labeling by creating a labeling job in SageMaker.
  3. Build and train a simple Scikit-learn linear learner model to classify the sentiment of the review text on the Databricks platform using a sample notebook.
  4. Use MLflow components to create and perform MLOps and save the model artifacts.
  5. Deploy the model as a SageMaker endpoint using the MLflow SageMaker library for real-time inference.

The following diagram illustrates the labeling and ML journey using Ground Truth and MLflow.

Create a labeling job in SageMaker

From the Amazon Customer Reviews dataset, we extract the text portions only, because we’re building a sentiment analysis model. Once extracted, we put the text in an S3 bucket and then create a Ground Truth labeling job via the SageMaker console.

On the Create labeling job page, fill out all required fields. As a part of step on this page, Ground Truth allows you to generate the job manifest file. Ground Truth uses the input manifest file to identify the number of files or objects in the labeling job so that the right number of tasks are created and sent to human (or machine) labelers. The file is automatically saved in the S3 bucket. The next step is to specify the task category and task selection. In this use case, we choose Text as the task category, and Text Classification with a single label for task selection, which means a review text will have a single sentiment: positive, negative, or neutral.

Finally, we write simple but concise instructions for labelers on how to label the text data. The instructions are displayed on the labeling tool and you can optionally review the annotator’s view at this time. Finally, we submit the job and monitor the progress on the console.

While the labeling job is in progress, we can also look at the labeled data on the Output tab. We can monitor each review text and label, and if the job was done by a human or machine. We can select 100% of the labeling jobs to be done by humans or choose machine annotation, which speeds up the job and reduces labor costs.

When the job is complete, the labeling job summary contains links to the output manifest and the labeled dataset. We can also go to Amazon S3 and download both from our S3 bucket folder.

In the next steps, we use a Databricks notebook, MLflow, and datasets labeled by Ground Truth to build a Scikit-learn model.

Download a labeled dataset from Amazon S3

We start by downloading the labeled dataset from Amazon S3. The manifest is saved in JSON format and we load it into a Spark DataFrame in Databricks. For training the sentiment analysis model, we only need the review text and sentiment that was annotated by the Ground Truth labeling job. We use select() to extract those two features. Then we convert the dataset from a PySpark DataFrame to a Pandas DataFrame, because the Scikit-learn algorithm requires Pandas DataFrame format.

Next, we use Scikit-learn CountVectorizer to transform the review text into a bigram vector by setting the ngram_range max value to 2. CountVectorizer converts text into a matrix of token counts. Then we use TfidfTransformer to transform the bigram vector into a term frequency-inverse document frequency (TF-IDF) format.

We compare the accuracy scores for training done with a bigram vector vs. bigram with TF-IDF. TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. Because the review text tends to be relatively short, we can observe how TF-IDF affects the performance of the predictive model.

Set up an MLflow experiment

MLflow was developed by Databricks and is now an open-source project. MLflow manages the ML lifecycle, so you can track, recreate, and publish experiments easily.

To set up MLflow experiments, we use mlflow.sklearn.autolog() to enable auto logging of hyperparameters, metrics, and model artifacts whenever estimator.fit(), estimator.fit_predict(), and estimator.fit_transform() are called. Alternatively, you can do this manually by calling mlflow.log_param() and mlflow.log_metric().

We fit the transformed dataset to a linear classifier with Stochastic Gradient Descent (SGD) learning. With SGD, the gradient of the loss is estimated one sample at a time and the model is updated along the way with a decreasing strength schedule.

Those two datasets we prepared earlier are passed to the train_and_show_scores() function for training. After training, we need to register a model and save its artifacts. We use mlflow.sklearn.log_model() to do so.

Before deploying, we look at the experiment’s results and choose two experiments (one for bigram and the other for bigram with TF-IDF) to compare. In our use case, the second model trained with bigram TF-IDF performed slightly better, so we pick that model to deploy. After the model is registered, we deploy the model, changing the model stage to production. We can accomplish this on the MLflow UI, or in the code using transition_model_version_stage().

Deploy and test the model as a SageMaker endpoint

Before we deploy the trained model, we need to build a Docker container to host the model in SageMaker. We do this by running a simple MLflow command that builds and pushes the container to Amazon Elastic Container Registry (Amazon ECR) in our AWS account.

We can now find the image URI on the Amazon ECR console. We pass the image URI as an image_url parameter, and use DEPLOYMENT_MODE_CREATE for the mode parameter if this is a new deployment. If updating an existing endpoint with a new version, use DEPLOYMENT_MODE_REPLACE.

To test the SageMaker endpoint, we create a function that takes the endpoint name and input data as its parameters.

Conclusion

In this post, we showed you how to use Ground Truth to label a raw dataset, and the use the labeled data to train a simple linear classifier using Scikit-learn. In this example, we use MLflow to track hyperparameters and metrics, register a production-grade model, and deploy the trained model to SageMaker as an endpoint. Along with Databricks to process the data, you can automate this whole use case, so as new data is introduced, it can be labeled and processed into the model. By automating these pipelines and models, data science teams can focus on new use cases and uncover more insights instead of spending their time managing data updates on a day-to-day basis.

To get started, check out Use Amazon SageMaker Ground Truth to Label Data and sign up for a 14-day free trial of Databricks on AWS. To learn more about how Databricks integrates with SageMaker, as well as other AWS services like AWS Glue and Amazon Redshift, visit Databricks on AWS.

Additionally, check out the following resources used in this post:

Use the following notebook to get started.


About the Authors

Rumi Olsen is a Solutions Architect in the AWS Partner Program. She specializes in serverless and machine learning solutions in her current role, and has a background in natural language processing technologies. She spends most of her spare time with her daughter exploring the nature of Pacific Northwest.

Igor Alekseev is a Partner Solution Architect at AWS in Data and Analytics. Igor works with strategic partners helping them build complex, AWS-optimized architectures. Prior joining AWS, as a Data/Solution Architect, he implemented many projects in Big Data, including several data lakes in the Hadoop ecosystem. As a Data Engineer, he was involved in applying AI/ML to fraud detection and office automation. Igor’s projects were in a variety of industries including communications, finance, public safety, manufacturing, and healthcare. Earlier, Igor worked as full stack engineer/tech lead.

Naseer Ahmed is a Sr. Partner Solutions Architect at Databricks supporting its AWS business. Naseer specializes in Data Warehousing, Business Intelligence, App development, Container, Serverless, Machine Learning Architectures on AWS. He was voted 2021 SME of the year at Databricks and is an avid crypto enthusiast.

Read More

Enable Amazon Kendra search for a scanned or image-based text document

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization.

Amazon Kendra supports a variety of document formats, such as Microsoft Word, PDF, and text. While working with a leading Edtech customer, we were asked to build an enterprise search solution that also utilizes images and PPT files. This post focuses on extending the document support in Amazon Kendra so you can preprocess text images and scanned documents (JPEG, PNG, or PDF format)  to make them searchable. The solution combines Amazon Textract for document preprocessing and optical character recognition (OCR), and Amazon Kendra for intelligent search.

With the new Custom Document Enrichment feature in Amazon Kendra, you can now preprocess your documents during ingestion and augment your documents with new metadata. Custom Document Enrichment allows you to call external services like Amazon Comprehend, Amazon Textract, and Amazon Transcribe to extract text from images, transcribe audio, and analyze video. For more information about using Custom Document Enrichment, refer to Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.

In this post, we propose an alternate method of preprocessing the content prior to calling the ingestion process in Amazon Kendra.

Solution overview

Amazon Textract is an ML service that automatically extracts text, handwriting, and data from scanned documents and goes beyond basic OCR to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents like PDFs, images, tables, and forms through basic OCR software that requires manual configuration, which often requires reconfiguration when the form changes.

To overcome these manual and expensive processes, Amazon Textract uses machine learning to read and process a wide range of documents, accurately extracting text, handwriting, tables, and other data without any manual effort. You can quickly automate document processing and take action on the information extracted, whether it’s automating loans processing or extracting information from invoices and receipts.

Amazon Kendra is an easy-to-use enterprise search service that allows you to add search capabilities to your applications so that end-users can easily find information stored in different data sources within your company. This could include invoices, business documents, technical manuals, sales reports, corporate glossaries, internal websites, and more. You can harvest this information from storage solutions like Amazon Simple Storage Service (Amazon S3) and OneDrive; applications such as Salesforce, SharePoint, and ServiceNow; or relational databases like Amazon Relational Database Service (Amazon RDS).

The proposed solution enables you to unlock the search potential in scanned documents, extending the ability of Amazon Kendra to find accurate answers in a wider range of document types. The workflow includes the following steps:

  1. Upload a document (or documents of various types) to Amazon S3.
  2. The event triggers an AWS Lambda function that uses the synchronous Amazon Textract API (DetectDocumentText).
  3. Amazon Textract reads the document in Amazon S3, extracts the text from it, and returns the extracted text to the Lambda function.
  4. The data source on the new text file needs to be reindexed.
  5. When reindexing is complete, you can search the new dataset either via the Amazon Kendra console or API.

The following diagram illustrates the solution architecture.

In the following sections, we demonstrate how to configure the Lambda function, create the event trigger, process a document, and then reindex the data.

Configure the Lambda function

To configure your Lambda function, add the following code to the function Python editor:

import urllib
import boto3

textract = boto3.client('textract')
def handler(event, context):
	source_bucket = event['Records'][0]['s3']['bucket']['name']
	object_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
	
	textract_result = textract.detect_document_text(
		Document={
			'S3Object': {
				'Bucket': source_bucket,
				'Name': object_key
			}
		})
	page=""
	blocks = [x for x in textract_result['Blocks'] if x['BlockType'] == "LINE"]
	for block in blocks:
		page += " " + block['Text']
        	
	print(page)
	s3 = boto3.resource('s3')
	object = s3.Object('demo-kendra-test', 'text/apollo11-summary.txt')
	object.put(Body=page)

We use the DetectDocumentText API to extract the text from an image (JPEG or PNG) retrieved in Amazon S3.

Create an event trigger at Amazon S3

In this step, we create an event trigger to start the Lambda function when a new document is uploaded to a specific bucket. The following screenshot shows our new function on the Amazon S3 console.

You can also verify the event trigger on the Lambda console.

Process a document

To test the process, we upload an image to the S3 folder that we defined for the S3 event trigger. We use the following sample image.

When the Lambda function is complete, we can go to the Amazon CloudWatch console to check the output. The following screenshot shows the extracted text, which confirms that the Lambda function ran successfully.

Reindex the data with Amazon Kendra

We can now reindex our data.

  1. On the Amazon Kendra console, under Data management in the navigation pane, choose Data sources.
  2. Select the data source demo-s3-datasource.
  3. Choose Sync now.

The sync state changes to Synching - crawling.

When the sync is complete, the sync status changes to Succeeded and the sync state changes to Idle.

Now we can go back to the search console and see our faceted search in action.

  1. In the navigation pane, choose Search console.

We added metadata for a few items; two of them are the ML algorithms XGBoost and BlazingText.

  1. Let’s try searching for Sagemaker.

Our search was successful, and we got a list of results. Let’s see what we have for facets.

  1. Expand Filter search results.

We have the category and tags facets that were part of our item metadata.

  1. Choose BlazingText to filter results just for that algorithm.
  2. Now let’s perform the search on newly uploaded image files. The following screenshot shows the search on new preprocessed documents.

Conclusion

This blog will be helpful in improving the effectiveness of search results and search experience. You can use Amazon Textract to extract text from scanned images that are added as metadata and later available as facets to interact with the search results. This is just an illustration of how you can use AWS native services to create a differentiated search experience for your users. This also helps in unlocking the full potential of your knowledge assets.

For a deeper dive into what you can achieve by combining other AWS services with Amazon Kendra, refer to Make your audio and video files searchable using Amazon Transcribe and Amazon KendraBuild an intelligent search solution with automated content enrichment, and other posts on the Amazon Kendra blog.


About of Author

Sanjay Tiwary is a Specialist Solutions Architect AI/ML. He spends his time working with strategic customers to define business requirements, provide L300 sessions around specific use cases, and design ML applications and services that are scalable, reliable, and performant. He has helped launch and scale the AI/ML powered Amazon SageMaker service and has implemented several proofs of concept using Amazon AI services. He has also developed the advanced analytics platform as a part of the digital transformation journey.

Read More

Interpret caller input using grammar slot types in Amazon Lex

Customer service calls require customer agents to have the customer’s account information to process the caller’s request. For example, to provide a status on an insurance claim, the support agent needs policy holder information such as the policy ID and claim number. Such information is often collected in the interactive voice response (IVR) flow at the beginning of a customer support call. IVR systems have typically used grammars based on the Speech Recognition Grammar Specification (SRGS) format to define rules and parse caller information (policy ID, claim number). You can now use the same grammars in Amazon Lex to collect information in a speech conversation. You can also provide semantic interpretation rules using ECMAScript tags within the grammar files. The grammar support in Amazon Lex provides granular control for collecting and postprocessing user input so you can manage an effective dialog.

In this post, we review the grammar support in Amazon Lex and author a sample grammar for use in an Amazon Connect contact flow.

Use grammars to collect information in a conversation

You can author the grammar as a slot type in Amazon Lex. First, you provide a set of rules in the SRGS format to interpret user input. As an optional second step, you can write an ECMA script that transforms the information collected in the dialog. Lastly, you store the grammar as an XML file in an Amazon Simple Storage Service (Amazon S3) bucket and reference the link in your bot definition. SRGS grammars are specifically designed for voice and DTMF modality. We use the following sample conversations to model our bot:

Conversation 1

IVR: Hello! How can I help you today?

User: I want to check my account balance.

IVR: Sure. Which account should I pull up?

User: Checking.

IVR: What is the account number?

User: 1111 2222 3333 4444

IVR: For verification purposes, what is your date of birth?

User: Jan 1st 2000.

IVR: Thank you. The balance on your checking account is $123 dollars.

Conversation 2

IVR: Hello! How can I help you today?

User: I want to check my account balance.

IVR: Sure. Which account should I pull up?

User: Savings.

IVR: What is the account number?

User: I want to talk to an agent.

IVR: Ok. Let me transfer the call. An agent should be able to help you with your request.

In the sample conversations, the IVR requests the account type, account number, and date of birth to process the caller’s requests. In this post, we review how to use the grammars to collect the information and postprocess it with ECMA scripts. The grammars for account ID and date cover multiple ways to provide the information. We also review the grammar in case the caller can’t provide the requested details (for example, their savings account number) and instead opts to speak with an agent.

Build an Amazon Lex chatbot with grammars

We build an Amazon Lex bot with intents to perform common retail banking functions such as checking account balance, transferring funds, and ordering checks. The CheckAccountBalance intent collects details such as account type, account ID, and date of birth, and provides the balance amount. We use a grammar slot type to collect the account ID and date of birth. If the caller doesn’t know the information or asks for an agent, the call is transferred to a human agent. Let’s review the grammar for the account ID:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="captureAccount"><!-- Header definition for US language and the root rule "captureAccount" to start with-->

	<rule id="captureAccount" scope="public">
		<tag> out=""</tag>
		<one-of>
			<item><ruleref uri="#digit"/><tag>out += rules.digit.accountNumber</tag></item><!--Call the subrule to capture 16 digits--> 
			<item><ruleref uri="#agent"/><tag>out =rules.agent;</tag></item><!--Exit point to route the caller to an agent--> 
		</one-of>
	</rule>

	<rule id="digit" scope="public"> <!-- Capture digits from 1 to 9 -->
		<tag>out.accountNumber=""</tag>
		<item repeat="16"><!-- Repeat the rule exactly 16 times -->
			<one-of>
				<item>1<tag>out.accountNumber+=1;</tag></item>
				<item>2<tag>out.accountNumber+=2;</tag></item>
				<item>3<tag>out.accountNumber+=3;</tag></item>
				<item>4<tag>out.accountNumber+=4;</tag></item>
				<item>5<tag>out.accountNumber+=5;</tag></item>
				<item>6<tag>out.accountNumber+=6;</tag></item>
				<item>7<tag>out.accountNumber+=7;</tag></item>
				<item>8<tag>out.accountNumber+=8;</tag></item>
				<item>9<tag>out.accountNumber+=9;</tag></item>
				<item>0<tag>out.accountNumber+=0;</tag></item>
				<item>oh<tag>out.accountNumber+=0</tag></item>
				<item>null<tag>out.accountNumber+=0;</tag></item>
			</one-of>
		</item>
	</rule>
	
	<rule id="agent" scope="public"><!-- Exit point to talk to an agent-->
		<item>
			<item repeat="0-1">i</item>
			<item repeat="0-1">want to</item>
			<one-of>
				<item repeat="0-1">speak</item>
				<item repeat="0-1">talk</item>
			</one-of>
			<one-of>
				<item repeat="0-1">to an</item>
				<item repeat="0-1">with an</item>
			</one-of>
			<one-of>
				<item>agent<tag>out="agent"</tag></item>
				<item>employee<tag>out="agent"</tag></item>
			</one-of>
		</item>
    </rule>
</grammar>

The grammar has two rules to parse user input. The first rule interprets the digits provided by the caller. These digits are appended to the output via an ECMA script tag variable (out). The second rule manages the dialog if the caller wants to talk to an agent. In this case the out tag is populated with the word agent. After the rules are parsed, the out tag carries the account number (out.AccountNumber) or the string agent. The downstream business logic can now use the out tag handle the call.

Deploy the sample Amazon Lex bot

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called BankingBot, and two grammar slot types (accountNumber, dateOfBirth).

  1. Download the Amazon Lex bot.
  2. On the Amazon Lex console, choose Actions, then choose Import.
  3. Choose the file BankingBot.zip that you downloaded, and choose Import. In the IAM Permissions section, for Runtime role, choose Create a new role with basic Amazon Lex permissions.
  4. Choose the bot BankingBot on the Amazon Lex console.
  5. Download the XML files for accountNumber and dateOfBirth. (Note: In some browsers you will have to “Save the link” to download the XML files)
  6. On the Amazon S3 console, upload the XML files.
  7. Navigate to the slot types on the Amazon Lex console, and click on the accountNumber slot type
  8. In the slot type grammar select the S3 bucket with the XML file and provide the object key. Click on Save slot type.
  9. Navigate to the slot types on the Amazon Lex console, and click on the dateOfBirth slot type
  10. In the slot type grammar select the S3 bucket with the XML file and provide the object key. Click on Save slot type.
  11. After the grammars are saved, choose Build.
  12. Download the supporting AWS Lambda and Navigate to the AWS Lambda console.
  13. On the create function page select Author from scratch. As basic information please provide the following: function name BankingBotEnglish, and Runtime Python 3.8.
  14. Click on Create function. In the Code source section, open lambda_funciton.py and delete the existing code. Download the code and open it in a text editor. Copy and paste the code into the empty lambda_funciton.py tab.
  15. Choose deploy.
  16. Navigate to the Amazon Lex Console and select BankingBot. Click on Deployment and then Aliases followed by TestBotAlias
  17. On the Aliases page select languages and navigate to English (US).
  18. For source select BankingBotEnglish, for Lambda version or alias select $LATEST
  19. Navigate to the Amazon Connect console, choose Contact flows.
  20. Download the contact flow to integrate with the Amazon Lex bot.
  21. In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows.
  22. Select the contact flow to load it into the application.
  23. Make sure the right bot is configured in the “Get Customer Input” block. Add a phone number to the contact flow.
  24. Choose a queue in the “Set working queue” block.
  25. Test the IVR flow by calling in to the phone number.
  26. Test the solution.

Test the solution

You can call in to the Amazon Connect phone number and interact with the bot. You can also test the solution directly on the Amazon Lex V2 console using voice and DTMF.

Conclusion

Custom grammar slots provide the ability to collect different types of information in a conversation. You have the flexibility to capture transitions such as handover to an agent. Additionally, you can postprocess the information before running the business logic. You can enable grammar slot types via the Amazon Lex V2 console or AWS SDK. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.

To learn more, refer to Using a custom grammar slot type. You can also view the Amazon Lex documentation for SRGS or ECMAScript for more information.


About the Authors

Kai Loreck is a professional services Amazon Connect consultant. He works on designing and implementing scalable customer experience solutions. In his spare time, he can be found playing sports, snowboarding, or hiking in the mountains.

Harshal Pimpalkhute is a Product Manager on the Amazon Lex team. He spends his time trying to get machines to engage (nicely) with humans.

Read More

Whitepaper: Machine Learning Best Practices in Healthcare and Life Sciences

For customers looking to implement a GxP-compliant environment on AWS for artificial intelligence (AI) and machine learning (ML) systems, we have released a new whitepaper: Machine Learning Best Practices in Healthcare and Life Sciences.

This whitepaper provides an overview of security and good ML compliance practices and guidance on building GxP-regulated AI/ML systems using AWS services. We cover the points raised by the FDA discussion paper and Good Machine Learning Practices (GMLP) while also drawing from AWS resources: the whitepaper GxP Systems on AWS and the Machine Learning Lens from the AWS Well-Architected Framework. The whitepaper was developed based on our experience with and feedback from AWS pharmaceutical and medical device customers, as well as AWS partners, who are currently using AWS services to develop ML models.

Healthcare and life sciences (HCLS) customers are adopting AWS AI and ML services faster than ever before, but they also face the following regulatory challenges during implementation:

  • Building a secure infrastructure that complies with stringent regulatory processes for working on the public cloud and aligning to the FDA framework for AI and ML.
  • Supporting AI/ML-enabled solutions for GxP workloads covering the following:
    • Reproducibility
    • Traceability
    • Data integrity
  • Monitoring ML models with respect to various changes to parameters and data.
  • Handling model uncertainty and confidence calibration.

In our whitepaper, you learn about the following topics:

  • How AWS approaches ML in a regulated environment and provides guidance on Good Machine Learning Practices using AWS services.
  • Our organizational approach to security and compliance that supports GxP requirements as part of the shared responsibility model.
  • How to reproduce the workflow steps, track model and dataset lineage, and establish model governance and traceability.
  • How to monitor and maintain data integrity and quality checks to detect drifts in data and model quality.
  • Security and compliance best practices for managing AI/ML models on AWS.
  • Various AWS services for managing ML models in a regulated environment.

AWS is dedicated to helping you successfully use AWS services in regulated life science environments to accelerate your research, development, and delivery of the next generation of medical, health, and wellness solutions.

Contact us with questions about using AWS services for AI/ML in GxP systems. To learn more about compliance in the cloud, visit AWS Compliance. You can also check out the following resources:


About the Authors

Susant Mallick is an Industry specialist and digital evangelist in AWS’ Global Healthcare and Life-Sciences practice. He has over 20+ years of experience in the Life Science industry working with biopharmaceutical and medical device companies across North America, APAC and EMEA regions. He has built many Digital Health Platform and Patient Engagement solutions using Mobile App, AI/ML, IoT and other technologies for customers in various Therapeutic Areas. He holds a B.Tech degree in Electrical Engineering and MBA in Finance. His thought leadership and industry expertise earned many accolades in Pharma industry forums.

Sai Sharanya Nalla is a Sr. Data Scientist at AWS Professional Services. She works with customers to develop and implement AI/ ML and HPC solutions on AWS. In her spare time, she enjoys listening to podcasts and audiobooks, taking long walks, and engaging in outreach activities.

Read More

Prepare data from Databricks for machine learning using Amazon SageMaker Data Wrangler

Data science and data engineering teams spend a significant portion of their time in the data preparation phase of a machine learning (ML) lifecycle performing data selection, cleaning, and transformation steps. It’s a necessary and important step of any ML workflow in order to generate meaningful insights and predictions, because bad or low-quality data greatly reduces the relevance of the insights derived.

Data engineering teams are traditionally responsible for the ingestion, consolidation, and transformation of raw data for downstream consumption. Data scientists often need to do additional processing on data for domain-specific ML use cases such as natural language and time series. For example, certain ML algorithms may be sensitive to missing values, sparse features, or outliers and require special consideration. Even in cases where the dataset is in a good shape, data scientists may want to transform the feature distributions or create new features in order to maximize the insights obtained from the models. To achieve these objectives, data scientists have to rely on data engineering teams to accommodate requested changes, resulting in dependency and delay in the model development process. Alternatively, data science teams may choose to perform data preparation and feature engineering internally using various programming paradigms. However, it requires an investment of time and effort in installation and configuration of libraries and frameworks, which isn’t ideal because that time can be better spent optimizing model performance.

Amazon SageMaker Data Wrangler simplifies the data preparation and feature engineering process, reducing the time it takes to aggregate and prepare data for ML from weeks to minutes by providing a single visual interface for data scientists to select, clean, and explore their datasets. Data Wrangler offers over 300 built-in data transformations to help normalize, transform, and combine features without writing any code. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, and Snowflake. You can now also use Databricks as a data source in Data Wrangler to easily prepare data for ML.

The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance and performance of data warehouses with the openness, flexibility and machine learning support of data lakes. With Databricks as a data source for Data Wrangler, you can now quickly and easily connect to Databricks, interactively query data stored in Databricks using SQL, and preview data before importing. Additionally, you can join your data in Databricks with data stored in Amazon S3, and data queried through Amazon Athena, Amazon Redshift, and Snowflake to create the right dataset for your ML use case.

In this post, we transform the Lending Club Loan dataset using Amazon SageMaker Data Wrangler for use in ML model training.

Solution overview

The following diagram illustrates our solution architecture.

The Lending Club Loan dataset contains complete loan data for all loans issued through 2007–2011, including the current loan status and latest payment information. It has 39,717 rows, 22 feature columns, and 3 target labels.

To transform our data using Data Wrangler, we complete the following high-level steps:

  1. Download and split the dataset.
  2. Create a Data Wrangler flow.
  3. Import data from Databricks to Data Wrangler.
  4. Import data from Amazon S3 to Data Wrangler.
  5. Join the data.
  6. Apply transformations.
  7. Export the dataset.

Prerequisites

The post assumes you have a running Databricks cluster. If your cluster is running on AWS, verify you have the following configured:

Databricks setup

Follow Secure access to S3 buckets using instance profiles for the required AWS Identity and Access Management (IAM) roles, S3 bucket policy, and Databricks cluster configuration. Ensure the Databricks cluster is configured with the proper Instance Profile, selected under the advanced options, to access to the desired S3 bucket.

After the Databricks cluster is up and running with required access to Amazon S3, you can fetch the JDBC URL from your Databricks cluster to be used by Data Wrangler to connect to it.

Fetch the JDBC URL

To fetch the JDBC URL, complete the following steps:

  1. In Databricks, navigate to the clusters UI.
  2. Choose your cluster.
  3. On the Configuration tab, choose Advanced options.
  4. Under Advanced options, choose the JDBC/ODBC tab.
  5. Copy the JDBC URL.

Make sure to substitute your personal access token in the URL.

Data Wrangler setup

This step assumes you have access to Amazon SageMaker, an instance of Amazon SageMaker Studio, and a Studio user.

To allow access to the Databricks JDBC connection from Data Wrangler, the Studio user requires following permission:

  • secretsmanager:PutResourcePolicy

Follow below steps to update the IAM execution role assigned to the Studio user with above permission, as an IAM administrative user.

  1. On the IAM console, choose Roles in the navigation pane.
  2. Choose the role assigned to your Studio user.
  3. Choose Add permissions.
  4. Choose Create inline policy.
  5. For Service, choose Secrets Manager.
  6. On Actions, choose Access level.
  7. Choose Permissions management.
  8. Choose PutResourcePolicy.
  9. For Resources, choose Specific and select Any in this account.

Download and split the dataset

You can start by downloading the dataset. For demonstration purposes, we split the dataset by copying the feature columns id, emp_title, emp_length, home_owner, and annual_inc to create a second loans_2.csv file. We remove the aforementioned columns from the original loans file except the id column and rename the original file to loans_1.csv. Upload the loans_1.csv file to Databricks to create a table loans_1 and loans_2.csv in an S3 bucket.

Create a Data Wrangler flow

For information on Data Wrangler pre-requisites, see Get Started with Data Wrangler.

Let’s get started by creating a new data flow.

  1. On the Studio console, on the File menu, choose New.
  2. Choose Data Wrangler flow.
  3. Rename the flow as desired.

Alternatively, you can create a new data flow from the Launcher.

  • On the Studio console, choose Amazon SageMaker Studio in the navigation pane.
  • Choose New data flow.

Creating a new flow can take a few minutes to complete. After the flow has been created, you see the Import data page.

Import data from Databricks into Data Wrangler

Next, we set up Databricks (JDBC) as a data source in Data Wrangler. To import data from Databricks, we first need to add Databricks as a data source.

  1. On the Import data tab of your Data Wrangler flow, choose Add data source.
  2. On the drop-down menu, choose Databricks (JDBC).

On the Import data from Databricks page, you enter your cluster details.

  1. For Dataset name, enter a name you want to use in the flow file.
  2. For Driver, choose the driver com.simba.spark.jdbc.Driver.
  3. For JDBC URL, enter the URL of your Databricks cluster obtained earlier.

The URL should resemble the following format jdbc:spark://<serve- hostname>:443/default;transportMode=http;ssl=1;httpPath=<http- path>;AuthMech=3;UID=token;PWD=<personal-access-token>.

  1. In the SQL query editor, specify the following SQL SELECT statement:
    select * from loans_1

If you chose a different table name while uploading data to Databricks, replace loans_1 in the above SQL query accordingly.

In the SQL query section in Data Wrangler, you can query any table connected to the JDBC Databricks database. The pre-selected Enable sampling setting retrieves the first 50,000 rows of your dataset by default. Depending on the size of the dataset, unselecting Enable sampling may result in longer import time.

  1. Choose Run.

Running the query gives a preview of your Databricks dataset directly in Data Wrangler.

  1. Choose Import.

Data Wrangler provides the flexibility to set up multiple concurrent connections to the one Databricks cluster or multiple clusters if required, enabling analysis and preparation on combined datasets.

Import the data from Amazon S3 into Data Wrangler

Next, let’s import the loan_2.csv file from Amazon S3.

  1. On the Import tab, choose Amazon S3 as the data source.
  2. Navigate to the S3 bucket for the loan_2.csv file.

When you select the CSV file, you can preview the data.

  1. In the Details pane, choose Advanced configuration to make sure Enable sampling is selected and COMMA is chosen for Delimiter.
  2. Choose Import.

After the loans_2.csv dataset is successfully imported, the data flow interface displays both the Databricks JDBC and Amazon S3 data sources.

Join the data

Now that we have imported data from Databricks and Amazon S3, let’s join the datasets using a common unique identifier column.

  1. On the Data flow tab, for Data types, choose the plus sign for loans_1.
  2. Choose Join.
  3. Choose the loans_2.csv file as the Right dataset.
  4. Choose Configure to set up the join criteria.
  5. For Name, enter a name for the join.
  6. For Join type, choose Inner for this post.
  7. Choose the id column to join on.
  8. Choose Apply to preview the joined dataset.
  9. Choose Add to add it to the data flow.

Apply transformations

Data Wrangler comes with over 300 built-in transforms, which require no coding. Let’s use built-in transforms to prepare the dataset.

Drop column

First we drop the redundant ID column.

  1. On the joined node, choose the plus sign.
  2. Choose Add transform.
  3. Under Transforms, choose + Add step.
  4. Choose Manage columns.
  5. For Transform, choose Drop column.
  6. For Columns to drop, choose the column id_0.
  7. Choose Preview.
  8. Choose Add.

Format string

Let’s apply string formatting to remove the percentage symbol from the int_rate and revol_util columns.

  1. On the Data tab, under Transforms, choose + Add step.
  2. Choose Format string.
  3. For Transform, choose Strip characters from right.

Data Wrangler allows you to apply your chosen transformation on multiple columns simultaneously.

  1. For Input columns, choose int_rate and revol_util.
  2. For Characters to remove, enter %.
  3. Choose Preview.
  4. Choose Add.

Featurize text

Let’s now vectorize verification_status, a text feature column. We convert the text column into term frequency–inverse document frequency (TF-IDF) vectors by applying the count vectorizer and a standard tokenizer as described below. Data Wrangler also provides the option to bring your own tokenizer, if desired.

  1. Under Transformers, choose + Add step.
  2. Choose Featurize text.
  3. For Transform, choose Vectorize.
  4. For Input columns, choose verification_status.
  5. Choose Preview.
  6. Choose Add.

Export the dataset

After we apply multiple transformations on different columns types, including text, categorical, and numeric, we’re ready to use the transformed dataset for ML model training. The last step is to export the transformed dataset to Amazon S3. In Data Wrangler, you have multiple options to choose from for downstream consumption of the transformations:

In this post, we take advantage of the Export data option in the Transform view to export the transformed dataset directly to Amazon S3.

  1. Choose Export data.
  2. For S3 location, choose Browse and choose your S3 bucket.
  3. Choose Export data.

Clean up

If your work with Data Wrangler is complete, shut down your Data Wrangler instance to avoid incurring additional fees.

Conclusion

In this post, we covered how you can quickly and easily set up and connect Databricks as a data source in Data Wrangler, interactively query data stored in Databricks using SQL, and preview data before importing. Additionally, we looked at how you can join your data in Databricks with data stored in Amazon S3. We then applied data transformations on the combined dataset to create a data preparation pipeline. To explore more Data Wrangler’s analysis capabilities, including target leakage and bias report generation, refer to the following blog post Accelerate data preparation using Amazon SageMaker Data Wrangler for diabetic patient readmission prediction.

To get started with Data Wrangler, see Prepare ML Data with Amazon SageMaker Data Wrangler, and see the latest information on the Data Wrangler product page.


About the Authors

Roop Bains is a Solutions Architect at AWS focusing on AI/ML. He is passionate about helping customers innovate and achieve their business objectives using Artificial Intelligence and Machine Learning. In his spare time, Roop enjoys reading and hiking.

Igor Alekseev is a Partner Solution Architect at AWS in Data and Analytics. Igor works with strategic partners helping them build complex, AWS-optimized architectures. Prior joining AWS, as a Data/Solution Architect, he implemented many projects in Big Data, including several data lakes in the Hadoop ecosystem. As a Data Engineer, he was involved in applying AI/ML to fraud detection and office automation. Igor’s projects were in a variety of industries including communications, finance, public safety, manufacturing, and healthcare. Earlier, Igor worked as full stack engineer/tech lead.

Huong Nguyen is a Sr. Product Manager at AWS. She is leading the user experience for SageMaker Studio. She has 13 years’ experience creating customer-obsessed and data-driven products for both enterprise and consumer spaces. In her spare time, she enjoys reading, being in nature, and spending time with her family.

Henry Wang is a software development engineer at AWS. He recently joined the Data Wrangler team after graduating from UC Davis. He has an interest in data science and machine learning and does 3D printing as a hobby.

Read More

Personalize cross-channel customer experiences with Amazon SageMaker, Amazon Personalize, and Twilio Segment

Today, customers interact with brands over an increasingly large digital and offline footprint, generating a wealth of interaction data known as behavioral data. As a result, marketers and customer experience teams must work with multiple overlapping tools to engage and target those customers across touchpoints. This increases complexity, creates multiple views of each customer, and makes it more challenging to provide an individual experience with relevant content, messaging, and product suggestions to each customer. In response, marketing teams use customer data platforms (CDPs) and cross-channel campaign management tools (CCCMs) to simplify the process of consolidating multiple views of their customers. These technologies provide non-technical users with an accelerated path to enable cross-channel targeting, engagement, and personalization, while reducing marketing teams’ dependency on technical teams and specialist skills to engage with customers.

Despite this, marketers find themselves with blind spots in customer activity when these technologies aren’t integrated with systems from other parts of the business. This is particularly true with non-digital channels, for example, in-store transactions or customer feedback from customer support. Marketing teams and their customer experience counterparts also struggle to integrate predictive capabilities developed by data scientists into their cross-channel campaigns or customer touchpoints. As a result, customers receive messaging and recommendations that aren’t relevant or are inconsistent with their expectations.

This post outlines how cross-functional teams can work together to address these challenges using an omnichannel personalization use case. We use a fictional retail scenario to illustrate how those teams interlock to provide a personalized experience at various points along the customer journey. We use Twilio Segment in our scenario, a customer data platform built on AWS. There are more than 12 CDPs in the market to choose from, many of which are also AWS partners, but we use Segment in this post because they provide a self-serve free tier that allows you to explore and experiment. We explain how to combine the output from Segment with in-store sales data, product metadata, and inventory information. Building on this, we explain how to integrate Segment with Amazon Personalize to power real-time recommendations. We also describe how we create scores for churn and repeat-purchase propensity using Amazon SageMaker. Lastly, we explore how to target new and existing customers in three ways:

  • With banners on third-party websites, also known as display advertising, using a propensity-to-buy score to attract similar customers.
  • On web and mobile channels presented with personalized recommendations powered by Amazon Personalize, which uses machine learning (ML) algorithms to create content recommendations.
  • With personalized messaging using Amazon Pinpoint, an outbound and inbound marketing communications service. These messages target disengaged customers and those showing a high propensity to churn.

Solution overview

Imagine you are a product owner leading the charge on cross-channel customer experience for a retail company. The company has a diverse set of online and offline channels, but sees digital channels as its primary opportunity for growth. They want to grow the size and value of their customer base with the following methods:

  • Attract new, highly qualified customers who are more likely to convert
  • Increase the average order value of all their customers
  • Re-attract disengaged customers to return and hopefully make repeat purchases

To ensure those customers receive a consistent experience across channels, you as a product owner need to work with teams such as digital marketing, front-end development, mobile development, campaign delivery, and creative agencies. To ensure customers receive relevant recommendations, you also need to work with data engineering and data science teams. Each of these teams are responsible for interacting with or developing features within the architecture illustrated in the following diagram.

The solution workflow contains the following high-level steps:

  1. Collect data from multiple sources to store in Amazon Simple Storage Service (Amazon S3).
  2. Use AWS Step Functions to orchestrate data onboarding and feature engineering.
  3. Build segments and predictions using SageMaker.
  4. Use propensity scores for display targeting.
  5. Send personalized messaging using Amazon Pinpoint.
  6. Integrate real-time personalized suggestions using Amazon Personalize.

In the following sections, we walk through each step, explain the activities of each team at a high level, provide references to related resources, and share hands-on labs that provide more detailed guidance.

Collect data from multiple sources

Digital marketing, front-end, and mobile development teams can configure Segment to capture and integrate web and mobile analytics, digital media performance, and online sales sources using Segment Connections. Segment Personas allows digital marketing teams to resolve the identity of users by stitching together interactions across these sources into a single user profile with one persistent identifier. These profiles, along with calculated metrics called Computed Traits and raw events, can be exported to Amazon S3. The following screenshot shows how identity rules are set up in Segment Personas.

In parallel, engineering teams can use AWS Data Migration Service (AWS DMS) to replicate in-store sales, product metadata, and inventory data sources from databases such as Microsoft SQL or Oracle and store the output in Amazon S3.

Data onboarding and feature engineering

After data is collected and stored in the landing zone on Amazon S3, data engineers can use components from the serverless data lake framework (SDLF) to accelerate data onboarding and build out the foundational structure of a data lake. With SDLF, engineers can automate the preparation of user-item data used to train Amazon Personalize or create a single view of customer behavior by joining online and offline behavioral data and sales data, using attributes such as customer ID or email address as a common identifier.

Step Functions is the key orchestrator driving these transformation jobs within SDLF. You can use Step Functions to build and orchestrate both scheduled and event-driven data workflows. The engineering team can orchestrate the tasks of other AWS services within a data pipeline. The outputs from this process are stored in a trusted zone on Amazon S3 to use for ML development. For more information on implementing the serverless data lake framework, see AWS serverless data analytics pipeline reference architecture.

Build segments and predictions

The process of building segments and predictions can be broken down into three steps: access the environment, build propensity models, and create output files.

Access the environment

After the engineering team has prepared and transformed the ML development data, the data science team can build propensity models using SageMaker. First, they build, train, and test an initial set of ML models. This allows them to see early results, decide which direction to go next, and reproduce experiments.

The data science team needs an active Amazon SageMaker Studio instance, an integrated development environment (IDE) for rapid ML experimentation. It unifies all the key features of SageMaker and offers an environment to manage the end-to-end ML pipelines. It removes complexity and reduces the time it takes to build ML models and deploy them into production. Developers can use SageMaker Studio notebooks, which are one-click Jupyter notebooks that you can quickly spin up to enable the entire ML workflow from data preparation to model deployment. For more information on SageMaker for ML, see Amazon SageMaker for Data Science.

Build the propensity models

To estimate churn and repeat-purchase propensity, the customer experience and data science teams should agree on the known driving factors for either outcome.

The data science team validates these known factors while also discovering unknown factors through the modeling process. An example of a factor driving churn can be the number of returns in the last 3 months. An example of a factor driving repurchases can be the number of items saved on the website or mobile app.

For our use case, we assume that the digital marketing team wants to create a target audience using lookalike modeling to find customers most likely to repurchase in the next month. We also assume that the campaign team wants to send an email offer to customers who will likely end their subscription in the next 3 months to encourage them to renew their subscription.

The data science team can start by analyzing the data (features) and summarizing the main characteristics of the dataset to understand the key data behaviors. They can then shuffle and split the data into training and test and upload these datasets into the trusted zone. You can use an algorithm such as the XGBoost classifier to train the model and automatically provide the feature selection, which is the best set of candidates to determine the propensity scores (or predicted values).

You can then tune the model by optimizing the algorithm metrics (such as hyperparameters) based on the ranges provided within the XGBoost framework. Test data is used to evaluate the model’s performance and estimate how well it generalizes to new data. For more information on evaluation metrics, see Tune an XGBoost Model.

Lastly, the propensity scores are calculated for each customer and stored in the trusted S3 zone to be accessed, reviewed, and validated by the marketing and campaign teams. This process also provides a prioritized evaluation of feature importance, which helps to explain how the scores were produced.

Create the output files

After the data science team has completed the model training and tuning, they work with the engineering team to deploy the best model to production. We can use SageMaker batch transform to run predictions as new data is collected and generate scores for each customer. The engineering team can orchestrate and automate the ML workflow using Amazon SageMaker Pipelines, a purpose-built continuous integration and continuous delivery (CI/CD) service for ML, which offers an environment to manage the end-to-end ML workflow. It saves time and reduces errors typically caused by manual orchestration.

The output of the ML workflow is imported by Amazon Pinpoint for sending personalized messaging and exported to Segment to use when targeting on display channels. The following illustration provides a visual overview of the ML workflow.

The following screenshot shows an example output file.

Use propensity scores for display targeting

The engineering and digital marketing teams can create the reverse data flow back to Segment to increase reach. This uses a combination of AWS Lambda and Amazon S3. Every time a new output file is generated by the ML workflow and saved in the trusted S3 bucket, a Lambda function is invoked that triggers an export to Segment. Digital marketing can then use regularly updated propensity scores as customer attributes to build and export audiences to Segment destinations (see the following screenshot). For more information on the file structure of the Segment export, see Amazon S3 from Lambda.

When the data is available in Segment, digital marketing can see the propensity scores developed in SageMaker as attributes when they create customer segments. They can generate lookalike audiences to target them with digital advertising. To create a feedback loop, digital marketing must ensure that impressions, clicks, and campaigns are being ingested back into Segment to optimize performance.

Send personalized outbound messaging

The campaign delivery team can implement and deploy AI-driven win-back campaigns to re-engage customers at risk of churn. These campaigns use the list of customer contacts generated in SageMaker as segments while integrating with Amazon Personalize to present personalized product recommendations. See the following diagram.

The digital marketing team can experiment using Amazon Pinpoint journeys to split win-back segments into subgroups and reserve a percentage of users as a control group that isn’t exposed to the campaign. This allows them to measure the campaign’s impact and creates a feedback loop.

Integrate real-time recommendations

To personalize inbound channels, the digital marketing and engineering teams work together to integrate and configure Amazon Personalize to provide product recommendations at different points in the customer’s journey. For example, they can deploy a similar item recommender on product detail pages to suggest complementary items (see the following diagram). Additionally, they can deploy a content-based filtering recommender in the checkout journey to remind customers of products they would typically buy before completing their order.

First, the engineering team needs to create RESTful microservices that respond to web, mobile, and other channel application requests with product recommendations. These microservices call Amazon Personalize to get recommendations, resolve product IDs into more meaningful information like name and price, check inventory stock levels, and determine which Amazon Personalize campaign endpoint to query based on the user’s current page or screen.

The front-end and mobile development teams need to add tracking events for specific customer actions to their applications. They can then use Segment to send those events directly to Amazon Personalize in real time. These tracking events are the same as the user-item data we extracted earlier. They allow Amazon Personalize solutions to refine recommendations based on live customer interactions. It’s essential to capture impressions, product views, cart additions, and purchases because these events create a feedback loop for the recommenders. Lambda is an intermediary, collecting user events from Segment and sending them to Amazon Personalize. Lambda also facilitates the reverse data exchange, relaying updated recommendations for the user back to Segment. For more information on configuring real-time recommendations with Segment and Amazon Personalize, see the Segment Real-time data and Amazon Personalize Workshop.

Conclusion

This post described how to deliver an omnichannel customer experience using a combination of Segment customer data platform and AWS services such as Amazon SageMaker, Amazon Personalize, and Amazon Pinpoint. We explored the role cross-functional teams play at each stage in the customer journey and in the data value chain. The architecture and approach discussed are focused on a retail environment, but you can apply it to other verticals such as financial services or media and entertainment. If you’re interested in trying out some of what we discussed, check out the Retail Demo Store, where you can find hands-on workshops that include Segment and other AWS partners.

Additional references

For additional information, see the following resources:

About Segment

Segment is an AWS Advanced Technology Partner and holder of the following AWS Independent Software Vendor (ISV) competencies: Data & Analytics, Digital Customer Experience, Retail, and Machine Learning. Brands such as Atlassian and Digital Ocean use real-time analytics solutions powered by Segment.


About the Authors

Dwayne Browne is a Principal Analytics Platform Specialist at AWS based in London. He is part of the Data-Driven Everything (D2E) customer program, where he helps customers become more data-driven and customer experience focused. He has a background in digital analytics, personalization, and marketing automation. In his spare time, Dwayne enjoys indoor climbing and exploring nature.

Hara Gavriliadi is a Senior Data Analytics Strategist at AWS Professional Services based in London. She helps customers transform their business using data, analytics, and machine learning. She specializes in customer analytics and data strategy. Hara loves countryside walks and enjoys discovering local bookstores and yoga studios in her free time.

Kenny Rajan is a Senior Partner Solution Architect. Kenny helps customers get the most from AWS and its partners by demonstrating how AWS partners and AWS services work better together. He’s interested in machine learning, data, ERP implementation, and voice-based solutions on the cloud. Outside of work, Kenny enjoys reading books and helping with charity activities.

Read More