Amazon AWS – Page 224

How eMagazines utilizes Amazon Polly to voice articles for school-aged kids

June 9, 2022

by Andrew Degenholtz Amazon AWS

This is a guest post by Andrew Degenholtz, CEO and Founder of eMagazines, the parent company of ReadAlong.ai. eMagazines’ technology seamlessly transforms print products into premium digital and audio experiences. Leveraging Amazon technology, ReadAlong.ai offers a simple, turn-key way for publishers to add audio to their websites with a single line of code.

eMagazines supports publishers in bringing high-quality journalism content to readers across digital platforms. Our ReadAlong.ai brand allows our customers to deepen their connection to readers by adding audio to traditional text-first publishing formats. In March 2020, we helped TIME for Kids launch a digital version of its popular magazine for school-aged kids. This premium subscription product helped their users transition to digital when the pandemic forced schools to close and families needed high-quality educational tools to supplement classroom learning materials.

In this post, we share how we created an automated way for TIME for Kids to seamlessly add audio for early readers and pre-readers through ReadAlong.ai, which uses Amazon Polly technology.

Why did TIME for Kids decide to start creating audio narration of their articles?

The addition of audio with auto scrolling and highlighting of text supports pre-readers and those students still learning to read. Listening while reading supports vocabulary development and reading comprehension, and new words are more likely to be learned when both their oral and written forms are provided. A report from the National Center on Early Childhood Development, Teaching, and Learning states that developing brains need to hear language even before learning to talk, and that even infants’ brains are preparing to speak months before they say their first words. Not only that, but the report also revealed that listening to stories read aloud helps expand both the volume and variety of words entering young vocabularies and fields of comprehension. Experts at Scholastic report that being read to also helps early readers “focus on the sounds of words read without interruption and provides a model of fluent reading,” and also noted that resources like audio help children learn how to listen, a prerequisite to learning to read.

What was the business challenge we addressed?

TIME for Kids originally addressed pre-reader accessibility by hiring voice actors to record their stories. The earlier iteration of their audio play button used an HTML audio player without speed variation or the option to scroll the page or highlight the text. The experience was expensive and time-consuming, and the user experience wasn’t as engaging as it could be. TIME for Kids was also unable to see even basic data around play or completion rates.

Why Amazon Polly?

We chose Amazon Polly because its APIs and web services support our goal of automating processes and making things easy for our clients.

Amazon Polly’s neural text-to-speech synthesis does the best job of voicing words within the context of a sentence, and the consistency in speech quality allows for the automation of article rendering.

Additionally, Amazon Polly offers a responsive API and powerful SSML support. This offers support for those cases where more control is needed to change inflection, and in the event that text contains challenging names (people, brands, companies) or word and phrase replacements (reading out abbreviations or acronyms in a particular way).

Amazon Polly also supports speech marks, which are crucial for highlighting the text that is currently being read out.

For TIME for Kids, the Kevin voice was a clear winner. TIME for Kids loved the approachable sound of the Kevin voice—they wanted wanting a voice that sounded like a child’s in order to help establish a sense of connection with young readers. Hear an example of a TIME for Kids article using the Kevin voice.

The technical challenge

TIME for Kids needed an educational audio solution for their website. It needed to be a one-time setup that was highly automated and very low friction. The solution also needed to process new articles as they were added dynamically, on a daily basis. And when a user listens to the audio, the page needed to scroll along with the text and highlight the sentence currently being read out loud.

Part of our challenge was to reliably and programmatically identify which content should be read aloud. In a typical publishing context, the audio player needs to read the article title and content, but avoid reading the header and footer text, navigation bars, and certain kinds of ads or captions. Our page analysis solution combines positive and negative query selectors. For each configuration, defined by a set of articles that share the same structure and layout, the http://readalong.ai solution supports a set of allow list selectors and a set of deny list selectors that together capture the appropriate content for synthesizing speech.

Furthermore, the TIME for Kids website posed many technical challenges because some pages are available only for paying subscribers, whereas some are open to the public. TIME for Kids offers four grade-specific editions, teaching materials, curriculum guides, and weekly virtual learning plans for each issue, as well as worksheets and quizzes. Therefore, each article has multiple versions for different reading levels in both English and Spanish—some with as many as seven different reading levels in both languages.

Our solution

We created a simple drop-in script that allowed TIME for Kids to only add one line of code to the header of any page where they wanted to offer audio. The script automated everything from page content delivery to audio-synthesis to webpage integration. Since the start of the school year, we’ve added the Kevin and Lupe voices (for English and Spanish content, respectively) to thousands of articles on timeforkids.com.

Our solution allowed for automated content delivery and audio synthesizing, which meant no need to sign into a dashboard, FTP, Dropbox, or otherwise send new article content to ReadAlong.ai each time a new page was added. The user-friendly backend of the solution also allows TIME for Kids to easily make word replacements, including global rules, to give the audio synthesizer engine lexicon hints for context-based pronunciations and difficult names, brands, or acronyms.

In addition to positioning and styling the launcher and player to match the TIME for Kids site design, as part of the customization, we added functionality to highlight and scroll the text as the article is read aloud, which is another helpful tool to support children in learning to recognize words and connect them to sounds. We customized this feature to be visible but not distracting, so the audio and visual elements could work in tandem to aid young readers. To support this enhanced feature, we implemented the detailed word- and sentence-level metadata available in Amazon Polly to provide a fluid highlighting experience that helps readers follow along as they encounter new words and concepts. This allows the listener to identify what they’re hearing as they view the content as it’s highlighted on the browser.

We also created a default for the Amazon Polly Kevin and Lupe voices to start at a slower speed, so the default pacing is at .9x, rather than at 1x, as another way to help early readers and pre-readers better access the content. Listeners have the ability to lower the default voice speed to .75x or increase to 1.5x, in order to accommodate more reading levels.

Business benefits for the customer

With our product in place on their site, TIME for Kids was able to voice their content in a scalable way. They deliver content on an article-by-article basis in two different languages (English and Spanish) and in seven different reading levels.

They’re also now able to easily collect and analyze data in real time, including both play and completion rates, and view most popular articles as well as articles with the most audio engagement.

We now know that 55% of kids that click to listen to an article complete 100% of the article, and 66% of kids that listen to an article complete more than half of the article. These significant completion rates reinforce the benefit and confirm that listeners are comfortable with the technology and the voice is relatable. The ReadAlong.ai audio also helped TIME for Kids promote its advanced accessibility features, including key articles with Spanish translation and read-aloud functionality, because the presence of the audio is featured prominently on the preview of each article along with other benefits (such as Spanish translation).

Stacy Bien, Director of Curriculum for TIME for Kids, was impressed with both the solution and the engagement data, saying,

“This is really a thing of beauty. This solution will help so many early readers develop their reading skills and easily consume more content. For us, we’ve seen a huge lift in engagement. That, coupled with the ease of use and cost-effectiveness, makes this a slam dunk.”

Conclusion

ReadAlong.ai used Amazon Polly to help TIME for Kids streamline the process of adding high-quality audio voiceover content to its premium subscription product. Our solution enabled the customer to significantly improve product time, precision, and cost. For example, a voiceover artist typically spends 1 hour or more to record an article, edit the audio, and master the final audio output. Now, once the ReadAlong.ai script has been added to the site, when new articles are created, the content is automatically processed without any time spent by a voiceover artist, audio editor, or administrator. The audio reads articles precisely and rarely requires adjustments, creating a valuable and immeasurable savings of both time and cost.

Collected KPIs tell us that not only did this become an easy way for the TIME for Kids team to manage audio functionality, but that the end-users—children early in the development of their reading abilities—take to the functionality as another tool on their reading path.

About the Author

Andrew Degenholtz is CEO and Founder of eMagazines and ReadAlong.ai, and is President of ValueMags, which he founded in 1999. Degenholtz holds a master’s in marketing from Northwestern University and a B.A. from Muhlenberg College. Previously, he was a member of the Alliance for Audited Media digital edition task force, created to develop best practices for acquisition of digital magazine subscribers.

Weekly forecasts can now start on Sunday with Amazon Forecast

June 9, 2022

by Dan Sinnreich Amazon AWS

We are excited to announce that in Amazon Forecast, you can now start your forecast horizon at custom starting points, including on Sundays for weekly forecasts. This allows you to more closely align demand planning forecasts to local business practices and operational requirements.

Forecast is a fully managed service that uses statistical and machine learning (ML) algorithms to deliver highly accurate time series forecasts. It uses state-of-the-art algorithms to predict future time series data based on historical data, and requires no ML experience. Typical Forecast applications include resource planning for inventory, workforce staffing, and web traffic. In this post, we review a new option that allows you to align forecasts with business and demand cycles, while reducing operational cost by offloading aggregation workflows.

To optimize demand planning, forecasts need to align with business operations. Previously, starting points for forecasts were fixed: daily forecasts assumed demand starting at midnight each day, weekly predictions assumed Monday as the first day of the week, and monthly predictions started on the first day of each month. These predefined starting points presented two challenges. First, if your business cycle began at a different point than the fixed value, you had to manually aggregate forecasts to your required starting point. For example, if your business week began on a Sunday and you wanted to produce weekly forecasts, you had to manually aggregate daily forecasts to a Sunday–Saturday week. This additional work added cost and compute time, and presented opportunities for errors. Second, the training data and forecast periods weren’t consistent; if your data reflects a demand cycle that begins on Sundays, the predictor and forecast should also use Sunday as the starting point.

Custom forecast horizon starting points now align business operations and forecasts, eliminating the need for manual aggregation work and saving cost and compute. If you have a business week starting on Sundays, you can automatically aggregate daily data to generate weekly forecasts that begin on Sundays. Or you can begin daily forecasts starting at 9:00 AM. Predictors can now be aligned with your ground truth data, providing consistency between inputs and forecasts. Forecast horizon starting points are easily defined when training new predictors via the Forecast console or using Forecast APIs.

Define custom forecast horizon starting periods

The forecast horizon, also called frequency, is the length of time for which a forecast is made, and is bounded by a starting and ending point. In Forecast, you can now select specific starting points for daily, weekly, monthly, and yearly forecast horizons when training new predictors. These starting points—also called boundary values—are selected at one frequency unit finer than the forecast horizon, as shown in the following table.

Forecast frequency unit	Boundary unit	Boundary values
Daily	Hour	0–23
Weekly	Day of week	Monday through Sunday
Monthly	Day of month	1 through 28
Yearly	Month	January through December

With custom starting points, you can align forecasts to start at specific points in time that match your business processes and ground truth data, for example, the month of May, the 15th of the month, Sundays, or 15:00 hours. For forecast horizons coarser than the provided time series frequency, Forecast aggregates the time series data based on the custom starting point. For example:

When generating daily forecasts from hourly data with a 9:00 AM starting period, forecasts are aggregated with hourly data each day between 9:00 AM to the following day at 8:00 AM
When generating weekly forecasts from daily data with a Sunday starting period, forecasts are aggregated with daily data each week from Sunday to the following Saturday
When generating monthly forecasts from daily data with a starting day of the 15th of the month, forecasts are aggregated with daily data from the 15th of the current month to the 14th of the next month
When generating yearly forecasts from monthly data with a starting month of May, forecasts are aggregated with monthly data from May of the current year to April of next year

Available forecast frequencies

The following screenshots show examples of custom daily, weekly, monthly, and yearly forecast frequencies and starting points (the Time alignment boundary field on the Forecast console).

Specify custom forecast horizon starting points

You can define custom forecast horizon starting points when creating a new predictor. The following steps demonstrate how to do this using the Forecast console. We also offer a sample notebook that provides an example of how to integrate this new setting into your workflows.

On the Forecast console, choose View dataset groups, and then Create dataset group.
Create your dataset group, a target time series dataset, and load your data.
You’re redirected to the Forecast console as your data is loaded.
After your target time series dataset is loaded into your dataset group and active, choose Start under Train a predictor.
In the Train predictor section, provide values for the Name, Forecast frequency, and Forecast horizon fields.
In the optional Time alignment boundary field, specify the starting point the predictor uses for the forecast.
The values in this list depend on the Forecast frequency value you choose. In this example, we create weekly forecasts with a 1-week horizon, with Sunday as the starting day of the week and of the forecast.
Provide other optional configurations as needed and choose Create.

After you create the predictor, you can create your forecast.
In the navigation pane, under your dataset group choose Predictors.
Select your new predictor.
Choose Create forecast.
Provide the necessary details and choose Start to create your forecast.
When the forecast is complete, choose Create forecast export to export the results.

The following screenshots are samples of the original input file (left) and the exported forecast results (right). The input file is at an hourly frequency, whereas the forecast is produced at a weekly frequency, beginning with Sunday as the first day of the week. This is an example of Forecast automatically aggregating over two levels of forecast frequencies (from hours to days).

Conclusion

Custom forecast horizon starting points in Forecast allow you to produce forecasts that align with your specific operational requirements. Work weeks start on different days in different regions, requiring forecasts that begin on days other than Mondays, and that are aligned with ground truth training and ongoing data. Or you may want to generate hourly forecasts that reflect a demand cycle beginning at 7:00 AM each day, for example.

Forecast also automatically aggregates fine-grained forecasts to higher-level frequencies (such as days into weeks). This allows you to produce forecasts aligned with your operations, and saves you costs by removing the need to stand up and manage aggregation workflows.

Custom starting points are optional. If you don’t provide specific starting points, forecasts start at default times. Specific forecast horizon starting points are only available with AutoPredictor. For more information, refer to New Amazon Forecast API that creates up to 40% more accurate forecasts and provides explainability and CreateAutoPredictor.

To learn more about forecast frequencies, refer to Data aggregation for different forecast frequencies. All these new capabilities are available in all Regions where Forecast is publicly available. For more information about Region availability, see AWS Regional Services.

About the Authors

Dan Sinnreich is a Sr. Product Manager for Amazon Forecast. He is focused on democratizing low-code/no-code machine learning and applying it to improve business outcomes. Outside of work, he can be found playing hockey, trying to improve his tennis serve, scuba diving, and reading science fiction.

Paras Arora is a Software Development Engineer in the Amazon Forecast Team. He is passionate about building cutting edge AI/ML solutions in the cloud. In his spare time, he enjoys hiking and traveling.

Chetan Surana is a Software Development Engineer in the Amazon Forecast team. His interests lie at the intersection of machine learning and software development, applying thoughtful design and engineering skills to solve problems. Outside of work, he enjoys photography, hiking, and cooking.

Continuously monitor predictor accuracy with Amazon Forecast

June 9, 2022

by Dan Sinnreich Amazon AWS

We’re excited to announce that you can now automatically monitor the accuracy of your Amazon Forecast predictors over time. As new data is provided, Forecast automatically computes predictor accuracy metrics, providing you with more information to decide whether to keep using, retrain, or create new predictors.

Monitoring predictor quality and identifying deterioration in accuracy over time is important to achieving business goals. However, the processes required to continuously monitor predictor accuracy metrics can be time-consuming to set up and challenging to manage: forecasts have to be evaluated, and updated accuracy metrics have to be computed. In addition, metrics have to be stored and charted to understand trends and make decisions about keeping, retraining, or recreating predictors. These processes can result in costly development and maintenance burdens, and place meaningful operational stress on data science and analyst teams. And for customers not willing to take on this time-consuming process (they would prefer to retrain new predictors even when not needed), this wastes time and compute.

With today’s launch, Forecast now automatically tracks predictor accuracy over time as new data is imported. You can now quantify your predictor’s deviation from initial quality metrics and systematically evaluate model quality by visualizing trends, and make more informed decisions about keeping, retraining, or rebuilding your models as new data comes in. Predictor monitoring can be enabled for new predictors at inception, or turned on for existing models. You can enable this feature with one click on the AWS Management Console or using Forecast APIs.

Predictor accuracy over time

A predictor is a machine learning model created at a point in time, using an original set of training data. After a predictor is created, it’s used on an ongoing basis over days, weeks, or months into the future to generate time series forecasts with new ground truth data generated through actual transactions. As new data is imported, the predictor generates new forecasted data points based on the latest data provided to it.

When a predictor is first created, Forecast produces accuracy metrics such as weighted quantile loss (wQL), mean absolute percentage error (MAPE), or root mean squared error (RMSE) to quantify the accuracy of the predictor. These accuracy metrics are used to determine whether a predictor will be put into production. However, the performance of a predictor will fluctuate over time. External factors such as changes in the economic environment or in consumer behavior can change the fundamental factors underlying a predictor. Other factors include new products, items, and services that may be created; changes in the financial or economic environment; or changes in the distribution of data.

For example, consider a predictor trained when a certain color of a product was popular. Months later, new colors may appear or become more popular and the distribution of values change. Or a shift occurs in the business environment that modifies long-standing purchasing patterns (such as from high-margin to low-margin products). All things considered, the predictor may need to be retrained, or a new predictor may need to be created to ensure highly accurate predictions continue to be made.

Automated predictor monitoring

Predictor monitoring is designed to automatically analyze your predictor’s performance as new ground truth time series data becomes available and is used to create new forecasts. This monitoring provides you with continuous model performance information, and saves you time so you don’t have to set up the process yourself.

If predictor monitoring is enabled in Forecast, each time you import new data and produce a new forecast, performance statistics are updated automatically. Until now, these performance statistics were only available when the predictor was initially trained; now these statistics are produced on a continuous basis using new ground truth data, and can be actively monitored to gauge predictor performance.

This allows you to use predictor performance statistics to decide when to train or retrain a new predictor. For example, as the average wQL metric deviates from the initial baseline values, you can determine whether to retrain a new predictor. If you decide to retrain a predictor or create a new one, you can begin generating new forecasted data points using the more accurate predictor.

The following graphs provide two examples of predictor monitoring. In the first chart, the average wQL metric is decreasing from the baseline (the initial value when the predictor was trained), indicating that forecast accuracy is increasing over time. The chart shows average wQL dropping from 0.3 to 0.15 over the course of a few days, meaning that forecast accuracy is increasing. In this case, there is no need to retrain the predictor because it’s producing more accurate forecasts than when it was first trained.

In the next figure, the opposite is true: the average wQL is increasing, indicating that accuracy is decreasing over time. In this case, you should consider retraining or rebuilding the predictor with new data.

In Forecast, you have the choice of retraining the current predictor or rebuilding it from scratch. Retraining is done with one click and incorporates more up-to-date data and any updates and improvements in the Forecast algorithms. Rebuilding the predictor allows you to provide new inputs (such as forecast frequency, horizon, or new dimension) to create a new predictor.

Enable predictor monitoring

You can enable predictor monitoring when creating a new predictor, or turn it on for existing predictors. The steps in this section demonstrate how to perform these steps using the Forecast console. There is also a Jupyter notebook that walks through a sequence of steps to enable predictor monitoring using APIs and generate predictor monitor results.

This example uses the time-sliced sample dataset available from the predictor monitoring notebook. In our example, we start with a 100,000-row dataset of New York City taxi pickups containing a timestamp, location ID, and target value (the number of pickups requested during the timestamp at the location ID).

Complete the following steps:

On the Forecast console, choose View dataset groups in the navigation pane.
Choose Create dataset group and provide your dataset group details.
After you create the dataset group, you’re prompted to create a target time series dataset. You use this dataset to train the predictor and create forecasts.
On the Create target time series dataset page, provide your data’s schema, frequency, and location.
Choose Start to import your target dataset.
Next, you build your predictor and train it using your initial dataset.
In the navigation pane, choose Predictors.
Choose Train new predictor.
In the Predictor settings section, enter a name for your predictor, how long in the future you want to forecast and at what frequency, and the number of quantiles you want to forecast for.
For Optimization metric, you can choose an optimization metric to optimize AutoPredictor to tune a model for a specific accuracy metric of your choice. We leave this as default for our walkthrough.
To get the predictor explainability report, select Enable predictor explainability.
To enable predictor monitoring, select Enable predictor monitoring.
Under the input data configuration, you can add local weather information and national holidays for more accurate demand forecasts.
Choose Start to start training your predictor.

Forecast now trains the predictor with this initial dataset. With predictor monitoring enabled, every time new data is provided in this dataset group, Forecast is able to compute updated predictor accuracy metrics.
After the predictor has been trained, choose it to evaluate the initial accuracy metrics.

The Metrics tab shows initial predictor quality metrics. Because you haven’t generated any forecasts from your predictor or imported any new ground truth data, there is nothing to show on the Monitoring tab.
The next step is to generate a forecast using the new predictor.
Choose Forecasts in the navigation pane.
Choose Create forecast to create a new forecast based on the time series data you just imported and the predictor settings.
Provide the forecast name, predictor name, and any additional quantile metrics you wish to compute.

After you create the forecast, you can view and export its details and results on the Forecast details page.

Predictor monitoring: Evaluating accuracy over time

Through the passage of time, new ground truth data is created by your business processes, for example, updated sales figures, staffing levels, or manufacturing output. To create new forecasts based on that new data, you can import your data to the dataset you created.

On the Amazon Forecast console, on the Dataset groups page, choose your dataset group.
Choose your dataset.
In the Dataset imports section, choose Create dataset import.
Provide additional details about your updated data, including its location.
Choose Start.

With predictor monitoring, Forecast compares this new data to the previous forecast generated, and computes accuracy metrics for the predictor. Updated predictor quality metrics are computed on an ongoing basis as new data is added to the dataset.

You can follow these steps to import additional data, representing additional transactions that have occurred through time.

Evaluate predictor monitoring results

To see predictor monitoring results, you must add new ground truth data after generated the initial forecasts. Forecast compares this new ground truth data to the previous forecast, and produces updated model accuracy values for monitoring.

On the Dataset groups page, choose the relevant dataset groups and select the Target Time Series to update it with new ground truth data.
Choose Create Dataset Import and add your new ground truth data.

After you provide the additional ground truth data, you can open your predictor and view initial predictor monitoring statistics.
Choose your predictor and navigate to the Monitoring tab.

You can follow these steps to run additional forecasts using this predictor and add further iterations of ground truth data. The progression of model accuracy statistics for your predictor are available on the Monitoring tab.

This example shows model accuracy statistics for a predictor that has been evaluated with four additional data updates. The predictor had an initial baseline MAPE of 0.55 when it was initially trained. As additional data was loaded, the MAPE dropped to .42 with the first additional dataset, indicating a more accurate predictor, and fluctuated within a tight range from .42 to .48 with subsequent datasets.

You can toggle the chart to view additional metrics. In the following examples, MASE and average wQL show similar fluctuations from the baseline over time.

The Monitoring History section at the bottom of the page provides full details on all predictor accuracy metrics tracked over time.

Set up prediction monitoring on an existing predictor

You can easily enable monitoring for existing predictors. To do so, complete the following steps:

In the navigation pane, under your dataset, choose Predictors.
From here there are two ways to enable monitoring:
1. Choose Start monitoring under the Monitoring column.
2. Choose your predictor and on the Monitoring tab, under Monitor details, choose Start monitor.
In the pop-up dialog, choose Start to start monitoring for the selected predictor.

The Monitoring tab now shows that predictor monitoring has started, and results are generated as you import more data.

Stop and restart predictor monitoring

You can also stop and restart predictor monitoring. Consider the following:

Cost – Predictor monitoring consumes additional resources. With typical small datasets, the cost is minimal, but may increase with large datasets (number of items in the input dataset, and forecast horizon).
Privacy – A copy of your forecast is stored during monitoring. If you don’t want to store this copy, you can stop monitoring.
Noise – If you’re experimenting with a predictor and don’t want to see noise in your predictor monitor results, you can temporarily stop predictor monitoring and start it again when your predictor is stable again.

To stop predictor monitoring, complete the following steps:

Navigate to the Monitoring tab for a predictor where monitoring is enabled.
Choose Stop Monitor to stop the monitoring of the predictor.
Verify your choice when prompted.

A message shows on the next page to indicate that predictor monitoring is stopped.

You can restart predictor monitoring by choosing Resume monitor.

Conclusion

Monitoring the quality of your predictors over time is important to achieve your demand planning and forecasting objectives, and ultimately your business goals. However, predictor monitoring can be a time-consuming exercise, and the processes required to stand up and maintain the necessary workflows can lead to higher operational costs.

Forecast can now automatically track the quality of your predictors, allowing you to reduce operational efforts, while helping you make more informed decisions about keeping, retraining, or rebuilding your predictors. To enable predictor monitoring, you can follow the steps outlined in this post, or follow our GitHub notebook.

Please note that predictor monitoring is only available with AutoPredictor. For more information, refer to New Amazon Forecast API that creates up to 40% more accurate forecasts and provides explainability and CreateAutoPredictor.

To learn more, refer to Predictor Monitoring. We also recommend reviewing the pricing for using these new features. All these new capabilities are available in all Regions where Forecast is publicly available. For more information about Region availability, see AWS Regional Services.

About the Authors

Dan Sinnreich is a Sr. Product Manager for Amazon Forecast. He is focused on democratizing low code/no code machine learning and applying it to improve business outcomes. Outside of work he can be found playing hockey, trying to improve his tennis serve, and reading science fiction.

Adarsh Singh works as a Software Development Engineer in the Amazon Forecast team. In his current role, he focuses on engineering problems and building scalable distributed systems that provide the most value to end users. In his spare time, he enjoys watching anime and playing video games.

Shannon Killingsworth is a UX Designer for Amazon Forecast. His current work is creating console experiences that are usable by anyone, and integrating new features into the console experience. In his spare time, he is a fitness and automobile enthusiast.

From petroleum engineering to machine learning

June 9, 2022

by Amazon AWS

How Chukwudi Chukwudozie’s path to Amazon was paved by a passion for problem-solving and growth.Read More

Unified data preparation and model training with Amazon SageMaker Data Wrangler and Amazon SageMaker Autopilot

June 9, 2022

by Peter Chung Amazon AWS

Data fuels machine learning (ML); the quality of data has a direct impact on the quality of ML models. Therefore, improving data quality and employing the right feature engineering techniques are critical to creating accurate ML models. ML practitioners often tediously iterate on feature engineering, choice of algorithms, and other aspects of ML in search of optimal models that generalize well on real-world data and deliver the desired results. Because speed in doing business disproportionately matters, this extremely tedious and iterative process may lead to project delays and lost business opportunities.

Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for ML from weeks to minutes, and Amazon SageMaker Autopilot automatically builds, trains, and tunes the best ML models based on your data. With Autopilot, you still maintain full control and visibility of your data and model. Both services are purpose-built to make ML practitioners more productive and accelerate time to value.

Data Wrangler now provides a unified experience enabling you to prepare data and seamlessly train a ML model in Autopilot. With this newly launched feature, you can now prepare your data in Data Wrangler and easily launch Autopilot experiments directly from the Data Wrangler user interface (UI). With just a few clicks, you can automatically build, train, and tune ML models, making it easier to employ state-of-the-art feature engineering techniques, train high-quality ML models, and gain insights from your data faster.

In this post, we discuss how you can use this new integrated experience in Data Wrangler to analyze datasets and easily build high-quality ML models in Autopilot.

Dataset overview

Pima Indians are an Indigenous group that live in Mexico and Arizona, US. Studies show Pima Indians as a high-risk population group for diabetes mellitus. Predicting the probability of an individual’s risk and susceptibility to a chronic illness like diabetes is an important task in improving the health and well-being of this often underrepresented minority group.

We use the Pima Indian Diabetes public dataset to predict the susceptibility of an individual to diabetes. We focus on the new integration between Data Wrangler and Autopilot to prepare data and automatically create an ML model without writing a single line of code.

The dataset contains information about Pima Indian females 21 years or older and includes several medical predictor (independent) variables and one target (dependent) variable, Outcome. The following chart describes the columns in our dataset.

Column Name	Description
Pregnancies	The number of times pregnant
Glucose	Plasma glucose concentration in an oral glucose tolerance test within 2 hours
BloodPressure	Diastolic blood pressure (mm Hg)
SkinThickness	Triceps skin fold thickness (mm)
Insulin	2-hour serum insulin (mu U/ml)
BMI	Body mass index (weight in kg/(height in m)^2)
DiabetesPedigree	Diabetes pedigree function
Age	Age in years
Outcome	The target variable

The dataset contains 768 records, with 9 total features. We store this dataset in Amazon Simple Storage Bucket (Amazon S3) as a CSV file and then import the CSV directly into a Data Wrangler flow from Amazon S3.

Solution overview

The following diagram summarizes what we accomplish in this post.[KT1]

Data scientists, doctors, and other medical domain experts provide patient data with information on glucose levels, blood pressure, body mass index, and other features used to predict the likelihood of having diabetes. With the dataset in Amazon S3, we import the dataset into Data Wrangler to perform exploratory data analysis (EDA), data profiling, feature engineering, and splitting the dataset into train and test for model building and evaluation.

We then use Autopilot’s new feature integration to quickly build a model directly from the Data Wrangler interface. We choose Autopilot’s best model based on the model with the highest F-beta score. After Autopilot finds the best model, we run a SageMaker Batch Transform job on the test (holdout) set with the model artifacts of the best model for evaluation.

Medical experts can provide new data to the validated model to obtain a prediction to see if a patient will likely have diabetes. With these insights, medical experts can start treatment early to improve the health and well-being of vulnerable populations. Medical experts can also explain a model’s prediction by referencing the model’s detail in Autopilot because they have full visibility into the model’s explainability, performance, and artifacts. This visibility in addition to validation of the model from the test set gives medical experts greater confidence in the model’s predictive ability.

We walk you through the following high-level steps.

Import the dataset from Amazon S3.
Perform EDA and data profiling with Data Wrangler.
Perform feature engineering to handle outliers and missing values.
Split data into train and test sets.
Train and build a model with Autopilot.
Test the model on a holdout sample with a SageMaker notebook.
Analyze validation and test set performance.

Prerequisites

Complete the following prerequisite steps:

Upload the dataset to an S3 bucket of your choice.
Make sure you have the necessary permissions. For more information, refer to Get Started with Data Wrangler.
Set up a SageMaker domain configured to use Data Wrangler. For instructions, refer to Onboard to Amazon SageMaker Domain.

Import your dataset with Data Wrangler

You can integrate a Data Wrangler data flow into your ML workflows to simplify and streamline data preprocessing and feature engineering using little to no coding. Complete the following steps:

Create a new Data Wrangler flow.

If this is your first time opening Data Wrangler, you may have to wait a few minutes for it to be ready.

Choose the dataset stored in Amazon S3 and import it into Data Wrangler.

After you import the dataset, you should see the beginnings of a data flow within the Data Wrangler UI. You now have a flow diagram.

Choose the plus sign next to Data types and choose Edit to confirm that Data Wrangler automatically inferred the correct data types for your data columns.

If the data types aren’t correct, you can easily modify them through the UI. If multiple data sources are present, you can join or concatenate them.

We can now create an analysis and add transformations.

Perform exploratory data analysis with the data insights report

Exploratory data analysis is a critical part of the ML workflow. We can use the new data insights report from Data Wrangler to gain a better understanding of the profile and distribution of our data. The report includes summary statistics, data quality warnings, target column insights, a quick model, and information about anomalous and duplicate rows.

Choose the plus sign next to Data types and choose Get data insights.

For Target column, choose Outcome.
For Problem type, and (optionally) select Classification.
Choose Create.

The results show a summary data with the dataset statistics.

We can also view the distribution of the labeled rows with a histogram, an estimate of the expected predicted quality of the model with the quick model feature, and a feature summary table.

We don’t go into the details of analyzing the data insights report; refer to Accelerate data preparation with data quality and insights in Amazon SageMaker Data Wrangler for additional details about how you can use the data insights report to accelerate your data preparation steps.

Perform feature engineering

Now that we’ve profiled and analyzed the distribution of our input columns at a high level, the first consideration for improving the quality of our data could be to handle missing values.

For example, we know that zeros (0) for the Insulin column represent missing values. We could follow the recommendation to replace the zeros with NaN. But on closer examination, we find that the minimum value is 0 for others columns such as Glucose, BloodPressure, SkinThickness, and BMI. We need a way to handle missing values, but need to be sensitive to columns with zeros as valid data. Let’s see how we can fix this.

In the Feature Details section, the report raises a Disguised missing value warning for the feature Insulin.

Because zeros in the Insulin column are in fact missing data, we use the Convert regex to missing transform to transform zero values to empty (missing values).

Choose the plus sign next to Data types and choose Add transform.
Choose Search and edit.
For Transform, choose Convert regex to missing.
For Input columns, choose the columns Insulin, Glucose, BloodPressure, SkinThickness, and BMI.
For Pattern, enter 0.
Choose Preview and Add to save this step.

The 0 entries under Insulin, Glucose, BloodPressure, SkinThickness, and BMI are now missing values.

Data Wrangler gives you a few other options to fix missing values.

We handle missing values by imputing the approximate median for the Glucose column.

We also want to ensure that our features are on the same scale. We don’t want to accidentally give more weight to a certain feature just because they contain a larger numeric range. We normalize our features to do this.

Add a new Process numeric transform and choose Scale values.
For Scaler, choose Min-max scaler.
For Input columns, choose the columns Pregnancies, BloodPressure, Glucose, SkinThickness, Insulin, BMI, and Age.
Set Min to 0 and Max to 1.

This makes sure that our features are between the values 0 and 1.

Now that’s we’ve created some features, we split our dataset into training and testing before we build a model.

Split data into training and testing

In the model building phase of your ML workflow, you test the efficacy of your model by running batch predictions. You can set aside a testing or holdout dataset for evaluation to see how your model performs by comparing the predictions to the ground truth. Generally, if more of the model’s predictions match the true labels, we can determine the model is performing well.

We use Data Wrangler to split our dataset for testing. We retain 90% of our dataset for training because we have a relatively small dataset. The remaining 10% of our dataset serves as the test dataset. We use this dataset to validate the Autopilot model later in this post.

We split our data by choosing the Split data transform and choosing Randomized split as the method. We designate 0.9 as the split percentage for training and 0.1 for testing.

With the data transformation and featuring engineering steps complete, we’re now ready to train a model.

Train and validate the model

We can use the new Data Wrangler integration with Autopilot to directly train a model from the Data Wrangler data flow UI.

Choose the plus sign next to Dataset and choose Train model.

For Amazon S3 location, specify the Amazon S3 location where SageMaker exports your data.

Autopilot uses this location to automatically train a model, saving you time from having to define the output location of the Data Wrangler flow, then having to define the input location of the Autopilot training data. This makes for a more seamless experience.

Choose Export and train to initiate model building with Autopilot.

Autopilot automatically selects the training data input and output locations. You only need to specify the target column and click Create Experiment to train your model.

Test the model on a holdout sample

When Autopilot completes the experiment, we can view the training results and explore the best model.

Choose View model details for your desired model, then choose the Performance tab on the model details page.

The Performance tab displays several model measurement tests, including a confusion matrix, the area under the precision/recall curve (AUCPR), and the area under the receiver operating characteristic curve (ROC). These illustrate the overall validation performance of the model, but they don’t tell us if the model will generalize well. We still need to run evaluations on unseen test data to see how accurately the model predicts if an individual will have diabetes.

To ensure the model generalizes well enough, we set aside the test sample for independent sampling. We can do so in the Data Wrangler flow UI.

Choose the plus sign next to Dataset, choose Export to, and choose Amazon S3.

Specify an Amazon S3 path.

We refer to this path when we run batch inference for validation in the next section.

Create a new SageMaker notebook to perform batch inferencing on the holdout sample and assess the test performance. Refer to the following GitHub repo for a sample notebook to run batch inference for validation.

Analyze validation and test set performance

When the batch transform is complete, we create a confusion matrix to compare the actual and predicted outcomes of the holdout dataset.

We see 23 true positives and 33 true negatives from our results. In our case, true positives refer to the model correctly predicting an individual as having diabetes. In contrast, true negatives refer to the model correctly predicting an individual as not having diabetes.

In our case, precision and recall are important metrics. Precision essentially measures all individuals predicted to have diabetes, how many really have diabetes? In contrast, recall helps measure all individual who indeed have diabetes, how many were predicted to have diabetes? For example, you may want to use a model with high precision because you want to treat as many individuals as you can, especially if the first stage of treatment has no effect on individuals without diabetes (these are false positives—those labeled as having it when in fact they do not).

We also plot the area under the ROC curve (AUC) graph to evaluate the results. The higher the AUC, the better the model is at distinguishing between classes, which in our case is how well the model performs at distinguishing patients with and without diabetes.

Conclusion

In this post, we demonstrated how to integrate your data processing, featuring engineering, and model building using Data Wrangler and Autopilot. We highlighted how you can easily train and tune a model with Autopilot directly from the Data Wrangler user interface. With this integration feature, we can quickly build a model after completing feature engineering, without writing any code. Then we referenced Autopilot’s best model to run batch predictions using the AutoML class with the SageMaker Python SDK.

Low-code and AutoML solutions like Data Wrangler and Autopilot remove the need to have deep coding knowledge to build robust ML models. Get started using Data Wrangler today to experience how easy it is to build ML models using SageMaker Autopilot.

About the Authors

Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications. He enjoys coffee, cooking, staying active, and spending time with his family.

Pradeep Reddy is a Senior Product Manager in the SageMaker Low/No Code ML team, which includes SageMaker Autopilot, SageMaker Automatic Model Tuner. Outside of work, Pradeep enjoys reading, running and geeking out with palm sized computers like raspberry pi, and other home automation tech.

Arunprasath Shankar is an Artificial Intelligence and Machine Learning (AI/ML) Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.

Srujan Gopu is a Senior Frontend Engineer in SageMaker Low Code/No Code ML helping customers of Autopilot and Canvas products. When not coding, Srujan enjoys going for a run with his dog Max, listening to audio books and VR game development.

Integrate Amazon Lex and Uneeq’s digital human platform

June 8, 2022

by Barry Conway Amazon AWS

In today’s digital landscape, customers are expecting a high-quality experience that is responsive and delightful. Chatbots and virtual assistants have transformed the customer experience from a point-and-click or a drag-and-drop experience to one that is driven by voice or text. You can create a more engaging experience by further augmenting the interaction with a visual modality.

Uneeq is an AWS Partner that specializes in developing animated visualizations of these voice bots and virtual agents, called. Uneeq’s digital humans can help provide a next-generation customer experience that is visual, animated, and emotional. Having worked with brands across numerous verticals such as UBS (financial services), Vodafone (telecommunications ), and Mentemia (healthcare), Uneeq helps customers enable innovative customer experiences powered by Amazon Lex.

Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides natural language understanding (NLU) and automatic speech recognition (ASR), enabling customer experiences that are highly engaging through conversational interactions.

In this post, we guide you through the steps required to configure an Amazon Lex V2 chatbot, connect it to Uneeq’s digital human, and manage a conversation.

Overview of solution

This solution uses the following services:

The following diagram illustrates the architecture of our solution.

The architecture utilizes AWS serverless resources for ease of deployment and to minimize any associated run costs with the deploying the solution.

The Uneeq digital human interfaces with a simple REST API, configured with Lambda proxy integration that in turn interacts with a deployed Amazon Lex bot.

After you deploy the bot, you need to configure it with a basic Welcome intent. In the first interaction with Uneeq’s digital human, the Welcome intent determines the initial phrase the Uneeq digital human gives. For example, “Hi, my name is Crissy and I am your digital assistant today. How can I help you?”

You deploy the solution with three high-level steps:

Deploy an Amazon Lex bot.
Deploy the integration, which is a simple API Gateway REST API and Lambda function using AWS Serverless Application Model (AWS SAM) .
Create a Uneeq 14-day free trial account and connect Uneeq’s digital human to the Amazon Lex bot.

Prerequisites

To implement this solution, you need the following prerequisites:

An AWS account
The AWS SAM CLI installed
An Amazon Simple Storage Service (Amazon S3) bucket
Access to the following AWS services:
- Amazon API Gateway
- AWS CloudFormation
- AWS Identity and Access Management (IAM)
- Lambda
- Amazon Lex
- AWS SAM
- Amazon S3

These instructions assume a general working knowledge of the listed Amazon services, particularly AWS SAM and AWS CloudFormation.

Deploy an Amazon Lex Bot

For this solution, we use the BookTrip sample bot that is provided in Amazon Lex.

On the Amazon Lex v2 console, choose Bots in the navigation pane.
Choose Create bot.
Select Start with an example.
For Example bot, choose BookTrip.
In the Bot configuration section, enter a bot name and optional description.
Under IAM permissions, select Create a role with basic Amazon Lex permissions.
Because this is a bot for demo purposes, it’s not subject to COPPA, so in the Children’s Online Privacy Protection Act (COPPA) section, select No.
Leave the remainder of the settings as default and choose Next.
Choose your preferred language and voice, which is provided by Amazon Polly.
Choose Done to create your bot.

Edit the BookTrip bot welcome intent

When first initiated, Uneeq’s digital human utters dialog to introduce itself based on a welcome intent defined in the Amazon Lex bot.

To add the welcome intent, browse to the intents for the BookTrip bot just created and create a new intent called Welcome by choosing Add intent.
To configure the welcome intent, in the Closing Response section, enter the initial phrase that you want Uneeq’s digital human to utter. For this post, we use “Hi, my name is Crissy and I am your digital assistant today. How can I help you?”

This is the only configuration required for this intent.

Choose Save intent.
Choose Build to build the bot with the Welcome intent.
Record the bot ID, alias ID, locale ID, and Welcome intent name to use in the next step to deploy the integration.

Deploy the integration using AWS SAM

Browse to the GitHub repo and clone the lexV2 branch. The template.yaml file is the AWS SAM configuration for the application; the swagger.yaml is the OpenAPI configuration for the API.

Deploy this application by following the instructions in the README file.
Make sure your AWS Command Line Interface (AWS CLI) configuration can access an AWS account.
Browse to the root of the cloned repository and install the required dependencies by running the following command:
```
cd function && npm install && cd ..
```
Prior to running the deploy command, upload the swagger.yaml file to an S3 bucket.

Deploy the serverless application by running the following command from the root of the repository, and assign values to the listed parameters:

1. pLexBotID
2. pLexBotAliasID
3. pWelcomeIntentName
4. pLocaleID
5. pS3BucketName

sam deploy --template-file template.yml --s3-bucket %S3BUCKETNAME% --stack-name %STACKNAME% --parameter-overrides pLexBotID=%LexV2BotID% pLexBotAliasID=%AliasID% pWelcomeIntentName=Welcome pLocaleID=en_AU pS3BucketName=%S3BucketName% --capabilities CAPABILITY_NAMED_IAM

Confirm the deployment has been successful by reviewing the output of the AWS SAM deployment.
Take note of the API endpoint URL; you use this for configuring Uneeq’s digital human.

Create a Uneeq trial account and configure Uneeq’s digital human

Let’s start by creating a 14-day free trial account on the Uneeq website.

On the Uneeq website, choose Free Trial.
Enter the required details and verify your email address via a unique code that is sent to the provided email address.
Choose a Uneeq digital human from the three provided to you as part of the free trial.

Uneeq has multiple personas available, but some require a paid subscription.

Choose a background for Uneeq’s digital human.
Enter a name for Uneeq’s digital human.
Choose your preferred language and voice for Uneeq’s digital human.

You can choose Test Voice to hear an example of the voice.

After you create Uneeq’s digital human, browse to the Uneeq dashboard and choose Personas.
Choose the edit icon for Uneeq’s digital human you just created.
In the Conversation settings section, choose Bring Your Own Conversation Platform.
For API URL, enter the URL of our deployed API.
Return to the Personas page and choose Try to start Uneeq’s digital human.

Uneeq’s digital human begins the interaction by uttering the dialog configured in your welcome intent.

For a demonstration of Uneeq’s digital human and Amazon Lex integration, watch Integrating Digital Humans with AWS Lambda – Devs in the Shed Episode 16.

Conclusion

In this post, I implemented a solution that integrates Amazon Lex with Uneeq’s digital human by enhancing the visual modality of the user experience. You can use this solution for multiple use cases by simply configuring it to a different Amazon Lex bot.

It’s easy to get started. Sign up for a free trial account with Uneeq’s digital human, and clone the GitHub repo to get started enhancing your customers’ interactions with your business. For more information about Amazon Lex, see Getting started with Amazon Lex and the V2 Developer Guide.

About the Author

Barry Conway is an Enterprise Solutions Architect with years of experience in the technology industry bridging the gap between business and technology. Barry has helped banking, manufacturing, logistics, and retail organizations realize their business goals.

Easily create and store features in Amazon SageMaker without code

June 8, 2022

by Peter Chung Amazon AWS

Data scientists and machine learning (ML) engineers often prepare their data before building ML models. Data preparation typically includes data preprocessing and feature engineering. You preprocess data by transforming data into the right shape and quality for training, and you engineer features by selecting, transforming, and creating variables when building a predictive model.

Amazon SageMaker helps you perform these tasks by simplifying feature preparation with Amazon SageMaker Data Wrangler and storage and feature serving with Amazon SageMaker Feature Store. You can prepare your data and engineer features using over 300 built-in transformations with Data Wrangler. Then you can persist those features to a purpose-built feature store for ML with Feature Store. These services help you build automatic and repeatable processes to streamline your data preparation tasks, all without writing code.

We’re excited to announce a new capability that seamlessly integrates Data Wrangler with Feature Store. You can now easily create features with Data Wrangler and store those features in Feature Store with just a few clicks in Amazon SageMaker Studio.

In this post, we demonstrate creating features with Data Wrangler and persisting them in Feature Store using the hotel booking demand dataset. We focus on the data preparation and feature engineering tasks to show how easily you can create and stores features in SageMaker without code using Data Wrangler. After the features are stored, they can be used for training and inference by multiple models and teams.

Solution overview

To demonstrate feature engineering and feature storage, we use a hotel booking demand dataset. You can download the dataset and view the full description of each variable. The dataset contains information such as when a hotel booking was made, the booking location, the length of stay, the number of parking spaces, and other features.

Our goal is to engineer features to predict if a user will cancel a booking.

We host the dataset in an Amazon Simple Storage Service (Amazon S3) bucket. We also open a Studio domain to utilize the native Data Wrangler and Feature Store capabilities. We import the dataset into a Data Wrangler flow and define the data transformation steps we want to apply using the Data Wrangler user interface (UI). We then have SageMaker run our feature engineering steps and store the features in Feature Store.

The following diagram illustrates the solution workflow.

To demonstrate Data Wrangler’s feature engineering steps, we assume we’ve already conducted exploratory data analysis (EDA). EDA helps you understand your data by identifying patterns in your data. For example, we might find that customers who book resort hotels tend to stay longer than city hotels. Or customers that stay over the weekend purchase more meals. Because these patterns aren’t evident with data in tables, data scientists use visualization tools to help identify patterns. EDA is often a necessary step to determine which features to create, delete, and transform.

If you already have features ready to export to Feature Store, you can navigate to the Save features to Feature Store section to learn how you can easily save your prepared features to Feature Store.

Prerequisites

If you want to follow along with this post, you should have the following prerequisites:

An AWS account
A Studio domain with the AmazonSageMakerFeatureStoreAccess managed policy attached to the AWS Identity and Access Management (IAM) execution role
An S3 bucket
The dataset uploaded to the S3 bucket

Create features with Data Wrangler

To create features with Data Wrangler, complete the following steps:

Enter your Studio domain.
Choose Data Wrangler as your resource to view.
Choose New flow.
Choose Import and import your data.

You can see a preview of the data in the Data Wrangler UI when selecting your dataset. You can also choose a sampling method. Because our dataset is relatively small, we choose not to sample our data. The flow editor now shows two steps in the UI, representing the step you took to import the data and a data validation step Data Wrangler automatically completes for you.

Choose the plus sign next to Data types and choose Add transform.

Assuming we’ve spent time in EDA, we can remove redundant columns that contribute to target leakage. Target leakage occurs when some data in a training dataset is strongly correlated with the target label, but isn’t available in real-world data. After we conduct a target leakage analysis, we determine we should drop redundant columns. Data Wrangler helped identify 10 columns to drop.

Add a step and choose the Drop column transform step.

Additionally, we determine we can remove columns like agent and adults after a multicollinearity analysis. Multicollinearity is the presence of high correlations between two or more independent variables. We usually want to avoid variables to be correlated to each other because they can lead to misleading and inaccurate models.

We also want to drop duplicate rows. In our case, nearly 28% of all rows in our dataset are duplicates. Because duplicates may have undesirable effects on our model, we use the transform set to remove them.

Add a new transform and choose Manage rows from the list of available transforms.
Choose Drop duplicates on the Transform drop-down menu.

Next, we want to handle missing values. We find that many hotel guests didn’t travel with children, and have a blank value for the children column. We can replace this blank value with 0.

Choose Handle missing as the transform step and Fill missing as the transform type.
Add a transform to fill blank values with the 0 value by choosing children as the input column.

From our EDA, we see that there are many missing values for the country column. However, the data reveals most of the hotel guests are from Europe. We determine that missing country column values can be replaced with the most commonly occurring country—Portugal (PRT).

Choose the Handle missing transform step and choose Fill missing as the transform type.
Choose country as the input column, and enter PRT as the Fill value.

ML algorithms like linear regression, logistic regression, neural networks, and others that use gradient descent as an optimization technique require data to be scaled. Normalization (also known as min-max scaling) is a scaling technique that transforms values to be in the range of 0–1. Standardization is another scaling technique where the values are centered around the mean with a standard deviation unit. In our case, we normalize the numeric feature columns to a standard scale between [0, 1].

Choose the Process numeric transform step and Scale values as the transform type.
Choose Min-max scaler as the scaler and lead_time, booking_changes, adr, and others as the input columns.
Leave 0 as Min and 1 as Max default values.

We also want to handle categorical data by representing them as numeric values. For example, if your categories are Dog and Cat, you may encode this information into two vectors, [1,0] to represent Dog, and [0,1] to represent Cat. For our dataset, we use one-hot encoding to encode categories into an integer between 0 and the total number of categories within the column.

Choose the One-hot encode transform type from the Encode categorical transform.

ML models are sensitive to the distribution and range of your feature values. Outliers can negatively impact model accuracy and lead to longer training times. For our dataset, we apply the standard deviation numeric outliers transform with a set of configuration values as shown in the following screenshot. We apply this transform on the numeric columns.

Choose the Standard Deviation Numeric Outliers transform type from the Handle outliers transform.

Lastly, we want to balance the target variable for class imbalance. In Data Wrangler, we can handle class imbalance using three different techniques:

Random undersample
Random oversample
SMOTE

In the Data Wrangler transform pane, choose Balance data as the group and choose Random oversample for the Transform field.

The ratio of positive to negative cases is around 0.38 before balancing.

After oversampling and balancing the dataset, the ratio equates to 1.

Now that we’ve completed our feature engineering tasks, we’re ready to export our features to Feature Store with one click.

Save features to Feature Store

You can easily export your generated features to SageMaker Feature Store by selecting it as the destination.

You can save the features into an existing feature group or create new one. For this post, we create a new feature group. Studio directs you to a new tab where you can create a new feature group.

Choose the plus sign, choose Export to, and choose SageMaker Feature Store.

Choose Create Feature Group.

Optionally, select Create “EventTime” column.
Choose Next.

Copy the JSON schema, then choose Create.

Provide a feature group name and an optional description for your feature group.
Select a feature group storage configuration that is either online or offline, or both.

Online stores serve features with low millisecond latency for real-time inference, whereas offline stores are ideal for retrieving your features for training models or for batch scoring. Additionally, you can run queries on your offline feature stores by registering your features in an AWS Glue Data Catalog. For more information, see Query Feature Store with Athena and AWS Glue.

Choose Continue.

Next, you specify the feature definitions. You specify the data type (string, integral, fractional) for each feature definition.

Enter the JSON schema from the previous step to define your feature definitions.
Choose Continue.

Next, you specify a record identifier name and a timestamp to uniquely identify a record within a feature group.

The record identifier name must refer to one of the names of a feature defined in the feature group’s feature definition. In our case, we use the existing identifier, distribution-channel, which was in our source dataset, and EventTime.

Choose Continue.

Lastly, apply any relevant tags and review your feature group details.
Choose Create feature group to finalize the process.

After we create our feature group, we can return to the Data Wrangler flow UI.
Choose the plus sign, choose Add destination, and choose SageMaker Feature Store.

We choose the desired destination feature group to ensure that the features we’re storing match the feature group schema.

If the newly created feature group doesn’t show up in the UI, refresh the list to reload the groups.

Chose the message under the Validation column to have Data Wrangler validate the schema of the dataset with the schema of the feature group.

If you missed specifying the event time column, Data Wrangler will notify you of an error and request that you add one to your dataset.

Once validated, Data Wrangler informs you that the data frame matches the feature group schema.

If you enabled both the online and offline stores for the feature group, you can optionally select Write to offline store only to only ingest data to the offline store.

This is helpful for historical data backfilling scenarios.

Choose Add to add another step to our Data Wrangler flow.
With all our steps defined, choose Create job to run our ML workflow from feature engineering to ingesting features into our feature group.

Give the job a name, then provide the job specifications like the type and number of instances.
Choose Run.

Congratulations! You’ve successfully engineered features using Data Wrangler and stored them in a persistent feature store without writing any code. You can easily explore features, see details of your feature group, and update the feature group schema when necessary.

Conclusion

In this post, we created features with Data Wrangler, and easily stored those features in Feature Store. We showed an example workflow for feature engineering in the Data Wrangler UI. Then we saved those features into Feature Store directly from Data Wrangler by creating a new feature group. Finally, we ran a processing job to ingest those features into Feature Store. These services helped us build automatic and repeatable processes to streamline our data preparation tasks, all without writing code.

With this new integration, you can accelerate your ML tasks with a more streamlined experience between feature engineering and feature ingestion. For more information, refer to Get Started with Data Wrangler and Get started with Amazon SageMaker Feature Store.

About the Authors

Patrick Lin is a Software Development Engineer with Amazon SageMaker Data Wrangler. He is committed to making Amazon SageMaker Data Wrangler the number one data preparation tool for productionized ML workflows. Outside of work, you can find him reading, listening to music, having conversations with friends, and serving at his church.

Ziyao Huang is a Software Development Engineer with Amazon SageMaker Data Wrangler. He is passionate about building great product that makes ML easy for the customers. Outside of work, Ziyao likes to read, and hang out with his friends

Simplifying BERT-based models to increase efficiency, capacity

June 8, 2022

by Amazon AWS

New method would enable BERT-based natural-language-processing models to handle longer text strings, run in resource-constrained settings — or sometimes both.Read More

Create train, test, and validation splits on your data for machine learning with Amazon SageMaker Data Wrangler

June 7, 2022

by Gopi Mudiyala Amazon AWS

In this post, we talk about how to split a machine learning (ML) dataset into train, test, and validation datasets with Amazon SageMaker Data Wrangler so you can easily split your datasets with minimal to no code.

Data used for ML is typically split into the following datasets:

Training – Used to train an algorithm or ML model. The model iteratively uses the data and learns to provide the desired result.
Validation – Introduces new data to the trained model. You can use a validation set to periodically measure model performance as training is happening, and also tune any hyperparameters of the model. However, validation datasets are optional.
Test – Used on the final trained model to assess its performance on unseen data. This helps determine how well the model generalizes.

Data Wrangler is a capability of Amazon SageMaker that helps data scientists and data engineers quickly and easily prepare data for ML applications using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code.

Today, we’re excited to announce a new data transformation to split datasets for ML use cases within Data Wrangler. This transformation splits your dataset into training, test, and optionally validation datasets without having to write any code.

Overview of the split data transformation

The split data transformation includes four commonly used techniques to split the data for training the model, validating the model, and testing the model:

Random split – Splits data randomly into train, test, and, optionally validation datasets using the percentage specified for each dataset. It ensures that the distribution of the data is similar in all datasets. Choose this option when you don’t need to preserve the order of your input data. For example, consider a movie dataset where the dataset is sorted by genre and you’re predicting the genre of the movie. A random split on this dataset ensures that the distribution of the data includes all genres in all three datasets.
Ordered split – Splits data in order, using the percentage specified for each dataset. An ordered split ensures that the data in each split is non-overlapping while preserving the order of the data. When training, we want to avoid past or future information leaking across datasets. The ordered split option prevents data leakage. For example, consider a scenario where you have customer engagement data for the first few months and you want to use this historical data to predict customer engagement in the next month. You can perform this split by providing an optional input column (numeric column). This operation uses the values of a numeric column to ensure that the data in each split doesn’t overlap while preserving the order. This helps avoid data leakage across splits. If no input column is provided, the order of the rows is used, so the data in each split still comes before the data in the next split. This is useful where the rows of the dataset are already ordered (for example, by date) and the model may need to be fit to earlier data and tested on later data.
Stratified split – Splits the dataset so that each split is similar with respect to a column specifying different categories for your data, for example, size or country. This split ensures that the train, test, and validation datasets have the same proportions for each category as the input dataset. This is useful with classification problems where we’re trying to ensure that the train and test sets have approximately the same percentage of samples of each target class. Choose this option if you have imbalanced data across different categories and you need to have it balanced across split datasets.
Split by key – Takes one or more columns as input (the key) and ensures that no combination of values across the input columns occurs in more than one of the splits (split by key). This is useful to avoid data leakage for unordered data. Choose this option if your data for key columns needs to be in the same split. For example, consider customer transactions split by customer ID; the split ensures that customer IDs don’t overlap across split datasets.

Solution overview

For this post, we demonstrate how to split data into train, test, and validation datasets using the four new split options in Data Wrangler. We use a hotel booking dataset available publicly on Kaggle, which has the year, month, and date that bookings were made, along with reservation statuses, cancellations, repeat customers, and other features.

Prerequisites

Before getting started, upload the dataset to an Amazon Simple Storage Service (S3) bucket, then import it into Data Wrangler. For instructions, refer to Import data from Amazon S3.

Random split

After we import the data into Data Wrangler, we start the transformation. We first demonstrate a random split.

On the Data Wrangler console, choose the plus sign and choose Add transform.
To add the split data transformation, choose Add step.

You’re redirected to the page where all transformations are displayed.
Scroll down the list and choose Split data.

The split data transformation has a drop-down menu that lists the available transformations to split your data, which include random, ordered, stratified, and split by key. By default, Randomized split is displayed.
Choose the default value Randomized split.
In the Splits section, enter the name Train with an 0.8 split percentage, and Test with a 0.2 percentage.
Choose the plus sign to add an additional split.
Add the Validation split with 0.2, and adjust Train to 0.7 and Test to 0.1.
The split percentage can be any value you want, provided all three splits sum to 1 (100%).We can also specify optional fields like Error threshold and Random seed. We can achieve an exact split by setting the error threshold to 0. A smaller error threshold can lead to more processing time for splitting the data. This allows you to control the trade-off between time and accuracy on the operation. The Random seed option is for reproducibility. If not specified, Data Wrangler uses a default random seed value. We leave it blank for the purpose of this post.
To preview your data split, choose Preview.

The preview page displays the data split. You can choose Train, Test, or Validation on the drop-down menu to review the details of each split.
When you’re satisfied with your data split, choose Add to add the transformation to your Data Wrangler flow.

To analyze the train dataset, choose Add analysis.

You can perform a similar analysis on the validation and test datasets.

Ordered split

We now use the hotel bookings dataset to demonstrate an ordered split transformation. The hotel dataset contains rows ordered by date.

Repeat the steps to add a split, and choose Ordered split on the drop-down menu.
Specify your three splits and desired percentages.
Preview your data and choose Add to add the transformation to the Data Wrangler flow.
Use the Add analysis option to verify the splits.

Stratified split

In the hotel booking dataset, we have an is_cancelled column, which indicates whether the booking was cancelled or not. We want to use this column to split the data. A stratified split ensures that the train, test, and validation datasets have same percentage of samples of is_cancelled.

Repeat the steps to add a transformation, and choose Stratified split.
Specify your three splits and desired percentages.
For Input column, choose is_canceled.
Preview your data and choose Add to add the transformation to the Data Wrangler flow.
Use the Add analysis option to verify the splits.

Split by key

The split by key transformation splits the data by the key or multiple keys we specify. This split is useful to avoid having the same data in the split datasets created during transformation and to avoid data leakage.

Repeat the steps to add a transformation, and choose Split by key.
Specify your three splits and desired percentages.
For Key column, we can specify the columns to form the key. For this post, choose the following columns:
1. is_cancelled
2. arrival_date_year
3. arrival_date_month
4. arrival_date_week_number
5. reservation_status
Preview your data and choose Add to add the transformation to the Data Wrangler flow.
Use the Add analysis option to verify the splits.

Considerations

The node labeled as Data types cannot be deleted. Deleting a split node deletes all its datasets and downstream datasets and its nodes.

Conclusion

In this post, we demonstrated how to split an input dataset into train, test, and validation datasets with Data Wrangler using the split techniques random, ordered, stratified, and split by key.

To learn more about using data flows with Data Wrangler, refer to Create and Use a Data Wrangler Flow. To get started with Data Wrangler, see Prepare ML Data with Amazon SageMaker Data Wrangler.

About the Authors

Gopi Mudiyala is a Senior Technical Account Manager at AWS. He helps customers in the Financial Services Industry with their operations in AWS. As a machine learning specialist, Gopi works to support customers succeed in their ML journey.

Xiyi Li is a Front End Engineer at Amazon SageMaker Data Wrangler. She helps support Amazon SageMaker Data Wrangler and is passionate about building products that provide a great user experience. Outside of work, she enjoys hiking and listening to classical music.

Vishaal Kapoor is a Senior Applied Scientist with AWS AI. He is passionate about helping customers understand their data in Data Wrangler. In his spare time, he mountain bikes, snowboards, and spends time with his family.

How InfoJobs (Adevinta) improves NLP model prediction performance with AWS Inferentia and Amazon SageMaker

June 7, 2022

by Juan Francisco Fernandez Amazon AWS

This is a guest post co-written by Juan Francisco Fernandez, ML Engineer in Adevinta Spain, and AWS AI/ML Specialist Solutions Architects Antonio Rodriguez and João Moura.

InfoJobs, a subsidiary company of the Adevinta group, provides the perfect match between candidates looking for their next job position and employers looking for the best hire for the openings they need to fill. For this goal, we use natural language processing (NLP) models such as BERT through PyTorch to automatically extract relevant information from users’ CVs at the moment they upload these to our portal.

Performing inference with NLP models can take several seconds when hosted on typical CPU-based instances given the complexity and variety of the fields. This affects the user experience in the job listing web portal. Alternatively, hosting these models on GPU-based instances can prove costly, which makes the solution not feasible for our business. For this solution, we were looking for a way to optimize the latency of predictions, while keeping the costs at a minimum.

To solve this challenge, we initially considered some possible solutions along two axes:

Vertical scaling by using bigger general-purpose instances as well as GPU-powered instances.
Optimizing our models using openly available techniques such as quantization or open tools such as ONNX.

Neither option, whether individually or combined, was able to provide the needed performance at an affordable cost. After benchmarking our full range of options with the help of AWS AI/ML Specialists, we found that compiling our PyTorch models with AWS Neuron and using AWS Inferentia to host them on Amazon SageMaker endpoints offered a reduction of up to 92% in prediction latency, at 75% lower cost when compared to our best initial alternatives. It was, in other words, like having the best of GPU power at CPU cost.

Amazon Comprehend is a plug-and-play managed NLP service that uses machine learning to automatically uncover valuable insights and connections in text. However, in this particular case we wanted to use fine-tuned models for the task.

In this post, we share a summary of the benchmarks performed and an example of how to use AWS Inferentia with SageMaker to compile and host NLP models. We also describe how InfoJobs is using this solution to optimize the inference performance of NLP models, extracting key information from users’ CVs in a cost-efficient way.

Overview of solution

First, we had to evaluate the different options available on AWS to find the best balance between performance and cost to host our NLP models. The following diagram summarizes the most common alternatives for real-time inference, most of which were explored during our collaboration with AWS.

Hosting options benchmark on SageMaker

We started our tests with a publicly available pre-trained model from the Hugging Face model hub bert-base-multilingual-uncased. This is the same base model used by InfoJobs’s CV key value extraction model. For this purpose, we deployed this model to a SageMaker endpoint using different combinations of instance types: CPU-based, GPU-based, or AWS Inferentia-based. We also explored optimization with Amazon SageMaker Neo and compilation with AWS Neuron where appropriate.

In this scenario, deploying our model to a SageMaker endpoint with an AWS Inferentia instance yielded 96% faster inference times compared to CPU instances and 44% faster inference times compared to GPU instances in the same range of cost and specs. This allows us to respond to 15 times more inferences than using CPU instances, or 4 times more inferences than using GPU instances at the same cost.

Based on the encouraging first results, our next step was to validate our tests on the actual model used by InfoJobs. This is a more complex model that requires PyTorch quantization for performance improvement, so we expected worse results compared to the previous standard case with bert-base-multilingual-uncased. The results of our tests for this model are summarized in the following table (based on public pricing in Region us-east-1 as of February 20, 2022).

Category	Mode	Instance type example	p50 Inference latency (ms)	TPS	Cost per hour (USD)	Inferences per hour	Cost per million inferences (USD)
CPU	Normal	m5.xlarge	1400	2	0.23	5606	41.03
CPU	Optimized	m5.xlarge	1105	2	0.23	7105	32.37
GPU	Normal	g4dn.xlarge	800	18	0.736	64800	11.36
GPU	Optimized	g4dn.xlarge	700	21	0.736	75600	9.74
AWS Inferentia	Compiled	inf1.xlarge	57	33	0.297	120000	2.48

The following graph shows real-time inference response times for the InfoJobs model (less is better). In this case, inference latency is 75-92% faster when compared to both CPU or GPU options.

This also means between 4-13 times less cost for running inferences compared to both CPU or GPU options, as shown in the following graph of cost per million inferences.

We must highlight that no further optimizations were made to the inference code during these non-extensive tests. However, the performance and cost benefits we saw from using AWS Inferentia exceeded our initial expectations, and enabled us to proceed to production. In the future, we will continue to optimize with other features of Neuron, such as NeuronCore Pipeline or the PyTorch-specific DataParallel API. We encourage you to explore and compare the results for your specific use case and model.

Compiling for AWS Inferentia with SageMaker Neo

You don’t need to use the Neuron SDK directly to compile your model and be able to host it on AWS Inferentia instances.

SageMaker Neo automatically optimizes machine learning (ML) models for inference on cloud instances and edge devices to run faster with no loss in accuracy. In particular, Neo is capable of compiling a wide variety of transformer-based models, making use of the Neuron SDK in the background. This allows you to get the benefit of AWS Inferentia by using APIs that are integrated with the familiar SageMaker SDK, with no required context switch.

In this section, we go through an example in which we show you how to compile a BERT model with Neo for AWS Inferentia. We then deploy that model to a SageMaker endpoint. You can find a sample notebook describing the whole process in detail on GitHub.

First, we need to create a sample input to trace our model with PyTorch and create a tar.gz file, with the model being its only content. This is a required step to have Neo compile our model artifact (for more information, see Prepare Model for Compilation). For demonstration purposes, the model is initialized as a mock model for sequence classification that hasn’t been fine-tuned on the task at all. In reality, you would replace the model identifier with your selected model from the Hugging Face model hub or a locally saved model artifact. See the following code:

import transformers
import torch
import tarfile

tokenizer = transformers.AutoTokenizer.from_pretrained("distilbert-base-multilingual-uncased")
model = transformers.AutoModelForSequenceClassification.from_pretrained(
"distilbert-base- multilingual-uncased", return_dict=False
)

seq_0 = "This is just sample text for model tracing, the length of the sequence does not matter because we will pad to the max length that Bert accepts."
seq_1 = seq_0
max_length = 512

tokenized_sequence_pair = tokenizer.encode_plus(
    seq_0, seq_1, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt"
)

example = tokenized_sequence_pair["input_ids"], tokenized_sequence_pair["attention_mask"]

traced_model = torch.jit.trace(model.eval(), example)
traced_model.save("model.pth")

with tarfile.open('model.tar.gz', 'w:gz') as f:
    f.add('model.pth')
f.close()

It’s important to set the return_dict parameter to False when loading a pre-trained model, because Neuron compilation does not support dictionary-based model outputs. We upload our model.tar.gz file to Amazon Simple Storage Service (Amazon S3), saving its location in a variable named traced_model_url.

We then use the PyTorchModel SageMaker API to instantiate and compile our model:

from sagemaker.pytorch.model import PyTorchModel
from sagemaker.predictor import Predictor
import json

traced_sm_model = PyTorchModel(
    model_data=traced_model_url,
    predictor_cls=Predictor,
    framework_version="1.5.1",
    role=role,
    sagemaker_session=sagemaker_session,
    entry_point="inference_inf1.py",
    source_dir="code",
    py_version="py3",
    name="inf1-bert-base-multilingual-uncased ",
)

compiled_inf1_model = traced_sm_model.compile(
    target_instance_family="ml_inf1",
    input_shape={"input_ids": [1, 512], "attention_mask": [1, 512]},
    job_name=’testing_inf1_neo,
    role=role,
    framework="pytorch",
    framework_version="1.5.1",
    output_path=f"s3://{sm_bucket}/{your_model_destination}”
    compiler_options=json.dumps("--dtype int64")
)

Compilation may take a few minutes. As you can see, our entry_point to model inference is our inference_inf1.py script. It determines how our model is loaded, how input and output are preprocessed, and how the model is used for prediction. Check out the full script on GitHub.

Finally, we can deploy our model to a SageMaker endpoint on an AWS Inferentia instance, and get predictions from it in real time:

from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

compiled_inf1_predictor = compiled_inf1_model.deploy(
    instance_type="ml.inf1.xlarge",
    initial_instance_count=1,
    endpoint_name=f"test-neo-inf1-bert",
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
)

payload = seq_0, seq_1
print(compiled_inf1_predictor.predict(payload))

As you can see, we were able to get all the benefits of using AWS Inferentia instances on SageMaker by using simple APIs that complement the standard flow of the SageMaker SDK.

Final solution

The following architecture illustrates the solution deployed in AWS.

All the testing and evaluation analysis described in this post were done with the help of AWS AI/ML Specialist Solutions Architects in under 3 weeks, thanks for the ease of use of SageMaker and AWS Inferentia.

Conclusion

In this post, we shared how InfoJobs (Adevinta) uses AWS Inferentia with SageMaker endpoints to optimize the performance of NLP model inference in a cost-effective way, reducing inference times up to 92% with a 75% lower cost than the initial best alternative. You can follow the process and code shared for compiling and deploying your own models easily using SageMaker, the Neuron SDK for PyTorch, and AWS Inferentia.

The results of the benchmarking tests performed between AWS AI/ML Specialist Solutions Architects and InfoJobs engineers were also validated in InfoJobs’s environment. This solution is now being deployed in production, handling the processing of all the CVs uploaded by users to the InfoJobs portal in real time.

As a next step, we will be exploring ways to optimize model training and our ML pipeline with SageMaker by relying on the Hugging Face integration with SageMaker and SageMaker Training Compiler, among other features.

We encourage you to try out AWS Inferentia with SageMaker, and connect with AWS to discuss your specific ML needs. For more examples on SageMaker and AWS Inferentia, you can also check out SageMaker examples on GitHub and AWS Neuron tutorials.

About the Authors

Juan Francisco Fernandez is an ML Engineer with Adevinta Spain. He joined InfoJobs to tackle the challenge of automating model development, thereby providing more time for data scientists to think about new experiments and models and freeing them of the burden of engineering tasks. In his spare time, he enjoys spending time with his son, playing basketball and video games, and learning languages.

Antonio Rodriguez is an AI & ML Specialist Solutions Architect at Amazon Web Services. He helps companies solve their challenges through innovation with the AWS Cloud and AI/ML services. Apart from work, he loves to spend time with his family and play sports with his friends.

João Moura is an AI & ML Specialist Solutions Architect at Amazon Web Services. He focuses mostly on NLP use cases and helping customers optimize deep learning model deployments.

Why did TIME for Kids decide to start creating audio narration of their articles?

What was the business challenge we addressed?

Why Amazon Polly?

The technical challenge

Our solution

Business benefits for the customer

Conclusion

About the Author

Define custom forecast horizon starting periods

Available forecast frequencies

Specify custom forecast horizon starting points

Conclusion

About the Authors

Predictor accuracy over time

Automated predictor monitoring

Enable predictor monitoring

Predictor monitoring: Evaluating accuracy over time

Evaluate predictor monitoring results

Set up prediction monitoring on an existing predictor

Stop and restart predictor monitoring

Conclusion

About the Authors

Dataset overview

Solution overview

Prerequisites

Import your dataset with Data Wrangler

Perform exploratory data analysis with the data insights report

Perform feature engineering

Split data into training and testing

Train and validate the model

Test the model on a holdout sample

Analyze validation and test set performance

Conclusion

About the Authors

Overview of solution

Prerequisites

Deploy an Amazon Lex Bot

Edit the BookTrip bot welcome intent

Deploy the integration using AWS SAM

Create a Uneeq trial account and configure Uneeq’s digital human

Conclusion

About the Author

Solution overview

Prerequisites

Create features with Data Wrangler

Save features to Feature Store

Conclusion

About the Authors

Overview of the split data transformation

Solution overview

Prerequisites

Random split

Ordered split

Stratified split

Split by key

Considerations

Conclusion

About the Authors

Overview of solution

Hosting options benchmark on SageMaker

Compiling for AWS Inferentia with SageMaker Neo

Final solution

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.