UN Satellite Centre Works With NVIDIA to Boost Sustainable Development Goals

To foster climate action for a healthy global environment, NVIDIA is working with the United Nations Satellite Centre (UNOSAT) to apply the powers of deep learning and AI.

The effort supports the UN’s 2030 Agenda for Sustainable Development, which has at its core 17 interrelated Sustainable Development Goals. These SDGs — which include “climate action” and “sustainable cities and communities” — serve as calls to action for all UN member states to bolster global well-being.

The collaboration between UNOSAT, part of the United Nations Institute for Training and Research, and NVIDIA is initially focused on boosting climate-related disaster management by using AI for Earth Observation. AI4EO, as it’s known, is a term that encompasses initiatives using AI to help monitor and assess the planet’s changes.

To fast track research and development for its AI4EO efforts, UNOSAT will integrate its satellite imagery technology infrastructure with NVIDIA’s accelerated computing platform. The AI-powered satellite imagery system will collect and analyze geospatial information to provide near-real-time insights about floods, wildfires and other climate-related disasters.

In addition, UNOSAT has launched an educational module that builds upon an NVIDIA Deep Learning Institute (DLI) course on applying deep learning methods to generate accurate flood detection models.

“Working with NVIDIA will enable us to close the loop from AI research to implementation of climate solutions in the shortest time possible, ensuring that vulnerable populations can benefit from the technology,” said Einar Bjørgo, director of UNOSAT.

AI-Powered Satellite Imagery Analysis

For tasks like evaluating the impact of a tropical cyclone in the Philippines or a volcanic eruption in Tonga, UNOSAT’s emergency mapping service uses computer vision and satellite imagery analysis to gain accurate information about complex disasters.

Near-real-time analysis is key to managing climate-disaster events. Humanitarian teams can use the data-driven insights provided by AI to take rapid, effective action in combating disasters. The data is also used to inform sustainable development policies, develop users’ capacities and strengthen climate resilience overall.

UNOSAT will supercharge its satellite imagery technology infrastructure with NVIDIA DGX systems, which enable AI development at scale — as well as the NVIDIA EGX platform, which delivers the power of accelerated computing from the data center to the edge.

NVIDIA technology speeds up AI-based flood detection by 7x, covering larger areas with greater accuracy, according to UNOSAT.

NVIDIA DLI Course on Disaster Risk Monitoring

In addition to powerful technology, a skilled workforce is essential to using AI and data science to analyze and prevent climate events from becoming humanitarian disasters.

“NVIDIA and UNOSAT have a unique opportunity to combat the impact of climate change and advance the UN’s SDGs, with a launching point of training data scientists to develop and deploy GPU-accelerated models that improve flood prediction,” said Keith Strier, vice president of global AI initiatives at NVIDIA.

UNOSAT has developed a module for the Deep Learning Institute’s free online course that covers how to build a deep learning model to automate the detection of flood events.

Called Disaster Risk Monitoring Using Satellite Imagery, it’s the first NVIDIA DLI course focused on climate action for the global public sector community — with many additional climate-action-related courses being planned.

UNOSAT’s module — based on a real UN case study — highlights an example of a flood in Nepal.

In collaboration with NVIDIA, UNOSAT is offering the module for free with the goal of upskilling data scientists worldwide to harness accelerated computing to predict and respond to climate-related disasters.

“We aim to democratize access to accelerated computing to help nations train more accurate deep learning models that better predict and respond to a full spectrum of humanitarian and natural disasters,” Strier said.

Get started with the course, which is now available.

Learn more about how NVIDIA technology is used to improve the planet and its people.

The post UN Satellite Centre Works With NVIDIA to Boost Sustainable Development Goals appeared first on NVIDIA Blog.

Read More

Taking the guesswork out of dental care with artificial intelligence

When you picture a hospital radiologist, you might think of a specialist who sits in a dark room and spends hours poring over X-rays to make diagnoses. Contrast that with your dentist, who in addition to interpreting X-rays must also perform surgery, manage staff, communicate with patients, and run their business. When dentists analyze X-rays, they do so in bright rooms and on computers that aren’t specialized for radiology, often with the patient sitting right next to them.

Is it any wonder, then, that dentists given the same X-ray might propose different treatments?

“Dentists are doing a great job given all the things they have to deal with,” says Wardah Inam SM ’13, PhD ’16.

Inam is the co-founder of Overjet, a company using artificial intelligence to analyze and annotate X-rays for dentists and insurance providers. Overjet seeks to take the subjectivity out of X-ray interpretations to improve patient care.

“It’s about moving toward more precision medicine, where we have the right treatments at the right time,” says Inam, who co-founded the company with Alexander Jelicich ’13. “That’s where technology can help. Once we quantify the disease, we can make it very easy to recommend the right treatment.”

Overjet has been cleared by the Food and Drug Administration to detect and outline cavities and to quantify bone levels to aid in the diagnosis of periodontal disease, a common but preventable gum infection that causes the jawbone and other tissues supporting the teeth to deteriorate.

In addition to helping dentists detect and treat diseases, Overjet’s software is also designed to help dentists show patients the problems they’re seeing and explain why they’re recommending certain treatments.

The company has already analyzed tens of millions of X-rays, is used by dental practices nationwide, and is currently working with insurance companies that represent more than 75 million patients in the U.S. Inam is hoping the data Overjet is analyzing can be used to further streamline operations while improving care for patients.

“Our mission at Overjet is to improve oral health by creating a future that is clinically precise, efficient, and patient-centric,” says Inam.

It’s been a whirlwind journey for Inam, who knew nothing about the dental industry until a bad experience piqued her interest in 2018.

Getting to the root of the problem

Inam came to MIT in 2010, first for her master’s and then her PhD in electrical engineering and computer science, and says she caught the bug for entrepreneurship early on.

“For me, MIT was a sandbox where you could learn different things and find out what you like and what you don’t like,” Inam says. “Plus, if you are curious about a problem, you can really dive into it.”

While taking entrepreneurship classes at the Sloan School of Management, Inam eventually started a number of new ventures with classmates.

“I didn’t know I wanted to start a company when I came to MIT,” Inam says. “I knew I wanted to solve important problems. I went through this journey of deciding between academia and industry, but I like to see things happen faster and I like to make an impact in my lifetime, and that’s what drew me to entrepreneurship.”

During her postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL), Inam and a group of researchers applied machine learning to wireless signals to create biomedical sensors that could track a person’s movements, detect falls, and monitor respiratory rate.

She didn’t get interested in dentistry until after leaving MIT, when she changed dentists and received an entirely new treatment plan. Confused by the change, she asked for her X-rays and asked other dentists to have a look, only to receive still another variation in diagnosis and treatment recommendations.

At that point, Inam decided to dive into dentistry for herself, reading books on the subject, watching YouTube videos, and eventually interviewing dentists. Before she knew it, she was spending more time learning about dentistry than she was at her job.

The same week Inam quit her job, she learned about MIT’s Hacking Medicine competition and decided to participate. That’s where she started building her team and getting connections. Overjet’s first funding came from the Media Lab-affiliated investment group the E14 Fund.

The E14 fund wrote the first check, and I don’t think we would’ve existed if it wasn’t for them taking a chance on us,” she says.

Inam learned that a big reason for variation in treatment recommendations among dentists is the sheer number of potential treatment options for each disease. A cavity, for instance, can be treated with a filling, a crown, a root canal, a bridge, and more.

When it comes to periodontal disease, dentists must make millimeter-level assessments to determine disease severity and progression. The extent and progression of the disease determines the best treatment.

“I felt technology could play a big role in not only enhancing the diagnosis but also to communicate with the patients more effectively so they understand and don’t have to go through the confusing process I did of wondering who’s right,” Inam says.

Overjet began as a tool to help insurance companies streamline dental claims before the company began integrating its tool directly into dentists’ offices. Every day, some of the largest dental organizations nationwide are using Overjet, including Guardian Insurance, Delta Dental, Dental Care Alliance, and Jefferson Dental and Orthodontics.

Today, as a dental X-ray is imported into a computer, Overjet’s software analyzes and annotates the images automatically. By the time the image appears on the computer screen, it has information on the type of X-ray taken, how a tooth may be impacted, the exact level of bone loss with color overlays, the location and severity of cavities, and more.

The analysis gives dentists more information to talk to patients about treatment options.

“Now the dentist or hygienist just has to synthesize that information, and they use the software to communicate with you,” Inam says. “So, they’ll show you the X-rays with Overjet’s annotations and say, ‘You have 4 millimeters of bone loss, it’s in red, that’s higher than the 3 millimeters you had last time you came, so I’m recommending this treatment.”

Overjet also incorporates historical information about each patient, tracking bone loss on every tooth and helping dentists detect cases where disease is progressing more quickly.

“We’ve seen cases where a cancer patient with dry mouth goes from nothing to something extremely bad in six months between visits, so those patients should probably come to the dentist more often,” Inam says. “It’s all about using data to change how we practice care, think about plans, and offer services to different types of patients.”

The operating system of dentistry

Overjet’s FDA clearances account for two highly prevalent diseases. They also put the company in a position to conduct industry-level analysis and help dental practices compare themselves to peers.

“We use the same tech to help practices understand clinical performance and improve operations,” Inam says. “We can look at every patient at every practice and identify how practices can use the software to improve the care they’re providing.”

Moving forward, Inam sees Overjet playing an integral role in virtually every aspect of dental operations.

“These radiographs have been digitized for a while, but they’ve never been utilized because the computers couldn’t read them,” Inam says. “Overjet is turning unstructured data into data that we can analyze. Right now, we’re building the basic infrastructure. Eventually we want to grow the platform to improve any service the practice can provide, basically becoming the operating system of the practice to help providers do their job more effectively.”

Read More

Family Style: Li Auto L9 Brings Top-Line Luxury and Intelligence to Full-Size SUV With NVIDIA DRIVE Orin

Finally, there’s a family car any kid would want to be seen in.

Beijing-based startup Li Auto this week rolled out its second electric vehicle, the L9. It’s a full-size SUV decked out with the latest intelligent driving technology.

With AI features and an extended battery range of more than 800 miles, the L9 promises to elevate the playing field for luxury family vehicles.

Li Auto is deploying its newest automated driving features with the expansion of its vehicle lineup, using a software-defined compute platform built on two NVIDIA DRIVE Orin systems-on-a-chip (SoCs).

With more than 500 trillion operations per second (TOPS), the L9’s compute platform can run various deep neural networks simultaneously and in real time, all while ensuring the redundancy and diversity necessary for safety.

First-Class Safety and Security

As a top-line luxury model, the L9 sports only the best when it comes to AI-assisted driving technology.

All Li Auto vehicles come standard with the electric automaker’s advanced driver assistance system, Li AD Max. To achieve surround perception, the system uses one forward-facing lidar, 11 cameras, one radar and 12 ultrasonic sensors, as well as DRIVE Orin SoCs.

In addition to handling the large number of applications and deep neural networks necessary for autonomous driving, DRIVE Orin is architected to achieve systematic safety standards such as the ISO 26262 ASIL-D. Its dual processors provide fallback redundancies for each other, further ensuring safe operation.

The L9’s high-performance sensors also enable round-the-clock security features, monitoring both the car’s interior and exterior.

Innovative Infotainment

Inside the vehicle, five 3D-capable screens transform the in-cabin experience.

In the cockpit, a combined head-up display and confidence view enhance safety for the person driving. The head-up display projects key driving information onto the front windshield, and the interactive visualization feature of the vehicle’s perception system is located above the steering wheel, keeping the driver’s attention on the road.

The L9’s screens for central control, passenger entertainment and rear cabin entertainment are 15.7-inch, 3K-resolution, automotive-grade OLED displays that deliver first-class visual experiences for every occupant.

Passengers can also interact with the intelligent in-cabin system via interior sensors and natural language processing.

Designed to optimize all driver and passenger experiences, the L9 represents the top of the line for luxury family vehicles.

The post Family Style: Li Auto L9 Brings Top-Line Luxury and Intelligence to Full-Size SUV With NVIDIA DRIVE Orin appeared first on NVIDIA Blog.

Read More

Import data from cross-account Amazon Redshift in Amazon SageMaker Data Wrangler for exploratory data analysis and data preparation

Organizations moving towards a data-driven culture embrace the use of data and machine learning (ML) in decision-making. To make ML-based decisions from data, you need your data available, accessible, clean, and in the right format to train ML models. Organizations with a multi-account architecture want to avoid situations where they must extract data from one account and load it into another for data preparation activities. Manually building and maintaining the different extract, transform, and load (ETL) jobs in different accounts adds complexity and cost, and makes it more difficult to maintain the governance, compliance, and security best practices to keep your data safe.

Amazon Redshift is a fast, fully managed cloud data warehouse. The Amazon Redshift cross-account data sharing feature provides a simple and secure way to share fresh, complete, and consistent data in your Amazon Redshift data warehouse with any number of stakeholders in different AWS accounts. Amazon SageMaker Data Wrangler is a capability of Amazon SageMaker that makes it faster for data scientists and engineers to prepare data for ML applications by using a visual interface. Data Wrangler allows you to explore and transform data for ML by connecting to Amazon Redshift datashares.

In this post, we walk through setting up a cross-account integration using an Amazon Redshift datashare and preparing data using Data Wrangler.

Solution overview

We start with two AWS accounts: a producer account with the Amazon Redshift data warehouse, and a consumer account for SageMaker ML use cases. For this post, we use the banking dataset. To follow along, download the dataset to your local machine. The following is a high-level overview of the workflow:

  1. Instantiate an Amazon Redshift RA3 cluster in the producer account and load the dataset.
  2. Create an Amazon Redshift datashare in the producer account and allow the consumer account to access the data.
  3. Access the Amazon Redshift datashare in the consumer account.
  4. Analyze and process data with Data Wrangler in the consumer account and build your data preparation workflows.

Be aware of the considerations for working with Amazon Redshift data sharing:

  • Multiple AWS accounts – You need at least two AWS accounts: a producer account and a consumer account.
  • Cluster type – Data sharing is supported in the RA3 cluster type. When instantiating an Amazon Redshift cluster, make sure to choose the RA3 cluster type.
  • Encryption – For data sharing to work, both the producer and consumer clusters must be encrypted and should be in the same AWS Region.
  • Regions – Cross-account data sharing is available for all Amazon Redshift RA3 node types in US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), and South America (São Paulo).
  • Pricing – Cross-account data sharing is available across clusters that are in the same Region. There is no cost to share data. You just pay for the Amazon Redshift clusters that participate in sharing.

Cross-account data sharing is a two-step process. First, a producer cluster administrator creates a datashare, adds objects, and gives access to the consumer account. Then the producer account administrator authorizes sharing data for the specified consumer. You can do this from the Amazon Redshift console.

Create an Amazon Redshift datashare in the producer account

To create your datashare, complete the following steps:

  1. On the Amazon Redshift console, create an Amazon Redshift cluster.
  2. Specify Production and choose the RA3 node type.
  3. Under Additional configurations, deselect Use defaults.
  4. Under Database configurations, set up encryption for your cluster.
  5. After you create the cluster, import the direct marketing bank dataset. You can download from the following URL: https://sagemaker-sample-data-us-west-2.s3-us-west-2.amazonaws.com/autopilot/direct_marketing/bank-additional.zip.
  6. Upload bank-additional-full.csv to an Amazon Simple Storage Service (Amazon S3) bucket your cluster has access to.
  7. Use the Amazon Redshift query editor and run the following SQL query to copy the data into Amazon Redshift:
    create table bank_additional_full (
      age char(40),
      job char(40),
      marital char(40),
      education char(40),
      default_history varchar(40),
      housing char(40),
      loan char(40),
      contact char(40),
      month char(40),
      day_of_week char(40),
      duration char(40),
      campaign char(40),
      pdays char(40),
      previous char(40),
      poutcome char(40),
      emp_var_rate char(40),
      cons_price_idx char(40),
      cons_conf_idx char(40),
      euribor3m char(40),
      nr_employed char(40),
      y char(40));
    copy bank_additional_full
    from <S3 LOCATION OF THE CSV FILE>
    credentials <CLUSTER ROLE ARN>
    region 'us-east-1'
    format csv
    IGNOREBLANKLINES
    IGNOREHEADER 1

  8. Navigate to the cluster details page and on the Datashares tab, choose Create datashare.
  9. For Datashare name, enter a name.
  10. For Database name, choose a database.
  11. In the Add datashare objects section, choose the objects from the database you want to include in the datashare.
    You have granular control of what you choose to share with others. For simplicity, we share all the tables. In practice, you might choose one or more tables, views, or user-defined functions.
  12. Choose Add.
  13. To add data consumers, select Add AWS accounts to the datashare and add your secondary AWS account ID.
  14. Choose Create datashare.
  15. To authorize the data consumer you just created, go to the Datashares page on the Amazon Redshift console and choose the new datashare.
  16. Select the data consumer and choose Authorize.

The consumer status changes from Pending authorization to Authorized.

Access the Amazon Redshift cross-account datashare in the consumer AWS account

Now that the datashare is set up, switch to your consumer AWS account to consume the datashare. Make sure you have at least one Amazon Redshift cluster created in your consumer account. The cluster has to be encrypted and in the same Region as the source.

  1. On the Amazon Redshift console, choose Datashares in the navigation pane.
  2. On the From other accounts tab, select the datashare you created and choose Associate.
  3. You can associate the datashare with one or more clusters in this account or associate the datashare to the entire account so that the current and future clusters in the consumer account get access to this share.
  4. Specify your connection details and choose Connect.
  5. Choose Create database from datashare and enter a name for your new database.
  6. To test the datashare, go to query editor and run queries against the new database to make sure all the objects are available as part of the datashare.

Analyze and process data with Data Wrangler

You can now use Data Wrangler to access the cross-account data created as a datashare in Amazon Redshift.

  1. Open Amazon SageMaker Studio.
  2. On the File menu, choose New and Data Wrangler Flow.
  3. On the Import tab, choose Add data source and Amazon Redshift.
  4. Enter the connection details of the Amazon Redshift cluster you just created in the consumer account for the datashare.
  5. Choose Connect.
  6. Use the AWS Identity and Access Management (IAM) role you used for your Amazon Redshift cluster.

Note that even though the datashare is a new database in the Amazon Redshift cluster, you can’t connect to it directly from Data Wrangler.

The correct way is to connect to the default cluster database first, and then use SQL to query the datashare database. Provide the required information for connecting to the default cluster database. Note that an AWS Key Management Service (AWS KMS) key ID is not required in order to connect.

Data Wrangler is now connected to the Amazon Redshift instance.

  1. Query the data in the Amazon Redshift datashare database using a SQL editor.
  2. Choose Import to import the dataset to Data Wrangler.
  3. Enter a name for the dataset and choose Add.

You can now see the flow on the Data Flow tab of Data Wrangler.

After you have loaded the data into Data Wrangler, you can do exploratory data analysis and prepare data for ML.

  1. Choose the plus sign and choose Add analysis.

Data Wrangler provides built-in analyses. These include but aren’t limited to a data quality and insights report, data correlation, a pre-training bias report, a summary of your dataset, and visualizations (such as histograms and scatter plots). You can also create your own custom visualization.

You can use the Data Quality and Insights Report to automatically generate visualizations and analyses to identify data quality issues, and recommend the right transformation required for your dataset.

  1. Choose Data Quality and Insights Report, and choose the Target column as y.
  2. Because this is a classification problem statement, for Problem type, select Classification.
  3. Choose Create.

Data Wrangler creates a detailed report on your dataset. You can also download the report to your local machine.

  1. For data preparation, choose the plus sign and choose Add analysis.
  2. Choose Add step to start building your transformations.

At the time of this writing, Data Wrangler provides over 300 built-in transformations. You can also write your own transformations using Pandas or PySpark.

You can now start building your transforms and analysis based on your business requirement.

Conclusion

In this post, we explored sharing data across accounts using Amazon Redshift datashares without having to manually download and upload data. We walked through how to access the shared data using Data Wrangler and prepare the data for your ML use cases. This no-code/low-code capability of Amazon Redshift datashares and Data Wrangler accelerates training data preparation and increases the agility of data engineers and data scientists with faster iterative data preparation.

To learn more about Amazon Redshift and SageMaker, refer to the Amazon Redshift Database Developer Guide and Amazon SageMaker Documentation.


About the Authors

 Meenakshisundaram Thandavarayan is a Senior AI/ML specialist with AWS. He helps hi-tech strategic accounts on their AI and ML journey. He is very passionate about data-driven AI.

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Read More

Predict types of machine failures with no-code machine learning using Amazon SageMaker Canvas

Predicting common machine failure types is critical in manufacturing industries. Given a set of characteristics of a product that is tied to a given type of failure, you can develop a model that can predict the failure type when you feed those attributes to a machine learning (ML) model. ML can help with insights, but up until now you needed ML experts to build models to predict machine failure types, the lack of which could delay any corrective actions that businesses need for efficiencies or improvement.

In this post, we show you how business analysts can build a machine failure type prediction ML model with Amazon SageMaker Canvas. Canvas provides you with a visual point-and-click interface that allows you to build models and generate accurate ML predictions on your own—without requiring any ML experience or having to write a single line of code.

Solution overview

Let’s assume you’re a business analyst assigned to a maintenance team of a large manufacturing organization. Your maintenance team has asked you to assist in predicting common failures. They have provided you with a historical dataset that contains characteristics tied to a given type of failure and would like you to predict which failure will occur in the future. The failure types include No Failure, Overstrain, and Power Failures. The data schema is listed in the following table.

Column Name Data Type Description
UID INT Unique identifier ranging from 1–10,000
productID STRING Consisting of a letter—L, M, or H for low, medium, or high—as product quality variants and a variant-specific serial number
type STRING Initial letter associated with productID consisting of L, M, or H only
air temperature [K] DECIMAL Air temperature specified in kelvin
process temperature [K] DECIMAL Precisely controlled temperatures to ensure quality of a given type of product specified in kelvin
rotational speed [rpm] DECIMAL Rotational speed of an object rotating around an axis is the number of turns of the object divided by time, specified as revolutions per minute
torque [Nm] DECIMAL Machine turning force through a radius, expressed in newton meters
tool wear [min] INT Tool wear expressed in minutes
failure type (target) STRING No Failure, Power Failure, or Overstrain Failure

After the failure type is identified, businesses can take any corrective actions. To do this, you use the data you have in a CSV file, which contains certain characteristics of a product as outlined in the table. You use Canvas to perform the following steps:

  1. Import the maintenance dataset.
  2. Train and build the predictive machine maintenance model.
  3. Analyze the model results.
  4. Test predictions against the model.

Prerequisites

A cloud admin with an AWS account with appropriate permissions is required to complete the following prerequisites:

  1. Deploy an Amazon SageMaker domain For instructions, see Onboard to Amazon SageMaker Domain.
  2. Launch Canvas. For instructions, see Setting up and managing Amazon SageMaker Canvas (for IT administrators).
  3. Configure cross-origin resource sharing (CORS) policies for Canvas. For instructions, see Give your users the ability to upload local files.

Import the dataset

First, download the maintenance dataset and review the file to make sure all the data is there.

Canvas provides several sample datasets in your application to help you get started. To learn more about the SageMaker-provided sample datasets you can experiment with, see Use sample datasets. If you use the sample dataset (canvas-sample-maintenance.csv) available within Canvas, you don’t have to import the maintenance dataset.


You can import data from different data sources into Canvas. If you plan to use your own dataset, follow the steps in Importing data in Amazon SageMaker Canvas.

For this post, we use the full maintenance dataset that we downloaded.

  1. Sign in to the AWS Management Console, using an account with the appropriate permissions to access Canvas.
  2. Log in to the Canvas console.
  3. Choose Import.
  4. Choose Upload and select the maintenance_dataset.csv file.
  5. Choose Import data to upload it to Canvas.

Import the dataset

The import process takes approximately 10 seconds (this can vary depending on dataset size). When it’s complete, you can see the dataset is in Ready status.

After you confirm that the imported dataset is ready, you can create your model.

Build and train the model

To create and train your model, complete the following steps:

  1. Choose New model, and provide a name for your model.
  2. Choose Create.
  3. Select the maintenance_dataset.csv dataset and choose Select dataset.
    In the model view, you can see four tabs, which correspond to the four steps to create a model and use it to generate predictions: Select, Build, Analyze, and Predict.
  4. On the Select tab, select the maintenance_dataset.csv dataset you uploaded previously and choose Select dataset.
    This dataset includes 9 columns and 10,000 rows. Canvas automatically moves to the Build phase.
  5. On this tab, choose the target column, in our case Failure Type.The maintenance team has informed you that this column indicates the type of failures typically seen based off of historical data from their existing machines. This is what you want to train your model to predict. Canvas automatically detects that this is a 3 Category problem (also known as multi-class classification). If the wrong model type is detected, you can change it manually with the Change type option.
    It should be noted that this dataset is highly unbalanced towards the No Failure class, which can be seen by viewing the column named Failure Type. Although Canvas and the underlying AutoML capabilities can partly handle dataset imbalance, this may result in some skewed performances. As an additional next step, refer to Balance your data for machine learning with Amazon SageMaker Data Wrangler. Following the steps in the shared link, you can launch an Amazon SageMaker Studio app from the SageMaker console and import this dataset within Amazon SageMaker Data Wrangler and use the Balance data transformation, then take the balanced dataset back to Canvas and continue the following steps. We are proceeding with the imbalanced dataset in this post to show that Canvas can handle imbalanced datasets as well.
    In the bottom half of the page, you can look at some of the statistics of the dataset, including missing and mismatched values, unique vales, and mean and median values. You can also drop some of the columns if you don’t want to use them for the prediction by simply deselecting them.
    After you’ve explored this section, it’s time to train the model! Before building a complete model, it’s a good practice to have a general idea about the model performance by training a Quick Model. A quick model trains fewer combinations of models and hyperparameters in order to prioritize speed over accuracy, especially in cases where you want to prove the value of training an ML model for your use case. Note that the quick build option isn’t available for models bigger than 50,000 rows.
  6. Choose Quick build.

model building in progress

Now you wait anywhere from 2–15 minutes. Once done, Canvas automatically moves to the Analyze tab to show you the results of quick training. The analysis performed using quick build estimates that your model is able to predict the right failure type (outcome) 99.2% of the time. You may experience slightly different values. This is expected.

Let’s focus on the first tab, Overview. This is the tab that shows you the Column impact, or the estimated importance of each column in predicting the target column. In this example, the Torque [Nm] and Rotational speed [rpm] columns have the most significant impact in predicting what type of failure will occur.

Analyse - Overview

Evaluate model performance

When you move to the Scoring portion of your analysis, you can see a plot representing the distribution of our predicted values with respect to the actual values. Notice that most failures will be within the No Failure category. To learn more about how Canvas uses SHAP baselines to bring explainability to ML, refer to Evaluating Your Model’s Performance in Amazon SageMaker Canvas, as well as SHAP Baselines for Explainability.
evaluate the model metrics

Canvas splits the original dataset into train and validation sets before the training. The scoring is a result of Canvas running the validation set against the model. This is an interactive interface where you can select the failure type. If you choose Overstrain Failure in the graphic, you can see that model identifies these 84% of time. This is good enough to take action on—perhaps have an operator or engineer check further. You can choose Power Failure in the graphic to see the respective scoring for further interpretation and actions.

You may be interested in failure types and how well the model predicts failure types based on a series of inputs. To take a closer look at the results, choose Advanced metrics. This displays a matrix that allows you to more closely examine the results. In ML, this is referred to as a confusion matrix.

advanced metrics

This matrix defaults to the dominate class, No Failure. On the Class menu, you can choose to view advanced metrics of the other two failure types of Overstrain Failure and Power Failure.

In ML, the accuracy of the model is defined as the number of correct predictions divided over the total number of predictions. The blue boxes represent correct predictions that the model made against a subset of test data where there was a known outcome. Here we are interested in what percentage of the time the model predicted a particular machine failure type (lets say No Failure) when its actually that failure type (No Failure). In ML, a ratio used to measure this is TP / (TP + FN). This is referred to as recall. In the default case, No Failure, there were 1,923 correct predictions out of 1,926 overall records, which resulted in 99% recall. Alternatively, in the class of Overstrain Failure, there were 32 out of 38, which results in 84% recall. Lastly, in the class of Power Failure, there were 16 out of 19, which results in 84% recall.

Now, you have two options:

  1. You can use this model to run some predictions by choosing Predict.
  2. You can create a new version of this model to train with the Standard build option. This will take much longer—about 1–2 hours—but provides a more robust model because it goes through a full AutoML review of data, algorithms, and tuning iterations.

Because you’re trying to predict failures, and the model predicts failures correctly 84% of time, you can confidently use the model to identify possible failures. So, you can proceed to option 1. If you weren’t confident, then you could have a data scientist review the modeling Canvas did and offer potential improvements via option 2.

Generate predictions

Now that the model is trained, you can start generating predictions.

  1. Choose Predict at the bottom of the Analyze page, or choose the Predict tab.
  2. Choose Select dataset, and choose the maintenance_dataset.csv file.
  3. Choose Generate predictions.

Canvas uses this dataset to generate our predictions. Although it’s generally a good idea to not use the same dataset for both training and testing, you can use the same dataset for the sake of simplicity in this case. Alternatively, you can remove some records from your original dataset that you use for training and use those records in a CSV file and feed it to the batch prediction here so you don’t use the same dataset for testing post-training.

batch prediction
After a few seconds, the prediction is complete. Canvas returns a prediction for each row of data and the probability of the prediction being correct. You can choose Preview to view the predictions, or choose Download to download a CSV file containing the full output.

download prediction
You can also choose to predict one by one values by choosing Single prediction instead of Batch prediction. Canvas shows you a view where you can provide the values for each feature manually and generate a prediction. This is ideal for situations like what-if scenarios, for example: How does the tool wear impact the failure type? What if process temperature increases or decreases? What if rotational speed changes?

single prediction

Standard build

The Standard build option chooses accuracy over speed. If you want to share the artifacts of the model with your data scientist and ML engineers, you can create a standard build next.

  1. Choose Add version
    Standard build - add version
  2. Choose a new version and choose Standard build.choose standard build
  3. After you create a standard build, you can share the model with data scientists and ML engineers for further evaluation and iteration.

Share model

Clean up

To avoid incurring future session charges, log out of Canvas.
Logout of Canvas

Conclusion

In this post, we showed how a business analyst can create a machine failure type prediction model with Canvas using maintenance data. Canvas allows business analysts such as reliability engineers to create accurate ML models and generate predictions using a no-code, visual, point-and-click interface. Analysts can take this to the next level by sharing their models with data scientist colleagues. Data scientists can view the Canvas model in Studio, where they can explore the choices Canvas made, validate model results, and even take the model to production with a few clicks. This can accelerate ML-based value creation and help scale improved outcomes faster.

To learn more about using Canvas, see Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas. For more information about creating ML models with a no-code solution, see Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts.


About the Authors

Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.

Twann Atkins is a Senior Solutions Architect for Amazon Web Services. He is responsible for working with Agriculture, Retail, and Manufacturing customers to identify business problems and working backwards to identify viable and scalable technical solutions. Twann has been helping customers plan and migrate critical workloads for more than 10 years with a recent focus on democratizing analytics, artificial intelligence and machine learning for customers and builders of tomorrow.

Omkar Mukadam is a Edge Specialist Solution Architecture at Amazon Web Services. He currently focuses on solutions which enables commercial customers to effectively design, build and scale with AWS Edge service offerings which includes but not limited to AWS Snow Family.

Read More