Amazon AWS – Page 198

Using graph neural networks to recommend related products

October 11, 2022

by Amazon AWS

Dual embeddings of each node, as both source and target, and a novel loss function enable 30% to 160% improvements over predecessors.Read More

Detect fraud in mobile-oriented businesses using GrabDefence device intelligence and Amazon Fraud Detector

October 10, 2022

by Marcel Pividal Amazon AWS

In this post, we present a solution that combines rich mobile device intelligence with customized machine learning (ML) modeling to help you catch fraudsters who exploit mobile apps.

GrabDefence (GD), Grab’s proprietary fraud detection and prevention technology, and AWS have launched GDxAFD, a fraud detection solution tailored for mobile apps that integrates GD’s device intelligence capabilities with Amazon Fraud Detector, AWS’s fully managed ML fraud detection solution. With GDxAFD, you can take advantage of more than 20 years of fraud detection expertise from Amazon as well as extensive mobile fraud experience from Southeast Asia’s leading superapp to safeguard your mobile application from fraudsters.

This solution rides on a larger global wave of anti-fraud efforts, which experts forecast to grow to USD $62.70 billion by 2028. With the rise of the digital economy, fraud syndicates increasingly target online businesses, causing financial loss and destroying the trust between end-users and the platform. The true cost to battle fraud is also increasing rapidly as more fraud checks leads to poorer customer experience, false positives, as well as operational burden, which as a whole is estimated to be three times larger than the actual fraud losses from the True Cost of Fraud^TM APAC Study by LexisNexis^® Risk Solutions.

From the combined industry experience, the solution team believes that many of the modus operandi in a mobile environment is driven by fraudsters having tools and methods to create fake accounts at scale and bypass a platform’s security checks on the device, thereby enabling them to exploit the platform for large returns. Therefore, preventing mobile fraud starts from clearly understanding the risk profile of the devices used to access the mobile app and then using the device risk intelligence gathered, together with additional data about the user, event, or account, to detect potential fraudulent behavior in real time and at scale. By combining rich device intelligence and ML, companies are better positioned to stay ahead of mobile-focused fraud syndicates, and reduce fraud on their platforms.

GD device intelligence

GD is a product from Grab’s fraud prevention team, which has years of experience building solutions for Grab. Grab is a NASDAQ listed company and a leading superapp in South East Asia, with over 30 million monthly transacting users (as per Grab’s Q1 2022 Results). Due to the scale of its operations as a leading superapp in SEA and the nature of a mobile-first business, Grab has been investing heavily in building fraud prevention solutions enabled by rich data, technology focus, and insights gathered from its operational experience and exposure. GD’s device intelligence service collects rich device-level data, excluding any personally identifiable information (PII), from mobile application users and securely analyzes it to understand the risk profile of the device. Learning from a large device network built via Grab’s superapp, GD’s device intelligence service can accurately generate device fingerprints and detect risky attributes such as device or app modification or tampering, emulator usage, and GPS spoofing. As mentioned earlier, many fraud modus operandi on mobile platforms involve mass creation of fake accounts, device reengineering, and location spoofing, which GD device intelligence is capable of detecting. As a result, by integrating GD device intelligence and Amazon Fraud Detector, platforms that face similar fraud attacks can expect up to a 23% increase in fraud detection based on statistical studies done by GrabDefence on Grab’s fraud prevention systems.

Custom fraud detection ML models in Amazon Fraud Detector

Amazon Fraud Detector customizes each model it creates to your own dataset, making the accuracy of models higher than one-size-fits-all ML solutions. During the fully automated model training process, a series of models that have learned patterns of fraud from AWS and Amazon’s own fraud expertise are used to boost your model performance even further.

With the GDxAFD solution, you now have step-by-step guidance and a reference architecture for how to use flexible event schemas in Amazon Fraud Detector to add GD device intelligence findings into your custom fraud detector models. The end result is an ML model that, once trained, has the benefit of learning from multiple data sources, including your own historical data, GD’s device intelligence data, fraud patterns seen across Amazon, and additional third-party data (added automatically by Amazon Fraud Detector). Based on our pilot between GD and Amazon Fraud Detector, our model using GD device intelligence has shown a 23% increase in detection performance for detecting fake account registrations. You can deploy these models to detect mobile fraud to prevent not only fake account registration but also fraudulent payments, promotion abuse, or loyalty program abuse, among others.

To get started, you first integrate GD’s mobile SDK into your mobile application to collect device-level data. Next, you use Amazon Fraud Detector to define the event you want to evaluate for fraud by specifying the event and account data points you have available for the event or account, including the device risk intelligence data points from GD. After this, you train your ML model in Amazon Fraud Detector in just a few steps. After you train the model, you can add it to a detector.

To begin performing real-time predictions, you integrate Amazon Fraud Detector’s low-latency prediction API into your application and begin sending new mobile events to generate fraud predictions. Each fraud prediction considers the GD device intelligence data for the device associated with the event as well as additional data and intelligence automatically added by Amazon Fraud Detector, including signals from fraud patterns experienced across Amazon.

Solution overview

Device intelligence is a critical type of input for risk decisions. One of the common challenges faced in fraud detection in the mobile space is the lack of enriched data availability to make risk decisions. On the other hand, mobile devices are typically the most expensive asset the fraudsters and fraud syndicates possess and, therefore, a significant level of effort is put into masking the true identity and profile of the device being used. Understanding the risk profile of the mobile device (which sometimes isn’t even a real device) and being able to drive insights from the relationship between different mobile devices can significantly improve risk decisions for any mobile business, and becomes central to any mobile-based fraud management strategy.

For generating real-time fraud predictions, the GDxAFD solution uses Amazon Fraud Detector and GrabDefence’s device intelligence SDK, along with Amazon API Gateway and AWS Lambda. You can provision the AWS portions of the solution using AWS CloudFormation.

The following diagram illustrates our solution architecture.

The workflow consists of the following steps:

When an end-user interacts with your mobile app, GD’s mobile SDK passively gathers device data and streams this data to GD’s device intelligence service, where a risk profile for the device is generated.
Then, when that user transacts using the mobile app and you want to assess fraud risk in real time, the mobile app sends the transaction data gathered by the app via API Gateway to a Lambda function.
The Lambda function gathers the GrabDefence risk profile for the device used during the transaction, combines that profile data with the other transaction data, and sends it to the fraud detector.
The fraud detector performs a fraud prediction using your custom fraud detection ML model and ruleset, and returns a risk score and outcome to the Lambda function. This result is sent back to your mobile app via API Gateway.
If desired, the mobile app can then choose to adjust the end-user experience accordingly based on this risk assessment.

Use cases for device intelligence with Amazon Fraud Detector

The ideal end-state solution is an Amazon Fraud Detector model that is trained on a dataset of your historical events and their associated historical GD device intelligence data. To achieve this, you need to integrate the GD Guardian SDK for mobile devices and then gather device intelligence data for your events until you have enough to train a model (for example,10,000 events with at least 400 examples of fraud events). Depending on your use case and availability of fraud labels, you have a couple of ways to get started sooner as you gather data for this solution:

Use case A: Use GD device intelligence data directly in the fraud detector rules – With this use case, you create a detector in Amazon Fraud Detector with a ruleset designed to flag high-risk events provided by the device intelligence. This works effectively when you have clear risk mitigation policies that you want to deploy for your platform. (for example, act on the user if the device is jailbroken, or don’t allow redemption of a promo if the device has more than five accounts) In such cases, you can set up your detector rules to flag events based on a combination of GD device risk score and GD device verdicts. This option requires no historical event data or labels to get started, so it can be ready to use sooner than the ML-based detection options.
Use case B: Use GD device intelligence and an Amazon Fraud Detector ML model with the fraud detector rules – If you have a historical event dataset and are able to train an Amazon Fraud Detector ML model immediately, you can build on use case A by adding an Amazon Fraud Detector model to your rules-based detector. This way, your detector logic is evaluating device intelligence with rules and all other event data with a customized ML model. This allows you to solve for more complex fraud tactics where statistical methods are required to separate fraud from non-fraud.

Best results are often achieved when both of these scenarios work in tandem, because they can serve different use cases over time even after you have more historical data. With these methods, Amazon Fraud Detector makes it easy to transition to the ideal solution in a few steps.

In the following sections, we walk through the steps to get started using Amazon Fraud Detector with GD device intelligence data.

Integrate the GD mobile SDK and start collecting device intelligence data

Prior to using GrabDefence device intelligence within your application, you must first register as a GrabDefence client. You receive the following credentials from the GrabDefence team:

tenant_id – A unique client identifier that represents your organization
app_id – A unique application identifier that represents the application you’re integrating

Refer to the GrabDefence documentation for further guidance on how to integrate this SDK.

Create your event type in Amazon Fraud Detector

An event type defines the schema for the event you want to assess for fraud. When creating an event type in Amazon Fraud Detector, you define all the data elements you will have available at the time of the fraud evaluation, including the GD device intelligence risk profile data elements such as the unique device ID and various device verdicts, to Amazon Fraud Detector variables. You need to include event variables (such as IP, email, or billing address) that are unique to the type of event you’re evaluating for fraud, as well as GD device intelligence data. The following table shows examples of event variables, the GD device intelligence data, and the recommended Amazon Fraud Detector variable type to map each element to.

Event Variable Type	Event Variable (Not Exhaustive)	Amazon Fraud Detector Event Variable	Example
Event Metadata	EVENT_TIMESTAMP	EVENT_TIMESTAMP	2019-11-30T13:01:01Z
	EVENT_ID	EVENT_ID	test0299df10-e2db-11eb-96e2-f7dgje3d3k03
	ENTITY_ID	ENTITY_ID	123
	EVENT_LABEL	EVENT_LABEL	FRAUD or LEGIT
	LABEL_TIMESTAMP	LABEL_TIMESTAMP	2019-11-30T13:01:01Z
Event Variables	Email	EMAIL_ADDRESS	test@example.com
	IP	IP_ADDRESS	192.0.2.1
	Phone	PHONE_NUMBER	555-0123
GD Device Intelligence Verdicts	Verdict: IOS Jailbroken Device	CUSTOM: CATEGORICAL	GV_IOS_JAIL_BROKEN
	Verdict: Debugger Detected	CUSTOM: CATEGORICAL	GV_DEBUGGER_DETECTED
	Verdict: Event Token Signature Mismatch	CUSTOM: CATEGORICAL	GV_EVENT_TOKEN_SIGNATURE_MISMATCH
	Verdict: Server Challenge Mismatch	CUSTOM: CATEGORICAL	GV_SERVER_CHALLENGE_MISMATCH
GD Risk Scores	User account risk score	CUSTOM: NUMERICAL	0.9 etc

Build your detection logic in Amazon Fraud Detector

At this point, you need to decide whether you want to start with use case A or use case B. For use case A, you start building a rules-based detector. For use case B, you build an Amazon Fraud Detector model first and, once finished, add the model to your detector.

For instructions on building an Amazon Fraud Detector model and detector, refer to the Amazon Fraud Detector user guide.

The following screenshot shows sample detector rules on the Amazon Fraud Detector console.

Test your detector using Amazon Fraud Detector batch predictions

You can use a batch predictions job to test your detector against a set of events using either the Amazon Fraud Detector console or the CreateBatchPredictionJob API. You need to specify the detector version (created in the previous step) and provide the events via a CSV file (up to 50 MB large) stored in an Amazon Simple Storage Service (Amazon S3) bucket. The output file containing the original input data along with appended results of the detector’s predictions will be available in the same S3 bucket (unless you specify a different location).

For more information on running an Amazon Fraud Detector batch prediction, refer to Amazon Fraud Detector batch predictions documentation page.

Set up the supporting infrastructure

To perform real-time predictions using the detector you built, you must set up a Lambda function that performs the following actions:

Receives transaction data (via API Gateway) gathered from your mobile app. This includes data such as IP address, email address, shipping and billing info, and so on, that is unique to the transaction and use case.
Collects the risk profile from the GD API. This includes device intelligence data and risk signals from GD. You need to convert the GD verdicts to the appropriate Amazon Fraud Detector variable CUSTOM: CATEGORICAL types. For example, if the GD verdict list contains GV_IOS_JAIL_BROKEN, you need to set the Verdict: IOS Jailbroken Device variable to TRUE when sending to Amazon Fraud Detector (as detailed in the next section).
Sends the data to the detector using the GetEventPrediction API (see the next section).

Perform real-time predictions using the Amazon Fraud Detector GetEventPrediction API

Your Lambda function can call the Amazon Fraud Detector GetEventPrediction API to perform real-time predictions and obtain results synchronously. The GetEventPrediction API returns matched outcomes based on the rules you set up earlier. If you attached a model to your detector in Amazon Fraud Detector, the model score is also returned as part of the GetEventPrediction API response. You can find examples of GetEventPrediction requests on the aws-fraud-detector-samples GitHub repository.

You can configure your Lambda function accordingly to parse the response from this API, and return the appropriate action to the mobile application (via API Gateway).

Build and train your model

After you integrate the GD SDK and are generating predictions with Amazon Fraud Detector, your events are stored in Amazon Fraud Detector and you can use the UpdateEventLabel API to add fraud labels for confirmed fraud events. When your stored dataset has 10,000 events with device data and at least 400 labelled as fraud, you can start building a custom Amazon Fraud Detector model that learns from GD’s device intelligence data.

At this point, you’re ready to train the model. This takes a few steps on the Amazon Fraud Detector console, and model training typically takes around an hour but can be longer depending on the size of your training dataset.

On the Amazon Fraud Detector console, choose Create model.
Choose Transaction Fraud Insights as the model type.
Choose the event type you created earlier.
Choose the date range for your training dataset that encompasses the period where you’ve collected GD device intelligence data.
Add all the event type variables, including the GD device-specific elements, to your model’s input configuration.
Strat training the model.

After your model is trained, you can review performance metrics and then deploy it by changing its status to Active. To learn more about model scores and performance metrics, see Model scores and Training performance metrics. At this point, you can now add your model to your detector, add threshold rules to interpret the risk scores that the model outputs, and continue making predictions using the GetEventPrediction API.

Automate the solution

You can use AWS CloudFormation to automate the creation of your Amazon Fraud Detector event type and related resources. For more details, refer to managing resources using AWS CloudFormation.

Conclusion

Congrats! You have successfully built an Amazon Fraud Detector model that integrates GD device intelligence into your detector. The Amazon Fraud Detector ML model you trained has learned from multiple data sources, including your own historical data, GD’s device intelligence data, fraud patterns seen across Amazon, and additional third-party data (added automatically by Amazon Fraud Detector). You can deploy this solution on your mobile apps to detect and capture various types of mobile fraud.

Special thanks to everyone who contributed to this blog including, Abhishek Ravi, Tanay Bhargava, Eric Burris, Puneet Gambhir (GrabDefence), Brian Kim (GrabDefence), and Sing Kwan Ng (GrabDefence).

About the author

Marcel Pividal is a Sr. AI Services Solutions Architect in the World-Wide Specialist Organization. Marcel has more than 20 years of experience solving business problems through technology for Fintechs, Payment Providers, Pharma, and government agencies. His current areas of focus are Risk Management, Fraud Prevention, and Identity Verification.

Adriaan de Jonge is Partner Solutions Architect at AWS in Singapore. He is part of the AWS GSI team in the ASEAN geography. Adriaan is particularly interested in serverless, cloud-native development, and DevOps. In his spare time, he likes to bake cakes that are suitable for people with allergies.

Jianbo Liu is a Research Scientist with Amazon Fraud Detector.

Amazon Research Awards recipients announced

October 10, 2022

by Amazon AWS

Awardees represent 36 universities in eight countries. Recipients have access to more than 300 Amazon public datasets, along with AWS AI/ML services and tools.Read More

TRIPP explores the potential of VR–powered meditation

October 7, 2022

by Amazon AWS

Alexa Fund portfolio company’s science-led program could change how we approach mental wellness — and how we use VR.Read More

How Synamedia uses Amazon Rekognition Video to build advanced video search capabilities for long-form video

October 6, 2022

by Daniel Burke Amazon AWS

Synamedia is a leading video technology provider addressing the needs for premium video service providers and direct-to-consumer (D2C) with a comprehensive solution portfolio. Synamedia solutions spread across several pillars such as video networks, TV platforms, advertisement and monetization, and content protection and piracy disruption.

Synamedia partnered with AWS to use artificial intelligence (AI) to develop enhanced video search capabilities for long-form video. This is to assist their customers in searching for videos based on a description of scenes that aren’t described in the metadata of the assets. For example, searching for a video (even within a series) that contained a scene on a boat that isn’t significant enough to be mentioned in the metadata. This enables content discovery driven from real-world objects.

With Amazon Rekognition Video, Synamedia built an AI solution that was able to perform label detection in videos and in images using standard and custom models. This enabled scene-level detection of specific objects in long-form video, based on what is actually in the scene at the time. This new capability allows users to find specific occurrences within the long-form video, based only on a general description of what they’re looking for. This enables Synamedia to perform extremely fast when onboarding new content, which now takes a few hours to spin up and get results. The solution is simple to use and extensive by providing the ability to add further custom models for domain-specific images.

“Amazon Rekognition Video is a powerful service that is simple to use. It gave us ready-made access to best-in-class computer vision capabilities, which we could use to build and test innovative video search features in a matter of weeks.”

– Avi Fruchter, Software Engineering Fellow at Synamedia.

Using AI to index visual content

As both supply of video content and demand for greater video insights continue to grow, effective video search capabilities are becoming more important. Traditional video search, however, is typically limited to basic information such as the video title, or in some instances, to metadata attached as tags that describe the key themes or content of the video.

Most descriptive information needs to be added manually, but this becomes prohibitive as the quantity of video grows. As a result, traditional video search performance is often limited. This limitation is even more pronounced for long-form video content, for which scene-level metadata usually doesn’t exist, given how expensive and time-consuming it is to produce.

To address this limitation, Synamedia set out to develop an AI-powered video search solution using computer vision to automatically identify scene-level details in any given video, and make that information discoverable to users based on general descriptions of those scenes.

Using Amazon Rekognition to build a custom computer vision solution in just 2 weeks

To accomplish this goal, Synamedia’s Software Engineering Fellow, Avi Fruchter, turned to Amazon Rekognition, a fully managed video analysis service that helps accelerate the process of using computer vision models to detect relevant scene-level occurrences such as objects, activities, and even text and scenes.

Amazon Rekognition Video accelerates the development of computer vision solutions for video by automatically processing and tagging video content using computer vision models. These models are fully managed and maintained by Amazon Rekognition. It removes the undifferentiated heavy lifting of managing the necessary infrastructure, and also reduces the technical expertise required to build and deploy these models.

To get started, you simply choose which of Amazon Rekognition’s wide range of capabilities is relevant to your task, and call the relevant API. The results are then returned as an easy-to-manage JSON response for each job.

For example, Synamedia used the StartLabelDetection API to automatically generate a list of labels for objects detected in each video frame of their video library. From this simple API call, Amazon Rekognition returned the list of labels, the confidence score of each, and the relevant timestamps for each frame. This enabled Synamedia to immediately create an entirely new set of search metadata for each video in their test library. Users are then able to search for specific video content just by describing specific objects or scenery they’re interested in, and get results that not only match their query, but that also point them to the specific scene in the video that featured that content.

Other relevant Amazon Rekognition APIs for video analysis are StartFaceDetection, StartPersonTracking, and StartSegmentDetection—a feature that can identify the moment that scenes in a video change.

Amazon Rekognition works on both pre-recorded and live video. Pre-recorded video is read from Amazon Simple Storage Service (Amazon S3), and live video can be processed from Amazon Kinesis Video Streams.

Synamedia chose Amazon Rekogntion for its ability to rapidly expand their capabilities. Synamedia’s innovation team is dedicated solely to building new technical innovations in video and has strong technical expertise. However, even for them it’s not always possible to have deep domain expertise in all areas of video technology. Enter Amazon Rekogntion, which extended their capabilities in computer vision, enabling them to conceptualize a use case and quickly test its viability.

“It was extremely fast to onboard, and the results were extremely quick,” Avi Fruchter says. “We are not always domain experts in all areas of ML, and Amazon Rekognition gives us the ability to leverage our existing expertise into new types of enhanced use cases for our customers.”

Synamedia anticipates their solution will have broad benefits for a wide range of customers, including companies with large video libraries as well as the growing number of companies who need to monitor specific events in live video feeds, such as health and safety risks.

Summary

With Amazon Rekognition Video, Synamedia was able to build and test an advanced video search capability in a matter of weeks, without needing to hire or develop additional specialized computer vision expertise.

This new capability has enabled Synamedia to expand the impact of its innovation team and continue with its mission to drive new video innovation for its customers.

Learn more about how you can quickly build advanced computer vision solutions for video by visiting Amazon Rekognition Video or referring to Amazon Rekognition resources.

About the authors

Daniel Burke is the European lead for AI and ML in the Private Equity group at AWS. Daniel works directly with Private Equity funds and their portfolio companies, helping them accelerate their AI and ML adoption to improve innovation and increase enterprise value.

John Shaw is the North American lead for AI and ML in the Private Equity group at AWS. John works directly with Private Equity funds and their portfolio companies, helping them accelerate their AI and ML adoption to improve innovation and increase enterprise value.

Increase ML model performance and reduce training time using Amazon SageMaker built-in algorithms with pre-trained models

October 6, 2022

by Vedant Jain Amazon AWS

Model training forms the core of any machine learning (ML) project, and having a trained ML model is essential to adding intelligence to a modern application. A performant model is the output of a rigorous and diligent data science methodology. Not implementing a proper model training process can lead to high infrastructure and personnel costs because it underlines the experimental phase of the ML process and by nature tends to be highly iterative.

Generally speaking, training a model from scratch is time-consuming and compute intensive. When the training data is small, we can’t expect to train a very performant model. A better alternative is to fine-tune a pretrained model on the target dataset. For certain use cases, Amazon SageMaker provides high-quality pretrained models that were trained on very large datasets. Fine-tuning these models takes a fraction of the training time compared to training a model from scratch.

To validate this assertion, we ran a study using built-in algorithms with pretrained models. We also compared two types of pretrained models within Amazon SageMaker Studio, Type 1 (legacy) and Type 2 (latest), against a model trained from scratch using Defect Detection Network (DDN) with regards to training time and infrastructure cost. To demonstrate the training process, we used the default detection dataset from the post Visual inspection automation using Amazon SageMaker JumpStart. This post showcases the results of the study. We also provide a Studio notebook, which you can modify to run the experiments using your own dataset and an algorithm or model of your choosing.

Model training in Studio

SageMaker is a fully managed ML service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment.

There are many ways with which you can train ML models using SageMaker, such as using Amazon SageMaker Debugger, Spark MLLib, or using custom Python code with TensorFlow, PyTorch, or Apache MXNet. You can also bring your own custom algorithm or choose an algorithm from AWS Marketplace.

Furthermore, SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and ML practitioners get started on training and deploying ML models quickly.

You can use built-in algorithms for either classification or regression problems, or for a variety of unsupervised learning tasks. Other built-in algorithms include text analysis and image processing. You can train a model from scratch using a built-in algorithm for a specific use case. For a full list of available built-in algorithms, see Common Information About Built-in Algorithms.

Some built-in algorithms also include pre-trained models for popular problem types that use the SageMaker SDK as well as Studio. These pre-trained models can greatly reduce the training time as well as infrastructure cost for common use cases such as semantic segmentation, object detection, text summarization, and question answering. For a complete list of pre-trained models, see Models.

For choosing the best model, SageMaker automatic model tuning, also known as hyperparameter tuning or hyperparameter optimization (HPO), can be very useful because it finds the best version of a model by running a slew of training jobs on your dataset using the algorithm and hyperparameters that you specify. Depending on the number of hyperparameters and the size of the search space, finding the best model can require thousands or even tens of thousands of training runs. Automatic model tuning provides a built-in HPO algorithm that removes the undifferentiated heavy lifting required to build your own HPO algorithm. Automatic model tuning provides the option of parallelizing model runs in order to reduce the time and cost of finding the best fit.

After the automatic model tuning has completed multiple runs for a set of hyperparameters, it chooses the hyperparameter values that result in the model with the best performance, as measured by the loss function specific to the model.

Training and validation loss is just one of the metrics needed to pick the best model for the use case. With so many options, it’s not always easy to make the right choice, and picking the best model boils down to the training time, cost of infrastructure, complexity, and quality of the resulting model, among other factors. There are other extraneous costs such as platform and personnel costs that we don’t take into account for this study.

In the subsequent sections, we discuss the study design and the results.

Dataset

We use the NEU-CLS dataset and a detector on the NEU-DET dataset. This dataset contains 1,800 images and 4,189 bounding boxes in total. The type of defects in our dataset are as follows:

Crazing (class: Cr, label: 0)
Inclusion (class: In, label: 1)
Pitted surface (class: PS, label: 2)
Patches (class: Pa, label: 3)
Rolled-in scale (class: RS, label: 4)
Scratches (class: Sc, label: 5)

For more details about the dataset, refer to Visual inspection automation using Amazon SageMaker JumpStart.

Models

We introduced the Defect Detection Network in the post Visual inspection automation using Amazon SageMaker JumpStart. We trained this model from scratch with the default hyperparameters, so we could have a benchmark to evaluate the rest of the models.

For object detection use cases, SageMaker provides the following set of built-in object models:

Type 1 (legacy) – Uses a built-in legacy object detection algorithm and uses the Single Shot multibox Detector (SSD) model with either a VGG or ResNet backbone, and was pre-trained on the ImageNet dataset.
Type 2 (latest) – Provides nine pre-trained object detection models, including eight SSD models and one FasterRCNN model. These models use VGG, ResNet, or MobileNet as the backbone, and were pre-trained on COCO, VOC, or FPN datasets.

Aside from training a model from scratch, we used these models to evaluate four approaches that typically reflect an ML model training process. The output of each approach is a trained ML model. In cases 1 and 3, a set of fixed hyperparameters are provided to train a single model, whereas in cases 2 and 4, SageMaker produces the best model and the set of hyperparameters that led to the best fit.

Type 1 (legacy) model – We use the model with a ResNet backbone, which is pre-trained on ImageNet with default hyperparameters and no optimizer.
Fine-tune Type 1 (legacy) with HPO – Now we run HPO to find better hyperparameters that lead to a better model. For a list of all parameters you can fine-tune, refer to Tune an Object Detection Model. In this notebook, we only fine-tune learning rate, momentum, and weight decay. We use automatic model tuning to run HPO. We need to provide hyperparameter ranges for learning rate, momentum, and weight decay. Automatic model tuning will monitor the log and parse the objective metrics. For object detection, we use Mean Average Precision (mAP) on the validation dataset as our metric.
Fine-tune Type 2 (latest) model – For the Type 2 (latest) object detection model, we follow the instructions in Fine-tune a Model and Deploy to a SageMaker Endpoint and use standard SageMaker APIs. You can find all fine-tunable Type 2 (latest) object detection models in the Built-in Algorithms with pre-trained Model table and set FineTunable?=True. Currently, there are nine fine-tunable object detection models. We use the one with the VGG backend and pretrained on VOC dataset. We fine-tune using a set of static hyperparameters.
Fine-tune Type 2 (latest) model with HPO – We provide a range for the ADAM learning rate; the rest of the hyperparameters stay default. Also, note that the Type 2 (latest) model training reports Val_CrossEntropy loss and Val_SmoothL1 loss instead of mAP on the validation dataset. Because we can only specify one evaluation metric for automatic model tuning, we choose to minimize Val_CrossEntropy.

For details on the hyperparameters, you can go through the Studio notebook.

Metrics

Next, we compare the results from the approaches based on important metrics and the infrastructure cost:

Loss function difference across models – All the different algorithms define the same loss function for object detection task: cross-entropy and smooth L1 loss. However, we use them differently:
- The Type 1 (legacy) object detection algorithm has defined mAP on the validation data, and we use it as the metric to find a training job that maximizes mAP.
- The Type 2 (latest) object detection algorithm, however, doesn’t define mAP. Instead, it defines Val_SmoothL1 loss and Val_CrossEntropy loss on the validation data. During model training with HPO, we need to specify one metric for automatic model tuning to monitor and parse. Therefore, we use Val_CrossEntropy loss as the metric and find the training job that minimizes it.
Validation metric (mAP) – We use the mAP on the validation dataset as our metric, where average precision is the average of precision and recall. mAP is the standard evaluation metric used in the COCO challenge for object detection tasks. For more information about the applicability of mAP for object detection, refer to mAP (mean Average Precision) for Object Detection. Because there is a difference in loss function between Type 1 and Type 2 models, we manually calculate the mAP for each type of model on the test dataset. We accomplish this by deploying the models behind a SageMaker endpoint and calling the model endpoint to score on the subset of the dataset. The results are then compared against the ground truth to calculate the mAP for each model type.
Training Instances Runtime cost – For simplicity, we only report the infrastructure cost incurred for each of the four approaches highlighted in the previous section. The cost is reported in dollars and calculated based on the runtime of the underlying Amazon Elastic Compute Cloud (Amazon EC2) instances.

Notebook

The Studio notebook is available on GitHub.

Results

The steel surface dataset has a total of 1,800 images in six categories. As discussed in the previous section, because there is a difference in the loss function that Type 1 (legacy) and Type 2 (latest) models maximize to find the best model, we first perform a train/test split on the dataset. In the final phase of the study, we run inference on the test dataset, so that we can compare across the four approaches using the same metric (mAP).

The test set contains 20% of the original dataset, which we randomly allocate from the full dataset. The remaining 80% is used for the model training phase, which requires us to define the training as well as the validation dataset. Therefore, for the training phase, we do a further 80/20 split on the data, where 80% of the training data is used for training and 20% for validation. See the following table.

Data	Number of Samples	Percentage of Original Dataset
Full	1,800	100
Train	1,152	64
Validation	288	16
Test	360	20

The output of each of the four approaches was a trained ML model. We plot the results from each of the four approaches alongside the bounding boxes from ground truth as well as the DDN model. The following plot also shows the confidence score for the class prediction.

A confidence score is provided as an evaluation standard. This confidence score shows the probability of the object of interest being detected correctly by the algorithm and is given as a percentage. The scores are taken on the mAP at different IoU (Intersection over Union) thresholds.

For the purpose of generating the mAP score against the test dataset, we deployed each model behind its own SageMaker real-time endpoint. Each inferencing test produced a mAP score.

A larger mAP score implies a higher accuracy of the model test results. Clearly, the Type 2 (latest) models outperforms the Type 1 (legacy) models in regards to accuracy, with or without using HPO. Type 2 with HPO has a slighter edge (mAP 0.375) over one without HPO (mAP 0.371).

We also measured the cost of training for each of the four approaches. We used the P3 instance types, specifically the ml.p3.2xlarge instances for each of the approaches. Each ml.p3.2xlarge instance costs $3.06/hour. Both the inference test mAP score and the cost of training are summarized in the following chart for comparison.

For simplicity, we did a cost comparison on the runtime of the training instances only.

For a more granular estimate of the total cost incurred, including the cost of Studio notebooks as well as the real-time endpoints used for inferencing, refer to the AWS Pricing Calculator for SageMaker.

The results indicate considerable gains in accuracy when moving from the Type 1 (legacy) to Type 2 (latest) model. The mAP score went up from 0.067 to 0.371 without using HPO and 0.226 to 0.375 with HPO respectively. The Type 2 model also took longer to train with the same instance type, implying that the accuracy gains also meant higher infrastructure cost. However, all mentioned approaches outperformed the DDN model (introduced in Visual inspection automation using Amazon SageMaker JumpStart) on all metrics. Training the Type 1 (legacy) model took 34 minutes, the Type 2 (latest) model took 1 hour, and the DDN model took over 8 hours. This indicates that fine-tuning a pre-trained model is much more efficient than training a model from scratch.

We also found that HPO (SageMaker automatic model tuning) is extremely effective, especially for models with large hyperparameter search spaces with 4x improvement in mAP score for Type 1 (legacy) model. We noted that we yielded much better model accuracy results when fine-tuning on three hyperparameters (learning rate, momentum, and weight decay) for the Type 1 (legacy) models as opposed to only one hyperparameter (ADAM learning rate) for the Type 2 (latest) model. This is because there is a relatively larger search space and therefore more room for improvement for the Type 1 (legacy) model. However, we need to trade off model performance with infrastructure cost and training time when running HPO.

Conclusion

In this post, we walked through the many ML model training options available with SageMaker and focused specifically on SageMaker built-in algorithms and pre-trained models. We introduced Type 1 (legacy) and Type 2 (latest) models. The built-in Sagemaker object detection models discussed in this post were pre-trained on large-scale datasets—the ImageNet dataset includes 14,197,122 images for 21,841 categories, and the PASCAL VOC dataset includes 11,530 images for 20 categories. The pre-trained models have learned rich and diverse low-level features, and can efficiently transfer knowledge to fine-tuned models and focus on learning high-level semantic features for the target dataset. You can find all built-in algorithms and fine-tunable pre-trained models at Built-in Algorithms with pre-trained Model Table and choose one for your use case. The use cases span from text summarization and question answering to computer vision and regression or classification.

In the beginning, we made an assertion that fine-tuning a SageMaker pre-trained model will take a fraction of training time that training a model from scratch. We trained a DNN model from scratch and introduced two types of SageMaker Built-in algorithms with pretrained models: Type (legacy) and Type 2 (latest). We further showcased four approaches, two of which used SageMaker automated model tuning, and finally arrived at the most performant model. When considering both training time as well as runtime cost, all SageMaker built-in algorithms outperformed the DDN model, thereby validating our assertion.

Although both Type 1 (legacy) and Type 2 (latest) outperformed training the DDN model from scratch, visual and numerical comparison confirmed that the Type 2 (latest) model and Type 2 (latest) model with HPO outperforms Type 1 (legacy) models. HPO had a big impact on accuracy for Type 1 models; however, it saw modest gains using HPO for Type 2 models, due to a constricted hyperparameter space.

In summary, for certain use cases, fine-tuning a pretrained model is both more efficient and more performant. We suggest taking advantage of the pre-trained Sagemaker built-in pretrained models and fine-tune on your target datasets. To get started, you need a Studio environment. For more information, refer to the Studio Development Guide and make sure to enable SageMaker projects and JumpStart. When your Studio setup is complete, navigate to the Studio Launcher to find the full list of JumpStart solutions and models. To recreate or modify the experiment in this post, choose the “Product Defect Detection” solution, which comes prepackaged with the notebook used to experiment, as shown in the following video. After you launch the solution, you can access the mentioned work in the notebook titled visual_object_detection.ipynb.

About the authors

Vedant Jain is a Sr. AI/ML Specialist Solutions Architect, helping customers derive value out of the Machine Learning ecosystem at AWS. Prior to joining AWS, Vedant has held ML/Data Science Specialty positions at various companies such as Databricks, Hortonworks (now Cloudera) & JP Morgan Chase. Outside of his work, Vedant is passionate about making music, using Science to lead a meaningful life & exploring delicious vegetarian cuisine from around the world.

Tao Sun is an Applied Scientist in Amazon Search. He obtained his Ph.D. in Computer Science from University of Massachusetts, Amherst. His research interests lie in deep reinforcement learning and probabilistic modeling. In the past, Tao worked for AWS Sagemaker Reinforcement Learning team and contributed to RL research and applications. Tao is now working on Page Template Optimization at Amazon Search.

InformedIQ automates verifications for Origence’s auto lending using machine learning

October 6, 2022

by Robert Berger Amazon AWS

This post was co-written with Robert Berger and Adine Deford from InformedIQ.

InformedIQ is the leader in AI-based software used by the nation’s largest financial institutions to automate loan processing verifications and consumer credit applications in real time per the lenders’ policies. They improve regulatory compliance, reduce cost, and increase accuracy by decreasing human error rates that are caused by the repetitive nature of tasks. Informed partnered with Origence (the nation’s leading lending technology solutions and services provider for 1,130 credit unions serving over 64 million members) to power Origence’s document process automation functionality for indirect lending to automatically identify documents and validate financing policies, creating a better credit union and dealer experience for their network of over 15,000 dealers. To date, $110 billion in auto loans have originated with Informed’s automation, which is 8% of all US auto loans. Six of the top 10 consumer lenders trust Informed’s technology.

In this post, we learn about the challenges faced and how machine learning (ML) solved the problems.

Problem statement

Manual loan verification document processing is time-consuming. The verification includes consumer stipulations like proof of residence, identity, insurance, and income. It can be prone to human error due to the repetitive nature of tasks.

With ML and automation, Informed can provide a software solution that is available 24/7, over holidays and weekends. The solution works accurately without conscious or unconscious bias to calculate and clear stipulations in under 30 seconds, vs. an average of 7 days for loan verifications, with 99% accuracy.

Solution overview

Informed uses a wide range of AWS offerings and capabilities, including Amazon SageMaker and Amazon Textract in their ML stack to power Origence’s document process automation functionality. The solution automatically extracts data and classifies documents (for example, driver’s license, paystub, W2 form, or bank statement), providing the required fields for the consumer verifications used to determine if the lender will grant the loan. Through accurate income calculations and validation of applicant data, loan documents, and documented classification, loans are processed faster and more accurately, with reduced human errors and fraud risk, and added operational efficiency. This helps in creating a better consumer, credit union, and dealer experience.

To classify and extract information needed to validate information in accordance with a set of configurable funding rules, Informed uses a series of proprietary rules and heuristics, text-based neural networks, and image-based deep neural networks, including Amazon Textract OCR via the DetectDocumentText API and other statistical models. The Informed API model can be broken down into five functional steps, as shown in the following diagram: image processing, classification, image feature computations, extractions, and stipulation verification rules, before determining the decision.

Given a sequence of pages for various document types (bank statement, driver’s license, paystub, SSI award letter, and so on), the image processing step performs the necessary image enhancements for each page and invokes multiple APIs, including Amazon Textract OCR for image to text conversion. The rest of the processing steps use the OCR text obtained from image processing and the image for each page.

Main advantages

Informed provides solutions to the auto lending industry that reduce manual processes, support compliance and quality, mitigate risk, and deliver significant cost savings to their customers. Let’s dive into two main advantages of the solution.

Automation at scale with efficiency

The adoption of AWS Cloud technologies and capabilities has helped Informed address a wider range of document types and onboard new partners. Informed has developed integrated, AI/ML-enabled solutions, and continuously strives for innovation to better serve clients.

Almost the entirety of the Informed SaaS service is hosted and enabled by AWS services. Informed is able to offload the undifferentiated heavy lifting for scalable infrastructure and focus on their business objectives. Their architecture includes load balancers, Amazon API Gateway, Amazon Elastic Container Service (Amazon ECS) containers, serverless AWS Lambda, Amazon DynamoDB, and Amazon Relational Database Service (Amazon RDS), in addition to ML technologies like Amazon Textract and SageMaker.

Reducing cost in document extraction

Informed uses new features from Amazon Textract to improve the accuracy of data extraction from documents such as bank statements and paystubs. Amazon Textract is an AI/ML service that automatically extracts text, handwriting, and other forms of metadata from scanned documents, forms, and tables in ways that make further ML processing more efficient and accurate. Informed uses AWS Textract OCR and Analyze Document APIs for both tables and forms as part of the verification process. Informed’s artificial intelligence modeling engine performs complex calculations, ensuring accuracy, identifying omissions, and combating fraud. With AWS, they continue to advance the accuracy and speed of the solution, helping lenders become more efficient by lowering loan processing costs and reducing time to process and fund. With a 99% accuracy rate for field prediction, dealers and credit unions can now focus less on collecting and validating data and more on developing strong customer relationships.

“Partnering with Informed.IQ to integrate their leading AI-based technology allows us to advance our lending systems’ capabilities and performance, further streamlining the overall loan process for our credit unions and their members”

– Brian Hendricks, Chief Product Officer at Origence.

Conclusion

Informed is constantly improving the accuracy, efficiency, and breadth of their automated loan document verifications. This solution can benefit any lending document verification process like personal and student loans, HELOCs, and powersports. The adoption of AWS Cloud technologies and capabilities has helped Informed address the growing complexity of the lending process and improve the dealer and customer experience. With AWS, the company continues to add enhancements that help lenders become more efficient, lower loan processing costs, and provide serverless computing.

Now that you have learned about how ML and automation can solve the loan document verification process, you can get started using Amazon Textract. You can also try out intelligent document processing workshops. Visit Automated data processing from documents to learn more about reference architectures, code samples, industry use cases, blog posts, and more.

About the authors

Robert Berger is the Chief Architect at InformedIQ. He is leading the transformation of the InformedIQ SaaS into a full Serverless Microservice architecture leveraging AWS Cloud, DevOps and Data Oriented Programming. Principal or founder in several other start-ups including InterNex, MetroFi, UltraDevices, Runa, Mist Systems and Omnyway.

Adine Deford is the VP of Marketing at Informed.IQ. She has more than 25 years of technology marketing experience serving industry leaders, world class marketing agencies and technology start-ups.

Jessica Oliveira is an Account Manager at AWS who provides guidance and support to SMB customers in Northern California. She is passionate about building strategic collaborations to help ensure her customer’s success. Outside of work, she enjoys traveling, learning about different languages and cultures, and spending time with her family.

Malini Chatterjee is a Senior Solutions Architect at AWS. She provides guidance to AWS customers on their workloads across a variety of AWS technologies. She brings a breadth of expertise in Data Analytics and Machine Learning. Prior to joining AWS she was architecting data solutions in financial industries. She is very interested in Amazon Future Engineer program enabling middle-school, high-school kids see the art of the possible in STEM. She is very passionate about semi-classical dancing and performs in community events. She loves traveling and spending time with her family.

The science behind the new “Alexa, what should I watch?” Fire TV experience

October 6, 2022

by Amazon AWS

The phrase launches a feature built to help customers navigate an increasingly complex and diverse world of content.Read More

Prevent account takeover at login with the new Account Takeover Insights model in Amazon Fraud Detector

October 5, 2022

by Marcel Pividal Amazon AWS

Digital is the new normal, and there’s no going back. Every year, consumers visit, on average, 191 websites or services requiring a user name and password, and the digital footprint is expected to grow exponentially. So much exposure naturally brings added risks like account takeover (ATO).

Each year, bad actors compromise billions of accounts through stolen credentials, phishing, social engineering, and multiple forms of ATO. To put it into perspective: account takeover fraud increased by 90% to an estimated $11.4 billion in 2021 compared with 2020. Beyond the financial impact, ATOs damage the customer experience, threaten brand loyalty and reputation, and strain fraud teams as they manage chargebacks and customer claims.

Many companies, even those with sophisticated fraud teams, use rules-based solutions to detect compromised accounts because they’re simple to create. To bolster their defenses and reduce friction for legitimate users, businesses are increasingly investing in AI and machine learning (ML) to detect account takeovers.

AWS can help you improve your fraud mitigation with solutions like Amazon Fraud Detector. This fully managed AI service allows you to identify potentially fraudulent online activities by enabling you to train custom ML fraud detection models without ML expertise.

This post discusses how to create a real-time detector endpoint using the new Account Takeover Insights (ATI) model in Amazon Fraud Detector.

Overview of solution

Amazon Fraud Detector relies on specific models with tailored algorithms, enrichments, and feature transformations to detect fraudulent events across multiple use cases. The newly launched ATI model is a low-latency fraud detection ML model designed to detect potentially compromised accounts and ATO fraud. The ATI model detects up to four times more ATO fraud than traditional rules-based account takeover solutions while minimizing the level of friction for legitimate users.

The ATI model is trained using a dataset containing your business’s historical login events. Event labels are optional for model training because the ATI model uses an innovative approach to unsupervised learning. The model differentiates events generated by the actual account owner (legit events) from those generated by bad actors (anomalous events).

Amazon Fraud Detector derives the user’s past behavior by continuously aggregating the data provided. Examples of user behavior include the number of times the user signed in from a specific IP address. With these additional enrichments and aggregates, Amazon Fraud Detector can generate strong model performance from a small set of inputs from your login events.

For a real-time prediction, you call the GetEventPrediction API after a user presents valid login credentials to quantify the risk of ATO. In response, you receive a model score between 0–1000, where 0 shows low fraud risk and 1000 shows high fraud risk, and an outcome based on a set of business rules you define. You can then take the appropriate action on your end: approve the login, deny the login, or challenge the user by enforcing an additional identity verification.

You can also use the ATI model to asynchronously evaluate account logins and take action based on the outcome, such as adding the account to an investigation queue so a human reviewer can determine if further action should be taken.

The following steps outline the process of training an ATI model and publishing a detector endpoint to generate fraud predictions:

Prepare and validate the data.
Define the entity, event and event variables, and event label (optional).
Upload event data.
Initiate model training.
Evaluate the model.
Create a detector endpoint and define business rules.
Get real-time predictions.

Prerequisites

Before getting started, complete the following prerequisite steps:

Have an AWS account.
Create an Amazon Simple Storage Service (Amazon S3) bucket and upload the training dataset to the bucket. You can find the synthetic training dataset in our GitHub repo.
Copy the dataset location under S3 URI. The S3 URI is the Amazon S3 location of your dataset. You use it later to import the model variables.

Prepare and validate the data

Amazon Fraud Detector requires that you provide your user account login data in a CSV file encoded in the UTF-8 format. For the ATI, you must provide certain event metadata and event variables in the header line of your CSV file.

The required event metadata is as follows:

EVENT_ID – A unique identifier for the login event.
ENTITY_TYPE – The entity that performs the login event, such as a merchant or a customer.
ENTITY_ID – An identifier for the entity performing the login event.
EVENT_TIMESTAMP – The timestamp when the login event occurred. The timestamp format must be in ISO 8601 standard in UTC.
EVENT_LABEL (optional) – A label that classifies the event as fraudulent or legitimate. You can use any labels, such as fraud, legit, 1, or 0.

Event metadata must be in uppercase letters. Labels aren’t required for login events. However, we recommend including EVENT_LABEL metadata and providing labels for your login events if available. If you provide labels, Amazon Fraud Detector uses them to automatically calculate an Account Takeover Discovery Rate and display it in the model performance metrics.

The ATI model has both required and optional variables. Event variable names must be in lowercase letters.

The following table summarizes the mandatory variables.

Category	Variable type	Description
IP address	`IP_ADDRESS`	The IP address used in the login event
Browser and device	`USERAGENT`	The browser, device, and OS used in the login event
Valid credentials	`VALIDCRED`	Indicates if the credentials that were used for login are valid

The following table summarizes the optional variables.

Category	Type	Description
Browser and device	`FINGERPRINT`	The unique identifier for a browser or device fingerprint
Session ID	`SESSION_ID`	The identifier for an authentication session
Label	`EVENT_LABEL`	A label that classifies the event as fraudulent or legitimate (such as `fraud`, `legit`, `1`, or `0`)
Timestamp	`LABEL_TIMESTAMP`	The timestamp when the label was last updated; this is required if `EVENT_LABEL` is provided

You can provide additional variables. However, Amazon Fraud Detector won’t include these variables for training an ATI model.

Dataset preparation

As you start to prepare your login data, you must meet the following requirements:

Provide at least 1,500 entities (individual user accounts), each with at least two associated login events
Your dataset must cover at least 30 days of login events

The following configurations are optional:

Your dataset can include examples of unsuccessful login events
You can optionally label these unsuccessful logins as fraudulent or legitimate
You can prepare historical data with login events spanning more than 6 months and include 100,000 entities

We provide a sample dataset for testing purposes that you can use to get started.

Data validation

Before creating your ATI model, Amazon Fraud Detector checks if the metadata and variables you included in your dataset for training the model meet the size and format requirements. For more information, see Dataset validation. If the dataset doesn’t pass validation, a model isn’t created. For details on common dataset errors, see Common event dataset errors.

Define the entity, event type, and event variables

In this section, we walk through the steps to create an entity, event type, and event variables. Optionally, you can also define event labels.

Define the entity

The entity defines who is performing the event. To create an entity, complete the following steps:

On the Amazon Fraud Detector console, in the navigation pane, choose Entities.
Choose Create.
Enter an entity name and optional description.
Choose Create entity.

Define the event and event variables

An event is a business activity evaluated for fraud risk; this event is performed by the entity we just created. The event type defines the structure for an event sent to Amazon Fraud Detector, including variables of the event, the entity performing the event, and, if available, the labels that classify the event.

To create an event, complete the following steps:

On the Amazon Fraud Detector console, in the navigation pane, choose Events.
Choose Create.
For Name, enter a name for your event type.
For Entity, choose the entity created in the previous step.

Define the event variables

For event variables, complete the following steps:

To define this event’s variables, choose Select variables from a training dataset.
For the AWS Identity and Access Management role, choose Create IAM role.

In the Create IAM role section, enter the specific bucket name where you uploaded your training data.
The name of the S3 bucket must be the name where you uploaded your dataset. Otherwise, you get an access denied exception error.
Choose Create role.

For Data location, enter the path to your training data, the path is the S3 URI you copied during the prerequisite steps, and choose Upload.

Amazon Fraud Detector extracts the headers from your training dataset and creates a variable for each header. Make sure to assign the variable to the correct variable type. As part of the model training process, Amazon Fraud Detector uses the variable type associated with the variable to perform variable enrichment and feature engineering. For more details about variable types, see Variable types.

Define event labels (optional)

Labels are used to categorize individual events as either fraud or legitimate. Event labels are optional for model training because the ATI model uses an innovative approach to unsupervised learning. The model differentiates events generated by the actual account owner (legit events) from those generated by abusive actors (anomalous events). We recommend you include EVENT_LABEL metadata and provide labels for your login events if available. If you provide labels, Amazon Fraud Detector uses them to automatically calculate an Account Takeover Discovery Rate and display it in the model performance metrics.

To create an event, complete the following steps:

Define two labels (for this post, 1 and 0).
Choose Create event type.

Upload event data

In this session, we walk through the steps to upload events data to the service for model training.

ATI models are trained on a dataset stored internally in Amazon Fraud Detector. By storing event data in Amazon Fraud Detector, you can train models that use auto-computed variables to improve performance, simplify model retraining, and update fraud labels to close the machine learning feedback loop. See Stored events for more information on storing your event dataset with Amazon Fraud Detector.

After you define your event, navigate to the Stored events tab. On the Stored events tab, you can see information about your dataset, such as the number of events stored and the total size of the dataset in MB. Because you just created this event type, there are no stored events yet. On this page, you can turn event ingestion on or off. When event ingestion is on, you can upload historical event data to Amazon Fraud Detector and automatically store event data from predictions in real time.

The easiest way to store historical data is by uploading a CSV file and importing the events. Alternatively, you can stream the data into Amazon Fraud Detector using the SendEvent API (see our GitHub repository for sample notebooks). To import the event from a CSV file, complete the following steps:

Under Import events data, choose New import.
You likely need to create a new IAM role. The import events feature requires both read and write access to Amazon S3.

Create a new IAM role and provide the S3 buckets for input and output files.
The IAM role you create grants Amazon Fraud Detector access to these buckets to read input files and store output files. If you don’t plan to store output files in a separate bucket, enter the same bucket name for both.
Choose Create role.

Enter the location of the CSV file that contains your event data. This should be the S3 URI you copied earlier.
Chose Start to start importing the events.

The import time varies based on the number of events you’re importing. For a dataset with 20,000 events, the process takes around 12 minutes, and after you refresh the page, the status changes to Completed. If the status changes to Error, choose the job name to show why the import failed.

Initiate model training

After successfully importing the events, you have all the pieces to initiate model training. To train a model, complete the following steps:

On the Amazon Fraud Detector console, in the navigation pane, choose Models.
Choose Add model and select Create model.
For Model name, enter the desired name for your model
For Model type, select Takeover Account Insights.
For Event type, choose the event type you created earlier.

Under Historical event data, you can specify the date range of events to train the model if needed.
Choose Next.

For this post, you configure training by identifying the variables used as inputs to the model.
After evaluating the variables, choose Next.

It’s a best practice to include all the available variables, even if you’re unsure about their value to the model. After the model is trained, Amazon Fraud Detector provides a ranked list of each variable’s impact on the model performance, so you can know whether to include that variable in future model training. If labels are provided, Amazon Fraud Detector uses them to evaluate and display model performance in terms of the model’s discovery rate.

If labels aren’t provided, Amazon Fraud Detector uses negative sampling to provide examples or analogous login attempts that help the model distinguish between legitimate and fraudulent activities. This produces precise risk scores that improve the model’s ability to capture incorrectly flagged legitimate activities.

After reviewing the model configured in the first two steps, choose Create and train the model.

You can see the model in training status in the console page. Creating and training the model takes approximately 45 minutes to complete. When the model has stopped training, you can check model performance by choosing the model version.

Evaluate model performance and deploy the model

In this session, we walk through the steps to review and evaluate the model performance.

Amazon Fraud Detector validates model performance using 15% of your data that wasn’t used to train the model and provides performance metrics. You need to consider these metrics and your business objectives to define a threshold that aligns with your business model. For further details on the metrics and how to determine thresholds, see Model performance metrics.

ATI is an anomaly detection model rather than a classification model; therefore, the evaluation metrics differ from classification models. When your ATI model has finished training, you can see the Anomaly Separation Index (ASI), a holistic measure of the model’s ability to identify high-risk anomalous logins. An ASI of 75% or more is considered good, 90% or more is considered high, and below 75% is considered poor.

To assist in choosing the right balance, Amazon Fraud Detector provides the following metrics to evaluate ATI model performance:

Anomaly Separation Index (ASI) – Summarizes the overall ability of the model to separate anomalous activities from the expected behavior of users. A model with no separability power will have the lowest possible ASI score of 0.5. In contrast, the model with a high separability power will have the highest possible ASI score of 1.0.
Challenge Rate (CR) – The score threshold indicates the percentage of login events the model would recommend challenging in the form of a one-time password, multi-factor authentication, identify verification, investigation, and so on.
Anomaly Discovery Rate (ADR) – Quantifies the percentage of anomalies the model can detect at the selected score threshold. A lower score threshold increases the percentage of anomalies captured by the model. Still, it would also require challenging a more significant percentage of login events, leading to higher customer friction.
ATO Discovery Rate (ATODR) – Quantifies the percentage of account compromise events that the model can detect at the selected score threshold. This metric is only available if 50 or more entities with at least one labeled ATO event are present in the ingested dataset.

In the following example, we have an ASI of 0.96 (high), which indicates a high ability to separate anomalous activities from the normal behavior of users. By writing a rule using a model score threshold of 500, you challenge or create friction on 6% of all login activities catching 96% of anomalous activities.

Another important metric is the model variable importance. Variable importance gives you an understanding of how the different variables relate to the model performance. You can have two types of variables: raw and aggregate variables. Raw variables are the ones that were defined based on the dataset, whereas aggregate variables are a combination of multiple variables that are enriched and have an aggregated importance value.

For more information about variable importance, see Model variable importance.

A variable (raw or aggregate) with a much higher number relative to the rest could indicate that the model might be overfitting. In contrast, variables with relatively lowest numbers could just be noise.

After reviewing the model performance and deciding what model score thresholds align with your business model, you can deploy the model version. For that, on the Actions menu, choose Deploy model version. With the model deployed, we create a detector endpoint and perform real-time prediction.

Create a detector endpoint and define business rules

Amazon Fraud Detector uses detector endpoints to generate fraud prediction. A detector contains detection logic, such as trained models and business rules, for a specific event you want to evaluate for fraud. Detection logic uses rules to tell Amazon Fraud Detector how to interpret the data associated with the model.

To create a detector, complete the following steps:

On the Amazon Fraud Detector console, in the navigation pane, choose Detectors.
Choose Create detector.
For Detector name, enter a name.
Optionally, describe your detector.
For Event type, choose the same event type as the model created earlier.
Choose Next.

On the Add model (optional) page, choose Add model.

To add a model, choose the model you trained and published during the model training steps and choose the active version.
Choose Add model.

As part of the next step, you create the business rules that define an outcome. A rule is a condition that tells Amazon Fraud Detector how to interpret variable values during a fraud prediction. A rule consists of one or more variables, a logic expression, and one or more outcomes. An outcome is the result of a fraud prediction and is returned if the rule matches during an evaluation.

Define decline_rule as $<your_model_name_insightscore >= 950 with outcome deny_login.
Define friction_rule as $ your_model_name _insightscore >= 855 and $ your_model_name_insightscore >= 950 with outcome challenge_login.
Define approve_rule as $account_takeover_model_insightscore < 855 with outcome approve_login.

Outcomes are strings returned in the GetEventPrediction API response. You can use outcomes to trigger events by calling applications and downstream systems or to simply identify who is likely to be fraud or legitimate.

On the Add Rules page, choose Next after you finish adding all your rules.

In the Configure rule execution section, choose the mode for your rules engine.
The Amazon Fraud Detector rules engine has two modes: first matched or all matched. First matched mode is for sequential rule runs, returning the outcome for the first condition met. The other mode is all matched, which evaluates all rules and returns outcomes from all the matching rules. In this example, we use the first matched mode for our detector.

After this process, you’re ready to create your detector and run some tests.

To run a test, go to your newly created detector and choose the detector version you want to use.
Provide the variable values as requested and choose Run test.

As a result of the test, you receive the risk score and the outcome based on your business rules.

You can also search past predictions by going to the left panel and choosing Search past predictions. The prediction is based on each variable’s contribution to the overall likelihood of a fraudulent event. The following screenshot is an example of a past prediction showing the input variables and how they influenced the fraud prediction score.

Get real-time predictions

To get real-time predictions and integrate Amazon Fraud Detector into your workflow, we need to publish the detector endpoint. Complete the following steps:

Go to the newly created detector and choose the detector version, which will be version 1.
On the Actions menu, choose Publish.

You can perform real-time predictions with the published detector by calling the GetEventPrediction API. The following is a sample Python code for calling the GetEventPrediction API:

import boto3
fraudDetector = boto3.client('frauddetector')

fraudDetector.get_event_prediction(
detectorId = 'sample_detector',
eventId = '802454d3-f7d8-482d-97e8-c4b6db9a0428',
eventTypeName = 'sample_transaction',
eventTimestamp = '2021-01-13T23:18:21Z',
entities = [{'entityType':'customer', 'entityId':'12345'}],
eventVariables = {
    'email_address' : 'johndoe@exampledomain.com',
    'ip_address' : '1.2.3.4'
}
)

Conclusion

Amazon Fraud Detector relies on specific models with tailored algorithms, enrichments, and feature transformations to detect fraudulent events across multiple use cases. In this post, you learned how to ingest data, train and deploy a model, write business rules, and publish a detector to generate real-time fraud prediction on potentially compromised accounts.

Visit Amazon Fraud Detector to learn more about Amazon Fraud Detector or our GitHub repo for code samples, notebook, and synthetic datasets.

About the authors

Mike Ames is a data scientist turned identity verification solution specialist, he has extensive experience developing machine learning and AI solutions to protect organizations from fraud, waste and abuse. In his spare time, you can find him hiking, mountain biking or playing freebee with his dog Max.

Metrics for evaluating content moderation in Amazon Rekognition and other content moderation services

October 5, 2022

by Amit Gupta Amazon AWS

Content moderation is the process of screening and monitoring user-generated content online. To provide a safe environment for both users and brands, platforms must moderate content to ensure that it falls within preestablished guidelines of acceptable behavior that are specific to the platform and its audience.

When a platform moderates content, acceptable user-generated content (UGC) can be created and shared with other users. Inappropriate, toxic, or banned behaviors can be prevented, blocked in real time, or removed after the fact, depending on the content moderation tools and procedures the platform has in place.

You can use Amazon Rekognition Content Moderation to detect content that is inappropriate, unwanted, or offensive, to create a safer user experience, provide brand safety assurances to advertisers, and comply with local and global regulations.

In this post, we discuss the key elements needed to evaluate the performance aspect of a content moderation service in terms of various accuracy metrics, and a provide an example using Amazon Rekognition Content Moderation API’s.

What to evaluate

When evaluating a content moderation service, we recommend the following steps.

Before you can evaluate the performance of the API on your use cases, you need to prepare a representative test dataset. The following are some high-level guidelines:

Collection – Take a large enough random sample (images or videos) of the data you eventually want to run through Amazon Rekognition. For example, if you plan to moderate user-uploaded images, you can take a week’s worth of user images for the test. We recommend choosing a set that has enough images without getting too large to process (such as 1,000–10,000 images), although larger sets are better.
Definition – Use your application’s content guidelines to decide which types of unsafe content you’re interested in detecting from the Amazon Rekognition moderation concepts taxonomy. For example, you may be interested in detecting all types of explicit nudity and graphic violence or gore.
Annotation – Now you need a human-generated ground truth for your test set using the chosen labels, so that you can compare machine predictions against them. This means that each image is annotated for the presence or absence of your chosen concepts. To annotate your image data, you can use Amazon SageMaker Ground Truth (GT)to manage image annotation. You can refer to GT for image labeling, consolidating annotations and processing annotation output.

Get predictions on your test dataset with Amazon Rekognition

Next, you want to get predictions on your test dataset.

The first step is to decide on a minimum confidence score (a threshold value, such as 50%) at which you want to measure results. Our default threshold is set to 50, which offers a good balance between retrieving large amounts of unsafe content without incurring too many false predictions on safe content. However, your platform may have different business needs, so you should customize this confidence threshold as needed. You can use the MinConfidence parameter in your API requests to balance detection of content (recall) vs the accuracy of detection (precision). If you reduce MinConfidence, you are likely to detect most of the inappropriate content, but are also likely to pick up content that is not actually inappropriate. If you increase MinConfidence you are likely to ensure that all your detected content is truly inappropriate but some content may not be tagged. We suggest experimenting with a few MinConfidence values on your dataset and quantitatively select the best value for your data domain.

Next, run each sample (image or video) of your test set through the Amazon Rekognition moderation API (DetectModerationLabels).

Measure model accuracy on images

You can assess the accuracy of a model by comparing human-generated ground truth annotations with the model predictions. You repeat this comparison for every image independently and then aggregate over the whole test set:

Per-image results – A model prediction is defined as the pair {label_name, confidence_score} (where the confidence score >= the threshold you selected earlier). For each image, a prediction is considered correct when it matches the ground truth (GT). A prediction is one of the following options:
- True Positive (TP): both prediction and GT are “unsafe”
- True Negative (TN): both prediction and GT are “safe”
- False Positive (FP): the prediction says “unsafe”, but the GT is “safe”
- False Negative (FN): the prediction is “safe”, but the GT is “unsafe”
Aggregated results over all images – Next, you can aggregate these predictions into dataset-level results:
- False positive rate (FPR) – This is the percentage of images in the test set that are wrongly flagged by the model as containing unsafe content: (FP): FP / (TN+FP).
- False negative rate (FNR) – This is the percentage of unsafe images in the test set that are missed by the model: (FN): FN / (FN+TP).
- True positive rate (TPR) – Also called recall, this computes the percentage of unsafe content (ground truth) that is correctly discovered or predicted by the model: TP / (TP + FN) = 1 – FNR.
- Precision – This computes the percentage of correct predictions (unsafe content) with regards to the total number of predictions made: TP / (TP+FP).

Let’s explore an example. Let’s assume that your test set contains 10,000 images: 9,950 safe and 50 unsafe. The model correctly predicts 9,800 out of 9,950 images as safe and 45 out of 50 as unsafe:

TP = 45
TN = 9800
FP = 9950 – 9800 = 150
FN = 50 – 45 = 5
FPR = 150 / (9950 + 150) = 0.015 = 1.5%
FNR = 5 / (5 + 45) = 0.1 = 10%
TPR/Recall = 45 / (45 + 5) = 0.9 = 90%
Precision = 45 / (45 + 150) = 0.23 = 23%

Measure model accuracy on videos

If you want to evaluate the performance on videos, a few additional steps are necessary:

Sample a subset of frames from each video. We suggest sampling uniformly with a rate of 0.3–1 frames per second (fps). For example, if a video is encoded at 24 fps and you want to sample one frame every 3 seconds (0.3 fps), you need to select one every 72 frames.
Run these sampled frames through Amazon Rekognition content moderation. You can either use our video API, which already samples frames for you (at a rate of 3 fps), or use the image API, in which case you want to sample more sparsely. We recommend the latter option, given the redundancy of information in videos (consecutive frames are very similar).
Compute the per-frame results as explained in the previous section (per-image results).
Aggregate results over the whole test set. Here you have two options, depending on the type of outcome that matters for your business:
1. Frame-level results – This considers all the sampled frames as independent images and aggregates the results exactly as explained earlier for images (FPR, FNR, recall, precision). If some videos are considerably longer than others, they will contribute more frames to the total count, making the comparison unbalanced. In that case, we suggest changing the initial sampling strategy to a fixed number of frames per video. For example, you could uniformly sample 50–100 frames per video (assuming videos are at least 2–3 minutes long).
2. Video-level results – For some use cases, it doesn’t matter whether the model is capable of correctly predicting 50% or 99% of the frames in a video. Even a single wrong unsafe prediction on a single frame could trigger a downstream human evaluation and only videos with 100% correct predictions are truly considered correctly. If this is your use case, we suggest you compute FPR/FNR/TPR over the frames of each video and consider the video as follows:

Video ID	Accuracy	Per-Video Categorization
Results Aggregated Over All the Frames of Video ID	Total FP = 0 Total FN = 0	Perfect predictions
.	Total FP > 0	False Positive (FP)
.	Total FN > 0	False Negative (FN)

After you have computed these for each video independently, you can then compute all the metrics we introduced earlier:

The percentage of videos that are wrongly flagged (FP) or missed (FN)
Precision and recall

Measure performance against goals

Finally, you need to interpret these results in the context of your goals and capabilities.

First, consider your business needs in regards to the following:

Data – Learn about your data (daily volume, type of data, and so on) and the distribution of your unsafe vs. safe content. For example, is it balanced (50/50), skewed (10/90) or very skewed (1/99, meaning that only 1% is unsafe)? Understanding such distribution can help you define your actual metric goals. For example, the number of safe content is often an order of magnitude larger than unsafe content (very skewed), making this almost an anomaly detection problem. Within this scenario, the number of false positives may outnumber the number of true positives, and you can use your data information (distribution skewness, volume of data, and so on) to decide the FPR you can work with.
Metric goals – What are the most critical aspects of your business? Lowering the FPR often comes at the cost of a higher FNR (and vice versa) and it’s important to find the right balance that works for you. If you can’t miss any unsafe content, you likely want close to 0% FNR (100% recall). However, this will incur the largest number of false positives, and you need to decide the target (maximum) FPR you can work with, based on your post-prediction pipeline. You may want to allow some level of false negatives to be able to find a better balance and lower your FPR: for example, accepting a 5% FNR instead of 0% could reduce the FPR from 2% to 0.5%, considerably reducing the number of flagged contents.

Next, ask yourself what mechanisms you will use to parse the flagged images. Even though the API’s may not provide 0% FPR and FNR, it can still bring huge savings and scale (for example, by only flagging 3% of your images, you have already filtered out 97% of your content). When you pair the API with some downstream mechanisms, like a human workforce that reviews the flagged content, you can easily reach your goals (for example, 0.5% flagged content). Note how this pairing is considerably cheaper than having to do a human review on 100% of your content.

When you have decided on your downstream mechanisms, we suggest you evaluate the throughput that you can support. For example, if you have a workforce that can only verify 2% of your daily content, then your target goal from our content moderation API is a flag rate (FPR+TPR) of 2%.

Finally, if obtaining ground truth annotations is too hard or too expensive (for example, your volume of data is too large), we suggest annotating the small number of images flagged by the API. Although this doesn’t allow for FNR evaluations (because your data doesn’t contain any false negatives), you can still measure TPR and FPR.

In the following section, we provide a solution for image moderation evaluation. You can take a similar approach for video moderation evaluation.

Solution overview

The following diagram illustrates the various AWS services you can use to evaluate the performance of Amazon Rekognition content moderation on your test dataset.

The content moderation evaluation has the following steps:

Upload your evaluation dataset into Amazon Simple Storage Service (Amazon S3).
Use Ground Truth to assign ground truth moderation labels.
Generate the predicted moderation labels using the Amazon Rekognition pre-trained moderation API using a few threshold values. (For example, 70%, 75% and 80%).
Assess the performance for each threshold by computing true positives, true negatives, false positives, and false negatives. Determine the optimum threshold value for your use case.
Optionally, you can tailor the size of the workforce based on true and false positives, and use Amazon Augmented AI (Amazon A2I) to automatically send all flagged content to your designated workforce for a manual review.

The following sections provide the code snippets for steps 1, 2, and 3. For complete end-to-end source code, refer to the provided Jupyter notebook.

Prerequisites

Before you get started, complete the following steps to set up the Jupyter notebook:

Create a notebook instance in Amazon SageMaker.
When the notebook is active, choose Open Jupyter.
On the Jupyter dashboard, choose New, and choose Terminal.

In the terminal, enter the following code:

cd SageMaker
git clone https://github.com/aws-samples/amazon-rekognition-code-samples.git

Open the notebook for this post: content-moderation-evaluation/Evaluating-Amazon-Rekognition-Content-Moderation-Service.ipynb.
Upload your evaluation dataset to Amazon Simple Storage Service (Amazon S3).

We will now go through steps 2 through 4 in the Jupyter notebook.

Use Ground Truth to assign moderation labels

To assign labels in Ground Truth, complete the following steps:

Create a manifest input file for your Ground Truth job and upload it to Amazon S3.

Create the labeling configuration, which contains all moderation labels that are needed for the Ground Truth labeling job.To check the limit for the number of label categories you can use, refer to Label Category Quotas. In the following code snippet, we use five labels (refer to the hierarchical taxonomy used in Amazon Rekognition for more details) plus one label (Safe_Content) that marks content as safe:

# customize CLASS_LIST to include all labels that can be used to classify sameple data, it's up to 10 labels
# In order to easily match image label with content moderation service supported taxonomy, 

CLASS_LIST = ["<label_1>", "<label_2>", "<label_3>", "<label_4>", "<label_5>", "Safe_Content"]
print("Label space is {}".format(CLASS_LIST))

json_body = {"labels": [{"label": label} for label in CLASS_LIST]}
with open("class_labels.json", "w") as f:
    json.dump(json_body, f)

s3.upload_file("class_labels.json", BUCKET, EXP_NAME + "/class_labels.json")

Create a custom worker task template to provide the Ground Truth workforce with labeling instructions and upload it to Amazon S3.
The Ground Truth label job is defined as an image classification (multi-label) task. Refer to the source code for instructions to customize the instruction template.
Decide which workforce you want to use to complete the Ground Truth job. You have two options (refer to the source code for details):
1. Use a private workforce in your own organization to label the evaluation dataset.
2. Use a public workforce to label the evaluation dataset.

Create and submit a Ground Truth labeling job. You can also adjust the following code to configure the labeling job parameters to meet your specific business requirements. Refer to the source code for complete instructions on creating and configuring the Ground Truth job.

human_task_config = {
    "AnnotationConsolidationConfig": {
        "AnnotationConsolidationLambdaArn": acs_arn,
    },
    "PreHumanTaskLambdaArn": prehuman_arn,
    "MaxConcurrentTaskCount": 200,  # 200 images will be sent at a time to the workteam.
    "NumberOfHumanWorkersPerDataObject": 3,  # 3 separate workers will be required to label each image.
    "TaskAvailabilityLifetimeInSeconds": 21600,  # Your workteam has 6 hours to complete all pending tasks.
    "TaskDescription": task_description,
    "TaskKeywords": task_keywords,
    "TaskTimeLimitInSeconds": 180,  # Each image must be labeled within 3 minutes.
    "TaskTitle": task_title,
    "UiConfig": {
        "UiTemplateS3Uri": "s3://{}/{}/instructions.template".format(BUCKET, EXP_NAME),
    },
}

After the job is submitted, you should see output similar to the following:

Labeling job name is: ground-truth-cm-1662738403

Wait for labeling job on the evaluation dataset to complete successfully, then continue to the next step.

Use the Amazon Rekognition moderation API to generate predicted moderation labels.

The following code snippet shows how to use the Amazon Rekognition moderation API to generate moderation labels:

client=boto3.client('rekognition')
def moderate_image(photo, bucket):
    response = client.detect_moderation_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}})
    return len(response['ModerationLabels'])

Assess the performance

You first retrieved ground truth moderation labels from the Ground Truth labeling job results for the evaluation dataset, then you ran the Amazon Rekognition moderation API to get predicted moderation labels for the same dataset. Because this is a binary classification problem (safe vs. unsafe content), we calculate the following metrics (assuming unsafe content is positive):

We also calculate the corresponding evaluation metrics:

The following code snippet shows how to calculate those metrics:

FPR = FP / (FP + TN)
FNR = FN / (FN + TP)
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)

Conclusion

This post discusses the key elements needed to evaluate the performance aspect of your content moderation service in terms of various accuracy metrics. However, accuracy is only one of the many dimensions that you need to evaluate when choosing a particular content moderation service. It’s critical that you include other parameters, such as the service’s total feature set, ease of use, existing integrations, privacy and security, customization options, scalability implications, customer service, and pricing. To learn more about content moderation in Amazon Rekognition, visit Amazon Rekognition Content Moderation.

About the authors

Amit Gupta is a Senior AI Services Solutions Architect at AWS. He is passionate about enabling customers with well-architected machine learning solutions at scale.

Davide Modolo is an Applied Science Manager at AWS AI Labs. He has a PhD in computer vision from the University of Edinburgh (UK) and is passionate about developing new scientific solutions for real-world customer problems. Outside of work, he enjoys traveling and playing any kind of sport, especially soccer.

Jian Wu is a Senior Enterprise Solutions Architect at AWS. He’s been with AWS for 6 years working with customers of all sizes. He is passionate about helping customers to innovate faster via the adoption of the Cloud and AI/ML. Prior to joining AWS, Jian spent 10+ years focusing on software development, system implementation and infrastructure management. Aside from work, he enjoys staying active and spending time with his family.

GD device intelligence

Custom fraud detection ML models in Amazon Fraud Detector

Solution overview

Use cases for device intelligence with Amazon Fraud Detector

Integrate the GD mobile SDK and start collecting device intelligence data

Create your event type in Amazon Fraud Detector

Build your detection logic in Amazon Fraud Detector

Test your detector using Amazon Fraud Detector batch predictions

Set up the supporting infrastructure

Perform real-time predictions using the Amazon Fraud Detector GetEventPrediction API

Build and train your model

Automate the solution

Conclusion

About the author

Using AI to index visual content

Using Amazon Rekognition to build a custom computer vision solution in just 2 weeks

Summary

About the authors

Model training in Studio

Dataset

Models

Metrics

Notebook

Results

Conclusion

About the authors

Problem statement

Solution overview

Main advantages

Automation at scale with efficiency

Reducing cost in document extraction

Conclusion

About the authors

Overview of solution

Prerequisites

Prepare and validate the data

Dataset preparation

Data validation

Define the entity, event type, and event variables

Define the entity

Define the event and event variables

Define the event variables

Define event labels (optional)

Upload event data

Initiate model training

Evaluate model performance and deploy the model

Create a detector endpoint and define business rules

Get real-time predictions

Conclusion

About the authors

What to evaluate

Get predictions on your test dataset with Amazon Rekognition

Measure model accuracy on images

Measure model accuracy on videos

Measure performance against goals

Solution overview

Prerequisites

Use Ground Truth to assign moderation labels

Use the Amazon Rekognition moderation API to generate predicted moderation labels.

Assess the performance

Conclusion

About the authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.