Study finds the risks of sharing health care data are low

In recent years, scientists have made great strides in their ability to develop artificial intelligence algorithms that can analyze patient data and come up with new ways to diagnose disease or predict which treatments work best for different patients.

The success of those algorithms depends on access to patient health data, which has been stripped of personal information that could be used to identify individuals from the dataset. However, the possibility that individuals could be identified through other means has raised concerns among privacy advocates.

In a new study, a team of researchers led by MIT Principal Research Scientist Leo Anthony Celi has quantified the potential risk of this kind of patient re-identification and found that it is currently extremely low relative to the risk of data breach. In fact, between 2016 and 2021, the period examined in the study, there were no reports of patient re-identification through publicly available health data.

The findings suggest that the potential risk to patient privacy is greatly outweighed by the gains for patients, who benefit from better diagnosis and treatment, says Celi. He hopes that in the near future, these datasets will become more widely available and include a more diverse group of patients.

“We agree that there is some risk to patient privacy, but there is also a risk of not sharing data,” he says. “There is harm when data is not shared, and that needs to be factored into the equation.”

Celi, who is also an instructor at the Harvard T.H. Chan School of Public Health and an attending physician with the Division of Pulmonary, Critical Care and Sleep Medicine at the Beth Israel Deaconess Medical Center, is the senior author of the new study. Kenneth Seastedt, a thoracic surgery fellow at Beth Israel Deaconess Medical Center, is the lead author of the paper, which appears today in PLOS Digital Health.

Risk-benefit analysis

Large health record databases created by hospitals and other institutions contain a wealth of information on diseases such as heart disease, cancer, macular degeneration, and Covid-19, which researchers use to try to discover new ways to diagnose and treat disease.

Celi and others at MIT’s Laboratory for Computational Physiology have created several publicly available databases, including the Medical Information Mart for Intensive Care (MIMIC), which they recently used to develop algorithms that can help doctors make better medical decisions. Many other research groups have also used the data, and others have created similar databases in countries around the world.

Typically, when patient data is entered into this kind of database, certain types of identifying information are removed, including patients’ names, addresses, and phone numbers. This is intended to prevent patients from being re-identified and having information about their medical conditions made public.

However, concerns about privacy have slowed the development of more publicly available databases with this kind of information, Celi says. In the new study, he and his colleagues set out to ask what the actual risk of patient re-identification is. First, they searched PubMed, a database of scientific papers, for any reports of patient re-identification from publicly available health data, but found none.

To expand the search, the researchers then examined media reports from September 2016 to September 2021, using Media Cloud, an open-source global news database and analysis tool. In a search of more than 10,000 U.S. media publications during that time, they did not find a single instance of patient re-identification from publicly available health data.

In contrast, they found that during the same time period, health records of nearly 100 million people were stolen through data breaches of information that was supposed to be securely stored.

“Of course, it’s good to be concerned about patient privacy and the risk of re-identification, but that risk, although it’s not zero, is minuscule compared to the issue of cyber security,” Celi says.

Better representation

More widespread sharing of de-identified health data is necessary, Celi says, to help expand the representation of minority groups in the United States, who have traditionally been underrepresented in medical studies. He is also working to encourage the development of more such databases in low- and middle-income countries.

“We cannot move forward with AI unless we address the biases that lurk in our datasets,” he says. “When we have this debate over privacy, no one hears the voice of the people who are not represented. People are deciding for them that their data need to be protected and should not be shared. But they are the ones whose health is at stake; they’re the ones who would most likely benefit from data-sharing.”

Instead of asking for patient consent to share data, which he says may exacerbate the exclusion of many people who are now underrepresented in publicly available health data, Celi recommends enhancing the existing safeguards that are in place to protect such datasets. One new strategy that he and his colleagues have begun using is to share the data in a way that it can’t be downloaded, and all queries run on it can be monitored by the administrators of the database. This allows them to flag any user inquiry that seems like it might not be for legitimate research purposes, Celi says.

“What we are advocating for is performing data analysis in a very secure environment so that we weed out any nefarious players trying to use the data for some other reasons apart from improving population health,” he says. “We’re not saying that we should disregard patient privacy. What we’re saying is that we have to also balance that with the value of data sharing.”

The research was funded by the National Institutes of Health through the National Institute of Biomedical Imaging and Bioengineering.

Read More

How Synamedia uses Amazon Rekognition Video to build advanced video search capabilities for long-form video

Synamedia is a leading video technology provider addressing the needs for premium video service providers and direct-to-consumer (D2C) with a comprehensive solution portfolio. Synamedia solutions spread across several pillars such as video networks, TV platforms, advertisement and monetization, and content protection and piracy disruption.

Synamedia partnered with AWS to use artificial intelligence (AI) to develop enhanced video search capabilities for long-form video. This is to assist their customers in searching for videos based on a description of scenes that aren’t described in the metadata of the assets. For example, searching for a video (even within a series) that contained a scene on a boat that isn’t significant enough to be mentioned in the metadata. This enables content discovery driven from real-world objects.

With Amazon Rekognition Video, Synamedia built an AI solution that was able to perform label detection in videos and in images using standard and custom models. This enabled scene-level detection of specific objects in long-form video, based on what is actually in the scene at the time. This new capability allows users to find specific occurrences within the long-form video, based only on a general description of what they’re looking for. This enables Synamedia to perform extremely fast when onboarding new content, which now takes a few hours to spin up and get results. The solution is simple to use and extensive by providing the ability to add further custom models for domain-specific images.

“Amazon Rekognition Video is a powerful service that is simple to use. It gave us ready-made access to best-in-class computer vision capabilities, which we could use to build and test innovative video search features in a matter of weeks.”

– Avi Fruchter, Software Engineering Fellow at Synamedia.

Using AI to index visual content

As both supply of video content and demand for greater video insights continue to grow, effective video search capabilities are becoming more important. Traditional video search, however, is typically limited to basic information such as the video title, or in some instances, to metadata attached as tags that describe the key themes or content of the video.

Most descriptive information needs to be added manually, but this becomes prohibitive as the quantity of video grows. As a result, traditional video search performance is often limited. This limitation is even more pronounced for long-form video content, for which scene-level metadata usually doesn’t exist, given how expensive and time-consuming it is to produce.

To address this limitation, Synamedia set out to develop an AI-powered video search solution using computer vision to automatically identify scene-level details in any given video, and make that information discoverable to users based on general descriptions of those scenes.

Using Amazon Rekognition to build a custom computer vision solution in just 2 weeks

To accomplish this goal, Synamedia’s Software Engineering Fellow, Avi Fruchter, turned to Amazon Rekognition, a fully managed video analysis service that helps accelerate the process of using computer vision models to detect relevant scene-level occurrences such as objects, activities, and even text and scenes.

Amazon Rekognition Video accelerates the development of computer vision solutions for video by automatically processing and tagging video content using computer vision models. These models are fully managed and maintained by Amazon Rekognition. It removes the undifferentiated heavy lifting of managing the necessary infrastructure, and also reduces the technical expertise required to build and deploy these models.

To get started, you simply choose which of Amazon Rekognition’s wide range of capabilities is relevant to your task, and call the relevant API. The results are then returned as an easy-to-manage JSON response for each job.

For example, Synamedia used the StartLabelDetection API to automatically generate a list of labels for objects detected in each video frame of their video library. From this simple API call, Amazon Rekognition returned the list of labels, the confidence score of each, and the relevant timestamps for each frame. This enabled Synamedia to immediately create an entirely new set of search metadata for each video in their test library. Users are then able to search for specific video content just by describing specific objects or scenery they’re interested in, and get results that not only match their query, but that also point them to the specific scene in the video that featured that content.

Other relevant Amazon Rekognition APIs for video analysis are StartFaceDetection, StartPersonTracking, and StartSegmentDetection—a feature that can identify the moment that scenes in a video change.

Amazon Rekognition works on both pre-recorded and live video. Pre-recorded video is read from Amazon Simple Storage Service (Amazon S3), and live video can be processed from Amazon Kinesis Video Streams.

Synamedia chose Amazon Rekogntion for its ability to rapidly expand their capabilities. Synamedia’s innovation team is dedicated solely to building new technical innovations in video and has strong technical expertise. However, even for them it’s not always possible to have deep domain expertise in all areas of video technology. Enter Amazon Rekogntion, which extended their capabilities in computer vision, enabling them to conceptualize a use case and quickly test its viability.

“It was extremely fast to onboard, and the results were extremely quick,” Avi Fruchter says. “We are not always domain experts in all areas of ML, and Amazon Rekognition gives us the ability to leverage our existing expertise into new types of enhanced use cases for our customers.”

Synamedia anticipates their solution will have broad benefits for a wide range of customers, including companies with large video libraries as well as the growing number of companies who need to monitor specific events in live video feeds, such as health and safety risks.

Summary

With Amazon Rekognition Video, Synamedia was able to build and test an advanced video search capability in a matter of weeks, without needing to hire or develop additional specialized computer vision expertise.

This new capability has enabled Synamedia to expand the impact of its innovation team and continue with its mission to drive new video innovation for its customers.

Learn more about how you can quickly build advanced computer vision solutions for video by visiting Amazon Rekognition Video or referring to Amazon Rekognition resources.


About the authors

Daniel Burke is the European lead for AI and ML in the Private Equity group at AWS. Daniel works directly with Private Equity funds and their portfolio companies, helping them accelerate their AI and ML adoption to improve innovation and increase enterprise value.

John Shaw is the North American lead for AI and ML in the Private Equity group at AWS. John works directly with Private Equity funds and their portfolio companies, helping them accelerate their AI and ML adoption to improve innovation and increase enterprise value.

Read More

Increase ML model performance and reduce training time using Amazon SageMaker built-in algorithms with pre-trained models

Model training forms the core of any machine learning (ML) project, and having a trained ML model is essential to adding intelligence to a modern application. A performant model is the output of a rigorous and diligent data science methodology. Not implementing a proper model training process can lead to high infrastructure and personnel costs because it underlines the experimental phase of the ML process and by nature tends to be highly iterative.

Generally speaking, training a model from scratch is time-consuming and compute intensive. When the training data is small, we can’t expect to train a very performant model. A better alternative is to fine-tune a pretrained model on the target dataset. For certain use cases, Amazon SageMaker provides high-quality pretrained models that were trained on very large datasets. Fine-tuning these models takes a fraction of the training time compared to training a model from scratch.

To validate this assertion, we ran a study using built-in algorithms with pretrained models. We also compared two types of pretrained models within Amazon SageMaker Studio, Type 1 (legacy) and Type 2 (latest), against a model trained from scratch using Defect Detection Network (DDN) with regards to training time and infrastructure cost. To demonstrate the training process, we used the default detection dataset from the post Visual inspection automation using Amazon SageMaker JumpStart. This post showcases the results of the study. We also provide a Studio notebook, which you can modify to run the experiments using your own dataset and an algorithm or model of your choosing.

Model training in Studio

SageMaker is a fully managed ML service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment.

There are many ways with which you can train ML models using SageMaker, such as using Amazon SageMaker Debugger, Spark MLLib, or using custom Python code with TensorFlow, PyTorch, or Apache MXNet. You can also bring your own custom algorithm or choose an algorithm from AWS Marketplace.

Furthermore, SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and ML practitioners get started on training and deploying ML models quickly.

You can use built-in algorithms for either classification or regression problems, or for a variety of unsupervised learning tasks. Other built-in algorithms include text analysis and image processing. You can train a model from scratch using a built-in algorithm for a specific use case. For a full list of available built-in algorithms, see Common Information About Built-in Algorithms.

Some built-in algorithms also include pre-trained models for popular problem types that use the SageMaker SDK as well as Studio. These pre-trained models can greatly reduce the training time as well as infrastructure cost for common use cases such as semantic segmentation, object detection, text summarization, and question answering. For a complete list of pre-trained models, see Models.

For choosing the best model, SageMaker automatic model tuning, also known as hyperparameter tuning or hyperparameter optimization (HPO), can be very useful because it finds the best version of a model by running a slew of training jobs on your dataset using the algorithm and hyperparameters that you specify. Depending on the number of hyperparameters and the size of the search space, finding the best model can require thousands or even tens of thousands of training runs. Automatic model tuning provides a built-in HPO algorithm that removes the undifferentiated heavy lifting required to build your own HPO algorithm. Automatic model tuning provides the option of parallelizing model runs in order to reduce the time and cost of finding the best fit.

After the automatic model tuning has completed multiple runs for a set of hyperparameters, it chooses the hyperparameter values that result in the model with the best performance, as measured by the loss function specific to the model.

Training and validation loss is just one of the metrics needed to pick the best model for the use case. With so many options, it’s not always easy to make the right choice, and picking the best model boils down to the training time, cost of infrastructure, complexity, and quality of the resulting model, among other factors. There are other extraneous costs such as platform and personnel costs that we don’t take into account for this study.

In the subsequent sections, we discuss the study design and the results.

Dataset

We use the NEU-CLS dataset and a detector on the NEU-DET dataset. This dataset contains 1,800 images and 4,189 bounding boxes in total. The type of defects in our dataset are as follows:

  • Crazing (class: Cr, label: 0)
  • Inclusion (class: In, label: 1)
  • Pitted surface (class: PS, label: 2)
  • Patches (class: Pa, label: 3)
  • Rolled-in scale (class: RS, label: 4)
  • Scratches (class: Sc, label: 5)

For more details about the dataset, refer to Visual inspection automation using Amazon SageMaker JumpStart.

Models

We introduced the Defect Detection Network in the post Visual inspection automation using Amazon SageMaker JumpStart. We trained this model from scratch with the default hyperparameters, so we could have a benchmark to evaluate the rest of the models.

For object detection use cases, SageMaker provides the following set of built-in object models:

Aside from training a model from scratch, we used these models to evaluate four approaches that typically reflect an ML model training process. The output of each approach is a trained ML model. In cases 1 and 3, a set of fixed hyperparameters are provided to train a single model, whereas in cases 2 and 4, SageMaker produces the best model and the set of hyperparameters that led to the best fit.

  1. Type 1 (legacy) model – We use the model with a ResNet backbone, which is pre-trained on ImageNet with default hyperparameters and no optimizer.
  2. Fine-tune Type 1 (legacy) with HPO – Now we run HPO to find better hyperparameters that lead to a better model. For a list of all parameters you can fine-tune, refer to Tune an Object Detection Model. In this notebook, we only fine-tune learning rate, momentum, and weight decay. We use automatic model tuning to run HPO. We need to provide hyperparameter ranges for learning rate, momentum, and weight decay. Automatic model tuning will monitor the log and parse the objective metrics. For object detection, we use Mean Average Precision (mAP) on the validation dataset as our metric.
  3. Fine-tune Type 2 (latest) model – For the Type 2 (latest) object detection model, we follow the instructions in Fine-tune a Model and Deploy to a SageMaker Endpoint and use standard SageMaker APIs. You can find all fine-tunable Type 2 (latest) object detection models in the Built-in Algorithms with pre-trained Model table and set FineTunable?=True. Currently, there are nine fine-tunable object detection models. We use the one with the VGG backend and pretrained on VOC dataset. We fine-tune using a set of static hyperparameters.
  4. Fine-tune Type 2 (latest) model with HPO – We provide a range for the ADAM learning rate; the rest of the hyperparameters stay default. Also, note that the Type 2 (latest) model training reports Val_CrossEntropy loss and Val_SmoothL1 loss instead of mAP on the validation dataset. Because we can only specify one evaluation metric for automatic model tuning, we choose to minimize Val_CrossEntropy.

For details on the hyperparameters, you can go through the Studio notebook.

Metrics

Next, we compare the results from the approaches based on important metrics and the infrastructure cost:

  • Loss function difference across models – All the different algorithms define the same loss function for object detection task: cross-entropy and smooth L1 loss. However, we use them differently:

    • The Type 1 (legacy) object detection algorithm has defined mAP on the validation data, and we use it as the metric to find a training job that maximizes mAP.
    • The Type 2 (latest) object detection algorithm, however, doesn’t define mAP. Instead, it defines Val_SmoothL1 loss and Val_CrossEntropy loss on the validation data. During model training with HPO, we need to specify one metric for automatic model tuning to monitor and parse. Therefore, we use Val_CrossEntropy loss as the metric and find the training job that minimizes it.
  • Validation metric (mAP) – We use the mAP on the validation dataset as our metric, where average precision is the average of precision and recall. mAP is the standard evaluation metric used in the COCO challenge for object detection tasks. For more information about the applicability of mAP for object detection, refer to mAP (mean Average Precision) for Object Detection. Because there is a difference in loss function between Type 1 and Type 2 models, we manually calculate the mAP for each type of model on the test dataset. We accomplish this by deploying the models behind a SageMaker endpoint and calling the model endpoint to score on the subset of the dataset. The results are then compared against the ground truth to calculate the mAP for each model type.
  • Training Instances Runtime cost – For simplicity, we only report the infrastructure cost incurred for each of the four approaches highlighted in the previous section. The cost is reported in dollars and calculated based on the runtime of the underlying Amazon Elastic Compute Cloud (Amazon EC2) instances.

Notebook

The Studio notebook is available on GitHub.

Results

The steel surface dataset has a total of 1,800 images in six categories. As discussed in the previous section, because there is a difference in the loss function that Type 1 (legacy) and Type 2 (latest) models maximize to find the best model, we first perform a train/test split on the dataset. In the final phase of the study, we run inference on the test dataset, so that we can compare across the four approaches using the same metric (mAP).

The test set contains 20% of the original dataset, which we randomly allocate from the full dataset. The remaining 80% is used for the model training phase, which requires us to define the training as well as the validation dataset. Therefore, for the training phase, we do a further 80/20 split on the data, where 80% of the training data is used for training and 20% for validation. See the following table.

Data Number of Samples Percentage of Original Dataset
Full 1,800 100
Train 1,152 64
Validation 288 16
Test 360 20

The output of each of the four approaches was a trained ML model. We plot the results from each of the four approaches alongside the bounding boxes from ground truth as well as the DDN model. The following plot also shows the confidence score for the class prediction.

A confidence score is provided as an evaluation standard. This confidence score shows the probability of the object of interest being detected correctly by the algorithm and is given as a percentage. The scores are taken on the mAP at different IoU (Intersection over Union) thresholds.

For the purpose of generating the mAP score against the test dataset, we deployed each model behind its own SageMaker real-time endpoint. Each inferencing test produced a mAP score.

A larger mAP score implies a higher accuracy of the model test results. Clearly, the Type 2 (latest) models outperforms the Type 1 (legacy) models in regards to accuracy, with or without using HPO. Type 2 with HPO has a slighter edge (mAP 0.375) over one without HPO (mAP 0.371).

We also measured the cost of training for each of the four approaches. We used the P3 instance types, specifically the ml.p3.2xlarge instances for each of the approaches. Each ml.p3.2xlarge instance costs $3.06/hour. Both the inference test mAP score and the cost of training are summarized in the following chart for comparison.

For simplicity, we did a cost comparison on the runtime of the training instances only.

For a more granular estimate of the total cost incurred, including the cost of Studio notebooks as well as the real-time endpoints used for inferencing, refer to the AWS Pricing Calculator for SageMaker.

The results indicate considerable gains in accuracy when moving from the Type 1 (legacy) to Type 2 (latest) model. The mAP score went up from 0.067 to 0.371 without using HPO and 0.226 to 0.375 with HPO respectively. The Type 2 model also took longer to train with the same instance type, implying that the accuracy gains also meant higher infrastructure cost. However, all mentioned approaches outperformed the DDN model (introduced in Visual inspection automation using Amazon SageMaker JumpStart) on all metrics. Training the Type 1 (legacy) model took 34 minutes, the Type 2 (latest) model took 1 hour, and the DDN model took over 8 hours. This indicates that fine-tuning a pre-trained model is much more efficient than training a model from scratch.

We also found that HPO (SageMaker automatic model tuning) is extremely effective, especially for models with large hyperparameter search spaces with 4x improvement in mAP score for Type 1 (legacy) model. We noted that we yielded much better model accuracy results when fine-tuning on three hyperparameters (learning rate, momentum, and weight decay) for the Type 1 (legacy) models as opposed to only one hyperparameter (ADAM learning rate) for the Type 2 (latest) model. This is because there is a relatively larger search space and therefore more room for improvement for the Type 1 (legacy) model. However, we need to trade off model performance with infrastructure cost and training time when running HPO.

Conclusion

In this post, we walked through the many ML model training options available with SageMaker and focused specifically on SageMaker built-in algorithms and pre-trained models. We introduced Type 1 (legacy) and Type 2 (latest) models. The built-in Sagemaker object detection models discussed in this post were pre-trained on large-scale datasets—the ImageNet dataset includes 14,197,122 images for 21,841 categories, and the PASCAL VOC dataset includes 11,530 images for 20 categories. The pre-trained models have learned rich and diverse low-level features, and can efficiently transfer knowledge to fine-tuned models and focus on learning high-level semantic features for the target dataset. You can find all built-in algorithms and fine-tunable pre-trained models at Built-in Algorithms with pre-trained Model Table and choose one for your use case. The use cases span from text summarization and question answering to computer vision and regression or classification.

In the beginning, we made an assertion that fine-tuning a SageMaker pre-trained model will take a fraction of training time that training a model from scratch. We trained a DNN model from scratch and introduced two types of SageMaker Built-in algorithms with pretrained models: Type (legacy) and Type 2 (latest). We further showcased four approaches, two of which used SageMaker automated model tuning, and finally arrived at the most performant model. When considering both training time as well as runtime cost, all SageMaker built-in algorithms outperformed the DDN model, thereby validating our assertion.

Although both Type 1 (legacy) and Type 2 (latest) outperformed training the DDN model from scratch, visual and numerical comparison confirmed that the Type 2 (latest) model and Type 2 (latest) model with HPO outperforms Type 1 (legacy) models. HPO had a big impact on accuracy for Type 1 models; however, it saw modest gains using HPO for Type 2 models, due to a constricted hyperparameter space.

In summary, for certain use cases, fine-tuning a pretrained model is both more efficient and more performant. We suggest taking advantage of the pre-trained Sagemaker built-in pretrained models and fine-tune on your target datasets. To get started, you need a Studio environment. For more information, refer to the Studio Development Guide and make sure to enable SageMaker projects and JumpStart. When your Studio setup is complete, navigate to the Studio Launcher to find the full list of JumpStart solutions and models. To recreate or modify the experiment in this post, choose the “Product Defect Detection” solution, which comes prepackaged with the notebook used to experiment, as shown in the following video. After you launch the solution, you can access the mentioned work in the notebook titled visual_object_detection.ipynb.


About the authors

Vedant Jain is a Sr. AI/ML Specialist Solutions Architect, helping customers derive value out of the Machine Learning ecosystem at AWS. Prior to joining AWS, Vedant has held ML/Data Science Specialty positions at various companies such as Databricks, Hortonworks (now Cloudera) & JP Morgan Chase. Outside of his work, Vedant is passionate about making music, using Science to lead a meaningful life & exploring delicious vegetarian cuisine from around the world.

Tao Sun is an Applied Scientist in Amazon Search. He obtained his Ph.D. in Computer Science from University of Massachusetts, Amherst. His research interests lie in deep reinforcement learning and probabilistic modeling. In the past, Tao worked for AWS Sagemaker Reinforcement Learning team and contributed to RL research and applications. Tao is now working on Page Template Optimization at Amazon Search.

Read More

InformedIQ automates verifications for Origence’s auto lending using machine learning

This post was co-written with Robert Berger and Adine Deford from InformedIQ.

InformedIQ is the leader in AI-based software used by the nation’s largest financial institutions to automate loan processing verifications and consumer credit applications in real time per the lenders’ policies. They improve regulatory compliance, reduce cost, and increase accuracy by decreasing human error rates that are caused by the repetitive nature of tasks. Informed partnered with Origence (the nation’s leading lending technology solutions and services provider for 1,130 credit unions serving over 64 million members) to power Origence’s document process automation functionality for indirect lending to automatically identify documents and validate financing policies, creating a better credit union and dealer experience for their network of over 15,000 dealers. To date, $110 billion in auto loans have originated with Informed’s automation, which is 8% of all US auto loans. Six of the top 10 consumer lenders trust Informed’s technology.

In this post, we learn about the challenges faced and how machine learning (ML) solved the problems.

Problem statement

Manual loan verification document processing is time-consuming. The verification includes consumer stipulations like proof of residence, identity, insurance, and income. It can be prone to human error due to the repetitive nature of tasks.

With ML and automation, Informed can provide a software solution that is available 24/7, over holidays and weekends. The solution works accurately without conscious or unconscious bias to calculate and clear stipulations in under 30 seconds, vs. an average of 7 days for loan verifications, with 99% accuracy.

Solution overview

Informed uses a wide range of AWS offerings and capabilities, including Amazon SageMaker and Amazon Textract in their ML stack to power Origence’s document process automation functionality. The solution automatically extracts data and classifies documents (for example, driver’s license, paystub, W2 form, or bank statement), providing the required fields for the consumer verifications used to determine if the lender will grant the loan. Through accurate income calculations and validation of applicant data, loan documents, and documented classification, loans are processed faster and more accurately, with reduced human errors and fraud risk, and added operational efficiency. This helps in creating a better consumer, credit union, and dealer experience.

To classify and extract information needed to validate information in accordance with a set of configurable funding rules, Informed uses a series of proprietary rules and heuristics, text-based neural networks, and image-based deep neural networks, including Amazon Textract OCR via the DetectDocumentText API and other statistical models. The Informed API model can be broken down into five functional steps, as shown in the following diagram: image processing, classification, image feature computations, extractions, and stipulation verification rules, before determining the decision.

Given a sequence of pages for various document types (bank statement, driver’s license, paystub, SSI award letter, and so on), the image processing step performs the necessary image enhancements for each page and invokes multiple APIs, including Amazon Textract OCR for image to text conversion. The rest of the processing steps use the OCR text obtained from image processing and the image for each page.

Main advantages

Informed provides solutions to the auto lending industry that reduce manual processes, support compliance and quality, mitigate risk, and deliver significant cost savings to their customers. Let’s dive into two main advantages of the solution.

Automation at scale with efficiency

The adoption of AWS Cloud technologies and capabilities has helped Informed address a wider range of document types and onboard new partners. Informed has developed integrated, AI/ML-enabled solutions, and continuously strives for innovation to better serve clients.

Almost the entirety of the Informed SaaS service is hosted and enabled by AWS services. Informed is able to offload the undifferentiated heavy lifting for scalable infrastructure and focus on their business objectives. Their architecture includes load balancers, Amazon API Gateway, Amazon Elastic Container Service (Amazon ECS) containers, serverless AWS Lambda, Amazon DynamoDB, and Amazon Relational Database Service (Amazon RDS), in addition to ML technologies like Amazon Textract and SageMaker.

Reducing cost in document extraction

Informed uses new features from Amazon Textract to improve the accuracy of data extraction from documents such as bank statements and paystubs. Amazon Textract is an AI/ML service that automatically extracts text, handwriting, and other forms of metadata from scanned documents, forms, and tables in ways that make further ML processing more efficient and accurate. Informed uses AWS Textract OCR and Analyze Document APIs for both tables and forms as part of the verification process. Informed’s artificial intelligence modeling engine performs complex calculations, ensuring accuracy, identifying omissions, and combating fraud. With AWS, they continue to advance the accuracy and speed of the solution, helping lenders become more efficient by lowering loan processing costs and reducing time to process and fund. With a 99% accuracy rate for field prediction, dealers and credit unions can now focus less on collecting and validating data and more on developing strong customer relationships.

“Partnering with Informed.IQ to integrate their leading AI-based technology allows us to advance our lending systems’ capabilities and performance, further streamlining the overall loan process for our credit unions and their members”

– Brian Hendricks, Chief Product Officer at Origence.

Conclusion

Informed is constantly improving the accuracy, efficiency, and breadth of their automated loan document verifications. This solution can benefit any lending document verification process like personal and student loans, HELOCs, and powersports. The adoption of AWS Cloud technologies and capabilities has helped Informed address the growing complexity of the lending process and improve the dealer and customer experience. With AWS, the company continues to add enhancements that help lenders become more efficient, lower loan processing costs, and provide serverless computing.

Now that you have learned about how ML and automation can solve the loan document verification process, you can get started using Amazon Textract. You can also try out intelligent document processing workshops. Visit Automated data processing from documents to learn more about reference architectures, code samples, industry use cases, blog posts, and more.


About the authors

Robert Berger is the Chief Architect at InformedIQ. He is leading the transformation of the InformedIQ SaaS into a full Serverless Microservice architecture leveraging AWS Cloud, DevOps and Data Oriented Programming. Principal or founder in several other start-ups including InterNex, MetroFi, UltraDevices, Runa, Mist Systems and Omnyway.

Adine Deford is the VP of Marketing at Informed.IQ. She has more than 25 years of technology marketing experience serving industry leaders, world class marketing agencies and technology start-ups.

Jessica Oliveira is an Account Manager at AWS who provides guidance and support to SMB customers in Northern California. She is passionate about building strategic collaborations to help ensure her customer’s success. Outside of work, she enjoys traveling, learning about different languages and cultures, and spending time with her family.

Malini Chatterjee is a Senior Solutions Architect at AWS. She provides guidance to AWS customers on their workloads across a variety of AWS technologies. She brings a breadth of expertise in Data Analytics and Machine Learning. Prior to joining AWS she was architecting data solutions in financial industries. She is very interested in Amazon Future Engineer program enabling middle-school, high-school kids see the art of the possible in STEM. She is very passionate about semi-classical dancing and performs in community events. She loves traveling and spending time with her family.

Read More

Fall Into October With 25 New Games Streaming on GeForce NOW

Cooler weather, the changing colors of the leaves, the needless addition of pumpkin spice to just about everything, and discount Halloween candy are just some things to look forward to in the fall.

GeForce NOW members can add one more thing to the list — 25 games joining the cloud gaming library in October, including day-and-date releases like A Plague Tale: Requiem, Victoria 3 and others.

Let’s start off the cooler months with the six games streaming on GeForce NOW today.

Arriving in October

There’s a heap of gaming goodness in store for GeForce NOW members this month.

A tale continues when A Plague Tale: Requiem releases Tuesday, Oct. 18, enhanced with ray-traced effects for RTX 3080 and Priority members.

After escaping their devastated homeland in the critically acclaimed A Plague Tale: Innocence, siblings Amicia and Hugo venture south of 14th-century France to new regions and vibrant cities. But when Hugo’s powers reawaken, death and destruction return in a flood of devouring rats. Forced to flee once more, the siblings place their hopes in a prophesied island that may hold the key to saving Hugo.

The new adventure begins soon — streaming to even Macs and mobile devices with the power of the cloud — so make sure to add the game to your wishlist to start playing when it’s released.

On top of that, check out the rest of the games coming this month:

  • Asterigos: Curse of the Stars (New release on Steam, Oct. 11)
  • Kamiwaza: Way of the Thief (New release on Steam, Oct. 11)
  • Ozymandias: Bronze Age Empire Sim (New release on Steam, Oct. 11)
  • LEGO Bricktales (New release on Steam, Oct. 12)
  • PC Building Simulator 2 (New release on Epic Games Store, Oct 12)
  • The Last Oricru (New release on Steam, Oct. 13)
  • Scorn (New release on Steam and Epic Games Store, Oct. 14)
  • A Plague Tale: Requiem (New release on Steam and Epic Games Store, Oct. 18)
  • Warhammer 40,000: Shootas, Blood & Teef (New release on Steam Oct. 20)
  • FAITH: The Unholy Trinity (New release on Steam, Oct. 21)
  • Victoria 3 (New release on Steam, Oct. 25)
  • The Unliving (New release on Steam, Oct. 31)
  • Commandos 3 – HD Remaster (Steam and Epic Games Store)
  • Draw Slasher (Steam)
  • Guild Wars: Game of the Year (Steam)
  • Guild Wars: Trilogy (Steam)
  • Labyrinthine (Steam)
  • Volcanoids (Steam)
  • Monster Outbreak (Steam and Epic Games Store)

Gotta Go Fast

The great thing about GFN Thursday is that there are new games every week, so there’s no need to wait until Halloween to treat yourself to great gaming. Six games arrive today, including the new release of Dakar Desert Rally with support for NVIDIA DLSS technology.

Dakar Desert Rally on GeForce NOW
Honestly, don’t even bother going to the car wash. You’ll just get it dirty again.

Dakar Desert Rally captures the speed and excitement of Amaury Sport Organisation’s largest rally race, with a wide variety of licensed vehicles from the world’s top makers. An in-game dynamic weather system means racers will need to overcome the elements as well as the competition to win. Unique challenges and fierce, online multiplayer races are available for all members, whether an off-road simulation diehard or a casual racing fan.

This week also brings the latest season of Ubisoft’s Roller Champions. “Dragon’s Way” includes new maps, effects, cosmetics, emotes, gear and other seasonal goodies to bring out gamers’ inner beasts.

Here’s the full list of new games coming to the cloud this week:

  • Marauders (New release on Steam)
  • Dakar Desert Rally (New release on Steam)
  • Lord of Rigel (New release on Steam)
  • Priest Simulator (New release on Steam)
  • Barotrauma (Steam)
  • Black Desert Online – North America and Europe (Pearl Abyss Launcher)

Pssst – Wake Up, September Ended

Don’t sleep on these extra 13 titles that came to the cloud on top of the 22 games announced in September.

For some frightful fun as we enter Spooky Season, let us know what game still haunts your dreams on Twitter or in the comments below.

The post Fall Into October With 25 New Games Streaming on GeForce NOW appeared first on NVIDIA Blog.

Read More

Low-Rank Optimal Transport: Approximation, Statistics and Debiasing

The matching principles behind optimal transport (OT) play an increasingly important role in machine learning, a trend which can be observed when OT is used to disambiguate datasets in applications (e.g. single-cell genomics) or used to improve more complex methods (e.g. balanced attention in transformers or self-supervised learning). To scale to more challenging problems, there is a growing consensus that OT requires solvers that can operate on millions, not thousands, of points. The low-rank optimal transport (LOT) approach advocated in (Scetbon et al., 2021) holds several promises in that…Apple Machine Learning Research

Prevent account takeover at login with the new Account Takeover Insights model in Amazon Fraud Detector

Digital is the new normal, and there’s no going back. Every year, consumers visit, on average, 191 websites or services requiring a user name and password, and the digital footprint is expected to grow exponentially. So much exposure naturally brings added risks like account takeover (ATO).

Each year, bad actors compromise billions of accounts through stolen credentials, phishing, social engineering, and multiple forms of ATO. To put it into perspective: account takeover fraud increased by 90% to an estimated $11.4 billion in 2021 compared with 2020. Beyond the financial impact, ATOs damage the customer experience, threaten brand loyalty and reputation, and strain fraud teams as they manage chargebacks and customer claims.

Many companies, even those with sophisticated fraud teams, use rules-based solutions to detect compromised accounts because they’re simple to create. To bolster their defenses and reduce friction for legitimate users, businesses are increasingly investing in AI and machine learning (ML) to detect account takeovers.

AWS can help you improve your fraud mitigation with solutions like Amazon Fraud Detector. This fully managed AI service allows you to identify potentially fraudulent online activities by enabling you to train custom ML fraud detection models without ML expertise.

This post discusses how to create a real-time detector endpoint using the new Account Takeover Insights (ATI) model in Amazon Fraud Detector.

Overview of solution

Amazon Fraud Detector relies on specific models with tailored algorithms, enrichments, and feature transformations to detect fraudulent events across multiple use cases. The newly launched ATI model is a low-latency fraud detection ML model designed to detect potentially compromised accounts and ATO fraud. The ATI model detects up to four times more ATO fraud than traditional rules-based account takeover solutions while minimizing the level of friction for legitimate users.

The ATI model is trained using a dataset containing your business’s historical login events. Event labels are optional for model training because the ATI model uses an innovative approach to unsupervised learning. The model differentiates events generated by the actual account owner (legit events) from those generated by bad actors (anomalous events).

Amazon Fraud Detector derives the user’s past behavior by continuously aggregating the data provided. Examples of user behavior include the number of times the user signed in from a specific IP address. With these additional enrichments and aggregates, Amazon Fraud Detector can generate strong model performance from a small set of inputs from your login events.

For a real-time prediction, you call the GetEventPrediction API after a user presents valid login credentials to quantify the risk of ATO. In response, you receive a model score between 0–1000, where 0 shows low fraud risk and 1000 shows high fraud risk, and an outcome based on a set of business rules you define. You can then take the appropriate action on your end: approve the login, deny the login, or challenge the user by enforcing an additional identity verification.

You can also use the ATI model to asynchronously evaluate account logins and take action based on the outcome, such as adding the account to an investigation queue so a human reviewer can determine if further action should be taken.

The following steps outline the process of training an ATI model and publishing a detector endpoint to generate fraud predictions:

  • Prepare and validate the data.
  • Define the entity, event and event variables, and event label (optional).
  • Upload event data.
  • Initiate model training.
  • Evaluate the model.
  • Create a detector endpoint and define business rules.
  • Get real-time predictions.

Prerequisites

Before getting started, complete the following prerequisite steps:

Prepare and validate the data

Amazon Fraud Detector requires that you provide your user account login data in a CSV file encoded in the UTF-8 format. For the ATI, you must provide certain event metadata and event variables in the header line of your CSV file.

The required event metadata is as follows:

  • EVENT_ID – A unique identifier for the login event.
  • ENTITY_TYPE – The entity that performs the login event, such as a merchant or a customer.
  • ENTITY_ID – An identifier for the entity performing the login event.
  • EVENT_TIMESTAMP – The timestamp when the login event occurred. The timestamp format must be in ISO 8601 standard in UTC.
  • EVENT_LABEL (optional) – A label that classifies the event as fraudulent or legitimate. You can use any labels, such as fraud, legit, 1, or 0.

Event metadata must be in uppercase letters. Labels aren’t required for login events. However, we recommend including EVENT_LABEL metadata and providing labels for your login events if available. If you provide labels, Amazon Fraud Detector uses them to automatically calculate an Account Takeover Discovery Rate and display it in the model performance metrics.

The ATI model has both required and optional variables. Event variable names must be in lowercase letters.

The following table summarizes the mandatory variables.

Category Variable type Description
IP address IP_ADDRESS The IP address used in the login event
Browser and device USERAGENT The browser, device, and OS used in the login event
Valid credentials VALIDCRED Indicates if the credentials that were used for login are valid

The following table summarizes the optional variables.

Category Type Description
Browser and device FINGERPRINT The unique identifier for a browser or device fingerprint
Session ID SESSION_ID The identifier for an authentication session
Label EVENT_LABEL A label that classifies the event as fraudulent or legitimate (such as fraud, legit, 1, or 0)
Timestamp LABEL_TIMESTAMP The timestamp when the label was last updated; this is required if EVENT_LABEL is provided

You can provide additional variables. However, Amazon Fraud Detector won’t include these variables for training an ATI model.

Dataset preparation

As you start to prepare your login data, you must meet the following requirements:

  • Provide at least 1,500 entities (individual user accounts), each with at least two associated login events
  • Your dataset must cover at least 30 days of login events

The following configurations are optional:

  • Your dataset can include examples of unsuccessful login events
  • You can optionally label these unsuccessful logins as fraudulent or legitimate
  • You can prepare historical data with login events spanning more than 6 months and include 100,000 entities

We provide a sample dataset for testing purposes that you can use to get started.

Data validation

Before creating your ATI model, Amazon Fraud Detector checks if the metadata and variables you included in your dataset for training the model meet the size and format requirements. For more information, see Dataset validation. If the dataset doesn’t pass validation, a model isn’t created. For details on common dataset errors, see Common event dataset errors.

Define the entity, event type, and event variables

In this section, we walk through the steps to create an entity, event type, and event variables. Optionally, you can also define event labels.

Define the entity

The entity defines who is performing the event. To create an entity, complete the following steps:

  • On the Amazon Fraud Detector console, in the navigation pane, choose Entities.
  • Choose Create.
  • Enter an entity name and optional description.
  • Choose Create entity.

Define the event and event variables

An event is a business activity evaluated for fraud risk; this event is performed by the entity we just created. The event type defines the structure for an event sent to Amazon Fraud Detector, including variables of the event, the entity performing the event, and, if available, the labels that classify the event.

To create an event, complete the following steps:

  • On the Amazon Fraud Detector console, in the navigation pane, choose Events.
  • Choose Create.
  • For Name, enter a name for your event type.
  • For Entity, choose the entity created in the previous step.

Define the event variables

For event variables, complete the following steps:

  • In the Create IAM role section, enter the specific bucket name where you uploaded your training data.
    The name of the S3 bucket must be the name where you uploaded your dataset. Otherwise, you get an access denied exception error.
  • Choose Create role.

  • For Data location, enter the path to your training data, the path is the S3 URI you copied during the prerequisite steps, and choose Upload.

Amazon Fraud Detector extracts the headers from your training dataset and creates a variable for each header. Make sure to assign the variable to the correct variable type. As part of the model training process, Amazon Fraud Detector uses the variable type associated with the variable to perform variable enrichment and feature engineering. For more details about variable types, see Variable types.

Define event labels (optional)

Labels are used to categorize individual events as either fraud or legitimate. Event labels are optional for model training because the ATI model uses an innovative approach to unsupervised learning. The model differentiates events generated by the actual account owner (legit events) from those generated by abusive actors (anomalous events). We recommend you include EVENT_LABEL metadata and provide labels for your login events if available. If you provide labels, Amazon Fraud Detector uses them to automatically calculate an Account Takeover Discovery Rate and display it in the model performance metrics.

To create an event, complete the following steps:

  • Define two labels (for this post, 1 and 0).
  • Choose Create event type.

Upload event data

In this session, we walk through the steps to upload events data to the service for model training.

ATI models are trained on a dataset stored internally in Amazon Fraud Detector. By storing event data in Amazon Fraud Detector, you can train models that use auto-computed variables to improve performance, simplify model retraining, and update fraud labels to close the machine learning feedback loop. See Stored events for more information on storing your event dataset with Amazon Fraud Detector.

After you define your event, navigate to the Stored events tab. On the Stored events tab, you can see information about your dataset, such as the number of events stored and the total size of the dataset in MB. Because you just created this event type, there are no stored events yet. On this page, you can turn event ingestion on or off. When event ingestion is on, you can upload historical event data to Amazon Fraud Detector and automatically store event data from predictions in real time.

The easiest way to store historical data is by uploading a CSV file and importing the events. Alternatively, you can stream the data into Amazon Fraud Detector using the SendEvent API (see our GitHub repository for sample notebooks). To import the event from a CSV file, complete the following steps:

  • Under Import events data, choose New import.
    You likely need to create a new IAM role. The import events feature requires both read and write access to Amazon S3.

  • Create a new IAM role and provide the S3 buckets for input and output files.
    The IAM role you create grants Amazon Fraud Detector access to these buckets to read input files and store output files. If you don’t plan to store output files in a separate bucket, enter the same bucket name for both.
  • Choose Create role.

  • Enter the location of the CSV file that contains your event data. This should be the S3 URI you copied earlier.
  • Chose Start to start importing the events.

The import time varies based on the number of events you’re importing. For a dataset with 20,000 events, the process takes around 12 minutes, and after you refresh the page, the status changes to Completed. If the status changes to Error, choose the job name to show why the import failed.

Initiate model training

After successfully importing the events, you have all the pieces to initiate model training. To train a model, complete the following steps:

  • On the Amazon Fraud Detector console, in the navigation pane, choose Models.
  • Choose Add model and select Create model.
  • For Model name, enter the desired name for your model
  • For Model type, select Takeover Account Insights.
  • For Event type, choose the event type you created earlier.

  • Under Historical event data, you can specify the date range of events to train the model if needed.
  • Choose Next.

  • For this post, you configure training by identifying the variables used as inputs to the model.
  • After evaluating the variables, choose Next.

It’s a best practice to include all the available variables, even if you’re unsure about their value to the model. After the model is trained, Amazon Fraud Detector provides a ranked list of each variable’s impact on the model performance, so you can know whether to include that variable in future model training. If labels are provided, Amazon Fraud Detector uses them to evaluate and display model performance in terms of the model’s discovery rate.

If labels aren’t provided, Amazon Fraud Detector uses negative sampling to provide examples or analogous login attempts that help the model distinguish between legitimate and fraudulent activities. This produces precise risk scores that improve the model’s ability to capture incorrectly flagged legitimate activities.

After reviewing the model configured in the first two steps, choose Create and train the model.

You can see the model in training status in the console page. Creating and training the model takes approximately 45 minutes to complete. When the model has stopped training, you can check model performance by choosing the model version.

Evaluate model performance and deploy the model

In this session, we walk through the steps to review and evaluate the model performance.

Amazon Fraud Detector validates model performance using 15% of your data that wasn’t used to train the model and provides performance metrics. You need to consider these metrics and your business objectives to define a threshold that aligns with your business model. For further details on the metrics and how to determine thresholds, see Model performance metrics.

ATI is an anomaly detection model rather than a classification model; therefore, the evaluation metrics differ from classification models. When your ATI model has finished training, you can see the Anomaly Separation Index (ASI), a holistic measure of the model’s ability to identify high-risk anomalous logins. An ASI of 75% or more is considered good, 90% or more is considered high, and below 75% is considered poor.

To assist in choosing the right balance, Amazon Fraud Detector provides the following metrics to evaluate ATI model performance:

  • Anomaly Separation Index (ASI) – Summarizes the overall ability of the model to separate anomalous activities from the expected behavior of users. A model with no separability power will have the lowest possible ASI score of 0.5. In contrast, the model with a high separability power will have the highest possible ASI score of 1.0.
  • Challenge Rate (CR) – The score threshold indicates the percentage of login events the model would recommend challenging in the form of a one-time password, multi-factor authentication, identify verification, investigation, and so on.
  • Anomaly Discovery Rate (ADR) – Quantifies the percentage of anomalies the model can detect at the selected score threshold. A lower score threshold increases the percentage of anomalies captured by the model. Still, it would also require challenging a more significant percentage of login events, leading to higher customer friction.
  • ATO Discovery Rate (ATODR) – Quantifies the percentage of account compromise events that the model can detect at the selected score threshold. This metric is only available if 50 or more entities with at least one labeled ATO event are present in the ingested dataset.

In the following example, we have an ASI of 0.96 (high), which indicates a high ability to separate anomalous activities from the normal behavior of users. By writing a rule using a model score threshold of 500, you challenge or create friction on 6% of all login activities catching 96% of anomalous activities.

Another important metric is the model variable importance. Variable importance gives you an understanding of how the different variables relate to the model performance. You can have two types of variables: raw and aggregate variables. Raw variables are the ones that were defined based on the dataset, whereas aggregate variables are a combination of multiple variables that are enriched and have an aggregated importance value.

For more information about variable importance, see Model variable importance.

A variable (raw or aggregate) with a much higher number relative to the rest could indicate that the model might be overfitting. In contrast, variables with relatively lowest numbers could just be noise.

After reviewing the model performance and deciding what model score thresholds align with your business model, you can deploy the model version. For that, on the Actions menu, choose Deploy model version. With the model deployed, we create a detector endpoint and perform real-time prediction.

Create a detector endpoint and define business rules

Amazon Fraud Detector uses detector endpoints to generate fraud prediction. A detector contains detection logic, such as trained models and business rules, for a specific event you want to evaluate for fraud. Detection logic uses rules to tell Amazon Fraud Detector how to interpret the data associated with the model.

To create a detector, complete the following steps:

  • On the Amazon Fraud Detector console, in the navigation pane, choose Detectors.
  • Choose Create detector.
  • For Detector name, enter a name.
  • Optionally, describe your detector.
  • For Event type, choose the same event type as the model created earlier.
  • Choose Next.

  • On the Add model (optional) page, choose Add model.

  • To add a model, choose the model you trained and published during the model training steps and choose the active version.
  • Choose Add model.

As part of the next step, you create the business rules that define an outcome. A rule is a condition that tells Amazon Fraud Detector how to interpret variable values during a fraud prediction. A rule consists of one or more variables, a logic expression, and one or more outcomes. An outcome is the result of a fraud prediction and is returned if the rule matches during an evaluation.

  • Define decline_rule as $<your_model_name_insightscore >= 950 with outcome deny_login.
  • Define friction_rule as $ your_model_name _insightscore >= 855 and $ your_model_name_insightscore >= 950 with outcome challenge_login.
  • Define approve_rule as $account_takeover_model_insightscore < 855 with outcome approve_login.

Outcomes are strings returned in the GetEventPrediction API response. You can use outcomes to trigger events by calling applications and downstream systems or to simply identify who is likely to be fraud or legitimate.

  • On the Add Rules page, choose Next after you finish adding all your rules.

  • In the Configure rule execution section, choose the mode for your rules engine.
    The Amazon Fraud Detector rules engine has two modes: first matched or all matched. First matched mode is for sequential rule runs, returning the outcome for the first condition met. The other mode is all matched, which evaluates all rules and returns outcomes from all the matching rules. In this example, we use the first matched mode for our detector.

After this process, you’re ready to create your detector and run some tests.

  • To run a test, go to your newly created detector and choose the detector version you want to use.
  • Provide the variable values as requested and choose Run test.

As a result of the test, you receive the risk score and the outcome based on your business rules.

You can also search past predictions by going to the left panel and choosing Search past predictions. The prediction is based on each variable’s contribution to the overall likelihood of a fraudulent event. The following screenshot is an example of a past prediction showing the input variables and how they influenced the fraud prediction score.

Get real-time predictions

To get real-time predictions and integrate Amazon Fraud Detector into your workflow, we need to publish the detector endpoint. Complete the following steps:

  • Go to the newly created detector and choose the detector version, which will be version 1.
  • On the Actions menu, choose Publish.

You can perform real-time predictions with the published detector by calling the GetEventPrediction API. The following is a sample Python code for calling the GetEventPrediction API:

import boto3
fraudDetector = boto3.client('frauddetector')

fraudDetector.get_event_prediction(
detectorId = 'sample_detector',
eventId = '802454d3-f7d8-482d-97e8-c4b6db9a0428',
eventTypeName = 'sample_transaction',
eventTimestamp = '2021-01-13T23:18:21Z',
entities = [{'entityType':'customer', 'entityId':'12345'}],
eventVariables = {
    'email_address' : 'johndoe@exampledomain.com',
    'ip_address' : '1.2.3.4'
}
)

Conclusion

Amazon Fraud Detector relies on specific models with tailored algorithms, enrichments, and feature transformations to detect fraudulent events across multiple use cases. In this post, you learned how to ingest data, train and deploy a model, write business rules, and publish a detector to generate real-time fraud prediction on potentially compromised accounts.

Visit Amazon Fraud Detector to learn more about Amazon Fraud Detector or our GitHub repo for code samples, notebook, and synthetic datasets.


About the authors

Marcel Pividal is a Sr. AI Services Solutions Architect in the World-Wide Specialist Organization. Marcel has more than 20 years of experience solving business problems through technology for Fintechs, Payment Providers, Pharma, and government agencies. His current areas of focus are Risk Management, Fraud Prevention, and Identity Verification.

Mike Ames is a data scientist turned identity verification solution specialist, he has extensive experience developing machine learning and AI solutions to protect organizations from fraud, waste and abuse. In his spare time, you can find him hiking, mountain biking or playing freebee with his dog Max.

Read More

Metrics for evaluating content moderation in Amazon Rekognition and other content moderation services

Content moderation is the process of screening and monitoring user-generated content online. To provide a safe environment for both users and brands, platforms must moderate content to ensure that it falls within preestablished guidelines of acceptable behavior that are specific to the platform and its audience.

When a platform moderates content, acceptable user-generated content (UGC) can be created and shared with other users. Inappropriate, toxic, or banned behaviors can be prevented, blocked in real time, or removed after the fact, depending on the content moderation tools and procedures the platform has in place.

You can use Amazon Rekognition Content Moderation to detect content that is inappropriate, unwanted, or offensive, to create a safer user experience, provide brand safety assurances to advertisers, and comply with local and global regulations.

In this post, we discuss the key elements needed to evaluate the performance aspect of a content moderation service in terms of various accuracy metrics, and a provide an example using Amazon Rekognition Content Moderation API’s.

What to evaluate

When evaluating a content moderation service, we recommend the following steps.

Before you can evaluate the performance of the API on your use cases, you need to prepare a representative test dataset. The following are some high-level guidelines:

  • Collection – Take a large enough random sample (images or videos) of the data you eventually want to run through Amazon Rekognition. For example, if you plan to moderate user-uploaded images, you can take a week’s worth of user images for the test. We recommend choosing a set that has enough images without getting too large to process (such as 1,000–10,000 images), although larger sets are better.
  • Definition – Use your application’s content guidelines to decide which types of unsafe content you’re interested in detecting from the Amazon Rekognition moderation concepts taxonomy. For example, you may be interested in detecting all types of explicit nudity and graphic violence or gore.
  • Annotation – Now you need a human-generated ground truth for your test set using the chosen labels, so that you can compare machine predictions against them. This means that each image is annotated for the presence or absence of your chosen concepts. To annotate your image data, you can use Amazon SageMaker Ground Truth (GT)to manage image annotation. You can refer to GT for image labeling, consolidating annotations and processing annotation output.

Get predictions on your test dataset with Amazon Rekognition

Next, you want to get predictions on your test dataset.

The first step is to decide on a minimum confidence score (a threshold value, such as 50%) at which you want to measure results. Our default threshold is set to 50, which offers a good balance between retrieving large amounts of unsafe content without incurring too many false predictions on safe content. However, your platform may have different business needs, so you should customize this confidence threshold as needed. You can use the MinConfidence parameter in your API requests to balance detection of content (recall) vs the accuracy of detection (precision). If you reduce MinConfidence, you are likely to detect most of the inappropriate content, but are also likely to pick up content that is not actually inappropriate. If you increase MinConfidence you are likely to ensure that all your detected content is truly inappropriate but some content may not be tagged. We suggest experimenting with a few MinConfidence values on your dataset and quantitatively select the best value for your data domain.

Next, run each sample (image or video) of your test set through the Amazon Rekognition moderation API (DetectModerationLabels).

Measure model accuracy on images

You can assess the accuracy of a model by comparing human-generated ground truth annotations with the model predictions. You repeat this comparison for every image independently and then aggregate over the whole test set:

  • Per-image results – A model prediction is defined as the pair {label_name, confidence_score} (where the confidence score >= the threshold you selected earlier). For each image, a prediction is considered correct when it matches the ground truth (GT). A prediction is one of the following options:

    • True Positive (TP): both prediction and GT are “unsafe”
    • True Negative (TN): both prediction and GT are “safe”
    • False Positive (FP): the prediction says “unsafe”, but the GT is “safe”
    • False Negative (FN): the prediction is “safe”, but the GT is “unsafe”
  • Aggregated results over all images – Next, you can aggregate these predictions into dataset-level results:

    • False positive rate (FPR) – This is the percentage of images in the test set that are wrongly flagged by the model as containing unsafe content: (FP): FP / (TN+FP).
    • False negative rate (FNR) – This is the percentage of unsafe images in the test set that are missed by the model: (FN): FN / (FN+TP).
    • True positive rate (TPR) – Also called recall, this computes the percentage of unsafe content (ground truth) that is correctly discovered or predicted by the model: TP / (TP + FN) = 1 – FNR.
    • Precision – This computes the percentage of correct predictions (unsafe content) with regards to the total number of predictions made: TP / (TP+FP).

Let’s explore an example. Let’s assume that your test set contains 10,000 images: 9,950 safe and 50 unsafe. The model correctly predicts 9,800 out of 9,950 images as safe and 45 out of 50 as unsafe:

  • TP = 45
  • TN = 9800
  • FP = 9950 – 9800 = 150
  • FN = 50 – 45 = 5
  • FPR = 150 / (9950 + 150) = 0.015 = 1.5%
  • FNR = 5 / (5 + 45) = 0.1 = 10%
  • TPR/Recall = 45 / (45 + 5) = 0.9 = 90%
  • Precision = 45 / (45 + 150) = 0.23 = 23%

Measure model accuracy on videos

If you want to evaluate the performance on videos, a few additional steps are necessary:

  1. Sample a subset of frames from each video. We suggest sampling uniformly with a rate of 0.3–1 frames per second (fps). For example, if a video is encoded at 24 fps and you want to sample one frame every 3 seconds (0.3 fps), you need to select one every 72 frames.
  2. Run these sampled frames through Amazon Rekognition content moderation. You can either use our video API, which already samples frames for you (at a rate of 3 fps), or use the image API, in which case you want to sample more sparsely. We recommend the latter option, given the redundancy of information in videos (consecutive frames are very similar).
  3. Compute the per-frame results as explained in the previous section (per-image results).
  4. Aggregate results over the whole test set. Here you have two options, depending on the type of outcome that matters for your business:
    1. Frame-level results – This considers all the sampled frames as independent images and aggregates the results exactly as explained earlier for images (FPR, FNR, recall, precision). If some videos are considerably longer than others, they will contribute more frames to the total count, making the comparison unbalanced. In that case, we suggest changing the initial sampling strategy to a fixed number of frames per video. For example, you could uniformly sample 50–100 frames per video (assuming videos are at least 2–3 minutes long).
    2. Video-level results – For some use cases, it doesn’t matter whether the model is capable of correctly predicting 50% or 99% of the frames in a video. Even a single wrong unsafe prediction on a single frame could trigger a downstream human evaluation and only videos with 100% correct predictions are truly considered correctly. If this is your use case, we suggest you compute FPR/FNR/TPR over the frames of each video and consider the video as follows:
Video ID Accuracy Per-Video Categorization
Results Aggregated Over All the Frames of Video ID

Total FP = 0

Total FN = 0

Perfect predictions
. Total FP > 0 False Positive (FP)
. Total FN > 0 False Negative (FN)

After you have computed these for each video independently, you can then compute all the metrics we introduced earlier:

  • The percentage of videos that are wrongly flagged (FP) or missed (FN)
  • Precision and recall

Measure performance against goals

Finally, you need to interpret these results in the context of your goals and capabilities.

First, consider your business needs in regards to the following:

  • Data – Learn about your data (daily volume, type of data, and so on) and the distribution of your unsafe vs. safe content. For example, is it balanced (50/50), skewed (10/90) or very skewed (1/99, meaning that only 1% is unsafe)? Understanding such distribution can help you define your actual metric goals. For example, the number of safe content is often an order of magnitude larger than unsafe content (very skewed), making this almost an anomaly detection problem. Within this scenario, the number of false positives may outnumber the number of true positives, and you can use your data information (distribution skewness, volume of data, and so on) to decide the FPR you can work with.
  • Metric goals – What are the most critical aspects of your business? Lowering the FPR often comes at the cost of a higher FNR (and vice versa) and it’s important to find the right balance that works for you. If you can’t miss any unsafe content, you likely want close to 0% FNR (100% recall). However, this will incur the largest number of false positives, and you need to decide the target (maximum) FPR you can work with, based on your post-prediction pipeline. You may want to allow some level of false negatives to be able to find a better balance and lower your FPR: for example, accepting a 5% FNR instead of 0% could reduce the FPR from 2% to 0.5%, considerably reducing the number of flagged contents.

Next, ask yourself what mechanisms you will use to parse the flagged images. Even though the API’s may not provide 0% FPR and FNR, it can still bring huge savings and scale (for example, by only flagging 3% of your images, you have already filtered out 97% of your content). When you pair the API with some downstream mechanisms, like a human workforce that reviews the flagged content, you can easily reach your goals (for example, 0.5% flagged content). Note how this pairing is considerably cheaper than having to do a human review on 100% of your content.

When you have decided on your downstream mechanisms, we suggest you evaluate the throughput that you can support. For example, if you have a workforce that can only verify 2% of your daily content, then your target goal from our content moderation API is a flag rate (FPR+TPR) of 2%.

Finally, if obtaining ground truth annotations is too hard or too expensive (for example, your volume of data is too large), we suggest annotating the small number of images flagged by the API. Although this doesn’t allow for FNR evaluations (because your data doesn’t contain any false negatives), you can still measure TPR and FPR.

In the following section, we provide a solution for image moderation evaluation. You can take a similar approach for video moderation evaluation.

Solution overview

The following diagram illustrates the various AWS services you can use to evaluate the performance of Amazon Rekognition content moderation on your test dataset.

The content moderation evaluation has the following steps:

  1. Upload your evaluation dataset into Amazon Simple Storage Service (Amazon S3).
  2. Use Ground Truth to assign ground truth moderation labels.
  3. Generate the predicted moderation labels using the Amazon Rekognition pre-trained moderation API using a few threshold values. (For example, 70%, 75% and 80%).
  4. Assess the performance for each threshold by computing true positives, true negatives, false positives, and false negatives. Determine the optimum threshold value for your use case.
  5. Optionally, you can tailor the size of the workforce based on true and false positives, and use Amazon Augmented AI (Amazon A2I) to automatically send all flagged content to your designated workforce for a manual review.

The following sections provide the code snippets for steps 1, 2, and 3. For complete end-to-end source code, refer to the provided Jupyter notebook.

Prerequisites

Before you get started, complete the following steps to set up the Jupyter notebook:

  1. Create a notebook instance in Amazon SageMaker.
  2. When the notebook is active, choose Open Jupyter.
  3. On the Jupyter dashboard, choose New, and choose Terminal.
  4. In the terminal, enter the following code:
    cd SageMaker
    git clone https://github.com/aws-samples/amazon-rekognition-code-samples.git

  5. Open the notebook for this post: content-moderation-evaluation/Evaluating-Amazon-Rekognition-Content-Moderation-Service.ipynb.
  6. Upload your evaluation dataset to Amazon Simple Storage Service (Amazon S3).

We will now go through steps 2 through 4 in the Jupyter notebook.

Use Ground Truth to assign moderation labels

To assign labels in Ground Truth, complete the following steps:

  1. Create a manifest input file for your Ground Truth job and upload it to Amazon S3.
  2. Create the labeling configuration, which contains all moderation labels that are needed for the Ground Truth labeling job.To check the limit for the number of label categories you can use, refer to Label Category Quotas. In the following code snippet, we use five labels (refer to the hierarchical taxonomy used in Amazon Rekognition for more details) plus one label (Safe_Content) that marks content as safe:
    # customize CLASS_LIST to include all labels that can be used to classify sameple data, it's up to 10 labels
    # In order to easily match image label with content moderation service supported taxonomy, 
    
    CLASS_LIST = ["<label_1>", "<label_2>", "<label_3>", "<label_4>", "<label_5>", "Safe_Content"]
    print("Label space is {}".format(CLASS_LIST))
    
    json_body = {"labels": [{"label": label} for label in CLASS_LIST]}
    with open("class_labels.json", "w") as f:
        json.dump(json_body, f)
    
    s3.upload_file("class_labels.json", BUCKET, EXP_NAME + "/class_labels.json")

  3. Create a custom worker task template to provide the Ground Truth workforce with labeling instructions and upload it to Amazon S3.
    The Ground Truth label job is defined as an image classification (multi-label) task. Refer to the source code for instructions to customize the instruction template.
  4. Decide which workforce you want to use to complete the Ground Truth job. You have two options (refer to the source code for details):
    1. Use a private workforce in your own organization to label the evaluation dataset.
    2. Use a public workforce to label the evaluation dataset.
  5. Create and submit a Ground Truth labeling job. You can also adjust the following code to configure the labeling job parameters to meet your specific business requirements. Refer to the source code for complete instructions on creating and configuring the Ground Truth job.
    human_task_config = {
        "AnnotationConsolidationConfig": {
            "AnnotationConsolidationLambdaArn": acs_arn,
        },
        "PreHumanTaskLambdaArn": prehuman_arn,
        "MaxConcurrentTaskCount": 200,  # 200 images will be sent at a time to the workteam.
        "NumberOfHumanWorkersPerDataObject": 3,  # 3 separate workers will be required to label each image.
        "TaskAvailabilityLifetimeInSeconds": 21600,  # Your workteam has 6 hours to complete all pending tasks.
        "TaskDescription": task_description,
        "TaskKeywords": task_keywords,
        "TaskTimeLimitInSeconds": 180,  # Each image must be labeled within 3 minutes.
        "TaskTitle": task_title,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://{}/{}/instructions.template".format(BUCKET, EXP_NAME),
        },
    }

After the job is submitted, you should see output similar to the following:

Labeling job name is: ground-truth-cm-1662738403

Wait for labeling job on the evaluation dataset to complete successfully, then continue to the next step.

Use the Amazon Rekognition moderation API to generate predicted moderation labels.

The following code snippet shows how to use the Amazon Rekognition moderation API to generate moderation labels:

client=boto3.client('rekognition')
def moderate_image(photo, bucket):
    response = client.detect_moderation_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}})
    return len(response['ModerationLabels'])

Assess the performance

You first retrieved ground truth moderation labels from the Ground Truth labeling job results for the evaluation dataset, then you ran the Amazon Rekognition moderation API to get predicted moderation labels for the same dataset. Because this is a binary classification problem (safe vs. unsafe content), we calculate the following metrics (assuming unsafe content is positive):

We also calculate the corresponding evaluation metrics:

The following code snippet shows how to calculate those metrics:

FPR = FP / (FP + TN)
FNR = FN / (FN + TP)
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)

Conclusion

This post discusses the key elements needed to evaluate the performance aspect of your content moderation service in terms of various accuracy metrics. However, accuracy is only one of the many dimensions that you need to evaluate when choosing a particular content moderation service. It’s critical that you include other parameters, such as the service’s total feature set, ease of use, existing integrations, privacy and security, customization options, scalability implications, customer service, and pricing. To learn more about content moderation in Amazon Rekognition, visit Amazon Rekognition Content Moderation.


About the authors

Amit Gupta is a Senior AI Services Solutions Architect at AWS. He is passionate about enabling customers with well-architected machine learning solutions at scale.

Davide Modolo is an Applied Science Manager at AWS AI Labs. He has a PhD in computer vision from the University of Edinburgh (UK) and is passionate about developing new scientific solutions for real-world customer problems. Outside of work, he enjoys traveling and playing any kind of sport, especially soccer.

Jian Wu is a Senior Enterprise Solutions Architect at AWS. He’s been with AWS for 6 years working with customers of all sizes. He is passionate about helping customers to innovate faster via the adoption of the Cloud and AI/ML. Prior to joining AWS, Jian spent 10+ years focusing on software development, system implementation and infrastructure management. Aside from work, he enjoys staying active and spending time with his family.

Read More