Facebook awards $200,000 to 2020 Internet Defense Prize winners at USENIX Security

This week, Facebook and USENIX awarded a total of $200,000 to the top three winners of the Internet Defense Prize at the 29th USENIX Security Symposium. Created in 2014, the award is funded by Facebook and offered in partnership with USENIX to celebrate contributions to the protection and defense of the internet.

This year we awarded a $100,000 first-place prize to Sathvik Prasad, Elijah Bouma-Sims, Athishay Kiran Mylappan, and Bradley Reaves at North Carolina State University for their work titled “Who’s Calling? Characterizing Robocalls through Audio and Metadata Analysis.” The paper discusses an 11-month “honeypot” study the team conducted to understand robocalls. The team’s findings surface potential technological and policy solutions to combat robocalling.

The second-place prize of $60,000 was awarded to Adam Oest (Arizona State University), Penghui Zhang (Arizona State University), Brad Wardman (PayPal, Inc.), Eric Nunes (PayPal, Inc.), Jakub Burgis (PayPal, Inc.), Ali Zand (Google), Kurt Thomas (Google), Adam Doupé (Arizona State University), and Gail-Joon Ahn (Samsung Research) for their paper, “Sunrise to Sunset: Analyzing the End-to-End Life Cycle and Effectiveness of Phishing Attacks at Scale.” This paper explores the impact of real-world phishing attacks on customers of a financial institution.

Our third-place prize of $40,000 went to Emily Tseng (Cornell University), Rosanna Bellini (Open Lab, Newcastle University), Nora McDonald (University of Maryland, Baltimore County), Matan Danos (Weizmann Institute of Science), Rachel Greenstadt (New York University), Damon McCoy (New York University), Nicola Dell (Cornell Tech), and Thomas Ristenpart (Cornell Tech) for their research titled, “The Tools and Tactics Used in Intimate Partner Surveillance: An Analysis of Online Infidelity Forums.” This paper takes a deeper look into how victims of intimate partner violence are surveilled.

We’d like to congratulate the 2020 winners of the Internet Defense Prize and thank them for their contributions to help make the internet more secure. To learn more about the Internet Defense Prize and past winners, please visit the Internet Defense Prize website.

The post Facebook awards $200,000 to 2020 Internet Defense Prize winners at USENIX Security appeared first on Facebook Research.

Read More

Amazon EC2 Inf1 instances featuring AWS Inferentia chips now available in five new Regions and with improved performance

Amazon EC2 Inf1 instances featuring AWS Inferentia chips now available in five new Regions and with improved performance

Following strong customer demand, AWS has expanded the availability of Amazon EC2 Inf1 instances to five new Regions: US East (Ohio), Asia Pacific (Sydney, Tokyo), and Europe (Frankfurt, Ireland). Inf1 instances are powered by AWS Inferentia chips, which Amazon custom-designed to provide you with the lowest cost per inference in the cloud and lower barriers for everyday developers to use machine learning (ML) at scale.

As you scale your use of deep learning across new applications, you may be bound by the high cost of running trained ML models in production. In many cases, up to 90% of the infrastructure spent on developing and running an ML application is on inference, making the need for high-performance, cost-effective ML inference infrastructure critical. Inf1 instances are built from the ground up to support ML inference applications and deliver up to 30% higher throughput and up to 45% lower cost per inference than comparable GPU-based instances. This gives you the performance and cost structure you need to confidently deploy your deep learning models across a broad set of applications.

Customers and Amazon services adopting Inf1instances

Since the launch of Inf1 instances, a broad spectrum of customers, such as large enterprises and startups, as well as Amazon services, have begun using them to run production workloads. Amazon’s Alexa team is in the process of migrating their Text-To-Speech workload from running on GPUs to Inf1 instances. INGA Technology, a startup focused on advanced text summarization, got started with Inf1 instances quickly and saw immediate gains.

“We quickly ramped up on AWS Inferentia-based Amazon EC2 Inf1 instances and integrated them in our development pipeline,” says Yaroslav Shakula, Chief Business Development Officer at INGA Technologies. “The impact was immediate and significant. The Inf1 instances provide high performance, which enables us to improve the efficiency and effectiveness of our inference model pipelines. Out of the box, we have experienced four times higher throughput, and 30% lower overall pipeline costs compared to our previous GPU-based pipeline.”

SkyWatch provides you with the tools you need to cost-effectively add Earth observation data into your applications. They use deep learning to process hundreds of trillions of pixels of Earth observation data captured from space every day.

“Adopting the new AWS Inferentia-based Inf1 instances using Amazon SageMaker for real-time cloud detection and image quality scoring was quick and easy,” says Adler Santos, Engineering Manager at SkyWatch. “It was all a matter of switching the instance type in our deployment configuration. By switching instance types to AWS Inferentia-based Inf1, we improved performance by 40% and decreased overall costs by 23%. This is a big win. It has enabled us to lower our overall operational costs while continuing to deliver high-quality satellite imagery to our customers, with minimal engineering overhead.”

AWS Neuron SDK performance and support for new ML models

You can deploy your ML models to Inf1 instances using the AWS Neuron SDK, which is integrated with popular ML frameworks such as TensorFlow, PyTorch, and MXNet. Because Neuron is integrated with ML frameworks, you can deploy your existing models to Amazon EC2 Inf1 instances with minimal code changes. This gives you the freedom to maintain hardware portability and take advantage of the latest technologies without being tied to vendor-specific software libraries.

Since its launch, the Neuron SDK has seen dramatic improvement in performance, delivering throughput up to two times higher for image classification models and up to 60% improvement for natural language processing models. The most recent launch of Neuron added support for OpenPose, a model for multi-person keypoint detection, providing 72% lower cost per inference than GPU instances.

Getting started

The easiest and quickest way to get started with Inf1 instances is via Amazon SageMaker, a fully managed service for building, training, and deploying ML models. If you prefer to manage your own ML application development platforms, you can get started by either launching Inf1 instances with AWS Deep Learning AMIs, which include the Neuron SDK, or use Inf1 instances via Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS) for containerized ML applications.

For more information, see Amazon EC2 Inf1 Instances.


About the Author

Michal Skiba is a Senior Product Manager at AWS and passionate about enabling developers to leverage innovative hardware. Over the past ten years he has managed various cloud computing infrastructure products at Silicon Valley companies, large and small.

 

 

 

 

 

Read More

On-device, Real-time Body Pose Tracking with MediaPipe BlazePose

On-device, Real-time Body Pose Tracking with MediaPipe BlazePose

Posted by Valentin Bazarevsky and Ivan Grishchenko, Research Engineers, Google Research

Pose estimation from video plays a critical role enabling the overlay of digital content and information on top of the physical world in augmented reality, sign language recognition, full-body gesture control, and even quantifying physical exercises, where it can form the basis for yoga, dance, and fitness applications. Pose estimation for fitness applications is particularly challenging due to the wide variety of possible poses (e.g., hundreds of yoga asanas), numerous degrees of freedom, occlusions (e.g. the body or other objects occlude limbs as seen from the camera), and a variety of appearances or outfits.

BlazePose results on fitness and dance use-cases.

Today we are announcing the release of a new approach to human body pose perception, BlazePose, which we presented at the CV4ARVR workshop at CVPR 2020. Our approach provides human pose tracking by employing machine learning (ML) to infer 33, 2D landmarks of a body from a single frame. In contrast to current pose models based on the standard COCO topology, BlazePose accurately localizes more keypoints, making it uniquely suited for fitness applications. In addition, current state-of-the-art approaches rely primarily on powerful desktop environments for inference, whereas our method achieves real-time performance on mobile phones with CPU inference. If one leverages GPU inference, BlazePose achieves super-real-time performance, enabling it to run subsequent ML models, like face or hand tracking.

Upper-body BlazePose model in MediaPipe

Topology
The current standard for human body pose is the COCO topology, which consists of 17 landmarks across the torso, arms, legs, and face. However, the COCO keypoints only localize to the ankle and wrist points, lacking scale and orientation information for hands and feet, which is vital for practical applications like fitness and dance. The inclusion of more keypoints is crucial for the subsequent application of domain-specific pose estimation models, like those for hands, face, or feet.

With BlazePose, we present a new topology of 33 human body keypoints, which is a superset of COCO, BlazeFace and BlazePalm topologies. This allows us to determine body semantics from pose prediction alone that is consistent with face and hand models.

BlazePose 33 keypoint topology as COCO (colored with green) superset

Overview: An ML Pipeline for Pose Tracking
For pose estimation, we utilize our proven two-step detector-tracker ML pipeline. Using a detector, this pipeline first locates the pose region-of-interest (ROI) within the frame. The tracker subsequently predicts all 33 pose keypoints from this ROI. Note that for video use cases, the detector is run only on the first frame. For subsequent frames we derive the ROI from the previous frame’s pose keypoints as discussed below.

Human pose estimation pipeline overview.

Pose Detection by extending BlazeFace
For real-time performance of the full ML pipeline consisting of pose detection and tracking models, each component must be very fast, using only a few milliseconds per frame. To accomplish this, we observe that the strongest signal to the neural network about the position of the torso is the person’s face (due to its high-contrast features and comparably small variations in appearance). Therefore, we achieve a fast and lightweight pose detector by making the strong (yet for many mobile and web applications valid) assumption that the head should be visible for our single-person use case.

Consequently, we trained a face detector, inspired by our sub-millisecond BlazeFace model, as a proxy for a pose detector. Note, this model only detects the location of a person within the frame and can not be used to identify individuals. In contrast to the Face Mesh and MediaPipe Hand tracking pipelines, where we derive the ROI from predicted keypoints, for the human pose tracking we explicitly predict two additional virtual keypoints that firmly describe the human body center, rotation and scale as a circle. Inspired by Leonardo’s Vitruvian man, we predict the midpoint of a person’s hips, the radius of a circle circumscribing the whole person, and the incline angle of the line connecting the shoulder and hip midpoints. This results in consistent tracking even for very complicated cases, like specific yoga asanas. The figure below illustrates the approach.

Vitruvian man aligned via two virtual keypoints predicted by our BlazePose detector in addition to the face bounding box

Tracking Model
The pose estimation component of the pipeline predicts the location of all 33 person keypoints with three degrees of freedom each (x, y location and visibility) plus the two virtual alignment keypoints described above. Unlike current approaches that employ compute-intensive heatmap prediction, our model uses a regression approach that is supervised by a combined heat map/offset prediction of all keypoints, as shown below.

Tracking network architecture: regression with heatmap supervision

Specifically, during training we first employ a heatmap and offset loss to train the center and left tower of the network. We then remove the heatmap output and train the regression encoder (right tower), thus, effectively using the heatmap to supervise a lightweight embedding.

The table below shows an ablation study of the model quality resulting from different training strategies. As an evaluation metric, we use the Percent of Correct Points with 20% tolerance (PCK@0.2) (where we assume the point to be detected correctly if the 2D Euclidean error is smaller than 20% of the corresponding person’s torso size). To obtain a human baseline, we asked annotators to annotate several samples redundantly and obtained an average PCK@0.2 of 97.2. The training and validation have been done on a geo-diverse dataset of various poses, sampled uniformly.

To cover a wide range of customer hardware, we present two pose tracking models: lite and full, which are differentiated in the balance of speed versus quality. For performance evaluation on CPU, we use XNNPACK; for mobile GPUs, we use the TFLite GPU backend.

Applications
Based on human pose, we can build a variety of applications, like fitness or yoga trackers. As an example, we present squats and push up counters, which can automatically count user statistics, or verify the quality of performed exercises. Such use cases can be implemented either using an additional classifier network or even with a simple joint pairwise distance lookup algorithm, which matches the closest pose in normalized pose space.

The number of performed exercises counter based on detected body pose. Left: Squats; Right: Push-Ups

Conclusion
BlazePose will be available to the broader mobile developer community via the Pose detection API in the upcoming release of ML Kit, and we are also releasing a version targeting upper body use cases in MediaPipe running in Android, iOS and Python. Apart from the mobile domain, we preview our web-based in-browser version as well. We hope that providing this human pose perception functionality to the broader research and development community will result in an emergence of creative use cases, stimulating new applications, and new research avenues.

We plan to extend this technology with more robust and stable tracking to an even larger variety of human poses and activities. In the accompanying Model Card, we detail the intended uses, limitations and model fairness to ensure that use of these models aligns with Google’s AI Principles. We believe that publishing this technology can provide an impulse to new creative ideas and applications by the members of the research and developer community at large. We are excited to see what you can build with it!

BlazePose results on yoga use-cases

Acknowledgments
Special thanks to all our team members who worked on the tech with us: Fan Zhang, Artsiom Ablavatski, Yury Kartynnik, Tyler Zhu, Karthik Raveendran, Andrei Vakunov, Andrei Tkachenka, Marat Dukhan, Tyler Mullen, Gregory Karpiak, Suril Shah, Buck Bourdon, Jiuqiang Tang, Ming Guang Yong, Chuo-Ling Chang, Esha Uboweja, Siarhei Kazakou, Andrei Kulik, Matsvei Zhdanovich, and Matthias Grundmann.

Read More

Recommendations for researchers to more accurately measure time spent on Facebook

Recommendations for researchers to more accurately measure time spent on Facebook

People often ask whether spending time on social media is good or bad for us. To answer this question, researchers need accurate ways to measure how much time people spend on platforms like Facebook, among other things. The most common approach, found in the vast majority of published studies, is through survey questions asking participants how much time they spent on these platforms. However, participants’ reports of their own use have well-documented limitations. Participants may not report accurately, because they either can’t recall or don’t know. Keeping track of time is hard, and people may report in biased or skewed ways. Some people may be more prone to recall errors. Further, validating these self-report measures is challenging in the absence of data from internal server logs.

Our aim is to provide researchers with validated self-report time measures that more closely capture people’s actual time spent on Facebook. In our latest paper, “How Well Do People Report Time Spent on Facebook? An Evaluation of Established Survey Questions with Recommendations” (CHI 2020), we evaluate common survey questions from the literature, provide recommendations to researchers, and provide translations for 14 languages.

We compared data from 10 self-reported Facebook-use survey measures deployed in 15 countries (N = 49,934) against data from Facebook’s server logs. We found that:

  • Participants significantly overestimated how much time they spent on Facebook and underestimated the number of times they visited. For example, on one survey question, people overestimated how much time they spent on Facebook by an average of 3.2 hours per day (see Figure 1).
  • Self-reported time spent was only moderately correlated with actual Facebook use (r = 0.23–0.42 across the 10 questions).
  • Some questions caused underestimation, while others caused overestimation. Only 27 percent were able to respond accurately even on the best-performing question.
  • The more time people spent on Facebook, the more likely it was that they misreported their time.
  • Teens and young adults have more error reporting their time on Facebook, which is notable because of the high reliance on college-aged samples in many fields.

Figure 1

Participants asked “How many hours a day, if any, do you typically spend using Facebook?” overestimated their time on Facebook by an average of 3.2 hours per day.

Informed by these results, we recommend the following to researchers aiming to measure time spent on Facebook:

1. To reduce measurement error, we recommend that researchers ask participants to report data from time management tools like Your Time on Facebook rather than try to estimate it themselves.

2. When time spent must be collected via self-report, we recommend the following wording from Ellison et al. (2007), which had the lowest error in our study.

  • In the past week, on average, approximately how much time PER DAY have you spent actively using Facebook?
    • Less than 10 minutes per day
    • 10–30 minutes per day
    • 31–60 minutes per day
    • 1–2 hours per day
    • 2–3 hours per day
    • More than 3 hours per day

3. Because self-reports of time spent are imprecise, we suggest that researchers not use these values directly but rather interpret people’s self-reported time spent as a noisy estimate of where they fall on a distribution relative to other respondents.

While our focus here is “time spent” questions, because these are very common in the literature, a growing body of studies shows that merely examining the amount of time an individual uses social media is inadequate for many questions of interest (such as how social media use might be associated with loneliness, social comparison, or academic performance). Instead, we recommend focusing on how people use social media, as discussed in the following studies:

  1. The relationship between Facebook use and well-being depends on communication type and tie strength by Moira Burke and Robert E. Kraut
  2. Social capital and resource requests on Facebook by Nicole B. Ellison, Rebecca Gray, Cliff Lampe, and Andrew T. Fiore
  3. Do social network sites enhance or undermine subjective well-being? A critical review. by Philippe Verduyn, Oscar Ybarra, Maxime Résibois, John Jonides, and Ethan Kross

Beyond these implications to researchers, we hope tools such as Your Time on Facebook provide people with more insight into the time they spend on our platform, and foster conversations around their perceptions of use and online habits. Connecting our own perceptions of Facebook use (“How is the time I spend on Facebook good/bad for me?”) and what scientific research tells us about social media’s impact on our lives is also crucial. In that regard, we hope insights from our study provide readers with tools to critically engage with science communication on social media use and well-being.

With the evolution of the platform with new features and shifts in how people use Facebook, even the strongest survey measures are likely to evolve. That said, employing a stable set of established measures is an important methodological practice for researchers to support comparative work within the scientific community. We hope to make a positive contribution by providing such validated measurements that support international, academic, and comparative work on the impact of Facebook on people’s lives.

The post Recommendations for researchers to more accurately measure time spent on Facebook appeared first on Facebook Research.

Read More

Expanding scientific portfolios and adapting to a changing world with Amazon Personalize

Expanding scientific portfolios and adapting to a changing world with Amazon Personalize

This is a guest blog post by David A. Smith at Thermo Fisher. In their own words, “Thermo Fisher Scientific is the world leader in serving science. Our Mission is to enable our customers to make the world healthier, cleaner, and safer. Whether our customers are accelerating life sciences research, solving complex analytical challenges, improving patient diagnostics and therapies, or increasing productivity in their laboratories, we are here to support them”

Researchers in Life Sciences perform increasingly complex work in an industry that’s changing at an accelerated pace. With the recent focus on the COVID-19 pandemic, scientists around the world are under the microscope as they work to deliver a cure. At Thermo Fisher, our driving principle is to provide these researchers, and others like them, the tools and materials they need to study the world’s most pressing problems.

The specialized products we sell have always necessitated personalized customer experiences. We sell nearly every type of product related to scientific work, from everyday essentials like labware and chemical reagents to specialized instrumentation for genetic sequencing. Our goal is to let our customers know that they can get everything they need at Thermo Fisher. Traditionally, we approached this problem via dedicated commercial sales teams trained to handle specific products. In today’s world, customer data comes from many different touchpoints, which makes it increasingly difficult for our sales teams to understand which products their customers need to do their research.

Over the last three years, my team has maintained a custom portal for these sales teams where they can see data for every part of their customers’ journey. This fast-moving environment presents a unique opportunity for us to use data science to deliver personalized product recommendations that target the right products for the right customers at the right time.

In this post, we discuss how and why we decided to use Amazon Personalize and how that decision has empowered our team to deliver highly personalized, multi-channel content in an ever-developing ecosystem.

First-generation recommendations

Our team initially developed a rules-based recommendation system based on content curated by in-house scientists and run using SQL queries within our Amazon Redshift cluster.

We had this system in place for a year, and it worked well, but as our data volume grew, our team was spending more and more time maintaining the system. We felt that our current infrastructure wasn’t keeping up, and we wanted to migrate to a completely serverless infrastructure for improved scalability and fault tolerance. The following diagram illustrates our existing recommendations infrastructure.

Another risk we identified was that these recommendations relied on an internal content creation process to understand where products fit in the customer journey. Although this was a powerful tool, we struggled to provide high-quality recommendations for new or recently introduced products. This is a classic “cold-start” problem for recommender systems, and one of our requirements for any new system was that it could surface new items without additional maintenance.

Custom recommendations

Our team initially looked at third-party vendors to help improve our recommendations. However, we found that purchasing a solution would be costly to implement and force us to sacrifice some of the flexibility required to operate in a commercial organization. We quickly decided against buying an off the-shelf solution.

The consensus was that we would build a custom machine learning (ML)-based system from scratch. We explored a few different options, including hierarchical recurrent neural network (HRNN) models. Eventually, we settled on a factorization machine model as the best combination of performance, ease of implementation, and scalability.

Personalized recommendations

About 8 weeks later, we were wrapping up the initial phases of model development and validation. The new system was performing well. We had significantly improved our predictions, and we were getting good feedback from some sample recommendations we had sent out.

We were gearing up to productionize our new solution when our team learned about Amazon Personalize. It was immediately apparent to us that Amazon Personalize had the ideal balance of flexibility, scalability, and measurability we were looking for when we had evaluated off-the-shelf solutions 2 months prior.

We decided to run some initial tests with Amazon Personalize to see how it performed on real data and get a feel for how much effort would be required to implement it. It took 2 days to prepare the data, train a model, and begin generating high-quality recommendations.

Bringing the test together

As a team that had recently planned for 4–6 weeks spent deploying our custom model into production, this was very attractive. For me, the data scientist responsible for successfully designing, building, and evaluating a completely homegrown solution, it was less attractive. I was excited about finally deploying our custom solution, and I was proud of its performance. We eventually decided to put the two models head-to-head, with the winner determined by the best combination of model performance, scalability, and flexibility.

Like any proud parent, I immediately set out to prove the custom model was better. I designed 32 tests for each model, and, over the next week, I ran each test on over 100 different slices of data to see which performed better on a holdout dataset. The deeper and more expressive neural network models provided by Amazon Personalize did a better job of predicting user behavior over roughly 80% of the testing criteria.

If you’re a data scientist, this story might make you cringe, but it has a happy ending. Designing this testing process forced me to examine our data even more deeply and creatively than I had while building our custom recommendation system. I was able to rapidly test all the different hypotheses and use the results to develop a deep understanding of each model’s relative strengths and weaknesses related to the business problem we initially set out to solve.

Our team couldn’t have performed such a thorough analysis if we were also managing the infrastructure required for deep learning models. As a team, we had the choice to either spend 6–8 weeks deploying our custom model or 2 weeks implementing a recommender system using Amazon Personalize.

Serverless infrastructure

Scalability and fault tolerance were our main priorities when designing the infrastructure for our scientific product recommendations. We also wanted a system that would allow us to visually monitor progress and track errors.

We opted to use AWS Step Functions to build the backbone of our recommendations inference pipeline with customized AWS Lambda functions to pull data from our Amazon Redshift cluster, prepare the datasets for ingestion by Amazon Personalize, and trigger and monitor Amazon Personalize jobs. The following graph illustrates this inference pipeline.

Flexibility in a changing world

Like many companies, our customers changed their habits significantly when the COVID-19 pandemic struck and businesses around the world shifted to work-from-home policies. There was a new demand to increase multi-channel targeting using email advertising campaigns.

Our team received a request to use the recommendation system we built with Amazon Personalize for targeted product email recommendations. Although we had never planned for this, it only took us a week to take our existing serverless inference pipeline and modify it to build, test, and validate an entirely new inference pipeline tuned specifically to email recommendations. Pivoting quickly is always challenging, but our commitment to building scalable and flexible infrastructure allowed us to overcome many of the challenges traditionally faced by teams when managing ML deployments and infrastructure. The following diagram illustrates the architecture of the email inference pipeline.

Despite the short turnaround time, the emails we’ve sent out following these recommendations have performed significantly better than previous baselines.

Looking back, it’s clear to me that we would have had significantly more difficulty meeting this request if we had opted to deploy our custom factorization machine model instead of using Amazon Personalize.

Conclusion

Thermo Fisher is constantly striving to help scientists around the world solve some of our greatest challenges. With Amazon Personalize, we’ve dramatically improved our ability to understand the work our customers do and serve them personalized experiences via multiple channels. Using Amazon Personalize has allowed us to focus on solving difficult problems instead of managing ML infrastructure.


About the Author

David A. Smith is a data scientist for Thermo Fisher Scientific based out of Carlsbad, California. He works with cross-organizational teams to design, build, and deploy automated models to drive customer intelligence and create business value. His interests include NLP, serverless ML, and blockchain technology. Outside of work, you can find David rock climbing, playing tennis, or swimming with his dog.

Read More

How Abyss Solutions Helps Keep Offshore Rig Operators Afloat

How Abyss Solutions Helps Keep Offshore Rig Operators Afloat

As its evocative name suggests, Abyss Solutions is a company taking AI to places where humans can’t — or shouldn’t — go.

The brainchild of four University of Sydney scientists and engineers, six years ago the startup set out to improve the maintenance and observation of industrial equipment.

It began by developing advanced technology to inspect the most difficult to reach assets of urban water infrastructure systems, such as dams, reservoirs, canals, bridges and ship hulls. Later, it zeroed in on an industry that often operates literally in the dark: offshore oil and gas platforms.

Abyss Solutions Lantern Eye output
Abyss Solutions Lantern Eye output.

A few years ago, Abyss CEO Nasir Ahsan and CTO Suchet Bargoti were demonstrating to a Houston-based platform operator the insights they could generate from the image data collected by its underwater Lantern Eye 3D camera. The camera’s sub-millimeter accuracy provides a “way to inspect objects as if you’re taking them out of water,” said Bargoti.

An employee of the operator interrupted the meeting to describe an ongoing problem the company was having with their topside equipment that was decaying and couldn’t be repaired sufficiently. Once it was clear that Abyss could provide detailed insight into the problem and how to solve it, no more selling was needed.

“Every one of these companies is dreading the next Deepwater Horizon,” said Bargoti, referencing the 2010 incident in which BP spilled nearly 5 million barrels of oil into the Gulf of Mexico, killing 11 people and countless wildlife, and costing the company $65 billion in cleanup costs and fines. “What they wanted to know is, ‘Will your data analytics help us understand what to fix and when to fix it?’”

Today, Abyss’s combination of NVIDIA GPU-powered deep learning algorithms, unmanned vehicles and innovative underwater cameras is enabling platform operators to spot faults and anomalies such as corrosion on equipment above and below the water and address it before it fails, potentially saving millions of dollars and even a few human lives.

During the COVID-19 pandemic, the stakes have risen. Offshore rigs have emerged as hotbeds for the spread of the virus, forcing them to adopt strict quarantine procedures that limit the number of people onsite in order to reduce the disease’s spread and minimize interruptions.

Essentially, this has sped up the industry’s digital transformation push and fueled the urgency of Abyss’ work, said Bargoti. “They can’t afford to have these things happening,” he said.

Abyss Solutions corrosion detections
Abyss Solutions corrosion detections.

Better Than Human Performance

Historically, inspection and maintenance of offshore platforms and equipment has been a costly, time-consuming and labor-intensive task for oil and gas companies. It often yields subjective findings that can result in missed needed repairs and unplanned shutdowns.

An independent audit found that Abyss’ semantic segmentation models are able to detect general corrosion with greater than 90 percent accuracy, while severe corrosion is identified with greater than 97 percent accuracy. Both are significant improvements over human efforts, and also have outcompeted other AI companies in the audit.

What’s more, Abyss says that its oil and gas platform clients report reductions in operating costs by as much as 25 percent thanks to its technology.

Training of Abyss’s models, which rely on many terabytes of data (each platform generates about 1TB a day), occurs on AWS instances running NVIDIA T4 Tensor Core GPUs. The company also uses the latest versions of CUDA and cuDNN in conjunction with TensorFlow to power deep learning applications such as image and video segmentation and classification, and object detection.

Bargoti said the company also is working with the NVIDIA Jetson TX2 module and TensorRT software to condense its models so they can run on their unmanned vehicles in real time.

Most of the data can be processed in the cloud because of the slowness of the corrosion process, but there are times when real-time AI is needed onsite, such as when a robotic vehicle needs to make decisions on where to go next.

Taking Full Advantage of Inception

As a member of NVIDIA Inception, a program to help startups working in AI and data science get to market faster, Abyss has benefited from a try-before-you-buy approach to NVIDIA tech. That’s allowed it to experiment with technologies before making big investments.

It’s also getting valuable advice on what’s coming down the pipe and how to time its work with the release of new GPUs. Bargoti said NVIDIA’s regularly advancing technology is helping Abyss squeeze more data into each compute cycle, pushing it closer to its long-term vision.

“We want to be the intel in these unmanned systems that makes smart decisions and pushes the frontier of exploration,” said Bargoti. “It’s all leading to this better development of perception systems, better development of decision-making systems and better development of robotics systems.”

Abyss is taking a deep look at a number of additional markets it believes its technology can help. The team is taking on growth capital and rapidly expanding globally.

“Continuous investment in R&D and innovation plays a critical role in ensuring Abyss can provide game-changing solutions to the industry,” he said.

The post How Abyss Solutions Helps Keep Offshore Rig Operators Afloat appeared first on The Official NVIDIA Blog.

Read More

Amazon Textract now available in Asia Pacific (Mumbai) and EU (Frankfurt) Regions 

Amazon Textract now available in Asia Pacific (Mumbai) and EU (Frankfurt) Regions 

You can now use Amazon Textract, a machine learning (ML) service that quickly and easily extracts text and data from forms and tables in scanned documents, for workloads in the AWS Asia Pacific (Mumbai) and EU (Frankfurt) Regions.

Amazon Textract goes beyond simple optical character recognition (OCR) to identify the contents of fields in forms, information stored in tables, and the context in which the information is presented. The Amazon Textract API supports multiple image formats like scans, PDFs, and photos, and you can use it with other AWS ML services like Amazon Comprehend, Amazon Comprehend Medical, Amazon Augmented AI, and Amazon Translate to derive deeper meaning from the extracted text and data. You can also use this text and data to build smart searches on large archives of documents, or load it into a database for use by applications, such as accounting, auditing, and compliance software.

An in-country infrastructure is critical for customers with data residency requirements and regulations, such as those operating in government, insurance, healthcare, and financial services. With this launch, customers in the AWS Asia Pacific (Mumbai) and EU (Frankfurt) Regions can benefit from Amazon Textract while complying with data residency requirements and integrating with other services and applications available in these Regions.

Perfios is a leading product technology company enabling businesses to aggregate, curate, and analyze structured and unstructured data to help in decision-making. “We have been testing Amazon Textract since its early days and are very excited to see it launch in India to help us address data sovereignty requirements for the region, which now unblocks us to use it at scale,” says Ramgopal Cillanki, Vice President, Head of Engineering at Perfios Software. “We believe that the service will help to transform the banking, financial services, and insurance (BFSI) industry from operations-heavy, human-in-the-loop processes to machine learning-powered API automation with minimal manual operations. Textract will not only help us reduce lenders’ decision-making turnaround time but also create business impact for our end-users in the long run.”

For more information about Amazon Textract and its Region availability, see Amazon Textract FAQs. To get started with Amazon Textract, see the Amazon Textract Developer Guide.


About the Author

Raj Copparapu is a Product Manager focused on putting machine learning in the hands of every developer.

 

 

 

 

Read More

Startup Lunit Uses AI to Help Doctors Prioritize Patients with COVID-19 Symptoms

Startup Lunit Uses AI to Help Doctors Prioritize Patients with COVID-19 Symptoms

Testing for COVID-19 has become more widespread, but addressing the pandemic will require quickly screening for and triaging patients who are experiencing symptoms.

Lunit, a South Korean medical imaging startup — its name is a portmanteau of “learning unit” — has created an AI-based system to detect pneumonia, often present in COVID-19 infected patients, within seconds.

The Lunit INSIGHT CXR system, which is CE marked, uses AI to quickly detect 10 different radiological findings on chest X-rays, including pneumonia and potentially cancerous lung nodules.

It overlays the results onto the X-ray image along with a probability score for the finding. The system also monitors progression of a patient’s condition, automatically tracking changes within a series of chest X-ray images taken over time.

Lunit has recently partnered with GE Healthcare, which launched its Thoracic Care Suite using Lunit INSIGHT CXR’s AI algorithms to flag abnormalities on chest X-rays for radiologists’ review. It’s one of the first collaborations to bring AI from a medical startup to an existing X-ray equipment manufacturer, making AI-based solutions commercially available.

For integration of its algorithms with GE Healthcare and other partners’ products, Lunit’s hardware is powered by NVIDIA Quadro P1000 GPUs, and its AI model is optimized on the NVIDIA Jetson TX2i module. For cloud-based deployment, the company uses NVIDIA drivers and GPUs.

Lunit is a premier member of NVIDIA Inception, a program that helps startups with go-to-market support, expertise and technology. Brandon Suh, CEO of Lunit, said being an Inception partner “has helped position the company as a leader in state-of-the-art technology for social impact.”

AI Opens New Doors in Medicine

The beauty of AI, according to Suh, is its ability to process vast amounts of data and discover patterns — augmenting human ability, in terms of time and energy.

The founders of Lunit, he said, started with nothing but a “crazy obsession with technology” and a vision to use AI to “open a new door for medical practice with increased survival rates and more affordable costs.”

Image courtesy of Lunit.

Initially, Lunit’s products were focused on detecting potentially cancerous nodules in a patient’s lungs or breasts, as well as analyzing pathology tissue slides. However, the COVID-19 outbreak provided an opportunity for the company to upgrade the algorithms being used to help alleviate the burdens of healthcare professionals on the frontlines of the pandemic.

“The definitive diagnosis for COVID-19 involves a polymerase chain reaction test to detect antigens, but the results take 1-2 days to be delivered,” said Suh. “In the meantime, the doctors are left without any clinical evidence that can help them make a decision on triaging the patients.”

With its newly refined algorithm, Lunit INSIGHT CXR can now single out pneumonia and identify it in a patient within seconds, helping doctors make immediate, actionable decisions for those in more urgent need of care.

The Lunit INSIGHT product line, which provides AI analysis for chest X-rays and mammograms, has been commercially deployed and tested in more than 130 sites in countries such as Brazil, France, Indonesia, Italy, Mexico, South Korea and Thailand.

“We feel fortunate to be able to play a part in the battle against COVID-19 with what we do best: developing medical AI solutions,” said Suh. “Though AI’s considered cutting-edge technology today, it could be a norm tomorrow, and we’d like everyone to benefit from a more accurate and efficient way of medical diagnosis and treatment.”

The team at Lunit is at work developing algorithms to use with 3D imaging, in addition to their current 2D ones. They’re also looking to create software that analyzes a tumor’s microenvironment to predict whether a patient would respond to immunotherapy.

Learn more about Lunit at NVIDIA’s healthcare AI startups solutions webinar on August 13. Register here.

The post Startup Lunit Uses AI to Help Doctors Prioritize Patients with COVID-19 Symptoms appeared first on The Official NVIDIA Blog.

Read More

REALM: Integrating Retrieval into Language Representation Models

REALM: Integrating Retrieval into Language Representation Models

Posted by Ming-Wei Chang and Kelvin Guu, Research Scientists, Google Research

Recent advances in natural language processing have largely built upon the power of unsupervised pre-training, which trains general purpose language representation models using a large amount of text, without human annotations or labels. These pre-trained models, such as BERT and RoBERTa, have been shown to memorize a surprising amount of world knowledge, such as “the birthplace of Francesco Bartolomeo Conti”, “the developer of JDK” and “the owner of Border TV”. While the ability to encode knowledge is especially important for certain natural language processing tasks such as question answering, information retrieval and text generation, these models memorize knowledge implicitly — i.e., world knowledge is captured in an abstract way in the model weights — making it difficult to determine what knowledge has been stored and where it is kept in the model. Furthermore, the storage space, and hence the accuracy of the model, is limited by the size of the network. To capture more world knowledge, the standard practice is to train ever-larger networks, which can be prohibitively slow or expensive.

Instead, what if there was a method for pre-training that could access knowledge explicitly, e.g., by referencing an additional large external text corpus, in order to achieve accurate results without increasing the model size or complexity?  For example, a sentence found in an external document collection, “Francesco Bartolomeo Conti was born in Florence,” could be referenced by the model to determine the birthplace of the musician, rather than relying on the model’s opaque ability to access the knowledge stored in its own parameters. The ability to retrieve text containing explicit knowledge such as this would improve the efficiency of pre-training while enabling the model to perform well on knowledge-intensive tasks without using billions of parameters.

In “REALM: Retrieval-Augmented Language Model Pre-Training”, accepted at the 2020 International Conference on Machine Learning, we share a novel paradigm for language model pre-training, which augments a language representation model with a knowledge retriever, allowing REALM models to retrieve textual world knowledge explicitly from raw text documents, instead of memorizing all the knowledge in the model parameters. We have also open sourced the REALM codebase to demonstrate how one can train the retriever and the language representation jointly.

Background: Pre-training Language Representation Models
To understand how standard language representation models memorize world knowledge, one should first review how these models are pre-trained. Since the invention of BERT, the fill-in-the-blank task, called masked language modeling, has been widely used for pre-training language representation models. Given any text with certain words masked out, the task is to fill back the missing words. An example of this task looks like:

I am so thirsty. I need to __ water.

During pre-training, a model will go over a large number of examples and adjust the parameters in order to predict the missing words (answer: drink, in the above example). Interestingly, the fill-in-the-blank task makes the model memorize certain facts about the world. For example, the knowledge of Einstein’s birthplace is required to fill the missing word in the following example:

Einstein was a __-born scientist. (answer: German)

However, because the world knowledge captured by the model is stored in the model weights, it is abstract, making it difficult to understand what information is stored.

Our Proposal: Retrieval-Augmented Language Representation Model Pre-training
In contrast to standard language representation models, REALM augments the language representation model with a knowledge retriever that first retrieves another piece of text from an external document collection as the supporting knowledge — in our experiments, we use the Wikipedia text corpus — and then feeds this supporting text as well as the original text into a language representation model.

The key intuition of REALM is that a retrieval system should improve the model’s ability to fill in missing words. Therefore, a retrieval that provides more context for filling the missing words should be rewarded. If the retrieved information does not help the model make its predictions, it should be discouraged, making room for better retrievals.

How does one train a knowledge retriever, given that only unlabeled text is available during pre-training? It turns out that one can use the task of filling words to train the knowledge retriever indirectly, without any human annotations. Assume the input of the query is:

We paid twenty __ at the Buckingham Palace gift shop.

Filling the missing word (answer:pounds) in this sentence without retrieval can be tricky, as the model would need to have implicitly stored knowledge of the country in which the Buckingham Palace is located and the associated currency, as well as make the connection between the two. It would be easier for the model to fill in the missing word if it was presented with a passage that explicitly connects some of the necessary knowledge, retrieved from an external corpus.

In this example, the retriever would be rewarded for retrieving the following sentence.

Buckingham Palace is the London residence of the British monarchy.

Since the retrieval step needs to add more context, there may be multiple retrieval targets that could be helpful in filling the missing word, for example, “The official currency of the United Kingdom is the Pound.” The whole process is demonstrated in the next figure:

Computational Challenges for REALM
Scaling REALM pre-training such that models can retrieve knowledge from millions of documents is challenging. In REALM, the selection of the best document is formulated as maximum inner product search (MIPS). To perform retrieval, MIPS models need to first encode all of the documents in the collection, such that each document has a corresponding document vector. When an input arrives, it is encoded as a query vector. In MIPS, given a query, the document in the collection that has the maximum inner product value between its document vector and the query vector is retrieved, as shown in the following figure:

In REALM, we use the ScaNN package to conduct MIPS efficiently, which makes finding the maximum inner product value relatively cheap, given that the document vectors are pre-computed. However, if the model parameters were updated during training, it is typically necessary to re-encode the document vectors for the entire collection of documents. To address the computational challenges, we structure the retriever so that the computation performed for each document can be cached and asynchronously updated. We also found that updating document vectors every 500 training steps, instead of every step, is able to achieve good performance and make training tractable.

Applying REALM to Open-domain Question Answering
We evaluate the effectiveness of REALM by applying it to open-domain question answering (Open-QA), one of the most knowledge-intensive tasks in natural language processing. The goal of the task is to answer questions, such as “What is the angle of the equilateral triangle?”

In standard question answering tasks (e.g., SQuAD or Natural Questions), the supporting document is provided as part of input, so a model only needs to look up the answer in the given document. In Open-QA, there are no given documents, so that Open-QA models need to look up the knowledge by themselves — this makes Open-QA an excellent task to examine the effectiveness of REALM.

The following figure shows the results on the OpenQA version of Natural Question. We mainly compared our results with T5, another approach that trains models without annotated supporting documents. From the figure, one can clearly see that REALM pre-training generates very powerful Open-QA models, and even outperforms the much larger T5 (11B) model by almost 4 points, using only a fraction of the parameters (300M).

Conclusion
The release of REALM has helped drive interest in developing end-to-end retrieval-augmented models, including a recent retrieval-augmented generative model. We look forward to the possibility of extending this line of work in several ways, including 1) applying REALM-like methods to new applications that require knowledge-intensive reasoning and interpretable provenance (beyond Open-QA), and 2) exploring the benefits of retrieving other forms of knowledge, such as images, knowledge graph structures, or even text in other languages. We are also excited to see what the research community does with the open source REALM codebase!

Acknowledgements
This work has been a collaborative effort involving Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.

Read More