YoucanBook.me optimizes your apps thanks to Amazon CodeGuru

YoucanBook.me optimizes your apps thanks to Amazon CodeGuru

This is a guest post co-written by Sergio Delgado from YoucanBook.me. In their own words, “YouCanBook.me is a small, independent and fully remote team, who love solving scheduling problems all over the world.”

At YoucanBook.me, we like to say that we’re “a small company that does great things.” Many aspects of our day-to-day culture are derived from such a simple motto, but especially a great emphasis on the efficiency of our operations.

Although we’re far from the first years in which our CTO programmed the entire first version of our SaaS tool, when I joined the company, we were only five developers, of which only three were in charge of backend services, and none were dedicated to it 100%. The daily tasks of a programmer in a startup like ours are incredibly varied, from answering customer support requests to refining the backlog of tasks, defining infrastructure, or helping with requirements. The job is as demanding as it is rewarding, where the challenges never end, but that forces us to seek efficiency in everything we do. A project not very well defined, where we’re not very clear about the way forward and that can take months of research, is a challenge for a team like ours, and we’ll probably postpone it again and again to prioritize more urgent developments that bring value to our customers as soon as possible. For us, it’s very important to extract the maximum benefit from every development we make in the shortest possible time.

The result of this philosophy is our involvement with Amazon Web Services and its different platforms and tools. Although the early versions of our backend services didn’t run on AWS, the migration to the cloud allowed us to stop worrying about managing physical servers in a hosting company and focus our efforts on our business logic and code.

Currently, our backend servers are based on Java technology and run on AWS Elastic Beanstalk, while for the frontend we have a mix of JSP pages and React client applications. In JVMs, we deploy WAR files, which we compile from the source code that we store in AWS CodeCommit repositories using AWS CodeBuild scripts. In addition, all our monitoring is based on Amazon CloudWatch logs and metrics.

An interesting feature is that the different environments (such as development, pre-production, production, and analytics) are completely separate, but we manage them through AWS Organizations. AWS Identity and Access Management (IAM) users are created in a root account, and then assume roles to operate on the rest of them.

With all this, three people manage half-a-dozen services, across four different environments, running in dozens of instances and, quite simply, everything works.

Our problem

Although our services are all developed with Java technology, the truth is that not all of them share the same technology stack. Over the years, they have been migrating to more modern frameworks and standardizing their design, but not all of them have been able to update yet. We were aware that some services had clear performance issues and caused bottlenecks when there were high load spikes, especially the older code based on legacy technologies.

Our short-term solution was to oversize those specific services, with the consequent extra cost, and redeploy them in the long term following the architecture of our most modern applications. But we were sure that we could achieve very fast improvements if we invested in some performance analysis tool, or APM (Application Performance Monitoring). We knew there were many in the market and some of us had experience working with some of them, and good references from others. So we created a performance improvement project on our roadmap, researched a little of which products looked better and … not much more. We never found the time to spend weeks contacting suppliers, installing the tools, analyzing our services during the trial period, and comparing the results. That’s why performance tasks were constantly being postponed, always waiting for some time of the year where we didn’t have much else to do. Which was never going to happen.

Amazon CodeGuru Profiler arrives

One of the good habits we have is being very attentive to all the news of AWS, and we’re usually very quick to test them, especially when they don’t involve changes in our applications’ code.

In addition, relying on AWS products gives us an extra advantage. As a company policy, we love being able to define our security model about IAM users, roles, and permissions, rather than having to create separate accounts and users on other platforms. This rigorous approach to managing access and permissions for users of our infrastructure allows us to regularly undergo security audits and successfully overcome them without investing too much effort for a company of our size. In fact, our safety certifications are one of our differentials from our competitors.

That’s why we immediately recognized the opportunity Amazon CodeGuru Profiler offered us when it was announced at the re:Invent conference in late 2019. On paper, other APM tools we wanted to evaluate seemed to offer more information or a larger set of metrics, but the big question was whether they would be useful to us. What good were reporting screens if we weren’t sure what they meant or if they didn’t offer recommendations that we could implement immediately? Amazon CodeGuru seemed simple, but instead of seeing it as a disadvantage, we had the intuition that it could be a big benefit to us. By testing it, we could analyze the results in hours, not weeks, and find out if it really gave us value when it came to discovering the parts of the code that we needed to optimize.

The best thing about CodeGuru Profiler is that it would take us longer to discuss whether or not to use it than just install it and try it out. A single developer, the infrastructure manager, was able to install the CodeGuru agent on all our JVMs in one afternoon. We ran CodeGuru Profiler directly in the production environment, allowing us to analyze latencies and identify bottlenecks using actual production data, especially after a load peak. We realized that it’s much easier and more realistic for us than simulating a synthetic workload, and there’s no possibility of us defining it incorrectly or under untrue assumptions. All we find in CodeGuru is the authentic behavior of our systems

The following screenshot shows our systems pre-optimization.

The following screenshot shows our systems post-optimization.

Analysis

The flame graphs of CodeGuru Profiler were very easy for us to understand. We simply select the time interval in which we detected a scaling problem or peak workload and, for each server application, we see the Java classes and methods that contributed most to the latencies of our users’ requests. Because our business is based on integrating with different external calendar systems (such as Google, Outlook, or CalDAV) much of that latency is inevitable, but we quickly found two clear strategies for optimizing our code:

  • Identify methods that don’t make requests to third-party systems but nevertheless add significant time to latencies. In these cases, CodeGuru Profiler also offered recommendations to optimize the code and improve its performance.
  • See exactly what percentage of response times were due to which type of requests to the underlying calendars. Some requests (such as creating an event) don’t have much room for improvement, but we did find queries that were done much more frequently than we had estimated, and that could be largely avoided by a more appropriate search policy.

We got down to work and in a couple of weeks, we generated about 15 tickets in our backlog, most of which were deployed in production during the first month. Typically, each ticket requires hours of development, rather than days, and we haven’t undone any of them or identified any false positives in CodeGuru’s recommendations.

Aftermath

We optimized our oldest and worst-performing service to reduce its latency by 15% by the 95th percentile on a typical working day. In addition, our response time graphs are much flatter than before, because we eliminated latency spikes that occurred semi-regularly (see the following screenshot).

The improvement is such that, in one of the last peak loads we had on the platform, this service was no longer the bottleneck of the system. It supported all requests without problem or blocking the rest of our APIs.

This has saved us not only the cost of extra instances we no longer need (which we had running just to service in these scenarios), but dozens of work-hours in deeper refactoring over legacy code, which was just what we were trying to avoid.

Another of our backend services, which typically holds a very high workload during business hours, has improved further, reducing latency by up to 40%. In fact, on one occasion, we introduced an error in the configuration of our autoscaling and reduced the number of execution instances to only one machine. It took us a couple of hours to realize our failure because that single instance could handle all our users’ requests without any problem!

The future

Our use of CodeGuru Profiler is very simple, but it has been tremendously valuable to us. In the medium term, we’re thinking of sampling part of our servers or user requests instead of analyzing all production traffic, for efficiency. However, it’s not too urgent for us because our services are working perfectly well with performance analytics enabled, and the impact on response times for our users is imperceptible.

How long do we plan to have CodeGuru Profiler activated? The answer is clear: indefinitely. Improving problematic parts of our services that we more or less already knew is a very good result, but the visibility that it can offer us in future peak loads is extraordinarily valuable. Because, let’s not fool ourselves, we removed several bottlenecks but will have hidden ones, or introduce them with new developments. With CloudWatch metrics and alarms, we can detect when this happens and know what happened, but CodeGuru helps us know why.

If you have a problem similar to ours, or want to prevent it, we invite you to become more familiar with CodeGuru.

About YoucanBook.me

YoucanBook.me allows you to schedule meetings online for your business or team, any size. It Eliminates the need to search for free spaces by sending and answering emails, allowing your client to create the appointment directly in your calendar.

Since its inception in 2011, our company remains small, efficient, self-financing, 100% remote, and dedicated to solving agenda issues for users around the world. With just 15 employees from the UK, Spain, and the United States, we serve tens of thousands of customers, managing more than one million meetings each month.


About the authors

Sergio Delgado defines himself as a programmer of vocation. In 25 years of developing software, he has gone through doing C++ video games, slot machines in Java, e-learning platforms in PHP, fighting with Dreamweaver, automating calls in a call center, and running an R&D department. One day he started working in the cloud, and he no longer wants to go back to earth. He’s currently the leader of the Engineering team and backend architect at YoucanBook.me. He collaborates with the community, giving talks in meetups or interviews in various podcasts, and can be found on LinkedIn.

 

 

Rodney Bozo is an AWS Solutions Architect who has over 20 years of experience supporting customers with on-premises resource management as well as offering cloud-based solutions.

 

Read More

Infoblox Inc. built a patent-pending homograph attack detection model for DNS with Amazon SageMaker

Infoblox Inc. built a patent-pending homograph attack detection model for DNS with Amazon SageMaker

This post is co-written by Femi Olumofin, an analytics architect at Infoblox.

In the same way that you can conveniently recognize someone by name instead of government-issued ID or telephone number, the Domain Name System (DNS) provides a convenient means for naming and reaching internet services or resources behind IP addresses. The pervasiveness of DNS, its mission-critical role for network connectivity, and the fact that most network security policies often fail to monitor network traffic using UDP port 53 make DNS attractive to malicious actors. Some of the most well-known DNS-based security threats implement malware command and control communications (C&C), data exfiltration, fast flux, and domain generated algorithms, knowing that traditional security solutions can’t detect them.

For more than two decades, Infoblox has operated as a leading provider of technologies and services to manage and secure the networking core, namely DNS, DHCP, and IP address management (collectively known as DDI). Over 8,000 customers, including more than a third of Fortune 500, depend on Infoblox to reliably automate, manage, and secure their on-premises, cloud, and hybrid networks.

Over the past 5 years, Infoblox has used AWS to build its SaaS services and help customers extend their DDI services from physical on-premises appliances to the cloud. The focus of this post is how Infoblox used Amazon SageMaker and other AWS services to build a DNS security analytics service to detect abuse, defection, and impersonation of customer brands.

The detection of customer brands or domain names targeted by socially engineered attacks has emerged as a crucial requirement for the security analytic services offered to customers. In the DNS context, a homograph is a domain name that’s visually similar to another domain name, called a target. Malicious actors create homographs to impersonate highly-valued domain name targets and use them to drop malware, phish user information, attack the reputation of a brand, and so on. Unsuspecting users can’t readily distinguish homographs from legitimate domains. In some cases, homographs and target domains are indistinguishable from a mere visual comparison.

Infoblox’s Challenge

A traditional domain name is composed of digits, letters, and the hyphen characters from the ASCII character encoding scheme, which comprises 128 code points (or possible characters), or from the Extended ASCII, which comprises 256 code points. Internationalized domain names (IDNs) are domain names that also enable the usage of Unicode characters, or can be written in languages that either use Latin letters with ligatures or diacritics (such as é or ü), or don’t use the Latin alphabet at all. IDNs offer extensive alphabets for most writing systems and languages, and allow you to access the internet in your own language. Similarly, because internet usage is rising around the world, IDNs offer a great way for anyone to connect with their target market no matter what language they speak. To enable that many languages, every IDN is represented in Punycode, consisting of a set of ASCII characters. For example, amāzon.com would become xn--amzon-gwa.com. Subsequently, every IDN domain is translated into ASCII for compatibility with DNS, which determines how domain names are transformed into IP addresses.

IDNs, in short, make the internet more accessible to everyone. However, they attract fraudsters that try to substitute some of those characters with identical-looking imitations and redirect us to fake domains. The process is known as homograph attack, which uses Unicode characters to create fake domains that are indistinguishable from targets, such as pɑypal.com for paypal.com (Latin Small Letter Alpha ‘ɑ’ [U+0251]). These look identical at first glance; however, at a closer inspection, you can see the difference: pɑypal.com for paypal.com.

The most common homograph domains construction methods are:

  • IDN homographs using Unicode chars (such as replacing “a” with “ɑ”)
  • Multi-letter homoglyph (such as replacing “m” with “rn“)
  • Character substitution (such as replacing “I” with “l”)
  • Punycode spoofing (for example, 㿝㿞㿙㿗[.]com encodes as xn--kindle[.]com, and 䕮䕵䕶䕱[.]com as xn—google[.]com)

Interestingly, homograph attacks go beyond DNS attacks, and are currently used to obfuscate process names on operating systems, or bypass plagiarism detection and phishing systems. Given that many of Infoblox’s customers were concerned about homograph attacks, the team embarked on creating a machine learning (ML)-based solution with Amazon SageMaker.

From a business perspective, dealing with homograph attacks can divert precious resources from an organization. A common method to deal with domain name impersonation and homograph attacks is to beat malicious actors by pre-registering hundreds of domains that are potential homographs of their brands. Unfortunately, such mitigation can only be effective for limited attackers because a much larger number of plausible-looking homographs are still available for an attack. With Infoblox IDN homographs detector, we have observed IDN homographs in 43 of Alexa’s top 50 domain names and for financial services and cryptocurrency domain names. The following table shows a few examples.

Solution

Traditional approaches to the homograph attack problem are based on string distance computation, and while some deep learning ones have started to appear, they predominantly aim to classify whole domain names. Infoblox solved this challenge by aiming at the per character identification standpoint of the domain. Each character is then processed using image recognition techniques, which allowed Infoblox to exploit the glyphs (or visual shape) of the Unicode characters, instead of relying on their code points, which are mere numerical values that make up the code space in character encoding terminology.

Following this approach, Infoblox reached a 96.9% accuracy rate for the classifier detecting Unicode characters that look like ASCII characters. The detection process requires a single offline prediction, unlike existing deep learning approaches that require repeated online prediction. It has fewer false positives when compared with the methods that rely on distance computation between strings.

Infoblox used Amazon SageMaker to build two components:

  • An offline identification of Unicode character homographs based on a CNN classifier. This model takes the images and labels of the ASCII characters of interest (such as the subset used for domain names) and outputs them to a Unicode map, which is rebuilt every time after each new release of the Unicode standard.
  • An online detection of domain name homographs taking a target domain list and an input DNS stream and generating homographs detections.

The following diagram illustrates how the overall detection process uses these two components.

In this diagram, each character is rendered with a 28 x 28 pixel image. In addition, each character from the train and test set is associated to the closest-looking ASCII character (which is its label).

The remainder of this post dives deeper into the solution to discuss the following:

  • Building the training data for the classifier
  • The classifier’s CNN architecture
  • Model evaluation
  • The online detection model

Building the training data for the classifier

To build the classifier, Infoblox wrote some code to assemble training data in an MNIST-like format. The Modified National Institute of Standards and Technology (MNIST) issued a large handwritten digit images database, which has been used as the Hello World for any deep learning computer vision practitioner. Each image has a dimension of 28 x 28 pixels. Infoblox’s code used the following assets to create variations of each character:

  • The Unicode standard list of visually confusable characters (the latest version is 13.0.0), along with their security considerations, which allow developers to act appropriately and steer away from visual spoofing attacks.
  • The Unicode standard block that contains the most common combining characters in a diacritical marks block. For instance, in the following chart from the Wikipedia entry Combining Diacritical Marks, you can find the U+300 block where the U+030x row crosses the 0 column; U+300 appears to be the grave accent, because you can also find in the “è” character in the French language. Some combining diacritics were left aside for building the training set because they were less conspicuous from a homograph attack perspective (for example, U+0363). For more information, see Combining Diacritical Marks on the Unicode website.
  • Multiple font typefaces, which attackers can use for malicious rendering and to radically transform the shapes of characters. For instance, Infoblox used multiple fonts from a local system, but can also add third-party fonts (such as Google Fonts) with the caveat that script fonts should be excluded. Using different fonts to generate many variations of each character acts as a powerful image augmentation technique for this use case: at this stage, Infoblox settled for 65 fonts to generate the training set. This number of fonts is sufficient to build a consistent training set that yields a decent accuracy. Using less fonts didn’t create enough representation for each character, and using more than these 65 fonts didn’t significantly improve the model accuracy.

In the future, Infoblox intends to use some data augmentation techniques (translate, scale, and shear operations, for instance) to further improve the robustness of their ML models. Indeed, each deep learning framework SDK offers rich data augmentations features that can be included in the data preparation pipeline.

CNN architecture of the classifier

When the training set was ready and with little to no learning curve to train a model on Amazon SageMaker, Infoblox started building a classifier based on the following CNN architecture.

This CNN neural network is built around two successive CONV-POOL cells followed by the classifier. The convolution section automatically extracts features from the input images, and the classification section uses these features to map (classify) the input images to the ASCII character map. The last layer converts the output of the classification network into a vector of probabilities for each class (such as ASCII character) in the input.

Infoblox had already started to build a TensorFlow model and was able to bring it into Amazon SageMaker. From there, they used multiple Amazon SageMaker features to accelerate or facilitate model development:

  • Support for distributed training with CPU and GPU instances – Infoblox mainly used ml.c4.xlarge (compute) and ml.p2.xlarge (GPU) instances. Although each training didn’t last long (approximately 20 minutes), each hyperparameter tuning job could span more than 7 hours because of the number of parameters and the granularity of their search space. Distributing the workload on many instances in the background without caring for any infrastructure consideration was key.
  • The ability to train, deploy and test predictions right from the notebook environment – From the same environment used to explore and prepare the data, Infoblox used Amazon SageMaker to transparently launch and manage training clusters and inference endpoints. These infrastructures are independent from the Amazon SageMaker notebook instance and are fully managed by the service.

Getting started was easy thanks to the existing documentation and many example notebooks made available by AWS on their public GitHub repo or directly from within the Amazon SageMaker notebook environment.

They started to test a TensorFlow training script locally in Amazon SageMaker with a few lines of code. Training in local mode had the following benefits:

  • Infoblox could easily monitor metrics (like GPU consumption), and ensure that the code written was actually taking advantage of the hardware that they would use during training jobs
  • While debugging, changes to the training and inference scripts were taken into account instantaneously, making iterating on the code much easier
  • There was no need to wait for Amazon SageMaker to provision a training cluster, and the script could run instantaneously

Having the flexibility to work in local mode in Amazon SageMaker was key to easily porting the existing work to the cloud. You can also prototype your inference code locally by deploying the Amazon SageMaker TensorFlow serving container on the local instance. When you’re happy with the model and training behavior, you can switch to a distributed training and inference by changing just a few lines of code so you create a new estimator, optimize the model, or even deploy the trained artifacts to a persistent endpoint.

After completing the data preparation and training process using the local mode, Infoblox started tuning the model in the cloud. This phase started with a coarse set of parameters that were gradually refined through several tuning jobs. During this phase, Infoblox used Amazon SageMaker hyperparameter tuning to help them select the best hyperparameter values. The following hyperparameters appeared to have the highest impact on the model performance:

  • Learning rate
  • Dropout rate (regularization)
  • Kernel dimensions of the convolution layers

When the model was optimized and reached the required accuracy and F1-score performance, the Infoblox team deployed the artifacts to an Amazon SageMaker endpoint. For added security, Amazon SageMaker endpoints are deployed in isolated dedicated instances, and as such, they need to be provisioned and are ready to serve new predictions after a few minutes.

Having the right or cleansed train, validation, and test sets was most important when trying to reach a decent accuracy. For instance, to select the 65 fonts of the training sets, the Infoblox team printed out the available fonts they had on their workstation and reviewed them manually to select the most relevant fonts.

Model evaluation

Infoblox used accuracy and the F1-score as the main metrics to evaluate the performance of the CNN classifier.

Accuracy is the fraction of homographs the model got right. It’s defined as the number of correct predictions detected over the total number of predictions the model generated. Infoblox achieved an accuracy greater than 96.9% (to put it another way, out of 1000 predictions made by the model, 969 were correctly classified as either homographs or not).

Two other important metrics for a classification problem are the precision and the recall.

Precision is defined as a ratio between the number of true positives and the total of true positives and false positives:

Recall is defined as the ratio between the number of true positives over the total of true positives and false negative:

Infoblox made use of a combined metric, the F1-score, which takes a harmonic mean between precision and recall. This helps the model strike a good balance between these two metrics.

From a business impact perspective, the preference is to minimize false negatives over false positives. The impact of a false negative is missed detections, which you can mitigate with an ensemble of classifiers. False positives have a direct negative effect on end-users, especially when you configure a block response policy action for DNS resolution of homographs in detector results.

Online detection model

The following diagram illustrates the architecture of the online detection model.

The online model uses the following AWS components:

  • Amazon Simple Storage Service (Amazon S3) stores train and test sets (1), Unicode glyphs (1), passive datasets, historical data, and model artifacts (3).
  • Amazon SageMaker trains the CNN model (2) and delivers offline inference with the homograph classifier (4). The output is the ASCII to Unicode map (5).
  • AWS Data Pipeline runs the batch detection pipeline (6) and manages the Amazon EMR clusters (creating them and submitting the different steps of the processing until shutdown).
  • Amazon EMR runs ETL jobs for both batch and streaming pipelines.
    • The batch pipeline reads input data from Amazon S3 (loading a list of targets and reading passive DNS data (7)), applies some ETL (8), and makes them available to the online detection system (10).
    • The online detection system is a streaming pipeline applying the same kind of transformation (10), but gets additional data by subscribing to an Apache Kafka broker (11).
  • Amazon DynamoDB (a NoSQL database) stores very detailed detection data (12) coming from the detection algorithm (the online system). Heavy writing is the main access pattern used here (large datasets and infrequent read requirement).
  • Amazon RDS for PostgreSQL stores a subset of the detection results at a higher level with a brief description of the results (13). Infoblox found Amazon RDS to be very suitable for storing a subset of the detection results that require high frequency read access for their use case while keeping cost under control.
  • AWS Lambda functions orchestrate and connect the different components of the architecture.

The overall architecture also follows AWS best practices with Amazon Virtual Private Cloud (Amazon VPC), Elastic Load Balancing, and Amazon Elastic Block Store (Amazon EBS).

Conclusion

The Infoblox team used Amazon SageMaker to train a deep CNN model that identifies Unicode characters that are visually similar to ASCII characters of DNS domains. The model was subsequently used to identify homograph characters from the Unicode standard with 0.969 validation accuracy and 0.969 test F1 score. Then they wrote a detector to use the model predictions to detect IDN homographs over passive DNS traffic without online image digitization or prediction. As of this writing, the detector has identified over 60 million resolutions of homograph domains, some of which are related to online campaigns to abuse popular online brands. There are more than 500 thousand unique homographs, among 60 thousand different brands. It has also identified attacks across 100 industries, with the majority (approximately 49%) aiming at financial services domains.

IDNs inadvertently allow attackers more creative ways to form homograph domains beyond what brand owners can anticipate. Organizations should consider DNS activities monitoring for homographs and not rely solely on pre-registering a shortlist of homograph domains for brand protection.

The following screenshots show examples of homograph domain webpage content compared to the domain they attempt to impersonate. We show the content of a homograph domain on the left and the real domain on the right.

Amazon: xn--amzon-hra.de => amäzon.de vs. amazon.de. Notice the empty area on the homograph domain page.

 

Google: xn--goog-8va3s.com => googļę.com vs. google.com. There is a top menu bar on the homograph domain page.

Facebook: xn--faebook-35a.com => faċebook.com vs. facebook.com. The difference between the login pages is not readily apparent unless we view them side-by-side.


About the authors

Femi Olumofin is an analytics architect at Infoblox, where he leads a company-wide effort to bring AI/ML models from research to production at scale. His expertise is in security analytics and big data pipelines architecture and implementation, machine learning models exploration and delivery, and privacy enhancing technologies. He received his Ph.D. in Computer Science from the University of Waterloo in Canada. In his spare time, Femi enjoys cycling, hiking, and reading.

 

 

Michaël Hoarau is an AI/ML specialist solution architect at AWS who alternates between data scientist and machine learning architect, depending on the moment. He has worked on a wide range of ML use cases, ranging from anomaly detection to predictive product quality or manufacturing optimization. When not helping customers develop the next best machine learning experiences, he enjoys observing the stars, traveling, or playing the piano.

 

 

Kosti Vasilakakis is a Sr. Business Development Manager for Amazon SageMaker, AWS’s fully managed service for end-to-end machine learning, and he focuses on helping financial services and technology companies achieve more with ML. He spearheads curated workshops, hands-on guidance sessions, and pre-packaged open-source solutions to ensure that customers build better ML models quicker, and safer. Outside of work, he enjoys traveling the world, philosophizing, and playing tennis.

 

 

 

Read More

Query drug adverse effects and recalls based on natural language using Amazon Comprehend Medical

Query drug adverse effects and recalls based on natural language using Amazon Comprehend Medical

In this post, we demonstrate how to use Amazon Comprehend Medical to extract medication names and medical conditions to monitor drug safety and adverse events. Amazon Comprehend Medical is a natural language processing (NLP) service that uses machine learning (ML) to easily extract relevant medical information from unstructured text. We query the OpenFDA API (an open-source API published by the FDA) and Clinicaltrials.gov API (another open-source API published by the National Library of Medicine (NLM) at the National Institutes of Health (NIH)) to get information on past adverse events, recalls, and clinical trials for the drug or medical condition in question. You can then use this data in population scale studies to further analyze the drug’s safety and efficacy.

Launching a new drug is an extensive process. By some estimates, it takes about 12 years to go from invention to launch. It involves various stages like preclinical testing, phase 1–3 clinical trials, and approvals by the Food and Drug Administration (FDA).In addition, new drugs require huge financial investments by pharmaceutical organizations. According to a new study published in JAMA Network, the median cost of bringing a drug to market is $918 million, with the range being between $314 million–$2.8 billion.

Even after launch, pharmaceutical companies continuously monitor for safety risks. Consumers can also directly report adverse drug reactions to the FDA. This could result in a drug recall, thereby jeopardizing millions of development dollars. Moreover, consumers who are taking these drugs and clinicians who are prescribing them need to be aware of such adverse reactions and decide whether corrective actions are necessary.

While no investment is guaranteed, drug manufacturers are starting to rely more on ML to achieve better outcomes and improve the chances of market success for new drugs they develop.

How can machine learning help?

To ensure drug safety, the FDA uses real-world data (RWD) and real-world evidence (RWE) to monitor post-market drug safety and adverse events. For more information, see real-world data (RWD) and real-world evidence (RWE) are playing an increasing role in health care decisions. This is also useful for healthcare professionals who develop guidelines and decision support tools based on RWD. Drug manufacturers can benefit from RWD analysis and use it to develop improved clinical trial designs and come up with new and innovative treatment approaches.

One of the major challenges with analyzing RWD effectively is that a lot of this data is unstructured—it doesn’t get stored in rows and columns that make it friendly to analytical queries. RWD can exist in multiple formats and span a variety of sources. It’s impracticable to use conventional analytical techniques to process unstructured data at the scale of a population. For more information, see Building a Real World Evidence Platform on AWS.

Advances in natural language processing (NLP) can help fill this gap. For example, you can use models trained on RWD to derive key entities (like medications and medical conditions) from adverse reactions reported by patients in natural language. After you extract these entities, you can store them in a database and integrate them into a variety of reporting applications. You can use them in population scale studies to determine cohorts susceptible to certain drugs or to analyze the drug’s safety and efficacy.

Solution architecture

The following diagram represents the overall architecture of the solution. In addition to Amazon Comprehend Medical, you use the following services:

The architecture includes the following steps:

  1. The demo solution is a simple html page which will be served via a lambda function on the first invocation of the api gateway url. The url will be in the output section of CloudFormation stack or it can be grabbed from api gateway.
  2. The submit buttons on the url will asynchronously invoke 2 other lambdas via apigateway
  3. The 2 Lambdas will use a common layer function to vet the free text entered by user by Comprehend Medical and return medication and medical conditions.
  4. The lambda functions process the entities from Comprehend Medical to query open source api’s clinicaltrail.gov and open.fda.gov. The HTML would render the output from these lambdas into respective tables

Prerequisites

To complete this walkthrough, you must have the following prerequisites:

Configuring the CloudFormation stack

To configure your CloudFormation stack, complete the following steps:

  1. Sign in to the Amazon Management Console.
  2. Choose us-east-1 as your Region.
  3. Launch the CloudFormation stack:
  4. Choose Next.
  5. For Stack name, enter a name; for example, drugsearch.
  6. In the Parameters section, update the API Gateway names as necessary.
  7. Provide the name of an S3 bucket in us-east-1 to store the CSV files.
  8. Choose Next.
  9. Select I acknowledge that AWS CloudFormation might create IAM resources.
  10. Choose Create stack.

The stack takes a few minutes to complete.

  1. On the Outputs tab, record the URL for the API Gateway.

Searching for information related to drugs and medical conditions

When you open the URL from the previous step, you can enter text related to drugs and medical conditions and choose Submit.

The output shows three tables with the following information:

  • Adverse effects of the related drugs and symptoms – This information is queried from clinicaltrial.gov, and records are limited to a maximum of 10.
  • Drug recall-related information – This information is queried from open.fda.gov, and records is limited to a maximum of 5 for every drug and symptom.
  • Clinical trials for the related symptoms and drugs – This information is queried from clinicaltrial.gov.

In addition to the tables, the page displays two hyperlinks to download clinical trial information and the OpenFDA in a CSV file. These files have a maximum of 100 records for clinical trials and 100 for every drug and medical condition in OpenFDA.

Conclusion

This post demonstrated a simple application that allows drug manufacturers, healthcare professionals, and consumers to look up useful information from trusted sources like the FDA and NIH. Using this architecture and the available code base, you can integrate this solution into other downstream applications related to the analysis and reporting of adverse events. We hope this lowers the barrier of entry and increases adoption of ML to improve patient outcomes and improve quality of care.


About the authors

Varad Ram is Senior Solutions Architect in Partner Team at Amazon Web Services. He likes to help customers adopt to cloud technologies and is particularly interested in artificial intelligence. He believes deep learning will power future technology growth. In his spare time, his daughter and son keep him busy biking and hiking.

 

 

 

Ujjwal Ratan is Principal Machine Learning Specialist Solution Architect in the Global Healthcare and Lifesciences team at Amazon Web Services. He works on the application of machine learning and deep learning to real world industry problems like medical imaging, unstructured clinical text, genomics, precision medicine, clinical trials and quality of care improvement. He has expertise in scaling machine learning/deep learning algorithms on the AWS cloud for accelerated training and inference. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.

 

 

Babu Srinivasan is Senior cloud architect at Deloitte. He works closely with customers in building scalable and resilient cloud-based architectures and accelerate the adoption of AWS cloud to solve business problems. Babu is also an APN (AWS Partner Network) Ambassador, passionate about sharing his AWS technical expertise with the technical community. In his spare time, Babu loves to spend time performing close-up card magic to friends and colleagues, wood turning in his garage woodshop or working on his AWS DeepRacer car.

 

 

Read More

Building a scalable outbound call engine using Amazon Connect and Amazon Lex

Building a scalable outbound call engine using Amazon Connect and Amazon Lex

­

This is a guest post by AWS Machine Learning Hero Cyrus Wong.

Staying connected with family, friends, and colleagues is easy for most people who live with or close to others. For educators who need to communicate lessons and schedules with their students, or businesses who communicate with new and existing customers, staying connected can be hard, especially in times of crisis and isolation.

Specifically, I wanted to make remote communication between educators and students easier. Communicating time-sensitive information and confirming that students received messages can be hard; scaling communication from tens to thousands of students can make the problem more complex, impacting educator and student productivity, time, and overall experience.

To meet this challenge, I developed Callouts, a simple, consistent, and scalable solution for educators to communicate with students a using Amazon Connect and Amazon Lex. In crisis times, such as a quarantine, this solution helps educators use an automated bot that calls students to communicate important messages, such as schedule changes, general announcements, and attendance confirmation.

Even if the resulting calls are similar, building scalable contact flows and chatbots can take time. By generalizing a survey-like call job to contact multiple recipients in parallel, Callouts makes it easy for developers to create sophisticated conversational experiences. Non-technical users who may find this intimidating can simply upload an Excel file into an Amazon Simple Storage Service (Amazon S3) bucket to trigger an automatic process that ultimately results in the AI agent calling multiple recipients at the same time.

Architecture and design

Callouts uses AWS Serverless Application Model (AWS SAM), an open-source framework for building serverless applications. It offers a syntax designed specifically for expressing serverless resources.

The following diagram illustrates the architecture.

The goal of the architecture is to allow non-technical users to define a “call job” and request execution without having to write code. The user creates an Excel file that contains their call tasks (for example, “You now have until Friday to submit your homework.”) and uploads it to Amazon S3. This triggers CreateExcelCallJobFunction, which converts the Excel file into a JSON message and sends it to an Amazon Simple Queue Service (Amazon SQS) FIFO queue (CallSqsQueue). An AWS Lambda function connected to the SQS queue processes the incoming messages, creates individual call task data, uploads the data to an S3 bucket, and starts the CalloutStateMachine AWS Step Functions task. The individual job data is saved and loaded from Amazon S3 to prevent sending an oversized payload to the start execution API. The ReservedConcurrentExecutions value of StartCallOutFlowFunction is set to 1 to make sure only one job goes to the state machine at a time.

This design allows other systems to create a call job by sending the defined data message to the SQS queue directly.

CalloutStateMachine

The following diagram shows the CalloutStateMachine workflow.

  1. One callout job includes a set of callout tasks, which calls one receiver.
  2. The callout task proceeds with dynamic parallelism. For more information, see New – Step Functions Support for Dynamic Parallelism.
  3. The step “Get Callout task” is a Lambda function to get the call task JSON from ExcelCallTaskBucket.
  4. The step “Callout with Amazon Connect” sends the message to AsynCalloutQueue.
  5. This step waits for the callback with task token or “Call Timeout.” For more information, see Call Amazon SQS with Step Functions.
  6. “Get Call Result” combines call results and generates an Excel call report.
  7. A completion message goes to the SNS topic.

This pattern lets the Amazon Connect contact flow use the SendTaskSuccess API to provide a call result for each outbound call with the task token. If “Callout with AWS Connect” can’t call back within 5 minutes by default, then the job goes through the “Call Timeout” state. For longer communications that may need more time to complete, you can change the AWS CloudFormation TimeoutSeconds parameter.

“Save call result” saves the call result to CallResultDynamoDBTable. “Get Call Result” retrieves all call results from CallResultDynamoDBTable, generates the call report, and uploads the report to the CallReportBucket.

Finally, the job publishes a message to the CallJobCompletion SNS topic. The message contains the task ID, bucket name, Excel report key, JSON report key, and a pre-assigned URL of the Excel report.

You can create an email subscription to the SNS topic to get a notification message upon completion of the call job. See the following screenshot for an example.

Callouts with Amazon Connect

The following diagram shows the architecture of Callouts with Amazon Connect.

  1. The “Callout with Amazon Connect” step of the CalloutStateMachine workflow sends an individual call task JSON to the SQS FIFO queue AsynCalloutQueue.
  2. A SQS new message event triggers CalloutFunction.
  3. CalloutFunction sends the call task to a “Calling out” contact flow.
  4. If a phone number isn’t accessible, the function calls back to CalloutStateMachine through the SendTaskSuccess API with the status NotCallable. (If one of the phone calls fails and the function calls the SendTaskFailure API, the whole workflow fails. The workflow has to continue even if some calls fail.)
  5. The “Calling out” contact flow interacts with three Lambda functions. SendTaskSuccessFunction calls to CalloutStateMachine with the status CallCompleted when the “Calling out” contact flow is successfully complete.

“Calling out” contact flow

The following diagram illustrates the “Calling out” contact flow.

Although the contact flow may seem complicated, the logic is straightforward.

“Calling out” architecture

The following diagram illustrates the “Calling out” architecture.

The following are a few highlights:

  • Enabling logging is very useful for call debugging, and you can use CloudWatch Logs Insights to trace a call from the log stream. For more information, see Analyzing Log Data with CloudWatch Logs Insights. For example, see the following code:
    fields @timestamp, @message
    | filter @message like "Specific Contact Flow Id"
    | sort @timestamp asc

  • You can invoke Lambda functions to do anything, such as saving data into Amazon DynamoDB and checking information from databases. For more information, see Invoke AWS Lambda Functions.
  • The “Get Customer Input” block redirects the call to Amazon Lex and returns the intent name and intent slot value for number, date, and time question types. For more information, see Create a Contact Flow and Add Your Amazon Lex Bot.
  • The chat flow omits the possibility of error, and all errors in Lambda or Amazon Lex end the call. Additionally, all error handlers of contact flow blocks connect to the Disconnect/Hang Up block.
  • SendTaskSuccessFunction calls the SendTaskSuccess API with status CallCompleted to finish the “Callout with AWS Connect” step.

Amazon Lex CalloutBot

This chatbot contains a set of intents that captures simple answers such as yes or no, letters, numbers, dates, and times; they don’t need a slot. The ExcelLexBot engine creates slots for each answer. For more sample flows and sample utterances, see Build an Amazon Lex Chatbot with Microsoft Excel and Building Better Bots Using Amazon Lex (Part 1).

This project contains four chatbots built on Amazon Lex to handle different scenarios: CalloutBot, CalloutBotDate, CalloutBotNumber, and CalloutBotTime.

The following conversation shows the question model:

Contact Flow: Play a question based on question_template and receiver data.
Chatbot Agent: Wait for answer in question_type.
User: Answer in question_type.

CalloutBot contains OkIntent, YesIntent, NoIntent, AIntent, BIntent, CIntent, DIntent, and EIntent. All the intents have a set of sample utterances, and Amazon Connect uses the intent name to capture the user’s answer. The following screenshots show the details in Excel for OkIntent and AIntent.

BIntent, CIntent, DIntent, and EIntent are all similar to AIntent. The bot uses those eight intents to handle OK, Yes/No, and multiple choice question types.

CalloutBotDate contains a DateIntent to solicit date information from the caller, receiver, or user; for example, for an appointment. See the following screenshot.

CalloutBotNumber contains a NumberIntent to solicit number information from the caller, receiver, or user; for example, for the number of attempts. See the following screenshot.

CalloutBotTime contains a TimeIntent to solicit time information from the caller, receiver, or user; for example, appointment information. See the following screenshot.

All chatbots contain AMAZONFallbackIntent and AMAZONRepeatIntent for the built-in intents FallbackIntent and RepeatIntent. The system uses built-in intents to capture the repeat and the message the chatbot can’t understand with a fallback intent. The contact flow repeats the question for the repeat or fallback intent captured. For more information, see Managing conversation flow with a fallback intent on Amazon Lex.

ExcelLexBot creates a DynamoDB table per intent to save each user answer and all intent history. For example, the following screenshot shows that after the receiver answered OK, you can find the record in the CalloutBotOkIntent table.

ContactId can help you to trace each call.

Call job Excel format

Non-technical users can also use Excel to create a call job. There are three Excel sheets: Configures, Questions, and Receivers. They are straightforward and don’t require any programming knowledge, and users need to fill in all three sheets for a call job. Developers can integrate Callouts into other systems by sending a JSON object into CallSqsQueue programmatically.

Receivers sheet

The list of receivers contains the following information:

  • id – A unique identifier for each user.
  • phone_number – The user’s phone number. If you specify the phone_prefix in the Configures sheet, you don’t need to add the country code here.
  • Additional columns – Optional columns for your message. For example, the following table includes the additional column username.
id phone_number username
1 12345678 Cyrus
2 89654201 Cyrus Wong

Configures sheet

The Configures sheet contains the following information for the common settings of this call job:

  • greeting – Greeting message for the call
  • ending – Closing message for the call
  • phone_prefix – International subscriber dialing (ISD) code for each phone call (the + is optional)

The following table shows example values in a Configures sheet.

Key Value
greeting Hi {{ username }}, This is a simple survey.
ending Good Bye {{ username }} and have a nice day!
phone_prefix +852

Questions sheet

Each row in the Questions sheet represents one question. There are two columns:

  • question_template – A Jinja 2 template generates output for each row receiver
  • question_type – This contains the following question types:
    • OK – Captures the answer “OK,” which is for the reminder use case
    • Yes/No – Captures the answer “Yes” or “No”
    • Multiple Choice – Captures the answer “A,” “B,” “C,” “D,” or “E”
    • Date – Captures DATE
    • Time – Captures TIME
    • Number – Captures NUMBER

The following table shows example values of question_template and question_type.

question_template question_type
Are you using Amazon Connect? Yes/No
How do you first hear about Amazon Connect? A. Newsletter, B. Social Media, C. AWS Event, D. AWS Website, or E. From Friend. Multiple Choice
How many applications do you use with Amazon Connect? Number
When should we call you back? Date
Preferred call back time? Time
In this demo, I want you to say OK. OK

If you just want to make a call, remove all rows and keep the header.

The question_template, greeting, and ending columns are in the Jinja template and generate an output with each receiver row. All messages use Amazon Connect SSML and are embedded with <speak>message</speak>, so you must not add the speak tag. For more information, see SSML in Amazon Connect Contact Flows.

Call report in Excel

The following screenshot is an example of an Excel call report.

The report contains the following columns:

  • task_id – The Excel file name. If you upload the same file again, you overwrite the result.
  • receiver_id – The receiver ID from the Receivers sheet.
  • call_at – The call start time.
  • status – The status of the call. There are two values:
    • CallCompleted – The receiver picked up the call and answered all the questions.
    • DropCall – The receiver either didn’t pick up the call or didn’t complete all the questions.
  • error – The entry null means no error or exception message.
  • phone_number – The receiver’s phone number.
  • username – The additional field from the call job from the Receivers sheet. All additional columns are copied to the result report.
  • Question_x – The number in the column name changes with the number of the question and shows the receiver’s answer.

Deploying Callouts

This section provides a walkthrough of deploying Callouts.

Creating an Amazon Connect instance and setting up contact flow

To create an Amazon Connect instance and set up the contact flow, complete the following steps:

  1. Create a virtual contact center instance in us-east-1. For instructions, see Create an Amazon Connect Instance.
    1. In Step 3, select I want to make outbound calls with Amazon Connect.
  2. Download the following contact flow from the GitHub repo.
  3. Import the “Calling out” contact flow. For instructions, see Export and Import a Contact Flow.
  4. Choose Show additional flow information.
  5. Locate the contact flow ARN (see the following screenshot).
  6. Record the InstanceId and ContactFlowId.

The Amazon Connect ARN format is arn:${Partition}:connect::${Account}:instance/${InstanceId}/contact-flow/${ContactFlowId}. For more information, see Resource Types Defined by Amazon Connect.

  1. Set a phone number for the “Calling out” contact flow. For instructions, see Claim a Phone Number and Associate a Phone Number with a Contact Flow.
  2. Record the phone number.

Deploying the Amazon Lex Chatbot and Callouts serverless application

To deploy the chatbot, complete the following steps:

  1. Sign in to your AWS account.
  2. Choose the US East (N. Virginia) Region.
  3. Open AWS Serverless Application Repository for ExcelLexBot.
  4. Select I acknowledge that this app creates custom IAM roles and resource policies.
  5. Choose Deploy.
  6. Wait for the completion message.
  7. Choose View CloudFormation Stack.
  8. Choose Outputs.
  9. Download the zipped chatbot Excel files from the GitHub repo and unzip them.
  10. Upload the four Excel files into the S3 bucket LexExcelBucket; for example, serverlessrepo-excellexbot-bucket-1bxqjwlbfqjy9.
  11. On the AWS CloudFormation console, wait for the status of all four stacks to show as CREATE_COMPLETE.
  12. Open the AWS Serverless Application Repository for Callouts.
  13. Select I acknowledge that this app creates custom IAM roles and resource policies.
  14. Choose Deploy with the parameters Create Connect Instance and set up contract flow (the application name is S3 bucket name prefix, and I suggest you use the default value).
  15. Wait for the completion message.

Giving permission to the Amazon Connect instance

To give permission to your Amazon Connect instance, add the Amazon Lex bot to the Amazon Connect instance and add CalloutBot_ExcelLexBot, CalloutBotDate_ExcelLexBot, CalloutBotNumber_ExcelLexBot, and CalloutBotTime_ExcelLexBot.

You don’t need to set up permission for the Lambda functions IteratorFunction, ResponseHanlderFunction, and SendTaskSuccessFunction for the Amazon Connect instance because AWS CloudFormation granted the invoke function permission.

When the deployment is complete, you can upload your call job to the S3 bucket.

Un-deploying Callouts

To un-deploy Callouts, complete the following steps:

  1. Go to LexExcelBucket and delete the four chatbot Excel files.

This action triggers the deletion of the four chatbot stacks, which takes a few minutes.

  1. Delete all files in excelcalljobbucket and callreportbucket.
  2. After the chatbot stacks are deleted, delete serverlessrepo-ExcelLexBot and serverlessrepo-awscallouts.

Creating the outbound call job

To create the outbound call job, complete the following steps:

  1. Download the Excel example from the GitHub repo.
  2. Change the content in the file—at least the phone number and phone_prefix.
  3. Upload the file to the S3 bucket that contains excelcalljobbucket in the name.
  4. Wait for up to 5 minutes and download the call report from the S3 bucket that contains callreportbucket in the name.

Demos

For examples of using Callouts, see the following videos on YouTube:

Conclusion

This post demonstrates how to build Callouts, a solution for educators to contact students in a simple, consistent, and scalable way using Amazon Connect and Amazon Lex. Based on the user experience at the Hong Kong Institute of Vocational Education, we believe that this solution not only benefits educators and students, but can also aid caregivers, businesses, and individuals.

Project collaborators include Mike Ng, Technical Program Intern at AWS, Brian Cheung, Sam Lam, and Pearly Law from the IT114115 Higher Diploma in Cloud and Data Centre Administration. Special thanks to the AWS team, including Dickson Yue, Jerry Yuen, Niranjan Hira, Randall Hunt, and Cameron Peron, for educating and supporting our team.


About the Author

Cyrus Wong is a Data Scientist at the Hong Kong Institute of Vocational Education (Lee Wai Lee) Cloud Innovation Centre, has achieved all 13 AWS Certifications, and enjoys sharing his AWS knowledge with others through open-source projects, blog posts, and events.

Read More

Fine-tuning a PyTorch BERT model and deploying it with Amazon Elastic Inference on Amazon SageMaker

Fine-tuning a PyTorch BERT model and deploying it with Amazon Elastic Inference on Amazon SageMaker

Text classification is a technique for putting text into different categories, and has a wide range of applications: email providers use text classification to detect spam emails, marketing agencies use it for sentiment analysis of customer reviews, and discussion forum moderators use it to detect inappropriate comments.

In the past, data scientists used methods such as tf-idf, word2vec, or bag-of-words (BOW) to generate features for training classification models. Although these techniques have been very successful in many natural language processing (NLP) tasks, they don’t always capture the meanings of words accurately when they appear in different contexts. Recently, we see increasing interest in using Bidirectional Encoder Representations from Transformers (BERT) to achieve better results in text classification tasks, due to its ability to encode the meaning of words in different contexts more accurately.

Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. The Amazon SageMaker Python SDK provides open-source APIs and containers that make it easy to train and deploy models in Amazon SageMaker with several different ML and deep learning frameworks.

Our customers often ask for quick fine-tuning and easy deployment of their NLP models. Furthermore, customers prefer low inference latency and low model inference cost. Amazon Elastic Inference enables attaching GPU-powered inference acceleration to endpoints, which reduces the cost of deep learning inference without sacrificing performance.

This post demonstrates how to use Amazon SageMaker to fine-tune a PyTorch BERT model and deploy it with Elastic Inference. The code from this post is available in the GitHub repo. For more information about BERT fine-tuning, see BERT Fine-Tuning Tutorial with PyTorch.

What is BERT?

First published in November 2018, BERT is a revolutionary model. First, one or more words in sentences are intentionally masked. BERT takes in these masked sentences as input and trains itself to predict the masked word. In addition, BERT uses a next sentence prediction task that pretrains text-pair representations.

BERT is a substantial breakthrough and has helped researchers and data engineers across the industry achieve state-of-art results in many NLP tasks. BERT offers representation of each word conditioned on its context (rest of the sentence). For more information about BERT, see BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

BERT fine-tuning

One of the biggest challenges data scientists face for NLP projects is lack of training data; you often have only a few thousand pieces of human-labeled text data for your model training. However, modern deep learning NLP tasks require a large amount of labeled data. One way to solve this problem is to use transfer learning.

Transfer learning is an ML method where a pretrained model, such as a pretrained ResNet model for image classification, is reused as the starting point for a different but related problem. By reusing parameters from pretrained models, you can save significant amounts of training time and cost.

BERT was trained on BookCorpus and English Wikipedia data, which contains 800 million words and 2,500 million words, respectively [1]. Training BERT from scratch would be prohibitively expensive. By taking advantage of transfer learning, you can quickly fine-tune BERT for another use case with a relatively small amount of training data to achieve state-of-the-art results for common NLP tasks, such as text classification and question answering.

Solution overview

In this post, we walk through our dataset, the training process, and finally model deployment.

We use an Amazon SageMaker notebook instance for running the code. For more information about using Jupyter notebooks on Amazon SageMaker, see Using Amazon SageMaker Notebook Instances or Getting Started with Amazon SageMaker Studio.

The notebook and code from this post is available on GitHub. To run it yourself, clone the GitHub repository and open the Jupyter notebook file.

Problem and dataset

For this post, we use Corpus of Linguistic Acceptability (CoLA), a dataset of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. In our notebook, we download and unzip the data using the following code:

if not os.path.exists("./cola_public_1.1.zip"):
    !curl -o ./cola_public_1.1.zip https://nyu-mll.github.io/CoLA/cola_public_1.1.zip
if not os.path.exists("./cola_public/"):
    !unzip cola_public_1.1.zip

In the training data, the only two columns we need are the sentence itself and its label:

df = pd.read_csv(
    "./cola_public/raw/in_domain_train.tsv",
    sep="t",
    header=None,
    usecols=[1, 3],
    names=["label", "sentence"],
)
sentences = df.sentence.values
labels = df.label.values

If we print out a few sentences, we can see how sentences are labeled based on their grammatical completeness. See the following code:

print(sentences[20:25])
print(labels[20:25])

["The professor talked us." "We yelled ourselves hoarse."
 "We yelled ourselves." "We yelled Harry hoarse."
 "Harry coughed himself into a fit."]
[0 1 0 0 1]

We then split the dataset for training and testing before uploading both to Amazon S3 for use later. The SageMaker Python SDK provides a helpful function for uploading to Amazon S3:

from sagemaker.session import Session
from sklearn.model_selection import train_test_split

train, test = train_test_split(df)
train.to_csv("./cola_public/train.csv", index=False)
test.to_csv("./cola_public/test.csv", index=False)

session = Session()
inputs_train = session.upload_data("./cola_public/train.tsv", key_prefix="sagemaker-bert/training/data")
inputs_test = session.upload_data("./cola_public/test.tsv", key_prefix="sagemaker-bert/testing/data")

Training script

For this post, we use the PyTorch-Transformers library, which contains PyTorch implementations and pretrained model weights for many NLP models, including BERT. See the following code:

model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",  # Use the 12-layer BERT model, with an uncased vocab.
    num_labels=2,  # The number of output labels--2 for binary classification.
    output_attentions=False,  # Whether the model returns attentions weights.
    output_hidden_states=False,  # Whether the model returns all hidden-states.
)

Our training script should save model artifacts learned during training to a file path called model_dir, as stipulated by the Amazon SageMaker PyTorch image. Upon completion of training, Amazon SageMaker uploads model artifacts saved in model_dir to Amazon S3 so they are available for deployment. The following code is used in the script to save trained model artifacts:

model_2_save = model.module if hasattr(model, "module") else model
model_2_save.save_pretrained(save_directory=args.model_dir)

We save this script in a file named train_deploy.py, and put the file in a directory named code/, where the full training script is viewable.

Because PyTorch-Transformer isn’t included natively in Amazon SageMaker PyTorch images, we have to provide a requirements.txt file so that Amazon SageMaker installs this library for training and inference. A requirements.txt file is a text file that contains a list of items that are installed by using pip install. You can also specify the version of an item to install. To install PyTorch-Transformer, we add the following line to the requirements.txt file:

transformers==2.3.0

You can view the entire file in the GitHub repo, and it also goes into the code/ directory. For more information about the format of a requirements.txt file, see Requirements Files.

Training on Amazon SageMaker

We use Amazon SageMaker to train and deploy a model using our custom PyTorch code. The Amazon SageMaker Python SDK makes it easier to run a PyTorch script in Amazon SageMaker using its PyTorch estimator. After that, we can use the SageMaker Python SDK to deploy the trained model and run predictions. For more information about using this SDK with PyTorch, see Using PyTorch with the SageMaker Python SDK.

To start, we use the PyTorch estimator class to train our model. When creating the estimator, we make sure to specify the following:

  • entry_point – The name of the PyTorch script
  • source_dir – The location of the training script and requirements.txt file
  • framework_version: The PyTorch version we want to use

The PyTorch estimator supports multi-machine, distributed PyTorch training. To use this, we just set train_instance_count to be greater than 1. Our training script supports distributed training for only GPU instances.

After creating the estimator, we call fit(), which launches a training job. We use the Amazon S3 URIs we uploaded the training data to earlier. See the following code:

from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="train_deploy.py",
    source_dir="code",
    role=role,
    framework_version="1.3.1",
    py_version="py3",
    train_instance_count=2,
    train_instance_type="ml.p3.2xlarge",
    hyperparameters={
        "epochs": 1,
        "num_labels": 2,
        "backend": "gloo",
    }
)
estimator.fit({"training": inputs_train, "testing": inputs_test})

After training starts, Amazon SageMaker displays training progress (as shown in the following code). Epochs, training loss, and accuracy on test data are reported:

2020-06-10 01:00:41 Starting - Starting the training job...
2020-06-10 01:00:44 Starting - Launching requested ML instances......
2020-06-10 01:02:04 Starting - Preparing the instances for training............
2020-06-10 01:03:48 Downloading - Downloading input data...
2020-06-10 01:04:15 Training - Downloading the training image..
2020-06-10 01:05:03 Training - Training image download completed. Training in progress.
...
Train Epoch: 1 [0/3207 (0%)] Loss: 0.626472
Train Epoch: 1 [350/3207 (98%)] Loss: 0.241283
Average training loss: 0.5248292144022736
Test set: Accuracy: 0.782608695652174
...

We can monitor the training progress and make sure it succeeds before proceeding with the rest of the notebook.

Deployment script

After training our model, we host it on an Amazon SageMaker endpoint by calling deploy on the PyTorch estimator. The endpoint runs an Amazon SageMaker PyTorch model server. We need to configure two components of the server: model loading and model serving. We implement these two components in our inference script train_deploy.py. The complete file is available in the GitHub repo.

model_fn() is the function defined to load the saved model and return a model object that can be used for model serving. The SageMaker PyTorch model server loads our model by invoking model_fn:

def model_fn(model_dir):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = BertForSequenceClassification.from_pretrained(model_dir)
    return model.to(device)

input_fn() deserializes and prepares the prediction input. In this use case, our request body is first serialized to JSON and then sent to model serving endpoint. Therefore, in input_fn(), we first deserialize the JSON-formatted request body and return the input as a torch.tensor, as required for BERT:

def input_fn(request_body, request_content_type):
    if request_content_type == "application/json":
        sentence = json.loads(request_body)
    
        input_ids = []
        encoded_sent = tokenizer.encode(sentence,add_special_tokens = True)
        input_ids.append(encoded_sent)
    
        # pad shorter sentences
        input_ids_padded =[]
        for i in input_ids:
            while len(i) < MAX_LEN:
                i.append(0)
            input_ids_padded.append(i)
        input_ids = input_ids_padded
    
        # mask; 0: added, 1: otherwise
        [int(token_id > 0) for token_id in sent] for sent in input_ids

        # convert to PyTorch data types.
        train_inputs = torch.tensor(input_ids)
        train_masks = torch.tensor(attention_masks)
    
        # train_data = TensorDataset(train_inputs, train_masks)
        return train_inputs, train_masks

predict_fn() performs the prediction and returns the result. See the following code:

def predict_fn(input_data, model):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.eval()
    input_id, input_mask = input_data
    input_id.to(device)
    input_mask.to(device)
    with torch.no_grad():
        return model(input_id, token_type_ids=None,attention_mask=input_mask)[0]

We take advantage of the prebuilt Amazon SageMaker PyTorch image’s default support for serializing the prediction result.

Deploying the endpoint

To deploy our endpoint, we call deploy() on our PyTorch estimator object, passing in our desired number of instances and instance type:

predictor = estimator.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

We then configure the predictor to use "application/json" for the content type when sending requests to our endpoint:

from sagemaker.predictor import json_deserializer, json_serializer

predictor.content_type = "application/json"
predictor.accept = "application/json"
predictor.serializer = json_serializer
predictor.deserializer = json_deserializer

Finally, we use the returned predictor object to call the endpoint:

result = predictor.predict("Somebody just left - guess who.")
print(np.argmax(result, axis=1))

[1]

The predicted class is 1, which is expected because the test sentence is a grammatically correct sentence.

Deploying the endpoint with Elastic Inference

Selecting the right instance type for inference requires deciding between different amounts of GPU, CPU, and memory resources. Optimizing for one of these resources on a standalone GPU instance usually leads to underutilization of other resources. Elastic Inference solves this problem by enabling you to attach the right amount of GPU-powered inference acceleration to your endpoint. In March 2020, Elastic Inference support for PyTorch became available for both Amazon SageMaker and Amazon EC2.

To use Elastic Inference, we must first convert our trained model to TorchScript. For more information, see Reduce ML inference costs on Amazon SageMaker for PyTorch models using Amazon Elastic Inference.

We first download the trained model artifacts from Amazon S3. The location of the model artifacts is estimator.model_data. We then convert the model to TorchScript using the following code:

model_torchScript = BertForSequenceClassification.from_pretrained("model/", torchscript=True)
device = "cpu"
for_jit_trace_input_ids = [0] * 64
for_jit_trace_attention_masks = [0] * 64
for_jit_trace_input = torch.tensor([for_jit_trace_input_ids])
for_jit_trace_masks = torch.tensor([for_jit_trace_input_ids])

traced_model = torch.jit.trace(
    model_torchScript, [for_jit_trace_input.to(device), for_jit_trace_masks.to(device)]
)
torch.jit.save(traced_model, "traced_bert.pt")

subprocess.call(["tar", "-czvf", "traced_bert.tar.gz", "traced_bert.pt"])

Loading the TorchScript model and using it for prediction requires small changes in our model loading and prediction functions. We create a new script deploy_ei.py that is slightly different from train_deploy.py script.

For model loading, we use torch.jit.load instead of the BertForSequenceClassification.from_pretrained call from before:

loaded_model = torch.jit.load(os.path.join(model_dir, "traced_bert.pt"))

For prediction, we take advantage of torch.jit.optimized_execution for the final return statement:

with torch.no_grad():
    with torch.jit.optimized_execution(True, {"target_device": "eia:0"}):
        return model(input_id,attention_mask=input_mask)[0]

The entire deploy_ei.py script is available in the GitHub repo. With this script, we can now deploy our model using Elastic Inference:

predictor = pytorch.deploy(
    initial_instance_count=1, 
    instance_type="ml.m5.large",
    accelerator_type="ml.eia2.xlarge"
)

We attach the Elastic Inference accelerator to our output by using the accelerator_type="ml.eia2.xlarge" parameter.

Cleaning up resources

Remember to delete the Amazon SageMaker endpoint and Amazon SageMaker notebook instance created to avoid charges. See the following code:

predictor.delete_endpoint()

Conclusion

In this post, we used Amazon SageMaker to take BERT as a starting point and train a model for labeling sentences on their grammatical completeness. We then deployed the model to an Amazon SageMaker endpoint, both with and without Elastic Inference acceleration. You can use this solution to tune BERT in other ways, or use other pretrained models provided by PyTorch-Transformers. For more about using PyTorch with Amazon SageMaker, see Using PyTorch with the SageMaker Python SDK.

Reference

[1] Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision, pages 19–27.


About the Authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Noble Prize he promised. Currently he helps customers in financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

 

 

 

David Ping is a Principal Solutions Architect with the AWS Solutions Architecture organization. He works with our customers to build cloud and machine learning solutions using AWS. He lives in the NY metro area and enjoys learning the latest machine learning technologies.

 

 

 

Lauren Yu is a Software Development Engineer at Amazon SageMaker. She works primarily on the SageMaker Python SDK, as well as toolkits for integrating PyTorch, TensorFlow, and MXNet with Amazon SageMaker. In her spare time, she enjoys playing viola in the Amazon Symphony Orchestra and Doppler Quartet.

 

 

Read More

Facebook uses Amazon EC2 to evaluate the Deepfake Detection Challenge

Facebook uses Amazon EC2 to evaluate the Deepfake Detection Challenge

In October 2019, AWS announced that it was working with Facebook, Microsoft, and the Partnership on AI on the first Deepfake Detection Challenge. Deepfake algorithms are the same as the underlying technology that has given us realistic animation effects in movies and video games. Unfortunately, those same algorithms have been used by bad actors to blur the distinction between reality and fiction. Deepfake videos result from using artificial intelligence to manipulate audio and video to make it appear as though someone did or said something they didn’t. For more information about deepfake content, see The Partnership on AI Steering Committee on AI and Media Integrity.

In machine learning (ML) terms, the Generative Adversarial Networks (GAN) algorithm has been the most popular algorithm to create deepfakes. GANs use a pair of neural networks: a generative network that produces candidates by adding noise to the original data, and a discriminative network that evaluates the data until it determines they aren’t synthesized. GANs matches one network against the other in an adversarial manner to generate new, synthetic instances of data that can pass for real data. This means the deepfake is indistinguishable from a normal dataset.

The goal of this challenge was to incentivize researchers around the world to build innovative methods that can help detect deepfakes and manipulated media. The competition, which ended on March 31, 2020, was popular amongst the Kaggle data science community. The deepfake project emphasized the benefits of scaling and optimizing the cost of deep learning batch inference. Once the competition was complete, the team at Facebook hosted the deepfake competition data on AWS and made it available to the world, encouraging researchers to keep fighting this problem.

There were over 4,200 total submissions from over 2,300 teams worldwide. The participating submissions are scored with the following log loss function, where a smaller score is better (for more information about scoring, see the contest rules):

Four groups of datasets were associated with the competition:

  • Training – The participating teams used this set for training their model. It consisted of 470 GB of video files, with real and fake labels for each video.
  • Public validation – Consisted of a sample of 400 videos from the test dataset.
  • Public test – Used by the Kaggle platform to compute the public leaderboard.
  • Private test – Held by the Facebook team, the host outside of the Kaggle competition platform for scoring the competition. The results from using the private test set were displayed on the competition’s private leaderboard. This set contains videos with a similar format and nature as the training and public validation and test sets, but contain real, organic videos as well as deepfakes.

After the competition deadline, Kaggle transferred the code for the two final submissions from each team to the competition host. The hosting team re-ran the submission code against this private dataset and returned prediction submissions to Kaggle to compute the final private leaderboard scores. The submissions were based on two types of compute virtual machines (VMs): GPU-based and CPU-based. Most of the submissions were GPU-based.

The competition hosting team at Facebook recognized several challenges in conducting an evaluation from the unexpectedly large number of participants. With over 4,200 total submissions and 9 GPU hours of runtime required for each using a p3.2xl Amazon Elastic Compute Cloud (Amazon EC2) P3 instance; they would need an estimated 42,000 GPU compute hours (or almost 5 years’ worth of compute hours) to complete the competition. To make the project even more challenging, they needed to do 5 years of GPU compute in 3 weeks.

Given the tight deadline, the host team had to address several constraints to complete the evaluation within the time and budget allotted.

Operational efficiency

To meet the tight timeframes for the competition and make the workload efficient due to the small team size, the solution must be low-code. To address the low-code requirement, they chose AWS Batch for scheduling and scaling out the compute workload. The following diagram illustrates the solution architecture.

AWS Batch was originally designed for developers, scientists, and engineers to easily and efficiently manage large numbers of batch computing jobs on AWS with little coding or cloud infrastructure deployment experience. There’s no need to install and manage batch computing software or server clusters, which allows you to focus on analyzing and solving problems. AWS Batch provides scheduling and scales out batch computing workloads across the full range of AWS compute services, such as Amazon EC2 and Spot Instances. Furthermore, AWS Batch has no additional charges for managing cluster resources. In this use case, the host simply submitted 4,200 compute jobs, which registered each Kaggle submission container, which ran for about 9 hours each. Using a cluster of instances, all jobs were complete in less than three weeks.

Elasticity

The tight timeframes for the competition, as well as requiring those instances for only a short period, speaks to the need for elasticity in compute. For example, the team estimated they would need a minimum of 85 Amazon EC2 P3 GPUs running in parallel around the clock to complete the evaluation. To account for restarts and other issues causing lost time, there was the potential for an additional 50% in capacity. Facebook was able to quickly scale up the number of GPUs and CPUs needed for the evaluation and scale them down when finished, only paying for what they used. This was much more efficient in terms of budget and operations effort than acquiring, installing, and configuring the compute on-premises.

Security

Security was another significant concern. Submissions from such a wide array of participants could contain viruses, malware, bots, or rootkits. Running these containers in a sandboxed, cloud environment avoided that risk. If the evaluation environment was exposed to various infectious agents, the environment could be terminated and easily rebuilt without exposing any production systems to downtime or data loss.

Privacy and confidentiality

Privacy and confidentiality are closely related to the security concerns. To address those concerns, all the submissions and data were held in a single, closely held AWS account with private virtual private clouds (VPCs) and restrictive permissions using AWS Identity and Access Management (IAM). To ensure privacy and confidentiality of the submitted models, and fairness in grading, a single, dedicated engineer was responsible for conducting the evaluation without looking into any of the Docker images submitted by the various teams.

Cost

Cost was another important constraint the team had to consider. A rough estimate of 42,000 hours of Amazon EC2 P3 instance runtime would cost about $125,000.

To lower the cost of GPU compute, the host team determined that the Amazon EC2 G4 (Nvida Tesla T4 GPUs) instance type was more cost-effective for this workload than the P3 instance (Volta 100 GPUs). Amongst the GPU instances in the cloud, Amazon EC2 G4 are cost-effective and versatile GPU instances for deploying ML models.

These instances are optimized for ML application deployments (inference), such as image classification, object detection, recommendation engines, automated speech recognition, and language translation, which push the boundary on AI innovation and latency.

The host team completed a few test runs with the G4 instance type. The test runtime for each submission resulted in a little over twice the comparative runtime of the P3 instances, resulting in the need for approximately 90,000 compute hours. The G4 instances cost up to 83% less per hour than the P3 instances. Even with longer runtimes per job with the G4 instances, the total compute cost decreased from $125,000 to just under $50,000. The following table illustrates the cost-effectiveness of the G4 instance type per inference.

p3.2xl g4dn.8xl
Runtime (hours) 90,000 25,000
Cost (USD) $125,000 $50,000
Cost per Inference $30 $12

The host team shared that many of the submission runs completed with less compute time than originally projected. The initial projection was based upon early model submissions, which were larger than the average size for all models submitted. About 80% of the runs took advantage of the G4 instance type, while some had to be run on the P3 instances due to slight differences in available GPU memory between the two instance types. The final numbers were 25,000 G4 (GPU) compute hours, 5,000 C4 (CPU) compute hours, and 800 P3 (GPU) compute hours, totaling $20,000 in compute cost. After approximately two weeks of around-the-clock evaluation, the host team completed the challenging task of evaluating all the submissions early and consumed less than half of the $50,000 estimate.

Conclusion

The host team was able to complete a full evaluation of the over 4,200 submission evaluations in less time than was available, while meeting the grading fairness criteria and coming in under budget. The host team successfully replicated the evaluation environment with a success rate of 94%, which is high for a two-stage competition.

Software projects are often risk-prone due to technological uncertainties, and perhaps even more so due to inherent complexity and constraints. The breadth and depth of AWS services running on Amazon EC2 allow you to solve your unique challenges by reducing technology uncertainty. In this case, the Facebook team completed the deepfake evaluation challenge on time and under budget with only one software engineer. The engineer started by selecting a low-code solution, AWS Batch, which is a proven service for even larger-scale HPC workloads, and reduced the evaluation cost by 2/3 through the choice of the AI inference-optimized G4 EC2 instance type.

AWS believes there’s no one solution to a problem. Solutions often consist of multiple and flexible building blocks from which you can craft solutions that meet your needs and priorities.


About the Authors

Wenming Ye is an AI and ML specialist architect at Amazon Web Services, helping researchers and enterprise customers use cloud-based machine learning services to rapidly scale their innovations. Previously, Wenming had a diverse R&D experience at Microsoft Research, SQL engineering team, and successful startups.

 

 

 

Tim O’Brien is a Senior Solutions Architect at AWS focused on Machine Learning and Artificial Intelligence. He has over 30 years of experience in information technology, security, and accounting. In his spare time, he likes hiking, climbing, and skiing with his wife and two dogs.

Read More