Shops on Facebook and Instagram: Understanding relationships between products to improve buyer and seller experience

This Research in Brief summarizes various projects carried out by co-authors Yaniv Sheena and Oren Sar Shalom, along with their colleagues on the Relevance Foundations team at Meta.

What the research is:

In 2020, we launched Shops on Facebook and Instagram to make it easy for businesses to set up a digital storefront and sell online. Currently, Shops holds a massive inventory of products from different verticals and diverse sellers, where the data provided tend to be unstructured, multilingual, and in some cases missing crucial information.

Understanding these products’ core characteristics and encoding their relationships can help to unlock a variety of e-commerce experiences, whether that’s recommending similar or complementary products on the product page or diversifying shopping feeds to avoid showing the same product multiple times. To unlock these opportunities, we have established a team of researchers and engineers in Tel-Aviv with the goal of creating a product graph that accommodates different product relations. The team has already launched capabilities that are integrated in various products across Meta.

Our research is focused on capturing and embedding different notions of relationships between products. These methods are based on signals from the products’ content (text, image, etc.) as well as past user interactions (e.g., collaborative filtering).

First, we tackle the problem of product deduplication, where we cluster together duplicates or variants of the same product. Finding duplicates or near-duplicate products among billions of items is like finding a needle in a haystack. For instance, if a local store in Israel and a big brand in Australia sell the exact same shirt or variants of the same shirt (e.g., different colors), we cluster these products together. This is challenging at a scale of billions of products with different images (some of low quality), descriptions, and languages.

Next, we introduce Frequently Bought Together (FBT), an approach for product recommendation based on products people tend to jointly buy or interact with.

How it works:

Product clustering

We developed a clustering platform that clusters similar items in real time. For every new item listed in the Shops catalog, our algorithm assigns either an existing cluster or a new cluster.

This process takes the following steps:

  • Product retrieval: We use image index based on GrokNet visual embedding as well as text retrieval based on an internal search back end powered by Unicorn. We retrieve up to 100 similar products from an index of representative items, which can be thought of as cluster centroids.
  • Pairwise similarity: We compare the new item with each representative item using a pairwise model that, given two products, predicts a similarity score.
  • Item to cluster assignment: We choose the most similar product and apply a static threshold. If the threshold is met, we assign the item. Otherwise, we create a new singleton cluster.

We specify two types of clustering spaces, based on business objectives:

  • Exact duplicates: Grouping instances of the exact same product
  • Product variants: Grouping variants of the same product (such as shirts in different colors or iPhones with differing amounts of storage)

For each clustering type, we train a model tailored for the specific task. The model is based on gradient boosted decision trees (GBDT) with a binary loss, and uses both dense and sparse features. Among the features, we use GrokNet embedding cosine distance (image distance), LASER embedding distance (cross-language textual representation), textual features like the Jaccard index, and a tree-based distance between products’ taxonomies. This allows us to capture both visual and textual similarities, while also leveraging signals like brand and category. Furthermore, we also experimented with SparseNN model, a deep model originally developed at Meta for personalization. It is designed to combine dense and sparse features to jointly train a network end to end by learning semantic representations for the sparse features. However, this model did not outperform the GBDT model, which is much lighter in terms of training time and resources.

Our models require training data sets for both clustering tasks: We send pairs of products to human raters to compose sets for training, validation, and evaluation. In addition, to obtain more relevant pairs with hard negatives, we utilize an active learning approach based on our existing retrieval mechanisms, followed by sampling by uncertainty and density (SUD).

To evaluate our approach, we formed a set consisting of ~100K pairs of products from the verticals Clothing & Accessories, Health & Beauty, and Home. Each pair was annotated by humans who marked whether the two products were different, exact duplications, or variants. We then measure precision and recall by inferring whether the products would reside in the same cluster, based on the above steps. Final results are pivoted by verticals, which tend to have different traits.

Pairwise similarity models performance: GBDT vs SparseNN

Clustering system-level performance by vertical

Since grouping together different products may cause unsatisfactory user experience, we tuned our models to be precision-oriented. Results suggest that we could solve a large portion of the problem but we still need to focus on improving recall. Further, we found that health & beauty products were more challenging and required better text understanding.

Frequently Bought Together (FBT)

Analysis of past purchases shows that customers often look for multiple items in a short period of time, such that together they have a synergistic utility. A notable example is a pair of jeans, together with a belt and possibly a matching shirt. When a customer is currently viewing a certain product (dubbed seed product), our task is to help them find complementary products.

Arguably, the most standard method to find products that go together is to simply count co-purchases. That is, we observe the (normalized) number of customers who purchased the seed items and, shortly afterward, another candidate product. If this amount exceeds some threshold, we say that the candidate product makes a good FBT recommendation for the seed product. However, with the ever-increasing variety of products available on Shops on Facebook and Instagram, there is always an abundance of new products that haven’t been purchased in large numbers. Reducing the recommendation threshold results in an overwhelming amount of noise — and, in particular, substitute items tangled with complementary ones.

To remedy this, we apply a two-step solution. First, we work on the category level (rather on product level) to identify pairs of categories that go together. This aggregation solves the problem of purchase sparsity, and its output was further verified by expert taxonomists. Then it then allows us to resort to a simple count-based approach, setting a low threshold but considering only pairs that belong to categories that go together.

Yet, even with a low threshold, there are many products that aren’t covered by this method. To increase coverage, we apply the following steps:

  • First, we utilize the variants’ model and copy recommendations of a product to its variants as well.
  • Second, we employ a model that predicts to what extent a pair of items are complementary based on their visual appearance.

As a training set for this model, we need a list of products that go together. To this end, we go over fashion images and extract the appeared products, assuming that products that appear in the same image make a good FBT recommendation.

To assess the performance of our approach, we conducted an experiment (A/B test) where we suggested a set of complementary items to buyers who considered a product (product page). We compared our approach with a baseline (control) consisting of suggestions that were hand-picked by sellers. FBT recommendation led to a 12 percent relative improvement in click-through rate, which proves the viability and effectiveness of that approach.

Why it matters:

Our methods to incorporate product similarities have improved various consumer-facing applications in Shops. First, we launched clustering-based post ranking logic, which diversifies product search results. We also showed that similarities based on intentful user actions led to better recommendation compared to suggestions chosen by sellers. Finally, we constantly collaborate with different teams across Shops to leverage our signals and improve relevance. Through intensive A/B testing, we learned that capturing relationships between products is a significant step in unlocking better user experiences.

What’s next:

We’re currently developing a holistic model that considers simultaneously behavioral data like co-views, co-purchases (distinct users who are viewing or buying the same product), and the preferences of the users who interacted with each item, together with product information like image, textual description, price, and brand. These two types of modalities, buyer engagement and product information, are learned in a mutual reinforcement manner where one type of modality acts as the label for the other type. Concretely, given a seed product, the behavioral modality allows us to find two products such that one of them makes a better recommendation than the other, thereby allowing the side information to be learned using triplet loss. Likewise, the side information modality generates triplets that allow to improve the behavioral features.

The post Shops on Facebook and Instagram: Understanding relationships between products to improve buyer and seller experience appeared first on Facebook Research.

Read More

AWS Deep Learning AMIs: New framework-specific DLAMIs for production complement the original multi-framework DLAMIs

Since its launch in November 2017, the AWS Deep Learning Amazon Machine Image (DLAMI) has been the preferred method for running deep learning frameworks on Amazon Elastic Compute Cloud (Amazon EC2). For deep learning practitioners and learners who want to accelerate deep learning in the cloud, the DLAMI comes pre-installed with AWS-optimized deep learning (DL) frameworks and their dependencies so you can get started right away with conducting research, developing machine learning (ML) applications, or educating yourself about deep learning. DLAMIs also make it easy to get going on instance types based on AWS-built processors such as Inferentia, Trainium, and Graviton, with all the necessary dependencies pre-installed.

The original DLAMI contained several popular frameworks such as PyTorch, TensorFlow, and MXNet, all in one bundle that AWS tested and supported on AWS instances. Although the multiple-framework DLAMI enables developers to explore various frameworks in a single image, some use cases require a smaller DLAMI that contains only a single framework. To support these use cases, we recently released DLAMIs that each contain a single framework. These framework-specific DLAMIs have less complexity and smaller size, making them more optimized for production environments.

In this post, we describe the components of the framework-specific DLAMIs and compare the use cases of the framework-specific and multi-framework DLAMIs.

All the DLAMIs contain similar libraries. The PyTorch DLAMI and the TensorFlow DLAMI each contain all the drivers necessary to run the framework on AWS instances including p3, p4, Trainium, or Graviton. The following table compares DLAMIs and components. More information can be found in the release notes.

Component Framework-specific PyTorch 1.9.0

Framework-specific

Tensorflow 2.5.0

Multi-framework (AL2 – v50)
PyTorch 1.9.0 N/A 1.4.0 & 1.8.1
TensorFlow N/A 2.5.0 2.4.2, 2.3.3 & 1.15.5
NVIDIA CUDA 11.1.1 11.2.2 10.x, 11.x
NVIDIA cuDNN 8.0.5 8.1.1 N/A

Eliminating other frameworks and their associated components makes each framework-specific DLAMI approximately 60% smaller (approximately 45 GB vs. 110 GB). As described in the following section, this reduction in complexity and size has advantages for certain use cases.

DLAMI use cases

The multi-framework DLAMI has, until now, been the default for AWS developers doing deep learning on EC2. This is because DLAMIs simplify the experience for developers looking to explore and compare different frameworks within a single AMI. The multi-framework DLAMI remains as a great solution for use cases focusing on research, development, and education. This is because the multi-framework DLAMI comes preinstalled with the deep learning infrastructure for TensorFlow, PyTorch, and MXNet. Developers don’t have to spend any time installing deep learning libraries and components specific to any of these frameworks, and can experiment with the latest versions of each of the most popular frameworks. This one-stop shop means that you can focus on your deep learning-related tasks instead of MLOps and driver configurations. Having multiple frameworks in the DLAMI provides flexibility and options for practitioners looking to explore multiple deep learning frameworks.

Some examples of use cases for the multi-framework DLAMI include:

  • Medical research – Research scientists want to develop models that detect malignant tumors and want to compare performance between deep learning frameworks to achieve the highest performance metrics possible
  • Deep learning college course – College students learning to train deep learning models can choose from the multiple frameworks installed on the DLAMI in a Jupyter environment
  • Developing a model for a mobile app – Developers use the multi-framework DLAMI to develop multiple models for their voice assistant mobile app using a combination of deep learning frameworks

When deploying in a production environment, however, developers may only require a single framework and its related dependencies. The lightweight, framework-specific DLAMIs provide a more streamlined image that minimizes dependencies. In addition to a smaller footprint, the framework-specific DLAMIs minimize the surface area for security attacks and provide more consistent compatibility across versions due to the limited number of included libraries. The framework-specific DLAMIs also have less complexity, which makes them more reliable as developers increment versions in production environments.

Some examples of use cases for framework-specific DLAMIs include:

  • Deploying an ML-based credit underwriting model – A finance startup wants to deploy an inference endpoint with high reliability and availability with faster auto scaling during demand spikes
  • Batch processing of video – A film company creates a command line application that increases the resolution of low-resolution digital video files using deep learning by interpolating pixels
  • Training a framework-specific model – A mobile app startup needs to train a model using TensorFlow because their app development stack requires a TensorFlow Lite compiled model

Conclusion

DLAMIs have become the go-to image for deep learning on EC2. Now, framework-specific DLAMIs build on that success by providing images that are optimized for production use cases. Like multi-framework DLAMIs, the single-framework images remove the heavy lifting necessary for developers to build and maintain deep learning applications. With the launch of the new, lightweight framework-specific DLAMIs, developers now have more choices for accelerated Deep Learning on EC2.

Get started with Single-framework DLAMIs today using this tutorial and selecting a framework-specific Deep Learning AMI in the Launch Wizard.


About the Authors

Francisco Calderon is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he helps solve critical business problems for AWS customers using deep learning. In his spare time, Francisco likes to play music and guitar, play soccer with his daughters, and enjoy time with his family.

Corey Barrett is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he uses machine learning and deep learning to solve critical business problems for AWS customers. Outside of work, you can find him enjoying the outdoors, sipping on scotch, and spending time with his family.

Read More

Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices

Machine learning provides powerful tools to researchers to identify and predict patterns and behaviors, as well as learn, optimize, and perform tasks. This ranges from applications like vision systems on autonomous vehicles or social robots to smart thermostats to wearable and mobile devices like smartwatches and apps that can monitor health changes. While these algorithms and their architectures are becoming more powerful and efficient, they typically require tremendous amounts of memory, computation, and data to train and make inferences.

At the same time, researchers are working to reduce the size and complexity of the devices that these algorithms can run on, all the way down to a microcontroller unit (MCU) that’s found in billions of internet-of-things (IoT) devices. An MCU is memory-limited minicomputer housed in compact integrated circuit that lacks an operating system and runs simple commands. These relatively cheap edge devices require low power, computing, and bandwidth, and offer many opportunities to inject AI technology to expand their utility, increase privacy, and democratize their use — a field called TinyML.

Now, an MIT team working in TinyML in the MIT-IBM Watson AI Lab and the research group of Song Han, assistant professor in the Department of Electrical Engineering and Computer Science (EECS), has designed a technique to shrink the amount of memory needed even smaller, while improving its performance on image recognition in live videos.

“Our new technique can do a lot more and paves the way for tiny machine learning on edge devices,” says Han, who designs TinyML software and hardware.

To increase TinyML efficiency, Han and his colleagues from EECS and the MIT-IBM Watson AI Lab analyzed how memory is used on microcontrollers running various convolutional neural networks (CNNs). CNNs are biologically-inspired models after neurons in the brain and are often applied to evaluate and identify visual features within imagery, like a person walking through a video frame. In their study, they discovered an imbalance in memory utilization, causing front-loading on the computer chip and creating a bottleneck. By developing a new inference technique and neural architecture, the team alleviated the problem and reduced peak memory usage by four-to-eight times. Further, the team deployed it on their own tinyML vision system, equipped with a camera and capable of human and object detection, creating its next generation, dubbed MCUNetV2. When compared to other machine learning methods running on microcontrollers, MCUNetV2 outperformed them with high accuracy on detection, opening the doors to additional vision applications not before possible.

The results will be presented in a paper at the conference on Neural Information Processing Systems (NeurIPS) this week. The team includes Han, lead author and graduate student Ji Lin, postdoc Wei-Ming Chen, graduate student Han Cai, and MIT-IBM Watson AI Lab Research Scientist Chuang Gan.

A design for memory efficiency and redistribution

TinyML offers numerous advantages over deep machine learning that happens on larger devices, like remote servers and smartphones. These, Han notes, include privacy, since the data are not transmitted to the cloud for computing but processed on the local device; robustness, as the computing is quick and the latency is low; and low cost, because IoT devices cost roughly $1 to $2. Further, some larger, more traditional AI models can emit as much carbon as five cars in their lifetimes, require many GPUs, and cost billions of dollars to train. “So, we believe such TinyML techniques can enable us to go off-grid to save the carbon emissions and make the AI greener, smarter, faster, and also more accessible to everyone — to democratize AI,” says Han.

However, small MCU memory and digital storage limit AI applications, so efficiency is a central challenge. MCUs contain only 256 kilobytes of memory and 1 megabyte of storage. In comparison, mobile AI on smartphones and cloud computing, correspondingly, may have 256 gigabytes and terabytes of storage, as well as 16,000 and 100,000 times more memory. As a precious resource, the team wanted to optimize its use, so they profiled the MCU memory usage of CNN designs — a task that had been overlooked until now, Lin and Chen say.

Their findings revealed that the memory usage peaked by the first five convolutional blocks out of about 17. Each block contains many connected convolutional layers, which help to filter for the presence of specific features within an input image or video, creating a feature map as the output. During the initial memory-intensive stage, most of the blocks operated beyond the 256KB memory constraint, offering plenty of room for improvement. To reduce the peak memory, the researchers developed a patch-based inference schedule, which operates on only a small fraction, roughly 25 percent, of the layer’s feature map at one time, before moving onto the next quarter, until the whole layer is done. This method saved four-to-eight times the memory of the previous layer-by-layer computational method, without any latency.

“As an illustration, say we have a pizza. We can divide it into four chunks and only eat one chunk at a time, so you save about three-quarters. This is the patch-based inference method,” says Han. “However, this was not a free lunch.” Like photoreceptors in the human eye, they can only take in and examine part of an image at a time; this receptive field is a patch of the total image or field of view. As the size of these receptive fields (or pizza slices in this analogy) grows, there becomes increasing overlap, which amounts to redundant computation that the researchers found to be about 10 percent. The researchers proposed to also redistribute the neural network across the blocks, in parallel with the patch-based inference method, without losing any of the accuracy in the vision system. However, the question remained about which blocks needed the patch-based inference method and which could use the original layer-by-layer one, together with the redistribution decisions; hand-tuning for all of these knobs was labor-intensive, and better left to AI.

“We want to automate this process by doing a joint automated search for optimization, including both the neural network architecture, like the number of layers, number of channels, the kernel size, and also the inference schedule including number of patches, number of layers for patch-based inference, and other optimization knobs,” says Lin, “so that non-machine learning experts can have a push-button solution to improve the computation efficiency but also improve the engineering productivity, to be able to deploy this neural network on microcontrollers.”

A new horizon for tiny vision systems

The co-design of the network architecture with the neural network search optimization and inference scheduling provided significant gains and was adopted into MCUNetV2; it outperformed other vision systems in peak memory usage, and image and object detection and classification. The MCUNetV2 device includes a small screen, a camera, and is about the size of an earbud case. Compared to the first version, the new version needed four times less memory for the same amount of accuracy, says Chen. When placed head-to-head against other tinyML solutions, MCUNetV2 was able to detect the presence of objects in image frames, like human faces, with an improvement of nearly 17 percent. Further, it set a record for accuracy, at nearly 72 percent, for a thousand-class image classification on the ImageNet dataset, using 465KB of memory. The researchers tested for what’s known as visual wake words, how well their MCU vision model could identify the presence of a person within an image, and even with the limited memory of only 30KB, it achieved greater than 90 percent accuracy, beating the previous state-of-the-art method. This means the method is accurate enough and could be deployed to help in, say, smart-home applications.

With the high accuracy and low energy utilization and cost, MCUNetV2’s performance unlocks new IoT applications. Due to their limited memory, Han says, vision systems on IoT devices were previously thought to be only good for basic image classification tasks, but their work has helped to expand the opportunities for TinyML use. Further, the research team envisions it in numerous fields, from monitoring sleep and joint movement in the health-care industry to sports coaching and movements like a golf swing to plant identification in agriculture, as well as in smarter manufacturing, from identifying nuts and bolts to detecting malfunctioning machines.

“We really push forward for these larger-scale, real-world applications,” says Han. “Without GPUs or any specialized hardware, our technique is so tiny it can run on these small cheap IoT devices and perform real-world applications like these visual wake words, face mask detection, and person detection. This opens the door for a brand-new way of doing tiny AI and mobile vision.”

This research was sponsored by the MIT-IBM Watson AI Lab, Samsung, and Woodside Energy, and the National Science Foundation.

Read More

Clinical text mining using the Amazon Comprehend Medical new SNOMED CT API

Mining medical concepts from written clinical text, such as patient encounters, plays an important role in clinical analytics and decision-making applications, such as population analytics for providers, pre-authorization for payers, and adverse-event detection for pharma companies. Medical concepts contain medical conditions, medications, procedures, and other clinical events. Extracting medical concepts is a complicated process due to the specialist knowledge required and the broad use of synonyms in the medical field. Furthermore, to make detected concepts useful for large-scale analytics and decision-making applications, they have to be codified. This is a process where a specialist looks up matching codes from a medical ontology, often containing tens to hundreds of thousands of concepts.

To solve these problems, Amazon Comprehend Medical provides a fast and accurate way to automatically extract medical concepts from the written text found in clinical documents. You can now also use a new feature to automatically standardize and link detected concepts to the SNOMED CT (Systematized Nomenclature of Medicine—Clinical Terms) ontology. SNOMED CT provides a comprehensive clinical healthcare terminology and accompanying clinical hierarchy, and is used to encode medical conditions, procedures, and other medical concepts to enable big data applications.

This post details how to use the new SNOMED CT API to link SNOMED CT codes to medical concepts (or entities) in natural written text that can then be used to accelerate research and clinical application building. After reading this post, you will be able to detect and extract medical terms from unstructured clinical text, map them to the SNOMED CT ontology (US edition), retrieve and manipulate information from a clinical database, including electronic health record (EHR) systems, and map SNOMED CT concepts to other ontologies using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) if your EHR system uses an ontology other than SNOMED CT.

Solution overview

Amazon Comprehend Medical is a HIPAA-eligible natural language processing (NLP) service that uses machine learning (ML) to extract clinical data from unstructured medical text—no ML experience required—and automatically map them to SNOMED CT, ICD10, or RxNorm ontologies with a simple API call. You can then add the ontology codes to your EHR database to augment patient data or link to other ontologies as desired through OMOP CDM. For this post, we demonstrate the solution workflow as shown in the following diagram with code based on the example sentence “Patient X was diagnosed with insomnia.”

To use clinical concept codes based on a text input, we detect and extract clinical terms, connect to the clinical data base, transform SNOMED code to OMOP CDM code, and use them within our records.

For this post, we use the OMOP CDM as a database schema as an example. Historically, healthcare institutions in different regions and countries use their own terminologies and classifications for their own purposes, which prevents the interoperability of the systems. While SNOMED CT standardizes medical concepts with a clinical hierarchy, the OMOP CDM provides a standardization mechanism to move from one ontology to another, with an accompanying data model. The OMOP CDM standardizes the format and content of observational data so that standardized applications, tools and methods can be applied across different datasets. In addition, the OMOP CDM makes it easier to convert codes from one vocabulary to another by having maps between medical concepts in different hierarchical ontologies and vocabularies. The ontologies hierarchy is set such that descendants are more specific than ascendants. For example, non-small cell lung cancer is a descendent of malignant neoplastic disease. This allows querying and retrieving concepts and all their hierarchical descendants, and also enables interoperability between ontologies.

We demonstrate implementing this solution with the following steps:

  1. Extract concepts with Amazon Comprehend Medical SNOMED CT and link them to the SNOMED CT (US edition) ontology.
  2. Connecting to the OMOP CDM.
  3. Map the SNOMED CT code to OMOP CDM concept IDs.
  4. Use the structured information to perform the following actions:
    1. Retrieve the number of patients with the disease.
    2. Traverse the ontology.
    3. Map to other ontologies.

Prerequisites

Before you get started, make sure you have the following:

  • Access to an AWS account.
  • Permissions to create an AWS CloudFormation.
  • Permissions to call Amazon Comprehend Medical from Amazon SageMaker.
  • Permissions to query Amazon Redshift from SageMaker.
  • The SNOMED CT license. SNOMED International is a strong member-owned and driven organization with free use of SNOMED CT within the member’s territory. Members manage the release, distribution, and sub-licensing of SNOMED CT and other products of the association within their territory.

This post assumes that you have an OMOP CDM database set up in Amazon Redshift. See Create data science environments on AWS for health analysis using OHDSI to set up a sample OMOP CDM in your AWS account using CloudFormation templates.

Extract concepts with Amazon Comprehend Medical SNOMED CT

You can extract SNOMED CT codes using Amazon Comprehend Medical with two lines of code. Assume you have a document, paragraph, or sentence:

clinical_note = "Patient X was diagnosed with insomnia."

First, we instantiate the Amazon Comprehend Medical client in boto3. Then, we simply call Amazon Comprehend Medical’s SNOMED CT API:

import boto3
cm_client = boto3.client("comprehendmedical")
response = cm_client.infer-snomedct(Text=clinical_note)

Done! In our example, the response is as follows:

{'Characters': {'OriginalTextCharacters': 38},
 'Entities': [{'Attributes': [],
               'BeginOffset': 29,
               'Category': 'MEDICAL_CONDITION',
               'EndOffset': 37,
               'Id': 0,
               'SNOMEDCTConcepts': [{'Code': '193462001',
                                     'Description': 'Insomnia (disorder)',
                                     'Score': 0.7997841238975525},
                                    {'Code': '191997003',
                                     'Description': 'Persistent insomnia '
                                                    '(disorder)',
                                     'Score': 0.6464713215827942},
                                    {'Code': '762348004',
                                     'Description': 'Acute insomnia (disorder)',
                                     'Score': 0.6253700256347656},
                                    {'Code': '59050008',
                                     'Description': 'Initial insomnia '
                                                    '(disorder)',
                                     'Score': 0.6112624406814575},
                                    {'Code': '24121004',
                                     'Description': 'Insomnia disorder related '
                                                    'to another mental '
                                                    'disorder (disorder)',
                                     'Score': 0.6014388203620911}],
               'Score': 0.9989109039306641,
               'Text': 'insomnia',
               'Traits': [{'Name': 'DIAGNOSIS', 'Score': 0.7624053359031677}],
               'Type': 'DX_NAME'}],
 'ModelVersion': '0.0.1',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '873',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Mon, 20 Sep 2021 18:32:04 GMT',
                                      'x-amzn-requestid': 'e9188a79-3884-4d3e-b73e-4f63ed831b0b'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'e9188a79-3884-4d3e-b73e-4f63ed831b0b',
                      'RetryAttempts': 0},
 'SNOMEDCTDetails': {'Edition': 'US',
                     'Language': 'en',
                     'VersionDate': '20200901'}}

The response contains the following:

  • Characters – Total number of characters. In this case, we have 38 characters.
  • Entities – List of detected medical concepts, or entities, from Amazon Comprehend Medical. The main elements in each entity are:

    • Text – Original text from the input data.
    • BeginOffset and EndOffset –The beginning and ending location of the text in the input note, respectively.
    • Category – Category of the detected entity. For example, MEDICAL_CONDITION for medical condition.
    • SNOMEDCTConcepts – Top five predicted SNOMED CT concept codes with the model’s confidence scores (in descending order). Each linked concept code has the following:

      • Code – SNOMED CT concept code.
      • Description – SNOMED CT concept description.
      • Score – Confidence score of the linked SNOMED CT concept.
    • ModelVersion – Version of the model used for the inference.
    • ResponseMetadata – API call metadata.
    • SNOMEDCTDetails – Edition, language, and date of the SNOMED CT version used.

For more information, refer to the Amazon Comprehend Medical Developer Guide. By default, the API links detected entities to the SNOMED CT US edition. To request support for your edition, for example the UK edition, contact us via AWS Support or the Amazon Comprehend Medical forum.

In our example, Amazon Comprehend Medical identifies “insomnia” as a clinical term and provides five ordered SNOMED CT concepts and code that we might be referring to in the sentence. In this example, Amazon Comprehend Medical correctly identifies the clinical term as the most likely option. Therefore, the next step is to extract the response. See the following code:

#Get top predicted SNOMED CT Concept
pred_snomed = response['Entities'][0]['SNOMEDCTConcepts'][0]

The content of pred_snomed is as follows, with its predicted SNOMED concept code, concept description, and prediction score (probability):

{
 'Description': 'Insomnia (disorder)',
 'Code': '193462001',
 'Score': 0.803254246711731
}

We have identified clinical terms in our text and linked them to SNOMED CT concepts. We can now use SNOMED CT’s hierarchical structure and relations to other ontologies to accelerate clinical analytics and decision-making application development.

Before we access the database, let’s define some utility functions that are helpful in our operations. First, we must import the necessary Python packages:

import pandas
import psycopg2

The following code is a function to connect to the Amazon Redshift database:

def connect_to_db(redshift_parameters, user, password):
    """Connect to database and returns connection
    Args:
        redshift_parameters (dict): Redshift connection parameters.
        user (str): Redshift user required to connect. 
        password (str): Password associated to the user
    Returns:
        Connection: boto3 redshift connection 
    """

    try:
        conn = psycopg2.connect(
            host=redshift_parameters["url"],
            port=redshift_parameters["port"],
            user=user,
            password=password,
            database=redshift_parameters["database"],
        )

        return conn

    except psycopg2.Error:
        raise ValueError("Failed to open database connection.")

The following code is a function to run a given query on the Amazon Redshift database:

def execute_query(cursor, query, limit=None):
    """Execute query
    Args:
        cursor (boto3 cursor): boto3 object pointing and with established connection to Redshift.
        query (str): SQL query.
        limit (int): Limit of rows returned by the data frame. Default to 'None' for no limit
    Returns:
        pd.DataFrame: Data Frame with the query results.
    """
    try:
        cursor.execute(query)
    except:
        return None

    columns = [c.name for c in cursor.description]
    results = cursor.fetchall()
    if limit:
        results = results[:limit]

    out = pd.DataFrame(results, columns=columns)

    return out

In the next sections, we connect to the database and run our queries.

Connect to the OMOP CDM

EHRs are often stored in databases using a specific ontology. In our case, we use the OMOP CDM, which contains a large number of ontologies (SNOMED, ICD10, RxNorm, and more), but you can extend the solution to other data models by modifying the queries. The first step is to connect to Amazon Redshift where the EHR data is stored.

Let’s define the variables used to connect the database. You must substitute the placeholder values in the following code within with your actual values based on your Amazon Redshift database:

#Connect to Amazon Redshift Database
REDSHIFT_PARAMS = {
                    "url": "<database-url>", 
                    "port": "<database-port>",
                    "database": "<database-name>",
                  }
REDSHIFT_USER = "<user-name>"
REDSHIFT_PASSWORD = "<user-password>"

conn = connect_to_db(REDSHIFT_PARAMS, REDSHIFT_USER, REDSHIFT_PASSWORD)
cursor = conn.cursor()

Map the SNOMED CT code to OMOP CDM concept IDs

The OMOP CDM uses its own concept IDs as data model identifiers across ontologies. Those differ from specific ontology codes such as SNOMED CT’s codes, but you can retrieve them from SNOMED CT codes using pre-built OMOP CDM maps. To retrieve the concept_id of SNOMED CT code 193462001, we use the following query:

query1 = f"
SELECT DISTINCT concept_id 
FROM cmsdesynpuf23m.concept 
WHERE vocabulary_id='SNOMED' AND concept_code='{pred_snomed['Code']}';
"

out_df = execute_query(cursor, query1)
concept_id = out_df['concept_id'][0]
print(concept_id)

The output OMOP CDM concept_id is 436962. The concept ID uniquely identifies a given medical concept in the OMOP CDM database and is used as a primary key in the concept table. This enables linking of each code with patient information in other tables.

Use the structured information map from the SNOMED CT code to OMOP CDM concept ID

Now that we have OMOP’s concept_id, we can run many queries from the database. When we find the particular concept, we can use it for different use cases. For example, we can use it to query population statistics with a given condition, traverse ontologies to bridge operability gaps, and extract the unique hierarchical structure of concepts to achieve the right queries. In this section, we walk you through a few examples.

Retrieve the number of patients with a disease

The first example is retrieving the total number of patients with the insomnia condition that we linked to its appropriate ontology concept using Amazon Comprehend Medical. The following code formulates and runs the corresponding SQL query:

query2 = f"
SELECT COUNT(DISTINCT person_id) 
FROM cmsdesynpuf23m.condition_occurrence 
WHERE condition_concept_id='{concept_id}';
"
out_df = execute_query(cursor, query2)
print(out_df)

In our sample records described in the prerequisites section, the total number of patients in the database that have been diagnosed with insomnia are 26,528.

Traverse the ontology

One of the advantages of using SNOMED CT is that we can exploit its hierarchical taxonomy. Let’s illustrate how via some examples.

Ancestors: Going up the hierarchy

First, let’s find the immediate ancestors and descendants of the concept insomnia. We use concept_ancestor and concept tables to get the parent (ancestor) and children (descendants) of the given concept code. The following code is the SQL statement to output the parent information:

query3 = f"
SELECT DISTINCT concept_code, concept_name 
FROM cmsdesynpuf23m.concept 
WHERE concept_id IN (SELECT ancestor_concept_id 
FROM cmsdesynpuf23m.concept_ancestor 
WHERE descendant_concept_id='{concept_id}' AND max_levels_of_separation=1);
"
out_df = execute_query(cursor, query3)
print(out_df)

In the preceding example, we used max_levels_of_separation=1 to limit concept codes that are immediate ancestors. You can increase the number to get more in the hierarchy. The following table summarizes our results.

concept_code concept_name
44186003 Dyssomnia
194437008 Disorders of initiating and maintaining sleep

SNOMED CT offers a polyhierarchical classification, which means a concept can have more than one parent. This hierarchy is also called a directed acyclic graph (DAG).

Descendants: Going down the hierarchy

We can use a similar logic to retrieve the children of the code insomnia:

query4 = f"SELECT DISTINCT concept_code, concept_name 
FROM cmsdesynpuf23m.concept 
WHERE concept_id IN (SELECT descendant_concept_id 
FROM cmsdesynpuf23m.concept_ancestor 
WHERE ancestor_concept_id='{concept_id}' AND max_levels_of_separation=1);
"
out_df = execute_query(cursor, query4)
print(out_df)

As a result, we get 26 descendant codes; the following table shows the first 10 rows.

concept_code concept_name
24121004 Insomnia disorder related to another mental disorder
191997003 Persistent insomnia
198437004 Menopausal sleeplessness
88982005 Rebound insomnia
90361000119105 Behavioral insomnia of childhood
41975002 Insomnia with sleep apnea
268652009 Transient insomnia
81608000 Insomnia disorder related to known organic factor
162204000 Late insomnia
248256006 Not getting enough sleep

We can then use these codes to query a broader set of patients (parent concept) or a more specific one (child concept).

Finding the concept in the appropriate hierarchy level is important, because if not accounted for appropriately, you might get wrong statistical answers from your queries. For example, in the preceding use case, let’s say that you want to find the number of patients with insomnia that is only related with not getting enough sleep. Using the parent concept for the general insomnia gives you a different answer than when specifying the descendant concept code only related with not getting enough sleep.

Map to other ontologies

We can also map the SNOMED concept code to other ontologies such as ICD10CM for conditions and RxNorm for medications. Because insomnia is condition, let’s find the corresponding ICD10 concept codes for the given insomnia’s SNOMED concept code. The following code is the SQL statement and function to find the ICD10 concept codes:

query5 = f"
SELECT DISTINCT concept_code, concept_name, vocabulary_id 
FROM cmsdesynpuf23m.concept 
WHERE vocabulary_id='ICD10CM' AND 
concept_id IN (SELECT concept_id_2 
FROM cmsdesynpuf23m.concept_relationship 
WHERE concept_id_1='{concept_id}' AND relationship_id='Mapped from');
"
out_df = execute_query(cursor, query5)
print(out_df)

The following table lists the corresponding ICD10 concept codes with their descriptions.

concept_code concept_name vocabulary_id
G47.0 Insomnia ICD10CM
G47.00 Insomnia, unspecified ICD10CM
G47.09 Other insomnia ICD10CM

When we’re done running SQL queries, let’s close the connection to the database:

conn.close()

Conclusion

Now that you have reviewed this example, you’re ready to apply Amazon Comprehend Medical on your clinical text to extract and link SNOMED CT concepts. We also provided concrete examples of how to use this information with your medical records using an OMOP CDM database to run SQL queries and get patient information related with the medical concepts. Finally, we also showed how to extract the different hierarchies of medical concepts and convert SNOMED CT concepts to other standardized vocabularies such as ICD10CM.

The Amazon ML Solutions Lab pairs your team with ML experts to help you identify and implement your organization’s highest value ML opportunities. If you’d like help accelerating your use of ML in your products and processes, please contact the Amazon ML Solutions Lab.


About the Author

Tesfagabir Meharizghi is a Data Scientist at the Amazon ML Solutions Lab where he helps customers across different industries accelerate their use of machine learning and AWS Cloud services to solve their business challenges.

Miguel Romero Calvo is an Applied Scientist at the Amazon ML Solutions Lab where he partners with AWS internal teams and strategic customers to accelerate their business through ML and cloud adoption.

Lin Lee Cheong is a Senior Scientist and Manager with the Amazon ML Solutions Lab team at Amazon Web Services. She works with strategic AWS customers to explore and apply artificial intelligence and machine learning to discover new insights and solve complex problems.

Read More

Plan the locations of green car charging stations with an Amazon SageMaker built-in algorithm

While the fuel economy of new gasoline or diesel-powered vehicles improves every year, green vehicles are considered even more environmentally friendly because they’re powered by alternative fuel or electricity. Hybrid electric vehicles (HEVs), battery only electric vehicles (BEVs), fuel cell electric vehicles (FCEVs), hydrogen cars, and solar cars are all considered types of green vehicles.

Charging stations for green vehicles are similar to the gas pump in a gas station. They can be fixed on the ground or wall and installed in public buildings (shopping malls, public parking lots, and so on), residential district parking lots, or charging stations. They can be based on different voltage levels and charge various types of electric vehicles.

As a charging station vendor, you should consider many factors when building a charging station. The location of charging stations is a complicated problem. Customer convenience, urban setting, and other infrastructure needs are all important considerations.

In this post, we use machine learning (ML) with Amazon SageMaker and Amazon Location Service to provide guidance for charging station vendors looking to choose optimal charging station locations.

Solution overview

In this solution, we focus use SageMaker training jobs to train the cluster model and a SageMaker endpoint to deploy the model. We use an Amazon Location Service display map and cluster result.

We also use Amazon Simple Storage Service (Amazon S3) to store the training data and model artifacts.

The following figure illustrates the architecture of the solution.

Data preparation

GPS data is highly sensitive information because it can be used to track historical movement of an individual. In the following post, we use the tool trip-simulator to generate GPS data that simulates a taxi driver’s driving behavior.

We choose Nashville, Tennessee, as our location. The following script simulates 1,000 agents and generates 14 hours of driving data starting September 15, 2020, 8:00 AM:

trip-simulator 
  --config scooter 
  --pbf nash.osm.pbf 
  --graph nash.osrm 
  --agents 1000 
  --start 1600128000000 
  --seconds 50400 
  --traces ./traces.json 
  --probes ./probes.json 
  --changes ./changes.json 
  --trips ./trips.json

The preceding script generates three output files. We use changes.json. It includes car driving GPS data as well as pickup and drop off information. The file format looks like the following:

{
	"vehicle_id":"PLC-4375",
	"event_time":1600128001000,
	"event_type":"available",
	"event_type_reason":"service_start",
	"event_location":{
					"type":"Feature",
					"properties":{

								},
					"geometry":{
					"type":"Point",
					"coordinates":
								[
								-86.7967066040155,
								36.17115028383999
								]
								}
					}
}

The field event_reason has four main values:

  • service_start – The driver receives a ride request, and drives to the designated location
  • user_pick_up – The driver picks up a passenger
  • user_drop_off – The driver reaches the destination and drops off the passenger
  • maintenance – The driver is not in service mode and doesn’t receive the request

In this post, we only collect the location data with the status user_pick_up and user_drop_off as the algorithm’s input. In real-life situations, you should also consider features such as the passenger’s information and business district information.

Pandas is an extended library of the Python language for data analysis. The following script converts the data from JSON format to CSV format via Pandas:

df=pd.read_json('./data/changes.json', lines=True)
df_event=df.event_location.apply(pd.Series)
df_geo=df_event.geometry.apply(pd.Series)
df_coord=df_geo.coordinates.apply(pd.Series)
result = pd.concat([df, df_coord], axis=1)
result = result.drop("event_location",axis = 1)
result.columns=["vehicle_id","event_time","event_type","event_reason","longitude","latitude"]
result.to_csv('./data/result.csv',index=False,sep=',')

The following table shows our results.

There is noise data in the original GPS data. This includes some pickup and drop-off coordinate points being marked in the lake. The generated GPS data follows uniform distribution without considering business districts, no-stop areas, and depopulated zones. In practice, there is no standard process for data preprocessing. You can simplify the process of data preprocessing and feature engineering with Amazon SageMaker Data Wrangler.

Data exploration

To better to observe and analyze the simulated track data, we use Amazon Location for data visualization. Amazon Location provides frontend SDKs for Android, iOS, and the web. For more information about Amazon Location, see the Developer Guide.

We start by creating a map on the Amazon Location console.

We use the MapLibre GL JS SDK for our map display. The following script displays a map of Nashville, Tennessee, and renders a specific car’s driving route (or trace) line:

async function initializeMap() {
// load credentials and set them up to refresh
await credentials.getPromise();

// Initialize the map
map = new maplibregl.Map({
container: "map",
center:[-86.792845,36.16378],// initial map centerpoint
zoom: 10, // initial map zoom
style: mapName,
transformRequest,
});
});

map.addSource('route', {
'type': 'geojson',
'data': {
'type': 'Feature',
'properties': {},
'geometry': {
'type': 'LineString',
'coordinates': [
				[-86.85009051679292,36.144774042081494],
				[-86.85001827659116,36.14473133061205],
				[-86.85004741661184,36.1446756197635],
				[-86.85007975396945,36.14465452846737],
				[-86.85005249508677,36.14469518290888]
				......
				]
			}
		}
						}
			);

The following graph displays a taxi’s 14-hour driving route.

The following script displays the car’s route distribution:

map.addSource('car-location', {
'type': 'geojson',
'data': {
'type': 'FeatureCollection',
'features': [
{'type': 'Feature','geometry': {'type': 'Point','coordinates': [-86.79417828985571,36.1742558685242]}},
{'type': 'Feature','geometry': {'type': 'Point','coordinates': [-86.76932509874324,36.18006513143749]}},
......
{'type': 'Feature','geometry': {'type': 'Point','coordinates': [-86.84082991448976,36.14558741886923]}}

]
}
});

The following map visualization shows our results.

Algorithm selection

K-means is an unsupervised learning algorithm. It attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.

SageMaker uses a modified version of the web-scale k-means clustering algorithm. Compared to the original version of the algorithm, the version SageMaker uses is more accurate. Like the original algorithm, it scales to massive datasets and delivers improvements in training time. To do this, it streams mini-batches (small, random subsets) of the training data.

The k-means algorithm expects tabular data. In this solution, the GPS coordinate data (longitude, latitude) is the input training data. See the following code:

df = pd.read_csv('./data/result.csv', sep=',',header=0,usecols=['longitude','latitude'])

#routine that converts the training data into protobuf format required for Sagemaker K-means.
def write_to_s3(bucket, prefix, channel, file_prefix, X):
buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, X.astype('float32'))
buf.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, channel, file_prefix + '.data')).upload_fileobj(buf)

#prepare training training and save to S3.
def prepare_train_data(bucket, prefix, file_prefix, save_to_s3=True):
train_data = df.as_matrix()
if save_to_s3:
write_to_s3(bucket, prefix, 'train', file_prefix, train_data)
return train_data

# using the dataset
train_data = prepare_train_data(bucket, prefix, 'train', save_to_s3=True)

# SageMaker k-means ECR images ARNs
images = {'us-west-2': '174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:latest',
'us-east-1': '382416733822.dkr.ecr.us-east-1.amazonaws.com/kmeans:latest',
'us-east-2': '404615174143.dkr.ecr.us-east-2.amazonaws.com/kmeans:latest',
'eu-west-1': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/kmeans:latest'}

image = images[boto3.Session().region_name]

Train the model

Before you train your model, consider the following:

  • Data format – Both protobuf recordIO and CSV formats are supported for training. In this solution, we use protobuf format and File mode as the training data input.
  • EC2 instance selection – AWS suggests using an Amazon Elastic Compute Cloud (Amazon EC2) CPU instance when selecting the k-means algorithm. We use two ml.c5.2xlarge instances for training.
  • Hyperparameters – Hyperparameters are closely related to the dataset; you can adjust them according to the actual situation to get the best results:

    • k – The number of required clusters (k). Because we don’t know the number of clusters in advance, we train many models with different values (k).
    • init_method – The method by which the algorithm chooses the initial cluster centers. A valid value is random or kmeans++.
    • epochs – The number of passes done over the training data. We set this to 10.
    • mini_batch_size – The number of observations per mini-batch for the data iterator. We tried 50, 100, 200, 500, 800, and 1,000 in our dataset.

We train our model with the following code. To get results faster, we start up SageMaker training job concurrently, each training jobs includes two instances. The range of k is between 3 and 16, and each training job will generate a model, the model artifacts are saved in S3 bucket.

K = range(3,16,1) #Select different k, k increased by 1 until 15
INSTANCE_COUNT = 2 #use two CPU instances
run_parallel_jobs = True #make this false to run jobs one at a time, especially if you do not want 
#create too many EC2 instances at once to avoid hitting into limits.
job_names = []

# launching jobs for all k
for k in K:
    print('starting train job:' + str(k))
    output_location = 's3://{}/kmeans_example/output/'.format(bucket) + output_folder
    print('training artifacts will be uploaded to: {}'.format(output_location))
    job_name = output_folder + str(k)

    create_training_params = 
    {
        "AlgorithmSpecification": {
            "TrainingImage": image,
            "TrainingInputMode": "File"
        },
        "RoleArn": role,
        "OutputDataConfig": {
            "S3OutputPath": output_location
        },
        "ResourceConfig": {
            "InstanceCount": INSTANCE_COUNT,
            "InstanceType": "ml.c4.xlarge",
            "VolumeSizeInGB": 20
        },
        "TrainingJobName": job_name,
        "HyperParameters": {
            "k": str(k),
            "feature_dim": "2",
          	"epochs": "100",
            "init_method": "kmeans++",
            "mini_batch_size": "800"
        },
        "StoppingCondition": {
            "MaxRuntimeInSeconds": 60 * 60
        },
            "InputDataConfig": [
            {
                "ChannelName": "train",
                "DataSource": {
                    "S3DataSource": {
                        "S3DataType": "S3Prefix",
                        "S3Uri": "s3://{}/{}/train/".format(bucket, prefix),
                        "S3DataDistributionType": "FullyReplicated"
                    }
                },

                "CompressionType": "None",
                "RecordWrapperType": "None"
            }
        ]
    }

    sagemaker = boto3.client('sagemaker')

    sagemaker.create_training_job(**create_training_params)

Evaluate the model

The number of clusters (k) is the most important hyperparameter in k-means clustering. Because we don’t know the value of k, we can use various methods to find the optimal value of k. In this section, we discuss two methods.

Elbow method

The elbow method is an empirical method to find the optimal number of clusters for a dataset. In this method, we select a range of candidate values of k, then apply k-means clustering using each of the values of k. We find the average distance of each point in a cluster to its centroid, and represent it in a plot. We select the value of k where the average distance falls suddenly. See the following code:

plt.plot()
models = {}
distortions = []
for k in K:
s3_client = boto3.client('s3')
key = 'kmeans_example/output/' + output_folder +'/' + output_folder + str(k) + '/output/model.tar.gz'
s3_client.download_file(bucket, key, 'model.tar.gz')
print("Model for k={} ({})".format(k, key))
!tar -xvf model.tar.gz
kmeans_model=mx.ndarray.load('model_algo-1')
kmeans_numpy = kmeans_model[0].asnumpy()
print(kmeans_numpy)
distortions.append(sum(np.min(cdist(train_data, kmeans_numpy, 'euclidean'), axis=1)) / train_data.shape[0])
models[k] = kmeans_numpy

# Plot the elbow
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('distortion')
plt.title('Elbow graph')
plt.show()

We select a k range from 3–15 and train the model with a built-in k-means clustering algorithm. When the model is fit with 10 clusters, we can see an elbow shape in the graph. This is an optimal cluster number.

Silhouette method

The silhouette method is another method to find the optimal number of clusters and interpretation and validation of consistency within clusters of data. The silhouette method computes silhouette coefficients of each point that measure how much a point is similar to its own cluster compared to other clusters by providing a succinct graphical representation of how well each object has been classified.

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The value of the silhouette ranges between [1, -1], where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters.

First, we must deploy the model and predict the y value as silhouette input:

import json
runtime = boto3.Session().client('runtime.sagemaker')
endpointName="kmeans-30-2021-08-06-00-48-38-963"
response = runtime.invoke_endpoint(EndpointName=endpointName,
ContentType='text/csv',
Body=b"-86.77971153,36.16336978n-86.77971153,36.16336978")
r=response['Body'].read()
response_json = json.loads(r)
y_km=[]
for item in response_json['predictions']:
y_km.append(int(item['closest_cluster']))

Next, we call the silhouette:

import numpy as np
from matplotlib import cm
import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score,silhouette_samples

cluster_labels=np.unique(y_km)
print(cluster_labels)
n_clusters=cluster_labels.shape[0]
silhouette_score_cluster_10=silhouette_score(X, y_km)
print("Silhouette Score When Cluster Number Set to 10: %.3f" % silhouette_score_cluster_10)
silhouette_vals=silhouette_samples(X,y_km,metric='euclidean')
y_ax_lower,y_ax_upper=0,0
yticks=[]
for i,c in enumerate(cluster_labels):
c_silhouette_vals=silhouette_vals[y_km==c]
c_silhouette_vals.sort()
y_ax_upper+=len(c_silhouette_vals)
color=cm.jet(float(i)/n_clusters)
plt.barh(range(y_ax_lower,y_ax_upper),
c_silhouette_vals,
height=1.0,
edgecolor='none',
color=color)
yticks.append((y_ax_lower+y_ax_upper)/2.0)
y_ax_lower+=len(c_silhouette_vals)

silhouette_avg=np.mean(silhouette_vals)
plt.axvline(silhouette_avg,
color='red',
linestyle='--')
plt.yticks(yticks,cluster_labels+1)
plt.ylabel("Cluster")
plt.xlabel("Silhouette Coefficients k=10,Score=%.3f" % silhouette_score_cluster_10)
plt.savefig('./figure.png')
plt.show()

When the silhouette score is closer to 1, it means clusters are well apart from each other. In the following experiment result, when k is set to 8, each cluster is well apart from each other.

We can use different model evaluation methods to get different values for the best k. In our experiment, we choose k=10 as optimal clusters.

Now we can display the k-means clustering result via Amazon Location. The following code marks selected locations on the map:

new maplibregl.Marker().setLngLat([-86.755974, 36.19235]).addTo(map);
new maplibregl.Marker().setLngLat([-86.710972, 36.203389]).addTo(map);
new maplibregl.Marker().setLngLat([-86.733895, 36.150209]).addTo(map);
new maplibregl.Marker().setLngLat([-86.795974, 36.165639]).addTo(map);
new maplibregl.Marker().setLngLat([-86.786743, 36.222799]).addTo(map);
new maplibregl.Marker().setLngLat([-86.701209, 36.267679]).addTo(map);
new maplibregl.Marker().setLngLat([-86.820134, 36.209863]).addTo(map);
new maplibregl.Marker().setLngLat([-86.769743, 36.131246]).addTo(map);
new maplibregl.Marker().setLngLat([-86.803346, 36.142358]).addTo(map);
new maplibregl.Marker().setLngLat([-86.833890, 36.113466]).addTo(map);

The following map visualization shows our results, with 10 clusters.

We also need to consider the scale of the charging station. Here, we divide the number of points around the center of each cluster by a coefficient (for example, the coefficient value is 100, which means every 100 cars share a charger pile). The following visualization includes charging station scale.

Conclusion

In this post, we explained an end-to-end scenario for creating a clustering model in SageMaker based on simulated driving data. The solution includes training an MXNet model and creating an endpoint for real-time model hosting. We also explained how you can display the clustering results via the Amazon Location SDK.

You should also consider charging type and quantity. Plug-in charging is categorized by voltage and power levels, leading to different charging times. Slow charging usually takes several hours to charge, whereas fast charging can achieve a 50% charge in 10–15 minutes. We cover these factors in a later post.

Many other industries are also affected by location planning problems, including retail stores and warehouses. If you have feedback about this post, submit comments in the Comments section below.


About the Author

Zhang Zheng is a Sr. Partner Solutions Architect with AWS, helping industry partners on their journey to well-architected machine learning solutions at scale.

Read More

General and Scalable Parallelization for Neural Networks

Scaling neural networks, whether it be the amount of training data used, the model size or the computation being utilized, has been critical for improving model quality in many real-world machine learning applications, such as computer vision, language understanding and neural machine translation. This, in turn, has motivated recent studies to scrutinize the factors that play a critical role in the success of scaling a neural model. Although increasing model capacity can be a sound approach to improve model quality, doing so presents a number of systems and software engineering challenges that must be overcome. For instance, in order to train large models that exceed the memory capacity of an accelerator, it becomes necessary to partition the weights and the computation of the model across multiple accelerators. This process of parallelization increases the network communication overhead and can result in device under-utilization. Moreover, a given algorithm for parallelization, which typically requires a significant amount of engineering effort, may not work with different model architectures.

To address these scaling challenges, we present “GSPMD: General and Scalable Parallelization for ML Computation Graphs”, in which we describe an open-source automatic parallelization system based on the XLA compiler. GSPMD is capable of scaling most deep learning network architectures and has already been applied to many deep learning models, such as GShard-M4, LaMDA, BigSSL, ViT, and MetNet-2, leading to state-of-the-art-results across several domains. GSPMD has also been integrated into multiple ML frameworks, including TensorFlow and JAX, which use XLA as a shared compiler.

Overview
GSPMD separates the task of programming an ML model from the challenge of parallelization. It allows model developers to write programs as if they were run on a single device with very high memory and computation capacity — the user simply needs to add a few lines of annotation code to a subset of critical tensors in the model code to indicate how to partition the tensors. For example, to train a large model-parallel Transformer, one may only need to annotate fewer than 10 tensors (less than 1% of all tensors in the entire computation graph), one line of additional code per tensor. Then GSPMD runs a compiler pass that determines the entire graph’s parallelization plan, and transforms it into a mathematically equivalent, parallelized computation that can be executed on each device. This allows users to focus on model building instead of parallelization implementation, and enables easy porting of existing single-device programs to run at a much larger scale.

The separation of model programming and parallelism also allows developers to minimize code duplication. With GSPMD, developers may employ different parallelism algorithms for different use cases without the need to reimplement the model. For example, the model code that powered the GShard-M4 and LaMDA models can apply a variety of parallelization strategies appropriate for different models and cluster sizes with the same model implementation. Similarly, by applying GSPMD, the BigSSL large speech models can share the same implementation with previous smaller models.

Generality and Flexibility
Because different model architectures may be better suited to different parallelization strategies, GSPMD is designed to support a large variety of parallelism algorithms appropriate for different use cases. For example, with smaller models that fit within the memory of a single accelerator, data parallelism is preferred, in which devices train the same model using different input data. In contrast, models that are larger than a single accelerator’s memory capacity are better suited for a pipelining algorithm (like that employed by GPipe) that partitions the model into multiple, sequential stages, or operator-level parallelism (e.g., Mesh-TensorFlow), in which individual computation operators in the model are split into smaller, parallel operators.

GSPMD supports all the above parallelization algorithms with a uniform abstraction and implementation. Moreover, GSPMD supports nested patterns of parallelism. For example, it can be used to partition models into individual pipeline stages, each of which can be further partitioned using operator-level parallelism.

GSPMD also facilitates innovation on parallelism algorithms by allowing performance experts to focus on algorithms that best utilize the hardware, instead of the implementation that involves lots of cross-device communications. For example, for large Transformer models, we found a novel operator-level parallelism algorithm that partitions multiple dimensions of tensors on a 2D mesh of devices. It reduces peak accelerator memory usage linearly with the number of training devices, while maintaining a high utilization of accelerator compute due to its balanced data distribution over multiple dimensions.

To illustrate this, consider a simplified feedforward layer in a Transformer model that has been annotated in the above way. To execute the first matrix multiply on fully partitioned input data, GSPMD applies an MPI-style AllGather communication operator to partially merge with partitioned data from another device. It then executes the matrix multiply locally and produces a partitioned result. Before the second matrix multiply, GSPMD adds another AllGather on the right-hand side input, and executes the matrix multiply locally, yielding intermediate results that will then need to be combined and partitioned. For this, GSPMD adds an MPI-style ReduceScatter communication operator that accumulates and partitions these intermediate results. While the tensors generated with the AllGather operator at each stage are larger than the original partition size, they are short-lived and the corresponding memory buffers will be freed after use, which does not affect peak memory usage in training.

Left: A simplified feedforward layer of a Transformer model. Blue rectangles represent tensors with dashed red & blue lines overlaid representing the desired partitioning across a 2×2 mesh of devices. Right: A single partition, after GSPMD has been applied.

A Transformer Example with Nested Parallelism
As a shared, robust mechanism for different parallelism modes, GSPMD allows users to conveniently switch between modes in different parts of a model. This is particularly valuable for models that may have different components with distinct performance characteristics, for example, multimodal models that handle both images and audio. Consider a model with the Transformer encoder-decoder architecture, which has an embedding layer, an encoder stack with Mixture-of-Expert layers, a decoder stack with dense feedforward layers, and a final softmax layer. In GSPMD, a complex combination of several parallelism modes that treats each layer separately can be achieved with simple configurations.

In the figure below, we show a partitioning strategy over 16 devices organized as a logical 4×4 mesh. Blue represents partitioning along the first mesh dimension X, and yellow represents partitioning along the second mesh dimension Y. X and Y are repurposed for different model components to achieve different parallelism modes. For example, the X dimension is used for data parallelism in the embedding and softmax layers, but used for pipeline parallelism in the encoder and decoder. The Y dimension is also used in different ways to partition the vocabulary, batch or model expert dimensions.

Computation Efficiency
GSPMD provides industry-leading performance in large model training. Parallel models require extra communication to coordinate multiple devices to do the computation. So parallel model efficiency can be estimated by examining the fraction of time spent on communication overhead — the higher percentage utilization and the less time spent on communication, the better. In the recent MLPerf set of performance benchmarks, a BERT-like encoder-only model with ~500 billion parameters to which we applied GSPMD for parallelization over 2048 TPU-V4 chips yielded highly competitive results (see table below), utilizing up to 63% of the peak FLOPS that the TPU-V4s offer. We also provide efficiency benchmarks for some representative large models in the table below. These example model configs are open sourced in the Lingvo framework along with instructions to run them on Google Cloud. More benchmark results can be found in the experiment section of our paper.

Model Family Parameter Count % of model activated* No. of Experts** No. of Layers No. of TPU FLOPS utilization
Dense Decoder (LaMDA) 137B 100% 1 64 1024 TPUv3 56.5%
Dense Encoder (MLPerf-Bert) 480B 100% 1 64 2048 TPUv4 63%
Sparsely Activated Encoder-Decoder (GShard-M4) 577B 0.25% 2048 32 1024 TPUv3 46.8%
Sparsely Activated Decoder 1.2T 8% 64 64 1024 TPUv3 53.8%
*The fraction of the model activated during inference, which is a measure of model sparsity.
**Number of experts included in the Mixture of Experts layer. A value of 1 corresponds to a standard Transformer, without a Mixture of Experts layer.

Conclusion
The ongoing development and success of many useful machine learning applications, such as NLP, speech recognition, machine translation, and autonomous driving, depend on achieving the highest accuracy possible. As this often requires building larger and even more complex models, we are pleased to share the GSPMD paper and the corresponding open-source library to the broader research community, and we hope it is useful for efficient training of large-scale deep neural networks.

Acknowledgements
We wish to thank Claire Cui, Zhifeng Chen, Yonghui Wu, Naveen Kumar, Macduff Hughes, Zoubin Ghahramani and Jeff Dean for their support and invaluable input. Special thanks to our collaborators Dmitry Lepikhin, HyoukJoong Lee, Dehao Chen, Orhan Firat, Maxim Krikun, Blake Hechtman, Rahul Joshi, Andy Li, Tao Wang, Marcello Maggioni, David Majnemer, Noam Shazeer, Ankur Bapna, Sneha Kudugunta, Quoc Le, Mia Chen, Shibo Wang, Jinliang Wei, Ruoming Pang, Zongwei Zhou, David So, Yanqi Zhou, Ben Lee, Jonathan Shen, James Qin, Yu Zhang, Wei Han, Anmol Gulati, Laurent El Shafey, Andrew Dai, Kun Zhang, Nan Du, James Bradbury, Matthew Johnson, Anselm Levskaya, Skye Wanderman-Milne‎, and Qiao Zhang for helpful discussions and inspirations.

Read More

Accelerating Financial Services With AI

AI is enabling brighter financial futures for consumers and businesses. From traditional banks to new fintechs, the financial services industry is powering use cases with AI such as preventing payments fraud, automating insurance claims, and accelerating trading strategies.

The latest episode in the I AM AI video series brings these technology stories to life by featuring global financial enterprises and startups transforming banking, insurance and payments.

Automating Insurance Claims and Document Processing

Ping An, China’s largest property and casualty insurer, uses NVIDIA GPU-powered image analysis and AI to rapidly calculate damages caused by vehicle collisions, automate claims handling for simple and clean cases, estimate costs and identify fraudulent claims. This automated experience leads to better customer service, fewer cases of insurance fraud and more efficient delivery of services.

CAPE Analytics, a computer vision startup, is transforming the property insurance industry by analyzing geospatial data to inform more accurate underwriting decisions and mitigate wildfire disasters. The NVIDIA Inception member uses AI to produce detailed data on the vegetation density, roof material and proximity to surrounding structures — more accurately calculating risk and helping homeowners take actions to reduce potential property damage.

Applica, a fintech, deploys progressive AI to streamline text-based workflows that deliver better-than-human performance. Its robotic text automation platform uses NVIDIA GPUs for training machine learning models and inference in production. This eliminates up to 90 percent of manual errors, boosts document turnover rate to less than one second, and reduces physical workforce effort by up to 75 percent.

Banks Adopt AI to Accelerate Model Training and Cut Costs

Bank of Montreal runs complex derivative models to find fair prices for financial contracts used in valuation and risk management. These AI-informed models — trained by Riskfuel, a Toronto-based startup and member of NVIDIA Inception, on 650 million data points and deployed for inference on NVIDIA A100 or T4 Tensor Core GPUs — can drive higher trade flows, generate new risk insights and lead to better product design and selection for Riskfuel’s clients.

Capital One uses Dask and RAPIDS, a suite of GPU-optimized libraries for accelerating data science and analytics pipelines, to achieve 100x improvement in model training times and reduce costs by nearly 98 percent. Its team of data scientists and machine learning engineers use accelerated and distributed data processing for financial and credit analysis.

AI Virtual Assistants Improve the Customer Experience

Square, a global leader in payments, powers its virtual assistant, Square Assistant, using conversational AI to schedule appointments with new and returning customers. These AI models are trained using large hyperparameter jobs running on NVIDIA GPUs in AWS. Once they’re trained and ready for deployment, Square found that inference jobs on large models such as RoBERTa run 10x faster on the AWS GPU service than on CPUs.

Intuit uses conversational AI and intelligent AI assistants to empower financial futures for individuals, self-employed workers and small business owners. The company uses AI technologies, such as knowledge engineering, machine learning and natural language processing and understanding, to provide targeted and personalized assistance with virtual experts, automate financial documents processing, and even forecast cash flow for small businesses.

Funding the Future of Financial Services with AI

NVIDIA’s full-stack accelerated computing platform enables banks, traders, payments providers, insurers and fintechs to deliver enhanced offerings that boost lifetime value for customers and reduce operational costs across their and their customers’ businesses.

Explore NVIDIA solutions for financial services and learn from more industry leaders, such as American Express and PayPal.

The post Accelerating Financial Services With AI appeared first on The Official NVIDIA Blog.

Read More

Artisan Baking: How Creators Worldwide Cooked Up GTC Keynote’s Virtual Kitchen

With their marbled counters, neoclassical oven alcove and iconic bouquets of spatulas, the “kitchen keynotes” delivered by NVIDIA founder and CEO Jensen Huang during pandemic-era GTCs have been a memorable setting for the highly anticipated events.

The keynotes were initially delivered from his real kitchen, in response to workplace closures. But last spring, the kitchen faded away to reveal a realistic digital replica — one that not only surprised viewers, but also showcased the powerful capabilities of the NVIDIA Omniverse virtual world simulation and collaboration platform.

Now, audiences can get a closer look at all the scenes from the latest virtual kitchen in the Virtual Kitchen Tone Poem.

The project, which launched at GTC in November, is a cinematic homage to the elaborate, painstaking work that went into recreating every detail of the kitchen, from its glistening chrome water fixtures to its earthenware salt cellar.

To accomplish the feat, a team of highly skilled artists collaborated across multiple continents and time zones using Omniverse.

It’s All in the Details

The virtual kitchen got its start during the video shoot for GTC in the fall of 2020, when an onsite crew captured high-resolution images of Huang’s kitchen. The lead environment artist used this footage as the main reference to build a virtual set.

The creative team’s project lead researched detailed references of everything in the kitchen, including the appliance models, oil tins, salt box brands, and even the screws within the cabinets.

A team of eight NVIDIA artists and 10 freelance creators built the cinematic with an Omniverse workflow. In Omniverse, each artist worked within their preferred software, then used Omniverse Connectors to bring all the models and data together, leading to a much smoother animation pipeline and publishing workflow.

The 3D modeling of 57 unique assets and 6,240 total scene objects was done in Autodesk 3ds Max, Autodesk Maya and Pixologic Zbrush. The artists used Adobe Substance Painter and Photoshop for texturing, and the rigging and animation was done in Maya. The team used Nuke for scene composition, while the editing was done with DaVinci Resolve.

Omniverse was where everything converged for lighting and rendering. Omniverse Nucleus acted as the universal exchange and collaboration hub for all the USD-based assets, which helped it all come together. Nucleus facilitated remote access, smart local caching and built-in versioning.

Producing the Virtual Kitchen Tone Poem was also an opportunity to further develop Omniverse Farm — a newly released systems layer that connects multiple computer systems to jointly execute batch operations — and Shot Manager extensions across multiple teams.

With Omniverse Farm, a team of artists can iterate on rendering in an organized, repeatable fashion, bringing flexibility and structure to the rendering process — similar to what animation and visual effects studios would expect.

Omniverse Farm enabled the team to easily batch render 40,000 frames for GTC totaling four terabytes of content, rendered across disparate workstations, on-premises data centers, and cloud servers with a peak of 1,200 GPUs running simultaneously. With the ability to easily contribute a workstation or new server to Farm, the teams could scale to meet their needs.

Visualizing the Future of VFX and Animation

The Virtual Kitchen Tone Poem showcases how it’s possible to have a workflow that’s iterative, scalable and streamlined under short deadlines. These are some of the biggest requirements for artists working in animation and VFX production studios.

Omniverse provided all the tools that enabled the creative team to efficiently render high-quality content for the latest GTC keynote, which shifted across virtual environments, including Huang’s kitchen, a data center, and NVIDIA’s campus in Silicon Valley. NVIDIA technology provided a new level of collaboration for people across the globe that wasn’t available before.

NVIDIA technology also allowed for a non-destructive workflow, which was crucial to a project of this nature and scale, as it helped the team streamline remote and cross-platform collaboration.

The Tone Poem showcases the potential of Omniverse, and how animation and VFX studios can use the platform to enhance workflows, including for production-style projects.

Learn more about NVIDIA Omniverse for professional media & entertainment teams and individual creators.

The post Artisan Baking: How Creators Worldwide Cooked Up GTC Keynote’s Virtual Kitchen appeared first on The Official NVIDIA Blog.

Read More

Unlocking human rights information with machine learning

Human rights defenders need information from many sources to do their work effectively. But as issues evolve and new precedents are set, finding the right information to defend a particular case can be like looking for a needle in a haystack.

For example, a human rights advocate campaigning for LGBTQ rights may want to know which countries have made the most progress and what resolutions they’ve passed. To do so, they have to manually sift through thousands of pages of dense documentation covering global laws and victims’ testimonies to find what they’re looking for.

The curation and cataloging of documents makes this process much easier, but still relies on the manual work of skilled experts. To help, the non-profit organization HURIDOCS looked to machine learning. With support from Google.org Fellows and grant funding, they’ve built new tools that can automatically tag human rights documents so they are searchable — making the curation process 13 times faster.

How machine learning can make information more accessible

Typically, non-governmental organizations collect and curate large bodies of human rights information, with the goal of making these collections useful for advocates. Manually processing these documents can take several days, particularly when they’re published in unfamiliar languages or in PDF format which is difficult to search through. As a result, many NGOs face a large backlog of documents that remain to be processed, and by the time they’re added to collections new documentation often supersedes them.

Based in Geneva, HURIDOCS has been developing tools to manage and analyze collections of human rights evidence, law and research for nearly four decades. In 2016, they had an idea: What if machine learning could skim through documents, make terms extractable, and classify the content to catalog documents more quickly?

HURIDOCS took their idea to the Google AI Impact Challenge and was selected for a $1 million grant from Google.org and six months of technical support from a team of seven full-time pro bono Google.org Fellows. As one of the Fellows, I helped train AI models and make sure that the tool was useful to human rights experts, not just machine learning experts.

The curation process of human rights documents gets a boost

Since then, HURIDOCS has launched ML-powered features to improve platforms they’ve built with other NGO partners, and, earlier this year, they began integrating the technology into more of its tools, including their flagship application Uwazi. As a result, updating documents now takes one week instead of two to three months, and curators have been able to catch up on multi-year document backlogs.

In June, HURIDOCS won a CogX Award for its machine learning work, and now the organization is continuing to explore what else its machine learning models can do — from creating automatic tables of contents for documents to identifying references within text. With the power of artificial intelligence, HURIDOCS hopes to solve the trickiest challenges facing human rights defenders.

Read More