Dive Into AI, Avatars and the Metaverse With NVIDIA at SIGGRAPH

Innovative technologies in AI, virtual worlds and digital humans are shaping the future of design and content creation across every industry. Experience the latest advances from NVIDIA in all these areas at SIGGRAPH, the world’s largest gathering of computer graphics experts, running Aug. 8-11.

At the conference, creators, developers, engineers, researchers and students will see all the new tech and research that enables them to elevate immersive storytelling, build realistic avatars and create stunning 3D virtual worlds.

NVIDIA’s special address on Tuesday, Aug. 9, at 9 a.m. PT will feature founder and CEO Jensen Huang, along with other senior leaders. Join to get an exclusive look at some of our most exciting work, from award-winning research to new AI-powered tools and solutions.

Discover the emergence of the metaverse, and see how users can build 3D content and connect photorealistic virtual worlds with NVIDIA Omniverse, a computing platform for 3D design collaboration and true-to-reality world simulation. See the advanced solutions that are powering these 3D worlds, and how they expand the realm of artistic expression and creativity.

NVIDIA is also presenting over 20 in-person sessions at SIGGRAPH, including hands-on labs and research presentations. Explore the session topics below to build your calendar for the event:

Building 3D Virtual Worlds

See how users can create assets and build virtual worlds for the metaverse using the power and versatility of Universal Scene Description (USD) with this presentation:

Powering the Metaverse

Find out how to accelerate complex 3D workflows and content creation for the metaverse. Discover groundbreaking ways to visualize, simulate and code with advanced solutions like NVIDIA Omniverse in sessions including:

  • Real-Time Collaboration in Ray-Traced VR. Discover the recent leaps in hardware architecture and graphics software that have made ray tracing at virtual-reality frame rates possible at this session on Monday, Aug. 8, at 5 p.m. PT.
  • Material Workflows in Omniverse. Learn how to improve graphics workflows with arbitrary material shading systems supported in Omniverse at this talk on Thursday, Aug. 11, at 9 a.m. PT.

Exploring Neural Graphics Research

Learn more about neural graphics — the unification of AI and graphics — which will make metaverse content creation available to everyone. From 3D assets to animation, see how AI integration can enhance results, automate design choices and unlock new opportunities for creativity in the metaverse. Check out the session below:

Accelerating Workflows Across Industries

Get insights on the latest technologies transforming industries, from cloud production to extended reality. Discover how leading film studios, cutting-edge startups and other graphics companies are building and supporting their technologies with NVIDIA solutions. Some must-see sessions include:

SIGGRAPH registration is required to attend the in-person events. Sessions will also be available the following day to watch on demand from our site.

Many NVIDIA partners will attend SIGGRAPH, showcasing demos and presenting on topics such as AI and virtual worlds. Download this event map to learn more.

And tune into the global premiere of The Art of Collaboration: NVIDIA, Omniverse and GTC on Wednesday, Aug. 10, at 10 a.m. PT. The documentary shares the story of the engineers, artists and researchers who pushed the limits of NVIDIA GPUs, AI and Omniverse to deliver the stunning GTC keynote last spring.

Join NVIDIA at SIGGRAPH to learn more, and watch NVIDIA’s special address to hear the latest on graphics, AI and virtual worlds.

The post Dive Into AI, Avatars and the Metaverse With NVIDIA at SIGGRAPH appeared first on NVIDIA Blog.

Read More

Introducing the Google Universal Image Embedding Challenge

Computer vision models see daily application for a wide variety of tasks, ranging from object recognition to image-based 3D object reconstruction. One challenging type of computer vision problem is instance-level recognition (ILR) — given an image of an object, the task is to not only determine the generic category of an object (e.g., an arch), but also the specific instance of the object (”Arc de Triomphe de l’Étoile, Paris, France”).

Previously, ILR was tackled using deep learning approaches. First, a large set of images was collected. Then a deep model was trained to embed each image into a high-dimensional space where similar images have similar representations. Finally, the representation was used to solve the ILR tasks related to classification (e.g., with a shallow classifier trained on top of the embedding) or retrieval (e.g., with a nearest neighbor search in the embedding space).

Since there are many different object domains in the world, e.g., landmarks, products, or artworks, capturing all of them in a single dataset and training a model that can distinguish between them is quite a challenging task. To decrease the complexity of the problem to a manageable level, the focus of research so far has been to solve ILR for a single domain at a time. To advance the research in this area, we hosted multiple Kaggle competitions focused on the recognition and retrieval of landmark images. In 2020, Amazon joined the effort and we moved beyond the landmark domain and expanded to the domains of artwork and product instance recognition. The next step is to generalize the ILR task to multiple domains.

To this end, we’re excited to announce the Google Universal Image Embedding Challenge, hosted by Kaggle in collaboration with Google Research and Google Lens. In this challenge, we ask participants to build a single universal image embedding model capable of representing objects from multiple domains at the instance level. We believe that this is the key for real-world visual search applications, such as augmenting cultural exhibits in a museum, organizing photo collections, visual commerce and more.

Images1 of object instances from some domains represented in the dataset: apparel and accessories, furniture and home goods, toys, cars, landmarks, dishes, artwork and illustrations.

Degrees of Variation in Different Domains
To represent objects from a large number of domains, we require one model to learn many domain-specific subtasks (e.g., filtering different kinds of noise or focusing on a specific detail), which can only be learned from a semantically and visually diverse collection of images. Addressing each degree of variation proposes a new challenge for both image collection and model training.

The first sort of variation comes from the fact that while some domains contain unique objects in the world (landmarks, artwork, etc.), others contain objects that may have many copies (clothing, furniture, packaged goods, food, etc.). Because a landmark is always placed at the same location, the surrounding context may be useful for recognition. In contrast, a product, say a phone, even of a specific model and color, may have millions of physical instances and thus appear in many surrounding contexts.

Another challenge comes from the fact that a single object may appear different depending on the point of view, lighting conditions, occlusion or deformations (e.g., a dress worn on a person may look very different than on a hanger). In order for a model to learn invariance to all of these visual modes, all of them should be captured by the training data.

Additionally, similarities between objects differ across domains. For example, in order for a representation to be useful in the product domain, it must be able to distinguish very fine-grained details between similarly looking products belonging to two different brands. In the domain of food, however, the same dish (e.g., spaghetti bolognese) cooked by two chefs may look quite different, but the ability of the model to distinguish spaghetti bolognese from other dishes may be sufficient for the model to be useful. Additionally, a vision model of high quality should assign similar representations to more visually similar renditions of a dish.

<!–

–><!–

–>

Domain    Landmark    Apparel
Image      
Instance Name    Empire State Building2    Cycling jerseys with Android logo3
Which physical objects belong to the instance class?    Single instance in the world    Many physical instances; may differ in size or pattern (e.g., a patterned cloth cut differently)
What are the possible views of the object?    Appearance variation only based on capture conditions (e.g., illumination or viewpoint); limited number of common external views; possibility of many internal views    Deformable appearance (e.g., worn or not); limited number of common views: front, back, side
What are the surroundings and are they useful for recognition?    Surrounding context does not vary much other than daily and yearly cycles; may be useful for verifying the object of interest    Surrounding context can change dramatically due to difference in environment, additional pieces of clothing, or accessories partially occluding clothing of interest (e.g., a jacket or a scarf)
What may be tricky cases that do not belong to the instance class?    Replicas of landmarks (e.g., Eiffel Tower in Las Vegas), souvenirs    Same piece of apparel of different material or different color; visually very similar pieces with a small distinguishing detail (e.g., a small brand logo); different pieces of apparel worn by the same model
Variation among domains for landmark and apparel examples.

Learning Multi-domain Representations
After a collection of images covering a variety of domains is created, the next challenge is to train a single, universal model. Some features and tasks, such as representing color, are useful across many domains, and thus adding training data from any domain will likely help the model improve at distinguishing colors. Other features may be more specific to selected domains, thus adding more training data from other domains may deteriorate the model’s performance. For example, while for 2D artwork it may be very useful for the model to learn to find near duplicates, this may deteriorate the performance on clothing, where deformed and occluded instances need to be recognized.

The large variety of possible input objects and tasks that need to be learned require novel approaches for selecting, augmenting, cleaning and weighing the training data. New approaches for model training and tuning, and even novel architectures may be required.

Universal Image Embedding Challenge
To help motivate the research community to address these challenges, we are hosting the Google Universal Image Embedding Challenge. The challenge was launched on Kaggle in July and will be open until October, with cash prizes totaling $50k. The winning teams will be invited to present their methods at the Instance-Level Recognition workshop at ECCV 2022.

Participants will be evaluated on a retrieval task on a dataset of ~5,000 test query images and ~200,000 index images, from which similar images are retrieved. In contrast to ImageNet, which includes categorical labels, the images in this dataset are labeled at the instance level.

The evaluation data for the challenge is composed of images from the following domains: apparel and accessories, packaged goods, furniture and home goods, toys, cars, landmarks, storefronts, dishes, artwork, memes and illustrations.

Distribution of domains of query images.

We invite researchers and machine learning enthusiasts to participate in the Google Universal Image Embedding Challenge and join the Instance-Level Recognition workshop at ECCV 2022. We hope the challenge and the workshop will advance state-of-the-art techniques on multi-domain representations.

Acknowledgement
The core contributors to this project are Andre Araujo, Boris Bluntschli, Bingyi Cao, Kaifeng Chen, Mário Lipovský, Grzegorz Makosa, Mojtaba Seyedhosseini and Pelin Dogan Schönberger. We would like to thank Sohier Dane, Will Cukierski and Maggie Demkin for their help organizing the Kaggle challenge, as well as our ECCV workshop co-organizers Tobias Weyand, Bohyung Han, Shih-Fu Chang, Ondrej Chum, Torsten Sattler, Giorgos Tolias, Xu Zhang, Noa Garcia, Guangxing Han, Pradeep Natarajan and Sanqiang Zhao. Furthermore we are thankful to Igor Bonaci, Tom Duerig, Vittorio Ferrari, Victor Gomes, Futang Peng and Howard Zhou who gave us feedback, ideas and support at various points of this project.


1 Image credits: Chris Schrier, CC-BY; Petri Krohn, GNU Free Documentation License; Drazen Nesic, CC0; Marco Verch Professional Photographer, CCBY; Grendelkhan, CCBY; Bobby Mikul, CC0; Vincent Van Gogh, CC0; pxhere.com, CC0; Smart Home Perfected, CC-BY.  
2 Image credit: Bobby Mikul, CC0.  
3 Image credit: Chris Schrier, CC-BY.  

Read More

What Is Direct and Indirect Lighting?

Imagine hiking to a lake on a summer day — sitting under a shady tree and watching the water gleam under the sun.

In this scene, the differences between light and shadow are examples of direct and indirect lighting.

The sun shines onto the lake and the trees, making the water look like it’s shimmering and the leaves appear bright green. That’s direct lighting. And though the trees cast shadows, sunlight still bounces off the ground and other trees, casting light on the shady area around you. That’s indirect lighting.

For computer graphics to immerse viewers in photorealistic environments, it’s important to accurately simulate the behavior of light to achieve the proper balance of direct and indirect lighting.

What Is Direct and Indirect Lighting?

Light shining onto an object is called direct lighting.

It determines the color and quantity of light that reaches a surface from a light source, but ignores all light that may arrive at the surface from any other sources, such as after reflection or refraction. Direct lighting also determines the amount of the light that’s absorbed and reflected by the surface itself.

Direct lighting from the sun and sky.

Light bouncing off a surface, illuminating other objects is called indirect lighting. It arrives at surfaces from everything except light sources. In other words, indirect lighting determines the color and quantity of all other light that arrives at a surface. Most commonly, indirect light is reflected from one surface onto other surfaces.

Indirect lighting generally tends to be more difficult and expensive to compute than direct lighting. This is because there is a substantially larger number of paths that light can take between the light emitter and the observer.

Direct and indirect lighting in the same setting.

What Is Global Illumination?

Global illumination is the process of computing the color and quantity of all light — both direct and indirect — that is on visible surfaces in a scene.

Accurately simulating all types of indirect light is extremely difficult, especially if the scene includes complex materials such as glass, water and shiny metals — or if the scene has scattering in clouds, smoke, fog or other elements known as volumetric media.

As a result, real-time graphics solutions for global illumination are typically limited to computing a subset of the indirect light — commonly for surfaces with diffuse (aka matte) materials.

How Are Direct and Indirect Lighting Computed? 

Many algorithms can be used for computing direct lighting, all of which have strengths and weaknesses. For example, if the scene has a single light and no shadows, direct illumination is trivial to compute, but it won’t look very realistic. On the other hand, when a scene has multiple light sources, processing them all for each surface can become expensive.

To tackle these issues, optimized algorithms and shading techniques were developed, such as deferred or clustered shading. These algorithms reduce the number of surface and light interactions to be computed.

Shadows can be added through a number of techniques, including shadow maps, stencil shadow volumes and ray tracing.

Shadow mapping has two steps. First, the scene is rendered from the light’s point of view into a special texture called the shadow map. Then, the shadow map is used to test whether surfaces visible on the screen are also visible from the light’s point of view. Shadow maps come with many limitations and artifacts, and quickly become expensive as the number of lights in the scene increases.

Stencil shadows in ‘Doom 3’ (2004). Image source: Wikipedia.

Stencil shadow volumes are based on extruding scene geometry away from the light, and rendering that extruded geometry into the stencil buffer. The contents of the stencil buffer are then used to determine if a given surface on the screen is in shadow or not. Stencil shadows are always sharp, unnaturally so, but they don’t suffer from common shadow map problems.

Until the introduction of NVIDIA RTX technology, ray tracing was too costly to use when computing shadows. Ray tracing is a method of rendering in graphics that simulates the physical behavior of light. Tracing the rays from a surface on the screen to a light allows for the computation of shadows, but this can be challenging if the light comes from one point. And ray-traced shadows can quickly get expensive if there are many lights in the scene.

More efficient sampling methods were developed to reduce the number of rays required to compute soft shadows from multiple lights. One example is an algorithm called ReSTIR, which calculates direct lighting from millions of lights and shadows with ray tracing at interactive frame rates.

Direct illumination and ray-traced shadows created with ReSTIR, compared to a previous algorithm.

What Is Path Tracing?

For indirect lighting and global illumination, even more methods exist. The most straightforward is called path tracing, where random light paths are simulated for each visible surface. Some of these paths reach lights and contribute to the finished scene, while others do not.

Path tracing is the most accurate method capable of producing results that fully represent lighting in a scene, matching the accuracy of mathematical models for materials and lights. Path tracing can be very expensive to compute, but it’s considered the “holy grail” of real-time graphics.

Comparison of path tracing with a less complete ray-tracing algorithm and rasterization.

How Does Direct and Indirect Lighting Affect Graphics?

Light map applied to a scene. Image courtesy of Reddit.

Direct lighting provides the basic appearance of realism, and indirect lighting makes scenes look richer and more natural.

One way indirect lighting has been used in many video games is through omnipresent ambient light. This type of light can be constant, or vary spatially over light probes arranged in a grid pattern. It can also be rendered into a texture that is wrapped around static objects in a scene — this method is known as a “light map.”

In most cases, ambient light is shadowed by a function of geometry around the surface called ambient occlusion, which helps increase the image realism.

Direct lighting only vs. global illumination in a forest scene.

Examples of Direct Lighting, Indirect Lighting and Global Illumination

Direct and indirect lighting has been present, in some form, in almost every 3D game since the 1990s. Below are some milestones of how lighting has been implemented in popular titles:

  • 1993: Doom showcased one of the first examples of dynamic lighting. The game could vary the light intensity per sector, which made textures lighter or darker, and was used to simulate dim and bright areas or flickering lights.
Map sectors with varying light intensities in Doom.
  • 1995: Quake introduced light maps, which were pre-computed for each level in the game. The light maps could modulate the ambient light intensity.
  • 1997: Quake II added color to the light maps, as well as dynamic lighting from projectiles and explosions.
  • 2001: Silent Hill 2 showcased per-pixel lighting and shadow mapping. Shrek used deferred lighting and stencil shadows.
  • 2007: Crysis showed dynamic screen-space ambient occlusion, which uses pixel depth to give a sense of changes in lighting.
Crysis (2007). Image courtesy of MobyGames.com.
  • 2008: Quake Wars: Ray Traced became the first game tech demo to use ray-traced reflections.
  • 2011: Crysis 2 became the first game to include screen-space reflections, which is a popular technique for reusing screen-space data to calculate reflections.
  • 2016: Rise of the Tomb Raider became the first game to use voxel-based ambient occlusion.
  • 2018: Battlefield V became the first commercial game to use ray-traced reflections.
  • 2019: Q2VKPT became the first game to implement path tracing, which was later refined in Quake II RTX.
  • 2020: Minecraft with RTX used path tracing with RTX.
Minecraft with RTX.

What’s Next for Lighting in Real-Time Graphics?

Real-time graphics are moving toward a more complete simulation of light in scenes with increasing complexity.

ReSTIR dramatically expands the possibilities of artists to use multiple lights in games. Its newer variant, ReSTIR GI, applies the same ideas toward global illumination, enabling path tracing with more bounces and fewer approximations. It can also render less noisy images faster. And more algorithms are being developed to make path tracing faster and more accessible.

Using a complete simulation of lighting effects with ray tracing also means that the rendered images can contain some noise. Clearing that noise, or “denoising,” is another area of active research.

More technologies are being developed to help games effectively denoise lighting in complex, highly detailed scenes with lots of motion at real-time frame rates. This challenge is being approached from two ends: advanced sampling algorithms that generate less noise and advanced denoisers that can handle increasingly difficult situations.

Denoising with NRD in Cyberpunk 2077.

Check out NVIDIA’s solutions for direct lighting and indirect lighting, and access NVIDIA resources for game development.

Learn more about graphics with NVIDIA at SIGGRAPH ‘22 and watch the NVIDIA’s special address, presented by NVIDIA CEO and senior leaders, to hear the latest graphics announcements.

The post What Is Direct and Indirect Lighting? appeared first on NVIDIA Blog.

Read More

Optimal pricing for maximum profit using Amazon SageMaker

This is a guest post by Viktor Enrico Jeney, Senior Machine Learning Engineer at Adspert.

Adspert is a Berlin-based ISV that developed a bid management tool designed to automatically optimize performance marketing and advertising campaigns. The company’s core principle is to automate maximization of profit of ecommerce advertising with the help of artificial intelligence. The continuous development of advertising platforms paves the way for new opportunities, which Adspert expertly utilizes for their customers’ success.

Adspert’s primary goal is to simplify the process for users while optimizing ad campaigns across different platforms. This includes the use of information gathered across the various platforms balanced against the optimum budget set on a level above each platform. Adspert’s focus is to optimize a customer’s goal attainment, regardless of what platform is used. Adspert continues to add platforms as necessary to give our customers significant advantages.

In this post, we share how Adspert created the pricing tool from scratch using different AWS services like Amazon SageMaker and how Adspert collaborated with the AWS Data Lab to accelerate this project from design to build in record time.

The pricing tool reprices a seller-selected product on an ecommerce marketplace based on the visibility and profit margin to maximize profits on the product level.

As a seller, it’s essential that your products are always visible because this will increase sales. The most important factor in ecommerce sales is simply if your offer is visible to customers instead of a competitor’s offer.

Although it certainly depends on the specific ecommerce platform, we’ve found that product price is one of the most important key figures that can affect visibility. However, prices change often and fast; for this reason the pricing tool needs to act in near-real time to increase the visibility.

Overview of solution

The following diagram illustrates the solution architecture.

The solution contains the following components:

  1. Amazon Relational Database Service (Amazon RDS) for PostgreSQL is the main source of data, containing product information that is stored in an RDS for Postgres database.
  2. Product listing changes information arrives in real time in an Amazon Simple Queue Service (Amazon SQS) queue.
  3. Product information stored in Amazon RDS is ingested in near-real time into the raw layer using the change data capture (CDC) pattern available in AWS Database Migration Service (AWS DMS).
  4. Product listing notifications coming from Amazon SQS are ingested in near-real time into the raw layer using an AWS Lambda function.
  5. The original source data is stored in the Amazon Simple Storage Service (Amazon S3) raw layer bucket using Parquet data format. This layer is the single source of truth for the data lake. The partitioning used on this storage supports the incremental processing of data.
  6. AWS Glue extract, transform, and load (ETL) jobs clean the product data, removing duplicates, and applying data consolidation and generic transformations not tied to a specific business case.
  7. The Amazon S3 stage layer receives prepared data that is stored in Apache Parquet format for further processing. The partitioning used on the stage store supports the incremental processing of data.
  8. The AWS Glue jobs created in this layer use the data available in the Amazon S3 stage layer. This includes application of use case-specific business rules and calculations required. The results data from these jobs are stored in the Amazon S3 analytics layer.
  9. The Amazon S3 analytics layer is used to store the data that is used by the ML models for training purposes. The partitioning used on the curated store is based on the data usage expected. This may be different to the partitioning used on the stage layer.
  10. The repricing ML model is a Scikit-Learn Random Forest implementation in SageMaker Script Mode, which is trained using data available in the S3 bucket (the analytics layer).
  11. An AWS Glue data processing job prepares data for the real-time inference. The job processes data ingested in the S3 bucket (stage layer) and invokes the SageMaker inference endpoint. The data is prepared to be used by the SageMaker repricing model. AWS Glue was preferred to Lambda, because the inference requires different complex data processing operations like joins and window functions on a high volume of data (billions of daily transactions). The result from the repricing model invocations are stored in the S3 bucket (inference layer).
  12. The SageMaker training job is deployed using a SageMaker endpoint. This endpoint is invoked by the AWS Glue inference processor, generating near-real-time price recommendations to increase product visibility.
  13. The predictions generated by the SageMaker inference endpoint are stored in the Amazon S3 inference layer.
  14. The Lambda predictions optimizer function processes the recommendations generated by the SageMaker inference endpoint and generates a new price recommendation that focuses on maximizing the seller profit, applying a trade-off between sales volume and sales margin.
  15. The price recommendations generated by the Lambda predictions optimizer are submitted to the repricing API, which updates the product price on the marketplace.
  16. The updated price recommendations generated by the Lambda predictions optimizer are stored in the Amazon S3 optimization layer.
  17. The AWS Glue prediction loader job reloads into the source RDS for Postgres SQL database the predictions generated by the ML model for auditing and reporting purposes. AWS Glue Studio was used to implement this component; it’s a graphical interface that makes it easy to create, run, and monitor ETL jobs in AWS Glue.

Data preparation

The dataset for Adspert’s visibility model is created from an SQS queue and ingested into the raw layer of our data lake in real time with Lambda. Afterwards, the raw data is sanitized by performing simple transformations, like removing duplicates. This process is implemented in AWS Glue. The result is stored in the staging layer of our data lake. The notifications provide the competitors for a given product, with their prices, fulfilment channels, shipping times, and many more variables. They also provide a platform-dependent visibility measure, which can be expressed as a Boolean variable (visible or not visible). We receive a notification any time an offer change happens, which adds up to several million events per month over all our customers’ products.

From this dataset, we extract the training data as follows: for every notification, we pair the visible offers with every non visible offer, and vice versa. Every data point represents a competition between two sellers, in which there is a clear winner and loser. This processing job is implemented in an AWS Glue job with Spark. The prepared training dataset is pushed to the analytics S3 bucket to be used by SageMaker.

Train the model

Our model classifies for each pair of offers, if a given offer will be visible. This model enables us to calculate the best price for our customers, increase visibility based on competition, and maximize their profit. On top of that, this classification model can give us deeper insights into the reasons for our listings being visible or not visible. We use the following features:

  • Ratio of our price to competitors’ prices
  • Difference in fulfilment channels
  • Amount of feedback for each seller
  • Feedback rating of each seller
  • Difference in minimum shipping times
  • Difference in maximum shipping times
  • Availability of each seller’s product

Adspert uses SageMaker to train and host the model. We use Scikit-Learn Random Forest implementation in SageMaker Script Mode. We also include some feature preprocessing directly in the Scikit-Learn pipeline in the training script. See the following code:

import numpy as np

def transform_price(X):
    X = X.to_numpy()
    return np.log(
        X[:, 0] / np.nanmin([X[:, 1], X[:, 2]], axis=0),
    ).reshape(-1, 1)

def difference(X):
    X = X.to_numpy()
    return (X[:, 0] - X[:, 1]).reshape(-1, 1)

def fulfillment_difference(X):
    X = X.astype(int)
    return difference(X)

One of the most important preprocessing functions is transform_price, which divides the price by the minimum of the competitor price and an external price column. We’ve found that this is feature has a relevant impact on the model accuracy. We also apply the logarithm to let the model decide based on relative price differences, not absolute price differences.

In the training_script.py script, we first define how to build the Scikit-Learn ColumnTransformer to apply the specified transformers to the columns of a dataframe:

import argparse
import os
from io import StringIO

import joblib
import numpy as np
import pandas as pd
from custom_transformers import difference
from custom_transformers import fulfillment_difference
from custom_transformers import transform_price
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import OneHotEncoder

def make_preprocessor():
    return ColumnTransformer([
        ('price_by_smallest_cp', FunctionTransformer(transform_price),
         ['price', 'competitor_price', 'external_price']),
        (fulfillment_difference, FunctionTransformer(fulfillment_difference),
         ['fulfillment', 'competitor_'fulfillment']),
        ('feedback_count', 'passthrough',
         ['feedback_count', 'competitor_feedback_count']),
        ('feedback_rating', 'passthrough',
         ['feedback_rating', 'competitor_feedback_rating']),
        (
            'availability_type',
            OneHotEncoder(categories=[['NOW'], ['NOW']],
                          handle_unknown='ignore'),
            ['availability_type', 'competitor_availability_type'],
        ),
        ('min_shipping', FunctionTransformer(difference),
         ['minimum_shipping_hours', 'competitor_min_shipping_hours']),
        ('max_shipping', FunctionTransformer(difference),
         ['maximum_shipping_hours', 'competitor_max_shipping_hours']),
    ], remainder='drop')

In the training script, we load the data from Parquet into a Pandas dataframe, define the pipeline of the ColumnTranformer and the RandomForestClassifier, and train the model. Afterwards, the model is serialized using joblib:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--output-data-dir', type=str,
                        default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str,
                        default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str,
                        default=os.environ['SM_CHANNEL_TRAIN'])

    args = parser.parse_args()

    # load training data
    input_files = [os.path.join(args.train, file)
                   for file in os.listdir(args.train)]
    if len(input_files) == 0:
        raise ValueError
    raw_data = [pd.read_parquet(file) for file in input_files]
    train_data = pd.concat(raw_data)

    # split data set into x and y values
    train_y = train_data.loc[:, 'is_visible']

    if train_y.dtype != 'bool':
        raise ValueError(f'Label 'is_visible' has to be dtype bool but is'
                         f' {train_y.dtype}')

    train_X = train_data.drop('is_visible', axis=1)

    # fit the classifier pipeline and store the fitted model
    clf = Pipeline([
        ('preprocessor', make_preprocessor()),
        ('classifier', RandomForestClassifier(random_state=1)),
    ])
    clf.fit(train_X, train_y)
    joblib.dump(clf, os.path.join(args.model_dir, 'model.joblib'))

In the training script, we also have to implement functions for inference:

  • input_fn – Is responsible for parsing the data from the request body of the payload
  • model_fn – Loads and returns the model that has been dumped in the training section of the script
  • predict_fn – Contains our implementation to request a prediction from the model using the data from the payload
  • predict_proba – In order to draw predicted visibility curves, we return the class probability using the predict_proba function, instead of the binary prediction of the classifier

See the following code:

def input_fn(request_body, request_content_type):
    """Parse input data payload"""
    if request_content_type == 'text/csv':
        df = pd.read_csv(StringIO(request_body))
        return df
    else:
        raise ValueError(f'{request_content_type} not supported by script!')


def predict_fn(input_data, model):
    """Predict the visibilities"""
    classes = model.classes_

    if len(classes) != 2:
        raise ValueError('Model has more than 2 classes!')

    # get the index of the winning class
    class_index = np.where(model.classes_ == 1)[0][0]

    output = model.predict_proba(input_data)
    return output[:, class_index]


def model_fn(model_dir):
    """Deserialized and return fitted model

    Note that this should have the same name as the serialized model in the
    main method
    """
    clf = joblib.load(os.path.join(model_dir, 'model.joblib'))
    return clf

The following figure shows the impurity-based feature importances returned by the Random Forest Classifier.

With SageMaker, we were able to train the model on a large amount of data (up to 14 billion daily transactions) without putting load on our existing instances or having to set up a separate machine with enough resources. Moreover, because the instances are immediately shut down after the training job, training with SageMaker was extremely cost-efficient. The model deployment with SageMaker worked without any additional workload. A single function call in the Python SDK is sufficient to host our model as an inference endpoint, and it can be easily requested from other services using the SageMaker Python SDK as well. See the following code:

from sagemaker.sklearn.estimator import SKLearn

FRAMEWORK_VERSION = "0.23-1"
script_path = 'training_script.py'
output_location = f's3://{bucket}/{folder}/output'
source_dir = 'source_dir'

sklearn = SKLearn(
    entry_point=script_path,
    source_dir=source_dir,
    framework_version=FRAMEWORK_VERSION,
    instance_type='ml.m5.large',
    role=role,
    sagemaker_session=sagemaker_session,
    output_path=output_location)

sklearn.fit({'train': training_path})

The model artefact is stored in Amazon S3 by the fit function. As seen in the following code, the model can be loaded as a SKLearnModel object using the model artefact, script path, and some other parameters. Afterwards, it can be deployed to the desired instance type and number of instances.

model = sagemaker.sklearn.model.SKLearnModel(
    model_data=f'{output_location}/sagemaker-scikit-learn-2021-02-23-11-13-30-036/output/model.tar.gz',
    source_dir=source_dir,
    entry_point=script_path,
    framework_version=FRAMEWORK_VERSION,
    sagemaker_session=sagemaker_session,
    role=role
)
ENDPOINT_NAME = 'visibility-model-v1'
model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name=ENDPOINT_NAME
)

Evaluate the model in real time

Whenever a new notification is sent for one of our products, we want to calculate and submit the optimal price. To calculate optimal prices, we create a prediction dataset in which we compare our own offer with each competitor’s offer for a range of possible prices. These data points are passed to the SageMaker endpoint, which returns the predicted probability of being visible against each competitor for each given price. We call the probability of being visible the predicted visibility. The result can be visualized as a curve for each competitor, portraying the relationship between our price and the visibility, as shown in the following figure.

In this example, the visibility against Competitor 1 is almost a piecewise constant function, suggesting that we mainly have to decrease the price below a certain threshold, roughly the price of the competitor, to become visible. However, the visibility against Competitor 2 doesn’t decrease as steeply. On top of that, we still have a 50% chance of being visible even with a very high price. Analyzing the input data revealed that the competitor has a low amount of ratings, which happen to be very poor. Our model learned that this specific ecommerce platform gives a disadvantage to sellers with poor feedback ratings. We discovered similar effects for the other features, like fulfilment channel and shipping times.

The necessary data transformations and inferences against the SageMaker endpoint are implemented in AWS Glue. The AWS Glue job work in micro-batches on the real-time data ingested from Lambda.

Calculate optimal prices using predicted visibilities

Finally, we want to calculate the aggregated visibility curve, which is the predicted visibility for each possible price. Our offer is visible if it’s better than all other sellers’ offers. Assuming independence between the probabilities of being visible against each seller given our price, the probability of being visible against all sellers is the product of the respective probabilities. That means the aggregated visibility curve can be calculated by multiplying all curves.

The following figures show the predicted visibilities returned from the SageMaker endpoint.

The following figure shows the aggregated visibility curve.

To calculate the optimal price, the visibility curve is first smoothed and then multiplied by the margin. To calculate the margin, we use the costs of goods and the fees. The cost of goods sold and fees are the static product information synced via AWS DMS. Based on the profit function, Adspert calculates the optimal price and submits it to the ecommerce platform through the platform’s API.

This is implemented in the AWS Lambda prediction optimizer.

The following figure shows the relation between predicted visibility and price.

The following figure shows the relation between price and profit.

Conclusion

Adspert’s existing approach to profit maximization is focused on bid management to increase the returns from advertising. To achieve superior performance on ecommerce marketplaces, however, sellers have to consider both advertising and competitive pricing of their products. With this new ML model to predict visibility, we can extend our functionality to also adjust customer’s prices.

The new pricing tool has to be capable of automated training of the ML model on a large amount of data, as well as real-time data transformations, predictions, and price optimizations. In this post, we walked through the main steps of our price optimization engine, and the AWS architecture we implemented in collaboration with the AWS Data Lab to achieve those goals.

Taking ML models from concept to production is typically complex and time-consuming. You have to manage large amounts of data to train the model, choose the best algorithm for training it, manage the compute capacity while training it, and then deploy the model into a production environment. SageMaker reduced this complexity by making it much more straightforward to build and deploy the ML model. After we chose the right algorithms and frameworks from the wide range of choices available, SageMaker managed all of the underlying infrastructure to train our model and deploy it to production.

If you’d like start familiarizing yourself with SageMaker, the Immersion Day workshop can help you gain an end-to-end understanding of how to build ML use cases from feature engineering, the various in-built algorithms, and how to train, tune, and deploy the ML model in a production-like scenario. It guides you to bring your own model and perform an on-premise ML workload lift-and-shift to the SageMaker platform. It further demonstrates advanced concepts like model debugging, model monitoring, and AutoML, and helps you evaluate your ML workload through the AWS ML Well-Architected lens.

If you’d like help accelerating the implementation of use cases that involve data, analytics, AI and ML, serverless, and container modernization, please contact the AWS Data Lab.


About the authors

Viktor Enrico Jeney is a Senior Machine Learning Engineer at Adspert based in Berlin, Germany. He creates solutions for prediction and optimization problems in order to increase customers’ profits. Viktor has a background in applied mathematics and loves working with data. In his free time, he enjoys learning Hungarian, practicing martial arts, and playing the guitar.

Ennio Pastore is a data architect on the AWS Data Lab team. He is an enthusiast of everything related to new technologies that have a positive impact on businesses and general livelihood. Ennio has over 9 years of experience in data analytics. He helps companies define and implement data platforms across industries, such as telecommunications, banking, gaming, retail, and insurance.

Read More

Rush Into August This GFN Thursday With 38 New Games on GeForce NOW

It’s the first GFN Thursday of the month and you know the drill — GeForce NOW is bringing a big batch of games to the cloud.

Get ready for 38 exciting titles like Saints Row and Rumbleverse arriving on the GeForce NOW library in August. Members can kick off the month streaming 13 new games today, including Retreat to Enen with RTX ON.

Arriving in August

This month is packed full of new games streaming across GeForce NOW-supported devices. Gamers have 38 new titles to look forward to, including exciting new releases like Saints Row and Rumbleverse that can be played on Macs only via the power of the GeForce cloud.

Saints Row on GeForce NOW
It feels so good to be bad. Play like a boss streaming ‘Saints Row’ this month on GeForce NOW.

Members will be able to visit the Weird Wild West of Santo Ileso, a vibrant city rife with crime in Deep Silver’s explosive franchise reboot of Saints Row. Embark on criminal ventures as the future Boss, form the Saints with allies Neenah, Kevin and Eli, take down competing gangs, and build your criminal empire to become truly Self Made.

Gamers will also be able to throw down in Rumbleverse, a new, free-to-play, 40-person Brawler Royale where anyone can be a champion. Customize your fighter by mixing and matching unique items and launch your way into the battlefield, streaming at full PC quality to mobile devices.

RTX 3080 members will also be able to play these and the other 1,300+ titles in the GeForce NOW library streaming in 4K resolution at 60 frames per second, or 1440p at 120 FPS on PC and Mac native apps.

Catch the full list of games coming to the cloud later this August:

Play New Games Today

Great gaming in August starts with 13 new games now ready to stream.

Retreat to Enen
Undertake a rite of passage to find your place in a world that narrowly avoided the extinction of humanity.

RTX 3080 and Priority members can play titles like Retreat to Enen with RTX ON support for beautiful, cinematic graphics. RTX 3080 members also get perks of ultra-low latency and maximized eight-hour gaming sessions to enjoy all of the new gaming goodness.

Catch all of the games ready to play today: 

Say Bye to July

In addition to the 13 games announced in July, an extra 13 joined over the month: 

And a few games announced last month didn’t make it, due to shifting of their release dates:

  • Grimstar: Welcome to the Savage Planet (Steam)
  • Panzer Arena: Prologue (Steam)
  • Turbo Sloths (Steam)

With all of these new games on the way, it’s a good time to take a look back and enjoy the games that have been bringing the heat over the summer. Let us know your response on Twitter or in the comments below.

The post Rush Into August This GFN Thursday With 38 New Games on GeForce NOW appeared first on NVIDIA Blog.

Read More

Reachability Embeddings: Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Geospatial Computer Vision

Self-supervised representation learning techniques utilize large datasets without semantic annotations to learn meaningful, universal features that can be conveniently transferred to solve a wide variety of downstream supervised tasks. In this paper, we propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories to solve downstream geospatial computer vision tasks. Tiles resulting from a raster representation of the earth’s surface are modeled as nodes on a graph or pixels of an image. GPS trajectories are modeled as allowed Markovian…Apple Machine Learning Research

Amazon Comprehend announces lower annotation limits for custom entity recognition

Amazon Comprehend is a natural-language processing (NLP) service you can use to automatically extract entities, key phrases, language, sentiments, and other insights from documents. For example, you can immediately start detecting entities such as people, places, commercial items, dates, and quantities via the Amazon Comprehend console, AWS Command Line Interface, or Amazon Comprehend APIs. In addition, if you need to extract entities that aren’t part of the Amazon Comprehend built-in entity types, you can create a custom entity recognition model (also known as custom entity recognizer) to extract terms that are more relevant for your specific use case, like names of items from a catalog of products, domain-specific identifiers, and so on. Creating an accurate entity recognizer on your own using machine learning libraries and frameworks can be a complex and time-consuming process. Amazon Comprehend simplifies your model training work significantly. All you need to do is load your dataset of documents and annotations, and use the Amazon Comprehend console, AWS CLI, or APIs to create the model.

To train a custom entity recognizer, you can provide training data to Amazon Comprehend as annotations or entity lists. In the first case, you provide a collection of documents and a file with annotations that specify the location where entities occur within the set of documents. Alternatively, with entity lists, you provide a list of entities with their corresponding entity type label, and a set of unannotated documents in which you expect your entities to be present. Both approaches can be used to train a successful custom entity recognition model; however, there are situations in which one method may be a better choice. For example, when the meaning of specific entities could be ambiguous and context-dependent, providing annotations is recommended because this might help you create an Amazon Comprehend model that is capable of better using context when extracting entities.

Annotating documents can require quite a lot of effort and time, especially if you consider that both the quality and quantity of annotations have an impact on the resulting entity recognition model. Imprecise or too few annotations can lead to poor results. To help you set up a process for acquiring annotations, we provide tools such as Amazon SageMaker Ground Truth, which you can use to annotate your documents more quickly and generate an augmented manifest annotations file. However, even if you use Ground Truth, you still need to make sure that your training dataset is large enough to successfully build your entity recognizer.

Until today, to start training an Amazon Comprehend custom entity recognizer, you had to provide a collection of at least 250 documents and a minimum of 100 annotations per entity type. Today, we’re announcing that, thanks to recent improvements in the models underlying Amazon Comprehend, we’ve reduced the minimum requirements for training a recognizer with plain text CSV annotation files. You can now build a custom entity recognition model with as few as three documents and 25 annotations per entity type. You can find further details about new service limits in Guidelines and quotas.

To showcase how this reduction can help you getting started with the creation of a custom entity recognizer, we ran some tests on a few open-source datasets and collected performance metrics. In this post, we walk you through the benchmarking process and the results we obtained while working on subsampled datasets.

Dataset preparation

In this post, we explain how we trained an Amazon Comprehend custom entity recognizer using annotated documents. In general, annotations can be provided as a CSV file, an augmented manifest file generated by Ground Truth, or a PDF file. Our focus is on CSV plain text annotations, because this is the type of annotation impacted by the new minimum requirements. CSV files should have the following structure:

File, Line, Begin Offset, End Offset, Type
documents.txt, 0, 0, 13, ENTITY_TYPE_1
documents.txt, 1, 0, 7, ENTITY_TYPE_2

The relevant fields are as follows:

  • File – The name of the file containing the documents
  • Line – The number of the line containing the entity, starting with line 0
  • Begin Offset – The character offset in the input text (relative to the beginning of the line) that shows where the entity begins, considering that the first character is at position 0
  • End Offset – The character offset in the input text that shows where the entity ends
  • Type – The name of the entity type you want to define

Additionally, when using this approach, you have to provide a collection of training documents as .txt files with one document per line, or one document per file.

For our tests, we used the SNIPS Natural Language Understanding benchmark, a dataset of crowdsourced utterances distributed among seven user intents (AddToPlaylist, BookRestaurant, GetWeather, PlayMusic, RateBook, SearchCreativeWork, SearchScreeningEvent). The dataset was published in 2018 in the context of the paper Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces by Coucke, et al.

The SNIPS dataset is made of a collection of JSON files condensing both annotations and raw text files. The following is a snippet from the dataset:

{
   "annotations":{
      "named_entity":[
         {
            "start":16,
            "end":36,
            "extent":"within the same area",
            "tag":"spatial_relation"
         },
         {
            "start":40,
            "end":51,
            "extent":"Lawrence St",
            "tag":"poi"
         },
         {
            "start":67,
            "end":70,
            "extent":"one",
            "tag":"party_size_number"
         }
      ],
      "intent":"BookRestaurant"
   },
   "raw_text":"I'd like to eat within the same area of Lawrence St for a party of one"
}

Before creating our entity recognizer, we transformed the SNIPS annotations and raw text files into a CSV annotations file and a .txt documents file.

The following is an excerpt from our annotations.csv file:

File, Line, Begin Offset, End Offset, Type
documents.txt, 0, 16, 36, spatial_relation
documents.txt, 0, 40, 51, poi
documents.txt, 0, 67, 70, party_size_number

The following is an excerpt from our documents.txt file:

I'd like to eat within the same area of Lawrence St for a party of one
Please book me a table for three at an american gastropub 
I would like to book a restaurant in Niagara Falls for 8 on June nineteenth
Can you book a table for a party of 6 close to DeKalb Av

Sampling configuration and benchmarking process

For our experiments, we focused on a subset of entity types from the SNIPS dataset:

  • BookRestaurant – Entity types: spatial_relation, poi, party_size_number, restaurant_name, city, timeRange, restaurant_type, served_dish, party_size_description, country, facility, state, sort, cuisine
  • GetWeather – Entity types: condition_temperature, current_location, geographic_poi, timeRange, state, spatial_relation, condition_description, city, country
  • PlayMusic – Entity types: track, artist, music_item, service, genre, sort, playlist, album, year

Moreover, we subsampled each dataset to obtain different configurations in terms of number of documents sampled for training and number of annotations per entity (also known as shots). This was done by using a custom script designed to create subsampled datasets in which each entity type appears at least k times, within a minimum of n documents.

Each model was trained using a specific subsample of the training datasets; the nine model configurations are illustrated in the following table.

Subsampled dataset name Number of documents sampled for training Number of documents sampled for testing Average number of annotations per entity type (shots)
snips-BookRestaurant-subsample-A 132 17 33
snips-BookRestaurant-subsample-B 257 33 64
snips-BookRestaurant-subsample-C 508 64 128
snips-GetWeather-subsample-A 91 12 25
snips-GetWeather-subsample-B 185 24 49
snips-GetWeather-subsample-C 361 46 95
snips-PlayMusic-subsample-A 130 17 30
snips-PlayMusic-subsample-B 254 32 60
snips-PlayMusic-subsample-C 505 64 119

To measure the accuracy of our models, we collected evaluation metrics that Amazon Comprehend automatically computes when training an entity recognizer:

  • Precision – This indicates the fraction of entities detected by the recognizer that are correctly identified and labeled. From a different perspective, precision can be defined as tp / (tp + fp), where tp is the number of true positives (correct identifications) and fp is the number of false positives (incorrect identifications).
  • Recall – This indicates the fraction of entities present in the documents that are correctly identified and labeled. It’s calculated as tp / (tp + fn), where tp is the number of true positives and fn is the number of false negatives (missed identifications).
  • F1 score – This is a combination of the precision and recall metrics, which measures the overall accuracy of the model. The F1 score is the harmonic mean of the precision and recall metrics, and is calculated as 2 * Precision * Recall / (Precision + Recall).

For comparing performance of our entity recognizers, we focus on F1 scores.

Considering that, given a dataset and a subsample size (in terms of number of documents and shots), you can generate different subsamples, we generated 10 subsamples for each one of the nine configurations, trained the entity recognition models, collected performance metrics, and averaged them using micro-averaging. This allowed us to get more stable results, especially for few-shot subsamples.

Results

The following table shows the micro-averaged F1 scores computed on performance metrics returned by Amazon Comprehend after training each entity recognizer.

Subsampled dataset name Entity recognizer micro-averaged F1 score (%)
snips-BookRestaurant-subsample-A 86.89
snips-BookRestaurant-subsample-B 90.18
snips-BookRestaurant-subsample-C 92.84
snips-GetWeather-subsample-A 84.73
snips-GetWeather-subsample-B 93.27
snips-GetWeather-subsample-C 93.43
snips-PlayMusic-subsample-A 80.61
snips-PlayMusic-subsample-B 81.80
snips-PlayMusic-subsample-C 85.04

The following column chart shows the distribution of F1 scores for the nine configurations we trained as described in the previous section.

Column chart showing the distribution of micro-averaged F1 scores for the nine configurations trained.

We can observe that we were able to successfully train custom entity recognition models even with as few as 25 annotations per entity type. If we focus on the three smallest subsampled datasets (snips-BookRestaurant-subsample-A, snips-GetWeather-subsample-A, and snips-PlayMusic-subsample-A), we see that, on average, we were able to achieve a F1 score of 84%, which is a pretty good result considering the limited number of documents and annotations we used. If we want to improve the performance of our model, we can collect additional documents and annotations and train a new model with more data. For example, with medium-sized subsamples (snips-BookRestaurant-subsample-B, snips-GetWeather-subsample-B, and snips-PlayMusic-subsample-B), which contain twice as many documents and annotations, we obtained on average a F1 score of 88% (5% improvement with respect to subsample-A datasets). Finally, larger subsampled datasets (snips-BookRestaurant-subsample-C, snips-GetWeather-subsample-C, and snips-PlayMusic-subsample-C), which contain even more annotated data (approximately four times the number of documents and annotations used for subsample-A datasets), provided a further 2% improvement, raising the average F1 score to 90%.

Conclusion

In this post, we announced a reduction of the minimum requirements for training a custom entity recognizer with Amazon Comprehend, and ran some benchmarks on open-source datasets to show how this reduction can help you get started. Starting today, you can create an entity recognition model with as few as 25 annotations per entity type (instead of 100), and at least three documents (instead of 250). With this announcement, we’re lowering the barrier to entry for users interested in using Amazon Comprehend custom entity recognition technology. You can now start running your experiments with a very small collection of annotated documents, analyze preliminary results, and iterate by including additional annotations and documents if you need a more accurate entity recognition model for your use case.

To learn more and get started with a custom entity recognizer, refer to Custom entity recognition.

Special thanks to my colleagues Jyoti Bansal and Jie Ma for their precious help with data preparation and benchmarking.


About the author

Luca Guida is a Solutions Architect at AWS; he is based in Milan and supports Italian ISVs in their cloud journey. With an academic background in computer science and engineering, he started developing his AI/ML passion at university. As a member of the natural language processing (NLP) community within AWS, Luca helps customers be successful while adopting AI/ML services.

Read More

Building Efficient Multiple Visual Domain Models with Multi-path Neural Architecture Search

Deep learning models for visual tasks (e.g., image classification) are usually trained end-to-end with data from a single visual domain (e.g., natural images or computer generated images). Typically, an application that completes visual tasks for multiple domains would need to build multiple models for each individual domain, train them independently (meaning no data is shared between domains), and then at inference time each model would process domain-specific input data. However, early layers between these models generate similar features, even for different domains, so it can be more efficient — decreasing latency and power consumption, lower memory overhead to store parameters of each model — to jointly train multiple domains, an approach referred to as multi-domain learning (MDL). Moreover, an MDL model can also outperform single domain models due to positive knowledge transfer, which is when additional training on one domain actually improves performance for another. The opposite, negative knowledge transfer, can also occur, depending on the approach and specific combination of domains involved. While previous work on MDL has proven the effectiveness of jointly learning tasks across multiple domains, it involved a hand-crafted model architecture that is inefficient to apply to other work.

In “Multi-path Neural Networks for On-device Multi-domain Visual Classification”, we propose a general MDL model that can: 1) achieve high accuracy efficiently (keeping the number of parameters and FLOPS low), 2) learn to enhance positive knowledge transfer while mitigating negative transfer, and 3) effectively optimize the joint model while handling various domain-specific difficulties. As such, we propose a multi-path neural architecture search (MPNAS) approach to build a unified model with heterogeneous network architecture for multiple domains. MPNAS extends the efficient neural architecture search (NAS) approach from single path search to multi-path search by finding an optimal path for each domain jointly. Also, we introduce a new loss function, called adaptive balanced domain prioritization (ABDP) that adapts to domain-specific difficulties to help train the model efficiently. The resulting MPNAS approach is efficient and scalable; the resulting model maintains performance while reducing the model size and FLOPS by 78% and 32%, respectively, compared to a single-domain approach.

Multi-Path Neural Architecture Search
To encourage positive knowledge transfer and avoid negative transfer, traditional solutions build an MDL model so that domains share most of the layers that learn the shared features across domains (called feature extraction), then have a few domain-specific layers on top. However, such a homogenous approach to feature extraction cannot handle domains with significantly different features (e.g., objects in natural images and art paintings). On the other hand, handcrafting a unified heterogeneous architecture for each MDL model is time-consuming and requires domain-specific knowledge.

NAS is a powerful paradigm for automatically designing deep learning architectures. It defines a search space, made up of various potential building blocks that could be part of the final model. The search algorithm finds the best candidate architecture from the search space that optimizes the model objectives, e.g., classification accuracy. Recent NAS approaches (e.g., TuNAS) have meaningfully improved search efficiency by using end-to-end path sampling, which enables us to scale NAS from single domains to MDL.

Inspired by TuNAS, MPNAS builds the MDL model architecture in two stages: search and training. In the search stage, to find an optimal path for each domain jointly, MPNAS creates an individual reinforcement learning (RL) controller for each domain, which samples an end-to-end path (from input layer to output layer) from the supernetwork (i.e., the superset of all the possible subnetworks between the candidate nodes defined by the search space). Over multiple iterations, all the RL controllers update the path to optimize the RL rewards across all domains. At the end of the search stage, we obtain a subnetwork for each domain. Finally, all the subnetworks are combined to build a heterogeneous architecture for the MDL model, shown below.

Since the subnetwork for each domain is searched independently, the building block in each layer can be shared by multiple domains (i.e., dark gray nodes), used by a single domain (i.e., light gray nodes), or not used by any subnetwork (i.e., dotted nodes). The path for each domain can also skip any layer during search. Given the subnetwork can freely select which blocks to use along the path in a way that optimizes performance (rather than, e.g., arbitrarily designating which layers are homogenous and which are domain-specific), the output network is both heterogeneous and efficient.

Example architecture searched by MPNAS. Dashed paths represent all the possible subnetworks. Solid paths represent the selected subnetworks for each domain (highlighted in different colors). Nodes in each layer represent the candidate building blocks defined by the search space.

The figure below demonstrates the searched architecture of two visual domains among the ten domains of the Visual Domain Decathlon challenge. One can see that the subnetwork of these two highly related domains (one red, the other green) share a majority of building blocks from their overlapping paths, but there are still some differences.

Architecture blocks of two domains (ImageNet and Describable Textures) among the ten domains of the Visual Domain Decathlon challenge. Red and green path represents the subnetwork of ImageNet and Describable Textures, respectively. Dark pink nodes represent the blocks shared by multiple domains. Light pink nodes represent the blocks used by each path. The model is built based on MobileNet V3-like search space. The “dwb” block in the figure represents the dwbottleneck block. The “zero” block in the figure indicates the subnetwork skips that block.

Below we show the path similarity between domains among the ten domains of the Visual Domain Decathlon challenge. The similarity is measured by the Jaccard similarity score between the subnetworks of each domain, where higher means the paths are more similar. As one might expect, domains that are more similar share more nodes in the paths generated by MPNAS, which is also a signal of strong positive knowledge transfer. For example, the paths for similar domains (like ImageNet, CIFAR-100, and VGG Flower, which all include objects in natural images) have high scores, while the paths for dissimilar domains (like Daimler Pedestrian Classification and UCF101 Dynamic Images, which include pedestrians in grayscale images and human activity in natural color images, respectively) have low scores.

Confusion matrix for the Jaccard similarity score between the paths for the ten domains. Score value ranges from 0 to 1. A greater value indicates two paths share more nodes.

Training a Heterogeneous Multi-domain Model
In the second stage, the model resulting from MPNAS is trained from scratch for all domains. For this to work, it is necessary to define a unified objective function for all the domains. To successfully handle a large variety of domains, we designed an algorithm that adapts throughout the learning process such that losses are balanced across domains, called adaptive balanced domain prioritization (ABDP).

Below we show the accuracy, model size, and FLOPS of the model trained in different settings. We compare MPNAS to three other approaches:

  • Domain independent NAS: Searching and training a model for each domain separately.
  • Single path multi-head: Using a pre-trained model as a shared backbone for all domains with separated classification heads for each domain.
  • Multi-head NAS: Searching a unified backbone architecture for all domains with separated classification heads for each domain.

From the results, we can observe that domain independent NAS requires building a bundle of models for each domain, resulting in a large model size. Although single path multi-head and multi-head NAS can reduce the model size and FLOPS significantly, forcing the domains to share the same backbone introduces negative knowledge transfer, decreasing overall accuracy.

Model   Number of parameters ratio     GFLOPS     Average Top-1 accuracy  
Domain independent NAS     5.7x 1.08 69.9
Single path multi-head 1.0x 0.09 35.2
Multi-head NAS 0.7x 0.04 45.2
MPNAS 1.3x 0.73 71.8
Number of parameters, gigaFLOPS, and Top-1 accuracy (%) of MDL models on the Visual Decathlon dataset. All methods are built based on the MobileNetV3-like search space.

MPNAS can build a small and efficient model while still maintaining high overall accuracy. The average accuracy of MPNAS is even 1.9% higher than the domain independent NAS approach since the model enables positive knowledge transfer. The figure below compares per domain top-1 accuracy of these approaches.

Top-1 accuracy of each Visual Decathlon domain.

Our evaluation shows that top-1 accuracy is improved from 69.96% to 71.78% (delta: +1.81%) by using ABDP as part of the search and training stages.

Top-1 accuracy for each Visual Decathlon domain trained by MPNAS with and without ABDP.

Future Work
We find MPNAS is an efficient solution to build a heterogeneous network to address the data imbalance, domain diversity, negative transfer, domain scalability, and large search space of possible parameter sharing strategies in MDL. By using a MobileNet-like search space, the resulting model is also mobile friendly. We are continuing to extend MPNAS for multi-task learning for tasks that are not compatible with existing search algorithms and hope others might use MPNAS to build a unified multi-domain model.

Acknowledgements
This work is made possible through a collaboration spanning several teams across Google. We’d like to acknowledge contributions from Junjie Ke, Joshua Greaves, Grace Chu, Ramin Mehran, Gabriel Bender, Xuhui Jia, Brendan Jou, Yukun Zhu, Luciano Sbaiz, Alec Go, Andrew Howard, Jeff Gilbert, Peyman Milanfar, and Ming-Tsuan Yang.

Read More

Training tree-based models with TensorFlow in just a few lines of code

A guest post by Dinko Franceschi, Broad Institute of MIT and Harvard

Kaggle has become the go-to place to practice data science skills and participate in machine learning model-building competitions. This tutorial will provide an easy-to-follow walkthrough of how to get started with a Kaggle notebook using TensorFlow Decision Forests. It’s a library that allows you to train tree-based models (like random forests and gradient-boosted trees) in TensorFlow.

Why should you be interested in decision forests? There are roughly two types of Kaggle competitions – and the winning solution (neural networks or decision forests) depends on the kind of data you’re working with.

If you’re working with a tabular data problem (these involve training a model to classify data in a spreadsheet which is an extremely common scenario) – the winning solution is often a decision forest. However, if you’re working with a perception problem that involves teaching a computer to see or hear (for example, image classification), the winning model is usually a neural network.

Here’s where the good news starts. You can implement a decision forest in TensorFlow with just a few lines of code. This relatively simple model often outperforms a neural network on many Kaggle problems.

We will explore the decision forests library with a simple dataset from Kaggle, and we will build our model with Kaggle Kernels which allow you to completely build and train your models online using free cloud compute power – similar to Colab. The dataset contains vehicle information such as cost, number of doors, occupancy, and maintenance costs which we will use to assign an evaluation on the car.

Kaggle Kernels can be accessed through your Kaggle account. If you do not have an account, please begin by signing up. On the home page, select the “Code” option on the left menu and select “New Notebook,” which will open a new Kaggle Kernel.

Once we have opened a new notebook from Kaggle Kernels, we download the car evaluation dataset to our environment. Click “Add data” near the top right corner of your notebook, search for “car evaluation,” and add the dataset.

Now we are ready to start writing code. Install the TensorFlow Decision Forests library and the necessary imports, as shown below. The code in this blog post has been obtained from the Build, train and evaluate models with the TensorFlow Decision Forests tutorial which contains additional examples to look at.

!pip install tensorflow_decision_forests

import numpy as np

import pandas

import tensorflow_decision_forests as tfdf

We will now import the dataset. We should note that the dataset we downloaded did not contain headers, so we will add those first based on the information provided on the Kaggle page for the dataset. It is good practice to inspect your dataset before you start working with it by opening it up in your favorite text or spreadsheet editor.

df = pandas.read_csv("../input/car-evaluation-data-set/car_evaluation.csv")

col_names =['buying price', 'maintenance price', 'doors', 'persons', 'lug_boot', 'safety', 'class']

df.columns = col_names

df.head()

We must then split the dataset into train and test:

def split_dataset(dataset, test_ratio=0.30):

test_indices = np.random.rand(len(dataset)) < test_ratio

return dataset[~test_indices], dataset[test_indices]


train_ds_pd, test_ds_pd = split_dataset(df)

print("{} examples in training, {} examples for testing.".format(

len(train_ds_pd), len(test_ds_pd)))

And finally we will convert the dataset into tf.data format. This is a high-performance format that is used by TensorFlow to train models more efficiently, and with TensorFlow Decision Forests, you can convert your dataset to this format with one line of code:


train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label="class")

test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_ds_pd, label="class")

Now you can go ahead and train your model right away by executing the following:

model = tfdf.keras.RandomForestModel()

model.fit(train_ds)

The library has good defaults which are a fine place to start for most problems. For advanced users, there are lots of options to choose from in the API doc as random forests are configurable.

Once you have trained the model, you can see how it will perform on the test data.

model.compile(metrics=["accuracy"])

print(model.evaluate(test_ds))

In just a few lines of code, you reached an accuracy of >95% on this small dataset! This is a simple dataset, and one might argue that neural networks could also yield impressive results. And they absolutely can (and do), especially when you have very large datasets (think: hundreds of thousands of examples, or more). However, neural networks require more code and are resource intensive as they require significantly more compute power.

Easy preprocessing

Decision forests have another important advantage: there are fewer steps to preprocess the data. Notice in the code above that you were able to pass a dataset with both categorical and numeric values directly to the decision forests. You did not have to do any preprocessing like normalizing numeric values, converting strings to integers, and one-hot encoding them. This has major benefits. It makes decision forests simpler to work with (so you can train a model quickly), and there is less code that can go wrong.

Below, you will see some important differences between the two techniques.

Easy to interpret

A significant advantage of decision forests is that they are easy to interpret. While the pipeline for decision trees differs significantly from that of training neural networks, there are major advantages for selecting these models for a given task. This is because feature importance is particularly straightforward to determine with decision forests (ensemble of decision trees). Notably, the TensorFlow Decision Forests library makes it possible to visualize feature importance with its model plotter function. Let’s see below how this works!

tfdf.model_plotter.plot_model_in_colab(model, tree_idx=0)

We see in the root of the tree on the left the number of examples (1728) and the corresponding distribution indicated by the different colors. Here our model is looking at the number of persons that the car can fit. The largest section indicated by green stands for 2 persons and the red for 4 persons. Furthermore, as we go down the tree we continue to see how the tree splits and the corresponding number of examples. Based on the condition, examples are branched to one of two paths. Interestingly, from here we can also determine the importance of a feature by examining all of the splits of a given feature and then computing how much this feature lowered the variance.

Decision Trees vs. Neural Networks

Neural networks undoubtedly have incredible representation learning capabilities. While they are very powerful in this regard, it is important to consider whether they are the right tool for the problem at hand. When working with neural networks, one must think a lot about how they will construct the layers. In contrast, decision forests are ready to go out of the box (of course, advanced users can tune a variety of parameters).

Prior to even building a neural network layer by layer, in most cases one must perform feature pre-processing. For example, this could include normalizing the features to have mean around 0 and standard deviation of 1 and converting strings to numbers. This initial step can be skipped right away with Tree-based models which natively handle mixed data.

As seen in the code above, we were able to obtain results in just a few steps. Once we have our desired metrics, we have to interpret them within the context of our problem. Perhaps one of the most significant strengths of Decision Trees is their interpretability. We see in the code above the diagrams that were outputted. Starting at the root, we can follow the branches and quickly get a good idea of how the model made its decisions. In contrast, neural networks are a “black box” that can be difficult to interpret and to explain to a non-technical audience.

Learning more

If you’d like to learn more about TensorFlow Decision Forests, the best place to start is with the project homepage. You can also check out this previous article for more background. And if you have any questions or feedback, the best place to ask them is on https://discuss.tensorflow.org/ using the tag “tfdf”. Thanks for reading!

Read More