The Greatest Podcast Ever Recorded

Is this the best podcast ever recorded? Let’s just say you don’t need a GPU to know that’s a stretch. But it’s pretty great if you’re a fan of tall tales.

And better still if you’re not a fan of stretching the truth at all.

That’s because detecting hyperbole may one day get more manageable, thanks to researchers at the University of Copenhagen working in the growing field of exaggeration detection.

Dustin Wright and Isabelle Augenstein have used NVIDIA GPUs to train an “exaggeration detection system” to identify overenthusiastic claims in health science reporting.

Their work comes as the pandemic has fueled demand for understandable, accurate information. And social media has made health misinformation more widespread.

Their paper leverages “few-shot learning,” a technique that lets developers wring more intelligence out of less data, and a new version of a technique called pattern exploiting training.

Research like Wright and Augenstein’s could one day speed more precise health sciences news to more people.

AI Podcast host Noah Kravitz — whose fishing stories we will never trust again after this episode — spoke with Wright about the work.

Key Points From This Episode

  • Approximately 33% of press releases about scientific papers tend to exaggerate the findings in the papers, which leads to news articles exaggerating the findings of these papers.
  • Wright’s exaggeration detection project aims to provide people like journalists with accurate information to ensure that they report accurately on science.
  • The project, accelerated using a NVIDIA Titan X GPU, uses a novel, multitask-capable version of a technique called Pattern Exploiting Training, which they dubbed MT-PET

Tweetables:

“Can we leverage language and related that learn patterns that the language model has picked up on from mass language model pre training, and be able to do classification with any text?” – Dustin Wright [7:28]

“About 33% of the time, press releases will exaggerate the scientific papers and as a result, that means about 33% of news articles exaggerate the findings in scientists’ papers.” – Dustin Wright [9:50]

“This is progress towards a system that could assist, for example, journalists, and ensuring that they’re doing accurate reporting on science.” – Dustin Wright [16:20]

You Might Also Like

NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim

Robots aren’t limited to the assembly line. Liila Torabi, senior product manager for Isaac Sim, a robotics and AI simulation platform powered by NVIDIA Omniverse, talks about where the field’s headed.

GANTheftAuto: Harrison Kinsley on AI-Generated Gaming Environments

Humans playing games against machines is nothing new, but now computers can develop their own games for people to play. Programming enthusiast and social media influencer Harrison Kinsley created GANTheftAuto, an AI-based neural network that generates a playable chunk of the classic video game Grand Theft Auto V.

The Driving Force: How Ford Uses AI to Create Diverse Driving Data

The neural networks powering autonomous vehicles require petabytes of driving data to learn how to operate. Nikita Jaipuria and Rohan Bhasin from Ford Motor Company explain how they use generative adversarial networks (GANs) to fill in the gaps of real-world data used in AV training.

Subscribe to the AI Podcast: Now Available on Amazon Music

You can now listen to the AI Podcast through Amazon Music.

You can also get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast Better: Have a few minutes to spare? Fill out our listener survey

 

 

Featured image: postcard, copyright expired

The post The Greatest Podcast Ever Recorded appeared first on The Official NVIDIA Blog.

Read More

Atos Previews Energy-Efficient, AI-Augmented Hybrid Supercomputer

Stepping deeper into the era of exascale AI, Atos gave the first look at its next-generation high-performance computer.

The BullSequana XH3000 combines Atos’ patented fourth-generation liquid-cooled HPC design with NVIDIA technologies to deliver both more performance and energy efficiency.

Giving users a choice of Arm or x86 computing architectures, it will come in versions using NVIDIA Grace, Intel, AMD or SiPearl processors. For accelerated computing, it supports nodes with four NVIDIA Tensor Core GPUs.

The XH3000 also flexibly employs network options including NVIDIA Quantum-2 InfiniBand and NVIDIA ConnectX-7 InfiniBand and Ethernet adapters to scale these powerful computing nodes to HPC systems capable of 10 mixed precision AI exaflops.

Hybrid HPC+AI+Quantum System

The result is a flexible, hybrid computing platform capable of running the most demanding HPC simulations, AI jobs and even emerging workloads in quantum computing.

The BullSequana XH3000 “will no doubt enable, through the gateway of exascale, some of the key scientific and industrial innovation breakthroughs of the future,” said Rodolphe Belmer, CEO of Atos, in a virtual event revealing the system.

With customers in more than 70 countries, Atos is #1 in supercomputing in Europe, India and South America and especially renowned in France, where it maintains its headquarters as well as a manufacturing and R&D base.

A Broad Collaboration

Optimizations for the XH3000 were among the first projects for EXAIL, the joint Excellence AI Lab that Atos and NVIDIA announced in November.

John Josephakis, global vice president of sales and business development for HPC/supercomputing at NVIDIA, congratulated the team behind the system in a video message.

“By combining the well-known expertise Atos has with NVIDIA AI and HPC technologies and work at our joint lab, this platform will allow researchers to get significant insights much faster to grand challenges both in supercomputing and industrial HPC,” he said.

EXAIL’s work spans climate research, healthcare and genomics, quantum computing, edge AI/computer vision and cybersecurity. Its researchers can access application frameworks such as NVIDIA Clara for healthcare and NVIDIA Morpheus for security as well as the NVIDIA cuQuantum SDK for quantum computing and the NVIDIA HPC SDK that runs hundreds of scientific and technical applications.

A Long, Productive Relationship

Atos built one of Europe’s first supercomputers to employ the NVIDIA Ampere architecture, the JUWELS Booster at the Jülich Supercomputing Center. It uses 3,744 NVIDIA A100 Tensor Core GPUs to deliver 2.5 exaflops of mixed-precision AI performance.

To provide a deeper understanding of climate change, Atos and NVIDIA researchers will run AI models on the system, currently ranked No. 8 on the TOP500 list of the world’s fastest supercomputers. Julich researchers used the system in April to conduct a state-of-the-art quantum circuit simulation.

Last year, Atos led deployment of BerzeLiUs, a system built on the NVIDIA DGX SuperPOD and Sweden’s largest supercomputer. The company also has delivered supercomputing infrastructure in Europe, India and South America based on NVIDIA DGX systems.

Next up, Atos is building Leonardo, a supercomputer at the Italian inter-university consortium CINECA. It will pack 14,000 NVIDIA A100 GPUs on an NVIDIA Quantum InfiniBand network and is expected to become the world’s fastest AI supercomputer, capable of 10 exaflops of mixed-precision AI performance.

With the first glimpse of the BullSequana XH3000, it’s clear there’s much more to come from the collaboration of Atos and NVIDIA.

The post Atos Previews Energy-Efficient, AI-Augmented Hybrid Supercomputer appeared first on The Official NVIDIA Blog.

Read More

Good News About the Carbon Footprint of Machine Learning Training

Machine learning (ML) has become prominent in information technology, which has led some to raise concerns about the associated rise in the costs of computation, primarily the carbon footprint, i.e., total greenhouse gas emissions. While these assertions rightfully elevated the discussion around carbon emissions in ML, they also highlight the need for accurate data to assess true carbon footprint, which can help identify strategies to mitigate carbon emission in ML.

In “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink”, accepted for publication in IEEE Computer, we focus on operational carbon emissions — i.e., the energy cost of operating ML hardware, including data center overheads — from training of natural language processing (NLP) models and investigate best practices that could reduce the carbon footprint. We demonstrate four key practices that reduce the carbon (and energy) footprint of ML workloads by large margins, which we have employed to help keep ML under 15% of Google’s total energy use.

The 4Ms: Best Practices to Reduce Energy and Carbon Footprints
We identified four best practices that reduce energy and carbon emissions significantly — we call these the “4Ms” — all of which are being used at Google today and are available to anyone using Google Cloud services.

  • Model. Selecting efficient ML model architectures, such as sparse models, can advance ML quality while reducing computation by 3x–10x.
  • Machine. Using processors and systems optimized for ML training, versus general-purpose processors, can improve performance and energy efficiency by 2x–5x.
  • Mechanization. Computing in the Cloud rather than on premise reduces energy usage and therefore emissions by 1.4x–2x. Cloud-based data centers are new, custom-designed warehouses equipped for energy efficiency for 50,000 servers, resulting in very good power usage effectiveness (PUE). On-premise data centers are often older and smaller and thus cannot amortize the cost of new energy-efficient cooling and power distribution systems.
  • Map Optimization. Moreover, the cloud lets customers pick the location with the cleanest energy, further reducing the gross carbon footprint by 5x–10x. While one might worry that map optimization could lead to the greenest locations quickly reaching maximum capacity, user demand for efficient data centers will result in continued advancement in green data center design and deployment.

These four practices together can reduce energy by 100x and emissions by 1000x.

Note that Google matches 100% of its operational energy use with renewable energy sources. Conventional carbon offsets are usually retrospective up to a year after the carbon emissions and can be purchased anywhere on the same continent. Google has committed to decarbonizing all energy consumption so that by 2030, it will operate on 100% carbon-free energy, 24 hours a day on the same grid where the energy is consumed. Some Google data centers already operate on 90% carbon-free energy; the overall average was 61% carbon-free energy in 2019 and 67% in 2020.

Below, we illustrate the impact of improving the 4Ms in practice. Other studies examined training the Transformer model on an Nvidia P100 GPU in an average data center and energy mix consistent with the worldwide average. The recently introduced Primer model reduces the computation needed to achieve the same accuracy by 4x. Using newer-generation ML hardware, like TPUv4, provides an additional 14x improvement over the P100, or 57x overall. Efficient cloud data centers gain 1.4x over the average data center, resulting in a total energy reduction of 83x. In addition, using a data center with a low-carbon energy source can reduce the carbon footprint another 9x, resulting in a 747x total reduction in carbon footprint over four years.

Reduction in gross carbon dioxide equivalent emissions (CO2e) from applying the 4M best practices to the Transformer model trained on P100 GPUs in an average data center in 2017, as done in other studies. Displayed values are the cumulative improvement successively addressing each of the 4Ms: updating the model to Primer; upgrading the ML accelerator to TPUv4; using a Google data center with better PUE than average; and training in a Google Oklahoma data center that uses very clean energy.

Overall Energy Consumption for ML
Google’s total energy usage increases annually, which is not surprising considering increased use of its services. ML workloads have grown rapidly, as has the computation per training run, but paying attention to the 4Ms — optimized models, ML-specific hardware, efficient data centers — has largely compensated for this increased load. Our data shows that ML training and inference are only 10%–15% of Google’s total energy use for each of the last three years, each year split ⅗ for inference and ⅖ for training.

Prior Emission Estimates
Google uses neural architecture search (NAS) to find better ML models. NAS is typically performed once per problem domain/search space combination, and the resulting model can then be reused for thousands of applications — e.g., the Evolved Transformer model found by NAS is open sourced for all to use. As the optimized model found by NAS is often more efficient, the one time cost of NAS is typically more than offset by emission reductions from subsequent use.

A study from the University of Massachusetts (UMass) estimated carbon emissions for the Evolved Transformer NAS.

  • Without ready access to Google hardware or data centers, the study extrapolated from the available P100 GPUs instead of TPUv2s, and assumed US average data center efficiency instead of highly efficient hyperscale data centers. These assumptions increased the estimate by 5x over the energy used by the actual NAS computation that was performed in Google’s data center.
  • In order to accurately estimate the emissions for NAS, it’s important to understand the subtleties of how they work. NAS systems use a much smaller proxy task to search for the most efficient models to save time, and then scale up the found models to full size. The UMass study assumed that the search repeated full size model training thousands of times, resulting in emission estimates that are another 18.7x too high.

The overshoot for the NAS was 88x: 5x for energy-efficient hardware in Google data centers and 18.7x for computation using proxies. The actual CO2e for the one-time search were 3,223 kg versus 284,019 kg, 88x less than the published estimate.

Unfortunately, some subsequent papers misinterpreted the NAS estimate as the training cost for the model it discovered, yet emissions for this particular NAS are ~1300x larger than for training the model. These papers estimated that training the Evolved Transformer model takes two million GPU hours, costs millions of dollars, and that its carbon emissions are equivalent to five times the lifetime emissions of a car. In reality, training the Evolved Transformer model on the task examined by the UMass researchers and following the 4M best practices takes 120 TPUv2 hours, costs $40, and emits only 2.4 kg (0.00004 car lifetimes), 120,000x less. This gap is nearly as large as if one overestimated the CO2e to manufacture a car by 100x and then used that number as the CO2e for driving a car.

Outlook
Climate change is important, so we must get the numbers right to ensure that we focus on solving the biggest challenges. Within information technology, we believe these are much more likely the lifecycle costs — i.e., emission estimates that include the embedded carbon emitted from manufacturing all components involved, from chips to data center buildings — of manufacturing computing equipment of all types and sizes1 rather than the operational cost of ML training.

Expect more good news if everyone improves the 4Ms. While these numbers may currently vary across companies, these simple measures can be followed across the industry:

If the 4Ms become widely recognized, we predict a virtuous circle that will bend the curve so that the global carbon footprint of ML training is actually shrinking, not increasing.

Acknowledgements
Let me thank my co-authors who stayed with this long and winding investigation into a topic that was new to most of us: Jeff Dean, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, and Maud Texier. We also had a great deal of help from others along the way for an earlier study that eventually led to this version of the paper. Emma Strubell made several suggestions for the prior paper, including the recommendation to examine the recent giant NLP models. Christopher Berner, Ilya Sutskever, OpenAI, and Microsoft shared information about GPT-3. Dmitry Lepikhin and Zongwei Zhou did a great deal of work to measure the performance and power of GPUs and TPUs in Google data centers. Hallie Cramer, Anna Escuer, Elke Michlmayr, Kelli Wright, and Nick Zakrasek helped with the data and policies for energy and CO2e emissions at Google.



1Worldwide IT manufacturing for 2021 included 1700M cell phones, 340M PCs, and 12M data center servers.   

Read More

Peak Performance: Production Studio Sets the Stage for Virtual Opening Ceremony at European Football Championship

At the latest UEFA Champions League Finals, one of the world’s most anticipated annual soccer events, pop stars Marshmello, Khalid and Selena Gomez shared the stage for a dazzling opening ceremony at Portugal’s third-largest football stadium — without ever stepping foot in it.

The stunning video performance took place in a digital twin of the Estádio do Dragão, or Dragon Stadium, rendered by Madrid-based MR Factory, a company that specializes in virtual production.

The studio, which has been at the forefront of using virtual productions for film and television since the 1990s, now brings its virtual sets to life with the help of NVIDIA Studio, RTX GPUs and Omniverse, a real-time collaboration and simulation platform.

MR Factory’s previous projects include Netflix’s popular series Money Heist and Sky Rojo, and feature films like Jeepers Creepers: Reborn.

With NVIDIA RTX technology and real-time rendering, MR Factory can create stunning visuals and 3D models faster than before. And with NVIDIA Omniverse Enterprise, MR factory enables remote designers and artists to collaborate in one virtual space.

These advanced solutions help the company accelerate design workflows and take virtual productions to the next level.

Images courtesy of MR Factory.

“NVIDIA is powering this new workflow that allows us to improve creative opportunities while reducing production times and costs,” said Óscar Olarte, co-founder and CTO of MR Factory. “Instead of traveling to places like Australia or New York, we can create these scenarios virtually — you go from creating content to creating worlds.”

Setting the Virtual Stage for UEFA Champions 

MR Factory received the UEFA Champions League Finals opening ceremony project when there were heavy restrictions on travel due to the pandemic. The event was initially set to take place at Istanbul’s Ataturk Stadium, the largest sporting arena in Turkey.

MR Factory captured images of the stadium and used them to create a 3D model for the music video. But with the pandemic’s shifting conditions, the UEFA changed the location to a stadium in Porto, on Portugal’s coast — with just two weeks until the project’s deadline.

MR Factory had to quickly create another 3D model of the new stadium. The team used NVIDIA technology to achieve this, with real-time rendering tools accelerating their creative workflows. To create the stunning graphics and set up the scenes for virtual production, MR Factory uses leading applications such as Autodesk Arnold, DaVinci Resolve, OctaneRender and Unreal Engine.

“One of the most exciting technologies for us right now is NVIDIA RTX because it allows us to render faster, and in real time,” said Olarte. “We can mix real elements with virtual elements instantly.”

MR Factory also uses camera-tracking technology, which allows it to capture all camera and lens movements on stage. They use that footage to then combine live elements with the virtual production environment in real time.

Over 80 people across Spain worked on the virtual opening ceremony and, with the help of NVIDIA RTX, the team was able to complete the integrations from scratch, render all the visuals and finish the project in time for the event.

Making Vast Virtual Worlds 

One of MR Factory’s core philosophies is enabling remote work, as this provides the company with more opportunities to hire talent from anywhere. The studio then empowers that talent with the best creative tools.

Additionally, MR Factory has been developing the metaverse as a way to produce films and television scenes. The pandemic accentuated the need for real-time collaboration and interaction between remote teams, and NVIDIA Omniverse Enterprise helps MR Factory achieve this.

With Omniverse Enterprise, MR Factory can drastically reduce production times, since multiple people can work simultaneously on the same project. Instead of completing a scene in a week, five artists can work in Omniverse and have the scene ready in a day, Olarte said.

“For us, virtual production is a way of creating worlds — and from these worlds come video games and movies,” he added. “So we’re building a library of content while we’re producing it, and the library is compatible with NVIDIA Omniverse Enterprise.”

MR Factory uses a render farm with 200 NVIDIA RTX A6000 GPUs, which provide artists with the GPU memory they need to quickly produce stunning work, deliver high-quality virtual productions and render in real time.

MR Factory plans to use Omniverse Enterprise and the render farm on future projects, so they can streamline creative workflows and bring virtual worlds together.

The same tools that MR Factory uses to create the virtual worlds of tomorrow are also available at no cost to millions of individual NVIDIA Studio creators with GeForce RTX and NVIDIA RTX GPUs.

Learn more about NVIDIA RTX, Omniverse and other powerful technologies behind the latest virtual productions by registering for free for GTC, taking place March 21-24.

The post Peak Performance: Production Studio Sets the Stage for Virtual Opening Ceremony at European Football Championship appeared first on The Official NVIDIA Blog.

Read More

Research advances technology of AI assistance for anesthesiologists

A new study by researchers at MIT and Massachusetts General Hospital (MGH) suggests the day may be approaching when advanced artificial intelligence systems could assist anesthesiologists in the operating room.

In a special edition of Artificial Intelligence in Medicine, the team of neuroscientists, engineers, and physicians demonstrated a machine learning algorithm for continuously automating dosing of the anesthetic drug propofol. Using an application of deep reinforcement learning, in which the software’s neural networks simultaneously learned how its dosing choices maintain unconsciousness and how to critique the efficacy of its own actions, the algorithm outperformed more traditional software in sophisticated, physiology-based simulations of patients. It also closely matched the performance of real anesthesiologists when showing what it would do to maintain unconsciousness given recorded data from nine real surgeries.

The algorithm’s advances increase the feasibility for computers to maintain patient unconsciousness with no more drug than is needed, thereby freeing up anesthesiologists for all the other responsibilities they have in the operating room, including making sure patients remain immobile, experience no pain, remain physiologically stable, and receive adequate oxygen, say co-lead authors Gabe Schamberg and Marcus Badgeley.

“One can think of our goal as being analogous to an airplane’s autopilot, where the captain is always in the cockpit paying attention,” says Schamberg, a former MIT postdoc who is also the study’s corresponding author. “Anesthesiologists have to simultaneously monitor numerous aspects of a patient’s physiological state, and so it makes sense to automate those aspects of patient care that we understand well.”

Senior author Emery N. Brown, a neuroscientist at The Picower Institute for Learning and Memory and Institute for Medical Engineering and Science at MIT and an anesthesiologist at MGH, says the algorithm’s potential to help optimize drug dosing could improve patient care.

“Algorithms such as this one allow anesthesiologists to maintain more careful, near-continuous vigilance over the patient during general anesthesia,” says Brown, the Edward Hood Taplin Professor Computational Neuroscience and Health Sciences and Technology at MIT.

Both actor and critic

The research team designed a machine learning approach that would not only learn how to dose propofol to maintain patient unconsciousness, but also how to do so in a way that would optimize the amount of drug administered. They accomplished this by endowing the software with two related neural networks: an “actor” with the responsibility to decide how much drug to dose at every given moment, and a “critic” whose job was to help the actor behave in a manner that maximizes “rewards” specified by the programmer. For instance, the researchers experimented with training the algorithm using three different rewards: one that penalized only overdosing, one that questioned providing any dose, and one that imposed no penalties.

In every case, they trained the algorithm with simulations of patients that employed advanced models of both pharmacokinetics, or how quickly propofol doses reach the relevant regions of the brain after doses are administered, and pharmacodynamics, or how the drug actually alters consciousness when it reaches its destination. Patient unconsciousness levels, meanwhile, were reflected in measure of brain waves, as they can be in real operating rooms. By running hundreds of rounds of simulation with a range of values for these conditions, both the actor and the critic could learn how to perform their roles for a variety of kinds of patients.

The most effective reward system turned out to be the “dose penalty” one in which the critic questioned every dose the actor gave, constantly chiding the actor to keep dosing to a necessary minimum to maintain unconsciousness. Without any dosing penalty the system sometimes dosed too much, and with only an overdose penalty it sometimes gave too little. The “dose penalty” model learned more quickly and produced less error than the other value models and the traditional standard software, a “proportional integral derivative” controller.

An able advisor

After training and testing the algorithm with simulations, Schamberg and Badgeley put the “dose penalty” version to a more real-world test by feeding it patient consciousness data recorded from real cases in the operating room.  The testing demonstrated both the strengths and limits of the algorithm.

During most tests, the algorithm’s dosing choices closely matched those of the attending anesthesiologists after unconsciousness had been induced and before it was no longer necessary. The algorithm, however, adjusted dosing as frequently as every five seconds, while the anesthesiologists (who all had plenty of other things to do) typically did so only every 20-30 minutes, Badgeley notes.

As the tests showed, the algorithm is not optimized for inducing unconsciousness in the first place, the researchers acknowledge. The software also doesn’t know of its own accord when surgery is over, they add, but it’s a straightforward matter for the anesthesiologist to manage that process.

One of the most important challenges any AI system is likely to continue to face, Schamberg says, is whether the data it is being fed about patient unconsciousness is perfectly accurate. Another active area of research in the Brown lab at MIT and MGH is in improving the interpretation of data sources, such as brain wave signals, to improve the quality of patient monitoring data under anesthesia.

In addition to Schamberg, Badgeley, and Brown, the paper’s other authors are Benyamin Meschede-Krasa and Ohyoon Kwon.

The JPB Foundation and the National Insititutes of Health funded the study.

Read More

Automate a shared bikes and scooters classification model with Amazon SageMaker Autopilot

Amazon SageMaker Autopilot makes it possible for organizations to quickly build and deploy an end-to-end machine learning (ML) model and inference pipeline with just a few lines of code or even without any code at all with Amazon SageMaker Studio. Autopilot offloads the heavy lifting of configuring infrastructure and the time it takes to build an entire pipeline, including feature engineering, model selection, and hyperparameter tuning.

In this post, we show how to go from raw data to a robust and fully deployed inference pipeline with Autopilot.

Solution overview

We use Lyft’s public dataset on bikesharing for this simulation to predict whether or not a user participates in the Bike Share for All program. This is a simple binary classification problem.

We want to showcase how easy it is to build an automated and real-time inference pipeline to classify users based on their participation in the Bike Share for All program. To this end, we simulate an end-to-end data ingestion and inference pipeline for an imaginary bikeshare company operating in the San Francisco Bay Area.

The architecture is broken down into two parts: the ingestion pipeline and the inference pipeline.

We primarily focus on the ML pipeline in the first section of this post, and review the data ingestion pipeline in the second part.

Prerequisites

To follow along with this example, complete the following prerequisites:

  1. Create a new SageMaker notebook instance.
  2. Create an Amazon Kinesis Data Firehose delivery stream with an AWS Lambda transform function. For instructions, see Amazon Kinesis Firehose Data Transformation with AWS Lambda. This step is optional and only needed to simulate data streaming.

Data exploration

Let’s download and visualize the dataset, which is located in a public Amazon Simple Storage Service (Amazon S3) bucket and static website:

# The dataset is located in a public bucket and static s3 website.
# https://www.lyft.com/bikes/bay-wheels/system-data

import pandas as pd
import numpy as np
import os
from time import sleep

!wget -q -O '201907-baywheels-tripdata.zip' https://s3.amazonaws.com/baywheels-data/201907-baywheels-tripdata.csv.zip
!unzip -q -o 201907-baywheels-tripdata.zip
csv_file = os.listdir('.')
data = pd.read_csv('201907-baywheels-tripdata.csv', low_memory=False)
data.head()

The following screenshot shows a subset of the data before transformation.

The last column of the data contains the target we want to predict, which is a binary variable taking either a Yes or No value, indicating whether the user participates in the Bike Share for All program.

Let’s take a look at the distribution of our target variable for any data imbalance.

# For plotting
%matplotlib inline
import matplotlib.pyplot as plt
#!pip install seaborn # If you need this library
import seaborn as sns
display(sns.countplot(x='bike_share_for_all_trip', data=data))

As shown in the graph above, the data is imbalanced, with fewer people participating in the program.

We need to balance the data to prevent an over-representation bias. This step is optional because Autopilot also offers an internal approach to handle class imbalance automatically, which defaults to a F1 score validation metric. Additionally, if you choose to balance the data yourself, you can use more advanced techniques for handling class imbalance, such as SMOTE or GAN.

For this post, we downsample the majority class (No) as a data balancing technique:

The following code enriches the data and under-samples the overrepresented class:

df = data.copy()
df.drop(columns=['rental_access_method'], inplace=True)

df['start_time'] = pd.to_datetime(df['start_time'])
df['start_time'] = pd.to_datetime(df['end_time'])

# Adding some day breakdown
df = df.assign(day_of_week=df.start_time.dt.dayofweek,
                            hour_of_day=df.start_time.dt.hour,
                            trip_month=df.start_time.dt.month)
# Breaking the day in 4 parts: ['morning', 'afternoon', 'evening']
conditions = [
    (df['hour_of_day'] >= 5) & (df['hour_of_day'] < 12),
    (df['hour_of_day'] >= 12) & (df['hour_of_day'] < 18),
    (df['hour_of_day'] >= 18) & (df['hour_of_day'] < 21),
]
choices = ['morning', 'afternoon', 'evening']
df['part_of_day'] = np.select(conditions, choices, default='night')
df.dropna(inplace=True)

# Downsampling the majority to rebalance the data
# We are getting about an even distribution
df.sort_values(by='bike_share_for_all_trip', inplace=True)
slice_pointe = int(df['bike_share_for_all_trip'].value_counts()['Yes'] * 2.1)
df = df[-slice_pointe:]
# The data is balanced now. Let's reshuffle the data
df = df.sample(frac=1).reset_index(drop=True)

We deliberately left our categorical features not encoded, including our binary target value. This is because Autopilot takes care of encoding and decoding the data for us as part of the automatic feature engineering and pipeline deployment, as we see in the next section.

The following screenshot shows a sample of our data.

The data in the following graphs looks otherwise normal, with a bimodal distribution representing the two peaks for the morning hours and the afternoon rush hours, as you would expect. We also observe low activities on weekends and at night.

In the next section, we feed the data to Autopilot so that it can run an experiment for us.

Build a binary classification model

Autopilot requires that we specify the input and output destination buckets. It uses the input bucket to load the data and the output bucket to save the artifacts, such as feature engineering and the generated Jupyter notebooks. We retain 5% of the dataset to evaluate and validate the model’s performance after the training is complete and upload 95% of the dataset to the S3 input bucket. See the following code:

import sagemaker
import boto3

# Let's define our storage.
# We will use the default sagemaker bucket and will enforce encryption 
 
bucket = sagemaker.Session().default_bucket()  # SageMaker default bucket. 
#Encrypting the bucket
s3 = boto3.client('s3')
SSEConfig={
        'Rules': [
            {
                'ApplyServerSideEncryptionByDefault': {
                    'SSEAlgorithm': 'AES256',
                }
            },
        ]
    }
s3.put_bucket_encryption(Bucket=bucket, ServerSideEncryptionConfiguration=SSEConfig)

prefix = 'sagemaker-automl01'                  # prefix for ther bucket
role = sagemaker.get_execution_role()          # IAM role object to use by SageMaker
sagemaker_session = sagemaker.Session()        # Sagemaker API
region = sagemaker_session.boto_region_name    # AWS Region

# Where we will load our data 
input_path = "s3://{}/{}/automl_bike_train_share-1".format(bucket, prefix) 
output_path = "s3://{}/{}/automl_bike_output_share-1".format(bucket, prefix)

# Spliting data in train/test set.
# We will use 95% of the data for training and the remainder for testing.
slice_point = int(df.shape[0] * 0.95) 
training_set = df[:slice_point] # 95%
testing_set = df[slice_point:]  # 5%

# Just making sure we have split it correctly
assert training_set.shape[0] + testing_set.shape[0] == df.shape[0]

# Let's save the data locally and upload it to our s3 data location
training_set.to_csv('bike_train.csv')
testing_set.to_csv('bike_test.csv', header=False)

# Uploading file the trasining set to the input bucket
sagemaker.s3.S3Uploader.upload(local_path='bike_train.csv', desired_s3_uri=input_path)

After we upload the data to the input destination, it’s time to start Autopilot:

from sagemaker.automl.automl import AutoML
# You give your job a name and provide the s3 path where you uploaded the data
bike_automl_binary = AutoML(role=role, 
                         target_attribute_name='bike_share_for_all_trip', 
                         output_path=output_path,
                         max_candidates=30)
# Starting the training 
bike_automl_binary.fit(inputs=input_path, 
                       wait=False, logs=False)

All we need to start experimenting is to call the fit() method. Autopilot needs the input and output S3 location and the target attribute column as the required parameters. After feature processing, Autopilot calls SageMaker automatic model tuning to find the best version of a model by running many training jobs on your dataset. We added the optional max_candidates parameter to limit the number of candidates to 30, which is the number of training jobs that Autopilot launches with different combinations of algorithms and hyperparameters in order to find the best model. If you don’t specify this parameter, it defaults to 250.

We can observe the progress of Autopilot with the following code:

# Let's monitor the progress this will take a while. Go grup some coffe.
from time import sleep

def check_job_status():
    return bike_automl_binary.describe_auto_ml_job()['AutoMLJobStatus']

def discribe():
    return bike_automl_binary.describe_auto_ml_job()

while True:
    print (check_job_status(), discribe()['AutoMLJobSecondaryStatus'], end='** ') 
    if check_job_status() in ["Completed", "Failed"]:
        if "Failed" in check_job_status():
            print(discribe()['FailureReason'])
        break
    sleep(20)

The training takes some time to complete. While it’s running, let’s look at the Autopilot workflow.

To find the best candidate, use the following code:

# Let's take a look at the best candidate selected by AutoPilot
from IPython.display import JSON
def jsonView(obj, rootName=None):
    return JSON(obj, root=rootName, expanded=True)

bestCandidate = bike_automl_binary.describe_auto_ml_job()['BestCandidate']
display(jsonView(bestCandidate['FinalAutoMLJobObjectiveMetric'], 'FinalAutoMLJobObjectiveMetric'))

The following screenshot shows our output.

Our model achieved a validation accuracy of 96%, so we’re going to deploy it. We could add a condition such that we only use the model if the accuracy is above a certain level.

Inference pipeline

Before we deploy our model, let’s examine our best candidate and what’s happening in our pipeline. See the following code:

display(jsonView(bestCandidate['InferenceContainers'], 'InferenceContainers'))

The following diagram shows our output.

Autopilot has built the model and has packaged it in three different containers, each sequentially running a specific task: transform, predict, and reverse-transform. This multi-step inference is possible with a SageMaker inference pipeline.

A multi-step inference can also chain multiple inference models. For instance, one container can perform principal component analysis before passing the data to the XGBoost container.

Deploy the inference pipeline to an endpoint

The deployment process involves just a few lines of code:

# We chose to difine an endpoint name.
from datetime import datetime as dt
today = str(dt.today())[:10]
endpoint_name='binary-bike-share-' + today
endpoint = bike_automl_binary.deploy(initial_instance_count=1,
                                  instance_type='ml.m5.xlarge',
                                  endpoint_name=endpoint_name,
                                  candidate=bestCandidate,
                                  wait=True)

Let’s configure our endpoint for prediction with a predictor:

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
csv_serializer = CSVSerializer()
csv_deserializer = CSVDeserializer()
# Initialize the predictor
predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name, 
                                                  sagemaker_session=sagemaker.Session(),
                                                  serializer=csv_serializer,
                                                  deserializer=csv_deserializer
                                                  )

Now that we have our endpoint and predictor ready, it’s time to use the testing data we set aside and test the accuracy of our model. We start by defining a utility function that sends the data one line at a time to our inference endpoint and gets a prediction in return. Because we have an XGBoost model, we drop the target variable before sending the CSV line to the endpoint. Additionally, we removed the header from the testing CSV before looping through the file, which is also another requirement for XGBoost on SageMaker. See the following code:

# The fuction takes 3 arguments: the file containing the test set,
# The predictor and finaly the number of lines to send for prediction.
# The function returns two Series: inferred and Actual.
def get_inference(file, predictor, n=1):
    infered = []
    actual = []
    with open(file, 'r') as csv:
        for i in range(n):
            line = csv.readline().split(',')
            #print(line)
            try:
                # Here we remove the target variable from the csv line before predicting 
                observed = line.pop(14).strip('n')
                actual.append(observed)
            except:
                pass
            obj = ','.join(line)
            
            predicted = predictor.predict(obj)[0][0]
            infered.append(predicted)
            pd.Series(infered)
            data = {'Infered': pd.Series(infered), 'Observed': pd.Series(actual)}
    return  pd.DataFrame(data=data)
    
n = testing_set.shape[0] # The size of the testing data
inference_df = get_inference('bike_test.csv', predictor, n)

inference_df['Binary_Result'] = (inference_df['Observed'] == inference_df['Infered'])
display(inference_df.head())

The following screenshot shows our output.

Now let’s calculate the accuracy of our model.

See the following code:

count_binary = inference_df['Binary_Result'].value_counts()
accuracy = count_binary[True]/n
print('Accuracy:', accuracy)

We get an accuracy of 92%. This is slightly lower than the 96% obtained during the validation step, but it’s still high enough. We don’t expect the accuracy to be exactly the same because the test is performed with a new dataset.

Data ingestion

We downloaded the data directly and configured it for training. In real life, you may have to send the data directly from the edge device into the data lake and have SageMaker load it directly from the data lake into the notebook.

Kinesis Data Firehose is a good option and the most straightforward way to reliably load streaming data into data lakes, data stores, and analytics tools. It can capture, transform, and load streaming data into Amazon S3 and other AWS data stores.

For our use case, we create a Kinesis Data Firehose delivery stream with a Lambda transformation function to do some lightweight data cleaning as it traverses the stream. See the following code:

# Data processing libraries
import pandas as pd  # Data processing
import numpy as np
import base64
from io import StringIO


def lambda_handler(event, context):
    output = []
    print('Received', len(event['records']), 'Records')
    for record in event['records']:

        payload = base64.b64decode(record['data']).decode('utf-8')
        df = pd.read_csv(StringIO(payload), index_col=0)

        df.drop(columns=['rental_access_method'], inplace=True)

        df['start_time'] = pd.to_datetime(df['start_time'])
        df['start_time'] = pd.to_datetime(df['end_time'])

        # Adding some day breakdown
        df = df.assign(day_of_week=df.start_time.dt.dayofweek,
                                 hour_of_day=df.start_time.dt.hour,
                                 trip_month=df.start_time.dt.month)
        # Breaking the day in 4 parts: ['morning', 'afternoon', 'evening']
        conditions = [
            (df['hour_of_day'] >= 5) & (df['hour_of_day'] < 12),
            (df['hour_of_day'] >= 12) & (df['hour_of_day'] < 18),
            (df['hour_of_day'] >= 18) & (df['hour_of_day'] < 21),
        ]
        choices = ['morning', 'afternoon', 'evening']
        df['part_of_day'] = np.select(conditions, choices, default='night')
        df.dropna(inplace=True)

        # Downsampling the majority to rebalance the data
        # We are getting about an even distribution
        df.sort_values(by='bike_share_for_all_trip', inplace=True)
        slice_pointe = int(df['bike_share_for_all_trip'].value_counts()['Yes'] * 2.1)
        df = df[-slice_pointe:]
        # The data is balanced now. Let's reshuffle the data
        df = df.sample(frac=1).reset_index(drop=True)

        data = base64.b64encode(bytes(df.to_csv(), 'utf-8')).decode("utf-8")
        output_record = {
            'recordId': record['recordId'],
            'result': 'Ok',
            'data': data

        }
        output.append(output_record)
    print('Returned', len(output), 'Records')
    print('Event', event)

    return {'records': output}

This Lambda function performs light transformation of the data streamed from the devices onto the data lake. It expects a CSV formatted data file.

For the ingestion step, we download the data and simulate a data stream to Kinesis Data Firehose with a Lambda transform function and into our S3 data lake.

Let’s simulate streaming a few lines:

# Saving the data in one file.
file = '201907-baywheels-tripdata.csv' 
data.to_csv(file)

# Stream the data 'n' lines at a time.
# Only run this for a minute and stop the cell
def streamer(file, n):
    with open(file, 'r') as csvfile:  
        header = next(csvfile)
        data = header
        counter = 0
        loop = True
        while loop == True:
            for i in range(n):
                line = csvfile.readline()
                data+=line
                # We reached the end of the csv file.
                if line == '':
                    loop = False
            counter+=n
            # Use your kinesis streaming name
            stream = client.put_record(DeliveryStreamName='firehose12-DeliveryStream-OJYW71BPYHF2', Record={"Data": bytes(data, 'utf-8')})
            data = header
            print( file, 'HTTPStatusCode: '+ str(stream['ResponseMetadata']['HTTPStatusCode']), 'csv_lines_sent: ' + str(counter), end=' -*- ')
            
            sleep(random.randrange(1, 3))
        return
# Streaming for 500 lines at a time. You can change this number up and down.
streamer(file, 500)

# We can now load our data as a DataFrame because it’s streamed into the S3 data lake:
# Getting data from s3 location where it was streamed.
STREAMED_DATA = 's3://firehose12-deliverybucket-11z0ya3patrod/firehose/2020'
csv_uri = sagemaker.s3.S3Downloader.list(STREAMED_DATA)
in_memory_string = [sagemaker.s3.S3Downloader.read_file(file) for file in csv_uri]
in_memory_csv = [pd.read_csv(StringIO(file), index_col=0) for file in in_memory_string]
display(df.tail())

Clean up

It’s important to delete all the resources used in this exercise to minimize cost. The following code deletes the SageMaker inference endpoint we created as well the training and testing data we uploaded:

#Delete the s3 data
predictor.delete_endpoint()

# Delete s3 data
s3 = boto3.resource('s3')
ml_bucket = sagemaker.Session().default_bucket()
delete_data = s3.Bucket(ml_bucket).objects.filter(Prefix=prefix).delete()

Conclusion

ML engineers, data scientists, and software developers can use Autopilot to build and deploy an inference pipeline with little to no ML programming experience. Autopilot saves time and resources, using data science and ML best practices. Large organizations can now shift engineering resources away from infrastructure configuration towards improving models and solving business use cases. Startups and smaller organizations can get started on machine learning with little to no ML expertise.

We recommend learning more about other important features SageMaker has to offer, such as the Amazon SageMaker Feature Store, which integrates with Amazon SageMaker Pipelines to create, add feature search and discovery, and reuse automated ML workflows. You can run multiple Autopilot simulations with different feature or target variants in your dataset. You could also approach this as a dynamic vehicle allocation problem in which your model tries to predict vehicle demand based on time (such as time of day or day of the week) or location, or a combination of both.


About the Authors

Doug Mbaya is a Senior Solution architect with a focus in data and analytics. Doug works closely with AWS partners, helping them integrate data and analytics solution in the cloud. Doug’s prior experience includes  supporting AWS customers in the ride sharing and food delivery segment.

Valerio Perrone is an Applied Science Manager working on Amazon SageMaker Automatic Model Tuning and Autopilot.

Read More

Boost your model’s accuracy using self-supervised learning with TensorFlow Similarity

Posted by Elie Bursztein and Owen Vallis, Google

TensorFlow similarity now supports key self-supervised learning algorithms to help you boost your model’s accuracy when you don’t have a lot of labeled data.

Basic Self-Supervised Training.

Often when training a new machine learning classifier, we have a lot more unlabeled data, such as photos, than labeled examples. Self-supervised learning techniques aim at leveraging those unlabeled data to learn useful data representations to boost classifier accuracy via a pre-training phase on those unlabeled examples. The ability to tap into abundant unlabeled data can significantly improve model accuracy in some cases.

Perhaps the most well known example of successful self-supervised training are transformer models, such as BERT, that learn meaningful language representations by pre-training on very large quantities of text, e.g., wikipedia or the web.

Self-supervised learning can be applied to any type of data and at various data scales. For example, if you have only a few hundred labeled images, using self-supervised learning can boost your model accuracy by pre-training on a medium sized dataset such as ImageNet. For example, SimCLR uses the ImageNet ILSVRC-2012 dataset for training the representations and then evaluates the transfer learning performance on 12 other image datasets such as CIFAR, Oxford-IIIT Pets, Food-101, and others. Self-supervised learning works at larger scales as well, where pre-training on billions of examples improves accuracy as well, including text transformer and vision transformer.

High level overview of how self-supervised learning works for images.

At its core, self-supervised learning works by contrasting two augmented “views” of the same example. The model objective is to maximize the similarity between these views to learn representations that are useful for down-stream tasks, such as training a supervised classifier. In practice, after pre-training on a large corpus of unlabeled images, training an image classifier is done by adding a single softmax dense layer with a on top of the frozen pre-trained representation and training as usual using a small number of labeled examples.

Examples of pairs of augmented views on CIFAR10 from the hello world notebook.

TensorFlow Similarity currently provides three key approaches for learning self-supervised representations: SimCLR, SimSiam, Barlow Twins, that work out of the box. TensorFlow Similarity also provides all the necessary components to implement additional forms of unsupervised learning. These include, callbacks, metrics, and data samplers.

You can start to explore how to leverage a self-supervised learning hello world notebook that demonstrates how to double the accuracy on CIFAR10.

Read More