AWS DeepRacer Evo and Sensor Kit now available for purchase

AWS DeepRacer Evo and Sensor Kit now available for purchase

AWS DeepRacer is a fully autonomous 1/18th scale race car powered by reinforcement learning (RL) that gives machine learning (ML) developers of all skill levels the opportunity to learn and build their ML skills in a fun and competitive way. AWS DeepRacer Evo includes new features and capabilities to help you learn more about ML through the addition of sensors that enable object avoidance and head-to-head racing. Starting today, while supplies last, developers can purchase AWS DeepRacer Evo for a limited-time, discounted price of $399, a savings of $199 off the regular bundle price of $598, and the AWS DeepRacer Sensor Kit for $149, a savings of $100 off the regular price of $249. Both are available on Amazon.com for shipping in the USA only.

What is AWS DeepRacer Evo?

AWS DeepRacer Evo is the next generation in autonomous racing. It comes fully equipped with stereo cameras and a LiDAR sensor to enable object avoidance and head-to-head racing, giving you everything you need to take your racing to the next level. These additional sensors allow for the car to handle more complex environments and take actions needed for new racing experiences. In object avoidance races, you use the sensors to detect and avoid obstacles placed on the track. In head-to-head, you race against another car on the same track and try to avoid it while still turning in the best lap time.

Forward-facing left and right cameras make up the stereo cameras, which help the car learn depth information in images. It can then use this information to sense and avoid objects it approaches on the track. The backward-facing LiDAR sensor detects objects behind and beside the car.

The AWS DeepRacer Evo car, available on Amazon.com, includes the original AWS DeepRacer car, an additional 4 megapixel camera module that forms stereo vision with the original camera, a scanning LiDAR, a shell that can fit both the stereo camera and LiDAR, and a few accessories and easy-to-use installation tools for a quick installation. If you already own an AWS DeepRacer car, you can upgrade your car to have the same capabilities as AWS DeepRacer Evo with the AWS DeepRacer Sensor Kit.

AWS DeepRacer Evo under the hood

The following table summarizes the details of AWS DeepRacer Evo.

CAR 1/18th scale 4WD monster truck chassis
CPU Intel Atom™ Processor
MEMORY 4 GB RAM
STORAGE 32 GB (expandable)
WI-FI 802.11ac
CAMERA 2 X 4 MP camera with MJPEG
LIDAR 360 degree 12 meters scanning radius LIDAR sensor
SOFTWARE Ubuntu OS 16.04.3 LTS, Intel® OpenVINO™ toolkit, ROS Kinetic
DRIVE BATTERY 7.4V/1100mAh lithium polymer
COMPUTE BATTERY 13600 mAh USB-C PD
PORTS 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMI
INTEGRATED SENSORS Accelerometer and Gyroscope

Getting started with AWS DeepRacer Evo

You can get your car ready to hit the track in five simple (and fun) steps. For full instructions, see Getting Started with AWS DeepRacer.

Step 1: Install the sensor kit

The first step is to set up the car by reconfiguring the sensors. The existing camera shifts to one side to allow room for the second camera to create a stereo configuration, and the LiDAR is mounted on a bracket above the battery and connects via USB between the two cameras.

Step 2: Connect and test drive

Connect any device to the same Wi-Fi network as your AWS DeepRacer car and navigate to its IP address in your browser. After you upgrade to the latest software version, use the device console to take a test drive.

Step 3: Train a model

Now it’s time to get hands-on with ML by training an RL model on the AWS DeepRacer console. To create a model using the new AWS DeepRacer Evo sensors, select the appropriate sensor configuration in Your Garage, train and evaluate the model, clone, and iterate to improve the model’s performance.

Step 4: Load the model onto the device

You can download the model for the vehicle from the AWS DeepRacer console to your local computer, and then upload it to the AWS DeepRacer vehicle using the file you chose in the Models section on the AWS DeepRacer console.

Step 5: Start racing

Now the rubber hits the road! In the Control vehicle page on the device console, you can select autonomous driving, choose the model you want to race with, make adjustments, and choose Start vehicle to shift into gear!

Building a DIY track

Now you’re ready to race, and every race car needs a race track! For a fun activity, you can build a track for your AWS DeepRacer Evo at home.

  1. Lay down tape on one border of a straight line (your length varies depending on available space).
  2. Measure a width of approximately 24”, excluding the tape borders.
  3. Lay down a parallel line and match the length.
  4. Place the vehicle at one edge of the track and get ready to race!

After you build your track, you can train your model on the console and start racing. Try more challenging races by placing objects (such as a box or toy) on the track and moving them around.

For more information about building tracks, see AWS DeepRacer Track Design Templates.

When you have the basics down for racing the car, you can spend more time improving and getting around the track with greater success.

Optimizing racing performance

Whether you want to go faster, round corners more smoothly, or stop or start faster, model optimization is the key to success in object avoidance and head-to-head racing. You can also experiment with new strategies:

  • Defensive driver – Your car is penalized whenever its position is within a certain range to any other object
  • Blocker – When your car detects a car behind it, it’s incentivized to stay in the same lane to prevent passing

The level of training complexity and time also impact the behavior of the car in different situations. Variables like the number of botcars on the training track, whether botcars are static or moving, and how often they change lanes all affect the model’s performance. There is so much more you can do to train your model and have lots of fun!

Join the race to win glory and prizes!

There are plenty of chances to compete against your fellow racers right now! Submit your model to compete in the AWS DeepRacer Virtual Circuit and try out object avoidance and head-to-head racing. Throughout the 2020 season, the number of objects and bots on the track increases, requiring you to optimize your use of sensors to top the leaderboard. Hundreds of developers have extended their ML journey by competing in object avoidance and head-to-head Virtual Circuit races in 2020 so far.

For more information about an AWS DeepRacer competition from earlier in the year, check out the F1 ProAm DeepRacer event. You can also learn more about AWS DeepRacer in upcoming AWS Summit Online events. Sign in to the AWS DeepRacer console now to learn more and start your ML journey.


About the Author

Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.

 

 

 

 

Read More

Carnegie Mellon University at ICML 2020

Carnegie Mellon University at ICML 2020

Carnegie Mellon University is proud to present 44 papers at the 37th International Conference on Machine Learning (ICML 2020), which will be held virtually this week. CMU is also involved in organizing 5 workshops at the conference, and our faculty and researchers are giving invited talks at 6 workshops.

Here is a quick overview of the areas our researchers are working on:

We are also proud to collaborate with many other researchers in academia and industry:

Publications

Check out the full list of papers below, along with their presentation times and links to preprints/code so you can check out our work.

Learning Theory

Familywise Error Rate Control by Interactive Unmasking
Boyan Duan (Carnegie Mellon University); Aaditya Ramdas (Carnegie Mellon University); Larry Wasserman (Carnegie Mellon University)
Learning Theory, Tue Jul 14 07:00 AM — 07:45 AM & Tue Jul 14 06:00 PM — 06:45 PM (PDT)

Stochastic Regret Minimization in Extensive-Form Games
Gabriele Farina (Carnegie Mellon University); Christian Kroer (Columbia University); Tuomas Sandholm (CMU, Strategy Robot, Inc., Optimized Markets, Inc., Strategic Machine, Inc.)
Learning Theory, Tue Jul 14 07:00 AM — 07:45 AM & Tue Jul 14 06:00 PM — 06:45 PM (PDT)

Strategyproof Mean Estimation from Multiple-Choice Questions
Anson Kahng (Carnegie Mellon University); Gregory Kehne (Carnegie Mellon University); Ariel D Procaccia (Harvard University)
Learning Theory, Tue Jul 14 08:00 AM — 08:45 AM & Tue Jul 14 07:00 PM — 07:45 PM (PDT)

On Learning Language-Invariant Representations for Universal Machine Translation
Han Zhao (Carnegie Mellon University); Junjie Hu (Carnegie Mellon University); Andrej Risteski (CMU)
Learning Theory, Wed Jul 15 05:00 AM — 05:45 AM & Wed Jul 15 04:00 PM — 04:45 PM (PDT)

Class-Weighted Classification: Trade-offs and Robust Approaches
Ziyu Xu (Carnegie Mellon University); Chen Dan (Carnegie Mellon University); Justin Khim (Carnegie Mellon University); Pradeep Ravikumar (Carnegie Mellon University)
Learning Theory, Wed Jul 15 08:00 AM — 08:45 AM & Wed Jul 15 09:00 PM — 09:45 PM (PDT)

Sparsified Linear Programming for Zero-Sum Equilibrium Finding
Brian H Zhang (Carnegie Mellon University); Tuomas Sandholm (CMU, Strategy Robot, Inc., Optimized Markets, Inc., Strategic Machine, Inc.)
Learning Theory, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 06:00 PM — 06:45 PM (PDT)

Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification
Chen Dan (Carnegie Mellon University); Yuting Wei (CMU); Pradeep Ravikumar (Carnegie Mellon University)
Learning Theory, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 06:00 PM — 06:45 PM (PDT)

Uniform Convergence of Rank-weighted Learning
Justin Khim (Carnegie Mellon University); Liu Leqi (Carnegie Mellon University); Adarsh Prasad (Carnegie Mellon University); Pradeep Ravikumar (Carnegie Mellon University)
Learning Theory, Thu Jul 16 07:00 AM — 07:45 AM & Thu Jul 16 06:00 PM — 06:45 PM (PDT)

Online Control of the False Coverage Rate and False Sign Rate
Asaf Weinstein (The Hebrew University of Jerusalem); Aaditya Ramdas (Carnegie Mellon University)
Online Learning, Active Learning, and Bandits, Tue Jul 14 10:00 AM — 10:45 AM & Tue Jul 14 09:00 PM — 09:45 PM (PDT)

On conditional versus marginal bias in multi-armed bandits
Jaehyeok Shin (Carnegie Mellon University); Aaditya Ramdas (Carnegie Mellon University); Alessandro Rinaldo (Carnegie Mellon University)
Online Learning, Active Learning, and Bandits, Wed Jul 15 08:00 AM — 08:45 AM & Wed Jul 15 07:00 PM — 07:45 PM (PDT)

General Machine Learning

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
Tong Yu (Carnegie Mellon University); Branislav Kveton (Google Research); Zheng Wen (DeepMind); Ruiyi Zhang (Duke University); Ole J. Mengshoel (Carnegie Mellon University)
Online Learning, Active Learning, and Bandits, Tue Jul 14 10:00 AM — 10:45 AM & Tue Jul 14 10:00 PM — 10:45 PM (PDT)

The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali (Stanford University); Edgar Dobriban (University of Pennsylvania); Ryan Tibshirani (Carnegie Mellon University)
Supervised Learning, Thu Jul 16 09:00 AM — 09:45 AM & Thu Jul 16 08:00 PM — 08:45 PM (PDT)

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling
David Woodruff (CMU); Amir Zandieh (EPFL)
General Machine Learning Techniques, Tue Jul 14 02:00 PM — 02:45 PM & Wed Jul 15 03:00 AM — 03:45 AM (PDT)

InfoGAN-CR and Model Centrality: Self-supervised Model Training and Selection for Disentangling GANs [code]Zinan Lin (Carnegie Mellon University); Kiran K Thekumparampil (University of Illinois at Urbana-Champaign); Giulia Fanti (CMU); Sewoong Oh (University of Washington)
Representation Learning, Wed Jul 15 08:00 AM — 08:45 AM & Wed Jul 15 08:00 PM — 08:45 PM (PDT)

LTF: A Label Transformation Framework for Correcting Label Shift
Jiaxian Guo (The University of Sydney); Mingming Gong (University of Melbourne); Tongliang Liu (The University of Sydney); Kun Zhang (Carnegie Mellon University); Dacheng Tao (The University of Sydney)
Transfer, Multitask and Meta-learning, Tue Jul 14 07:00 AM — 07:45 AM & Tue Jul 14 07:00 PM — 07:45 PM (PDT)

Optimizing Dynamic Structures with Bayesian Generative Search
Minh Hoang (Carnegie Mellon University); Carleton Kingsford (Carnegie Mellon University)
Transfer, Multitask and Meta-learning, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 05:00 PM — 05:45 PM (PDT)

Label-Noise Robust Domain Adaptation
Xiyu Yu (Baidu Inc.); Tongliang Liu (The University of Sydney); Mingming Gong (University of Melbourne); Kun Zhang (Carnegie Mellon University); Kayhan Batmanghelich (University of Pittsburgh); Dacheng Tao (The University of Sydney)
Unsupervised and Semi-Supervised Learning, Wed Jul 15 05:00 AM — 05:45 AM & Wed Jul 15 07:00 PM — 07:45 PM (PDT)

Input-Sparsity Low Rank Approximation in Schatten Norm
Yi Li (Nanyang Technological University); David Woodruff (Carnegie Mellon University)
Unsupervised and Semi-Supervised Learning, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 06:00 PM — 06:45 PM (PDT)

A Pairwise Fair and Community-preserving Approach to k-Center Clustering
Brian Brubach (University of Maryland); Darshan Chakrabarti (Carnegie Mellon University); John P Dickerson (University of Maryland); Samir Khuller (Northwestern University); Aravind Srinivasan (University of Maryland College Park); Leonidas Tsepenekas (University of Maryland, College Park)
Unsupervised and Semi-Supervised Learning, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 05:00 PM — 05:45 PM (PDT)

Poisson Learning: Graph Based Semi-Supervised Learning At Very Low Label Rates
Jeff Calder (University of Minnesota); Brendan Cook (University of Minnesota); Matthew Thorpe (University of Manchester); Dejan Slepcev (Carnegie Mellon University)
Unsupervised and Semi-Supervised Learning, Thu Jul 16 07:00 AM — 07:45 AM & Thu Jul 16 06:00 PM — 06:45 PM (PDT)

Trustworthy Machine Learning

Explaining Groups of Points in Low-Dimensional Representations [code]Gregory Plumb CMU); Jonathan Terhorst (U-M LSA); Sriram Sankararaman (UCLA); Ameet Talwalkar (CMU)
Accountability, Transparency and Interpretability, Tue Jul 14 07:00 AM — 07:45 AM & Tue Jul 14 06:00 PM — 06:45 PM (PDT)

Overfitting in adversarially robust deep learning [code]Leslie Rice (Carnegie Mellon University); Eric Wong (Carnegie Mellon University); Zico Kolter (Carnegie Mellon University)
Adversarial Examples, Tue Jul 14 08:00 AM — 08:45 AM & Tue Jul 14 07:00 PM — 07:45 PM (PDT)

Adversarial Robustness Against the Union of Multiple Perturbation Models
Pratyush Maini (IIT Delhi); Eric Wong (Carnegie Mellon University); Zico Kolter (Carnegie Mellon University)
Adversarial Examples, Wed Jul 15 09:00 AM — 09:45 AM & Wed Jul 15 09:00 PM — 09:45 PM (PDT)

Characterizing Distribution Equivalence and Structure Learning for Cyclic and Acyclic Directed Graphs
AmirEmad Ghassami (UIUC); Alan Yang (University of Illinois at Urbana-Champaign); Negar Kiyavash (École Polytechnique Fédérale de Lausanne); Kun Zhang (Carnegie Mellon University)
Causality, Tue Jul 14 07:00 AM — 07:45 AM & Tue Jul 14 08:00 PM — 08:45 PM (PDT)

Certified Robustness to Label-Flipping Attacks via Randomized Smoothing
Elan Rosenfeld (Carnegie Mellon University); Ezra Winston (Carnegie Mellon University); Pradeep Ravikumar (Carnegie Mellon University); Zico Kolter (Carnegie Mellon University)
Trustworthy Machine Learning, Tue Jul 14 09:00 AM — 09:45 AM & Tue Jul 14 08:00 PM — 08:45 PM (PDT)

FACT: A Diagnostic for Group Fairness Trade-offs
Joon Sik Kim (Carnegie Mellon University); Jiahao Chen (JPMorgan AI Research); Ameet Talwalkar (CMU)
Fairness, Equity, Justice, and Safety, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 05:00 PM — 05:45 PM (PDT)

Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing
Sanghamitra Dutta (CMU); Dennis Wei (IBM Research); Hazar Yueksel (IBM Research); Pin-Yu Chen (IBM Research); Sijia Liu (IBM Research); Kush R Varshney (IBM Research)
Fairness, Equity, Justice, and Safety, Thu Jul 16 07:00 AM — 07:45 AM & Thu Jul 16 06:00 PM — 06:45 PM (PDT)

Deep Learning

Combining Differentiable PDE Solvers and Graph Neural Networks for Fluid Flow Prediction [code]Filipe de Avila Belbute-Peres (Carnegie Mellon University); Thomas D. Economon (SU2 Foundation); Zico Kolter (Carnegie Mellon University)
Deep Learning – General, Wed Jul 15 05:00 AM — 05:45 AM & Wed Jul 15 04:00 PM — 04:45 PM (PDT)

Optimizing Data Usage via Differentiable Rewards
Xinyi Wang (Carnegie Mellon University); Hieu Pham (Carnegie Mellon University); Paul Michel (Carnegie Mellon University); Antonios  Anastasopoulos (Carnegie Mellon University); Jaime Carbonell (Carnegie Mellon University); Graham Neubig (Carnegie Mellon University)
Deep Learning – Algorithms, Thu Jul 16 08:00 AM — 08:45 AM & Thu Jul 16 07:00 PM — 07:45 PM (PDT)

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
Nikunj Saunshi (Princeton University); Yi Zhang (Princeton); Mikhail Khodak (Carnegie Mellon University); Sanjeev Arora (Princeton University)
Deep Learning – Theory, Tue Jul 14 10:00 AM — 10:45 AM & Tue Jul 14 09:00 PM — 09:45 PM (PDT)

Stabilizing Transformers for Reinforcement Learning
Emilio Parisotto (Carnegie Mellon University); Francis Song (DeepMind); Jack Rae (Deepmind); Razvan Pascanu (Google Deepmind); Caglar Gulcehre (DeepMind); Siddhant Jayakumar (DeepMind); Max Jaderberg (DeepMind); Raphaël Lopez Kaufman (DeepMind); Aidan Clark (DeepMind); Seb Noury (DeepMind); Matthew Botvinick (Google); Nicolas Heess (DeepMind); Raia Hadsell (Deepmind)
Reinforcement Learning – Deep RL, Wed Jul 15 05:00 AM — 05:45 AM & Wed Jul 15 04:00 PM — 04:45 PM (PDT)

Planning to Explore via Self-Supervised World Models [code]Ramanan Sekar (University of Pennsylvania); Oleh Rybkin (University of Pennsylvania); Kostas Daniilidis (University of Pennsylvania); Pieter Abbeel (UC Berkeley); Danijar Hafner (Google); Deepak Pathak (CMU, FAIR)
Reinforcement Learning – Deep RL, Wed Jul 15 08:00 AM — 08:45 AM & Wed Jul 15 07:00 PM — 07:45 PM (PDT)

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control [code]Wenlong Huang (UC Berkeley); Igor Mordatch (Google); Deepak Pathak (CMU, FAIR)
Reinforcement Learning – Deep RL, Thu Jul 16 08:00 AM — 08:45 AM & Thu Jul 16 08:00 PM — 08:45 PM (PDT)

VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing
“Zoltán Á Milacski (Eötvös Loránd University); Barnabas Poczos (Carnegie Mellon University); Andras Lorincz (Eötvös Loránd University)
Sequential, Network, and Time-Series Modeling, Wed Jul 15 02:00 PM — 02:45 PM & Thu Jul 16 01:00 AM — 01:45 AM (PDT)

Applications

An EM Approach to Non-autoregressive Conditional Sequence Generation
Zhiqing Sun (Carnegie Mellon University); Yiming Yang (Carnegie Mellon University)
Applications – Language, Speech and Dialog, Tue Jul 14 08:00 AM — 08:45 AM & Tue Jul 14 07:00 PM — 07:45 PM (PDT)

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation [code]Junjie Hu (Carnegie Mellon University); Sebastian Ruder (DeepMind); Aditya Siddhant (Google Research); Graham Neubig (Carnegie Mellon University); Orhan Firat (Google); Melvin Johnson (Google)
Applications – Language, Speech and Dialog, Tue Jul 14 10:00 AM — 10:45 AM & Tue Jul 14 09:00 PM — 09:45 PM (PDT)

Learning Factorized Weight Matrix for Joint Filtering
Xiangyu Xu (Carnegie Mellon University); Yongrui Ma (SenseTime); Wenxiu Sun (SenseTime Research)
Applications – Computer Vision, Thu Jul 16 03:00 PM — 03:45 PM & Fri Jul 17 04:00 AM — 04:45 AM (PDT)

Uncertainty-Aware Lookahead Factor Models for Quantitative Investing
Lakshay Chauhan (Euclidean Technologies); John Alberg (Euclidean Technologies LLC); Zachary Lipton (Carnegie Mellon University)
Applications – Other, Tue Jul 14 08:00 AM — 08:45 AM & Tue Jul 14 08:00 PM — 08:45 PM (PDT)

Learning Robot Skills with Temporal Variational Inference
Tanmay Shankar (Facebook AI Research); Abhinav Gupta (CMU/FAIR)
Applications – Other, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 05:00 PM — 05:45 PM (PDT)

Optimization

Nearly Linear Row Sampling Algorithm for Quantile Regression
Yi Li (Nanyang Technological University); Ruosong Wang (Carnegie Mellon University); Lin Yang (UCLA); Hanrui Zhang (Duke University)
Optimization – Large Scale, Parallel and Distributed, Tue Jul 14 09:00 AM — 09:45 AM & Tue Jul 14 08:00 PM — 08:45 PM (PDT)

The Non-IID Data Quagmire of Decentralized Machine Learning
Kevin Hsieh (Microsoft Research); Amar Phanishayee (Microsoft Research); Onur Mutlu (ETH Zurich); Phillip B Gibbons (CMU)
Optimization – Large Scale, Parallel and Distributed, Wed Jul 15 08:00 AM — 08:45 AM & Wed Jul 15 09:00 PM — 09:45 PM (PDT)

Refined bounds for algorithm configuration: The knife-edge of dual class approximability
Maria-Florina Balcan (Carnegie Mellon University); Tuomas Sandholm (CMU, Strategy Robot, Inc., Optimized Markets, Inc., Strategic Machine, Inc.); Ellen Vitercik (Carnegie Mellon University)
Optimization – General, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 06:00 PM — 06:45 PM (PDT)

Probabilistic Inference

Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting [code]Niccolo Dalmasso (Carnegie Mellon University); Rafael Izbicki (UFSCar); Ann Lee (Carnegie Mellon University)
Probabilistic Inference – Models and Probabilistic Programming, Tue Jul 14 07:00 AM — 07:45 AM & Tue Jul 14 06:00 PM — 06:45 PM (PDT)

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models [code]Rares-Darius Buhai (MIT); Yoni Halpern (Google); Yoon Kim (Harvard University); Andrej Risteski (CMU); David Sontag (MIT)
Probabilistic Inference – Models and Probabilistic Programming, Thu Jul 16 06:00 AM — 06:45 AM & Thu Jul 16 05:00 PM — 05:45 PM (PDT)

Workshops

Check out the full list of organized workshops below, along with their times and links to the program.

Invited Speakers

Participatory Approaches to Machine Learning
Alexandra Chouldechova (CMU)
Fri Jul 17 06:00 AM — 01:45 PM (PDT)

Workshop on AI for Autonomous Driving
Drew Bagnell (Aurora and CMU)
Fri Jul 17 05:00 AM — 03:00 PM (PDT)

Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond
Zico Kolter (CMU)
Sat Jul 18 05:50 AM — 02:30 PM (PDT)

Real World Experiment Design and Active Learning
Aaditya Ramdas (CMU)
Sat Jul 18 07:00 AM — 03:35 PM (PDT)

Incentives in Machine Learning
Nihar Shah (CMU)
Sat Jul 18 08:00 AM — 11:00 AM (PDT)

2nd ICML Workshop on Human in the Loop Learning (HILL)
Christian Lebiere (CMU); Pradeep Ravikumar, (CMU)
Sat Jul 18 11:00 AM — 03:00 AM (PDT)

Organizers

Federated Learning for User Privacy and Data Confidentiality
Nathalie Baracaldo (IBM Research Almaden, USA); Olivia Choudhury (Amazon, USA); Gauri Joshi (Carnegie Mellon University, USA); Ramesh Raskar (MIT Media Lab, USA); Shiqiang Wang (IBM T. J. Watson Research Center, USA); Han Yu (Nanyang Technological University, Singapore)
Sat Jul 18 05:45 AM — 02:35 PM (PDT)

MLRetrospectives: A Venue for Self-Reflection in ML Research
Ryan Lowe, (Mila / McGill University); Jessica Forde, (Brown University); Jesse Dodge, CMU); Mayoore Jaiswal, IBM Research); Rosanne Liu, (Uber AI Labs); Joelle Pineau, (Mila / McGill University / Facebook AI); Yoshua Bengio, (Mila)
Sat Jul 18 05:50 AM — 02:30 PM (PDT)

Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond
Jian Tang (HEC Montreal & MILA); Le Song (Georgia Institute of Technology); Jure Leskovec (Stanford University); Renjie Liao (University of Toronto); Yujia Li (DeepMimd); Sanja Fidler (University of Toronto, NVIDIA); Richard Zemel (U Toronto); Ruslan Salakhutdinov (CMU)
Sat Jul 18 05:50 AM — 02:30 PM (PDT)

Workshop on Learning in Artificial Open Worlds
William H. Guss (CMU and OpenAI); Katja Hofmann (Microsoft); Brandon Houghton (CMU and OpenAI); Noburu (Sean) Kuno (Microsoft); Ruslan Salakhutdinov (CMU); Kavya Srinet (Facebook AI Research); Arthur Szlam (Facebook AI Research)
Sat Jul 18 07:00 AM — 02:00 PM (PDT)

Real World Experiment Design and Active Learning
Ilija Bogunovic (ETH Zurich); Willie Neiswanger (Carnegie Mellon University); Yisong Yue (Caltech)
Sat Jul 18 07:00 AM — 03:35 PM (PDT)

Read More

Letting robots manipulate cables

For humans, it can be challenging to manipulate thin flexible objects like ropes, wires, or cables. But if these problems are hard for humans, they are nearly impossible for robots. As a cable slides between the fingers, its shape is constantly changing, and the robot’s fingers must be constantly sensing and adjusting the cable’s position and motion.

Standard approaches have used a series of slow and incremental deformations, as well as mechanical fixtures, to get the job done. Recently, a group of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) pursued the task from a different angle, in a manner that more closely mimics us humans. The team’s new system uses a pair of soft robotic grippers with high-resolution tactile sensors (and no added mechanical constraints) to successfully manipulate freely moving cables.

One could imagine using a system like this for both industrial and household tasks, to one day enable robots to help us with things like tying knots, wire shaping, or even surgical suturing. 

The team’s first step was to build a novel two-fingered gripper. The opposing fingers are lightweight and quick moving, allowing nimble, real-time adjustments of force and position. On the tips of the fingers are vision-based “GelSight” sensors, built from soft rubber with embedded cameras. The gripper is mounted on a robot arm, which can move as part of the control system.

The team’s second step was to create a perception-and-control framework to allow cable manipulation. For perception, they used the GelSight sensors to estimate the pose of the cable between the fingers, and to measure the frictional forces as the cable slides. Two controllers run in parallel: one modulates grip strength, while the other adjusts the gripper pose to keep the cable within the gripper.

When mounted on the arm, the gripper could reliably follow a USB cable starting from a random grasp position. Then, in combination with a second gripper, the robot can move the cable “hand over hand” (as a human would) in order to find the end of the cable. It could also adapt to cables of different materials and thicknesses.

As a further demo of its prowess, the robot performed an action that humans routinely do when plugging earbuds into a cell phone. Starting with a free-floating earbud cable, the robot was able to slide the cable between its fingers, stop when it felt the plug touch its fingers, adjust the plug’s pose, and finally insert the plug into the jack. 

“Manipulating soft objects is so common in our daily lives, like cable manipulation, cloth folding, and string knotting,” says Yu She, MIT postdoc and lead author on a new paper about the system. “In many cases, we would like to have robots help humans do this kind of work, especially when the tasks are repetitive, dull, or unsafe.” 

String me along 

Cable following is challenging for two reasons. First, it requires controlling the “grasp force” (to enable smooth sliding), and the “grasp pose” (to prevent the cable from falling from the gripper’s fingers).  

This information is hard to capture from conventional vision systems during continuous manipulation, because it’s usually occluded, expensive to interpret, and sometimes inaccurate. 

What’s more, this information can’t be directly observed with just vision sensors, hence the team’s use of tactile sensors. The gripper’s joints are also flexible — protecting them from potential impact. 

The algorithms can also be generalized to different cables with various physical properties like material, stiffness, and diameter, and also to those at different speeds. 

When comparing different controllers applied to the team’s gripper, their control policy could retain the cable in hand for longer distances than three others. For example, the “open-loop” controller only followed 36 percent of the total length, the gripper easily lost the cable when it curved, and it needed many regrasps to finish the task. 

Looking ahead 

The team observed that it was difficult to pull the cable back when it reached the edge of the finger, because of the convex surface of the GelSight sensor. Therefore, they hope to improve the finger-sensor shape to enhance the overall performance. 

In the future, they plan to study more complex cable manipulation tasks such as cable routing and cable inserting through obstacles, and they want to eventually explore autonomous cable manipulation tasks in the auto industry.

Yu She wrote the paper alongside MIT PhD students Shaoxiong Wang, Siyuan Dong, and Neha Sunil; Alberto Rodriguez, MIT associate professor of mechanical engineering; and Edward Adelson, the John and Dorothy Wilson Professor in the MIT Department of Brain and Cognitive Sciences

Read More

Decentralized Reinforcement Learning:Global Decision-Making viaLocal Economic Transactions

Decentralized Reinforcement Learning:Global Decision-Making viaLocal Economic Transactions

Many neural network architectures that underlie various artificial intelligence systems today bear an interesting similarity to the early computers a century ago.
Just as early computers were specialized circuits for specific purposes like solving linear systems or cryptanalysis, so too does the trained neural network generally function as a specialized circuit for performing a specific task, with all parameters coupled together in the same global scope.

One might naturally wonder what it might take for learning systems to scale in complexity in the same way as programmed systems have.
And if the history of how abstraction enabled computer science to scale gives any indication, one possible place to start would be to consider what it means to build complex learning systems at multiple levels of abstraction, where each level of learning is the emergent consequence of learning from the layer below.

This post discusses our recent paper that introduces a framework for societal decision-making, a perspective on reinforcement learning through the lens of a self-organizing society of primitive agents.
We prove the optimality of an incentive mechanism for engineering the society to optimize a collective objective.
Our work also provides suggestive evidence that the local credit assignment scheme of the decentralized reinforcement learning algorithms we develop to train the society facilitates more efficient transfer to new tasks.

Stanford AI Lab Papers and Talks at ICML 2020

Stanford AI Lab Papers and Talks at ICML 2020

The International Conference on Machine Learning (ICML) 2020 is being hosted virtually from July 13th – July 18th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Active World Model Learning in Agent-rich Environments with Progress Curiosity


Authors: Kuno Kim, Megumi Sano, Julian De Freitas, Nick Haber, Daniel Yamins

Contact: khkim@cs.stanford.edu

Links: | Video

Keywords: curiosity, active learning, world models, animacy, attention


Graph Structure of Neural Networks


Authors: Jiaxuan You, Jure Leskovec, Kaiming He, Saining Xie

Contact: jiaxuan@stanford.edu

Keywords: neural network design, network science, deep learning


A Distributional Framework For Data Valuation


Authors: Amirata Ghorbani, Michael P. Kim, James Zou

Contact: jamesz@stanford.edu

Links: Paper

Keywords: shapley value, data valuation, machine learning, data markets


A General Recurrent State Space Framework for Modeling Neural Dynamics During Decision-Making


Authors: David Zoltowski, Jonathan Pillow, Scott Linderman

Contact: scott.linderman@stanford.edu

Links: Paper

Keywords: computational neuroscience, dynamical systems, variational inference


An Imitation Learning Approach for Cache Replacement


Authors: Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn

Contact: evanliu@cs.stanford.edu

Links: Paper

Keywords: imitation learning, cache replacement, benchmark


An Investigation of Why Overparameterization Exacerbates Spurious Correlations


Authors: Shiori Sagawa*, Aditi Raghunathan*, Pang Wei Koh*, Percy Liang

Contact: ssagawa@cs.stanford.edu

Links: Paper

Keywords: robustness, spurious correlations, overparameterization


Better Depth-Width Trade-offs for Neural Networks through the Lens of Dynamical Systems.


Authors: Vaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas

Contact: vaggos@cs.stanford.edu

Links: Paper

Keywords: expressivity, depth, width, dynamical systems


Bridging the Gap Between f-GANs and Wasserstein GANs


Authors: Jiaming Song, Stefano Ermon

Contact: jiaming.tsong@gmail.com

Links: Paper

Keywords: gans, generative models, f-divergence, wasserstein distance


Concept Bottleneck Models


Authors: Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang

Contact: pangwei@cs.stanford.edu

Links: Paper

Keywords: concepts, intervention, interpretability


Domain Adaptive Imitation Learning


Authors: Kuno Kim, Yihong Gu, Jiaming Song, Shengjia Zhao, Stefano Ermon

Contact: khkim@cs.stanford.edu

Links: Paper

Keywords: imitation learning, domain adaptation, reinforcement learning, generative adversarial networks, cycle consistency


Encoding Musical Style with Transformer Autoencoders


Authors: Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel

Contact: kechoi@cs.stanford.edu

Links: Paper | Blog Post | Video

Keywords: sequential, network, and time-series modeling; applications – music


Fair Generative Modeling via Weak Supervision


Authors: Kristy Choi, Aditya Grover, Trisha Singh, Rui Shu, Stefano Ermon

Contact: kechoi@cs.stanford.edu

Links: Paper | Video

Keywords: deep learning – generative models and autoencoders; fairness, equity, justice, and safety


Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods


Authors: Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré

Contact: danfu@cs.stanford.edu

Links: Paper | Blog Post | Video

Keywords: weak supervision, latent variable models


Flexible and Efficient Long-Range Planning Through Curious Exploration


Authors: Aidan Curtis, Minjian Xin, Dilip Arumugam, Kevin Feigelis, Daniel Yamins

Contact: yamins@stanford.edu

Links: Paper | Blog Post | Video

Keywords: planning, deep learning, sparse reinforcement learning, curiosity


FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis


Authors: Aman Sinha, Matthew O’Kelly, Hongrui Zheng, Rahul Mangharam, John Duchi, Russ Tedrake

Contact: amans@stanford.edu, mokelly@seas.upenn.edu

Links: Paper | Video

Keywords: distributional robustness, online learning, autonomous driving, reinforcement learning, simulation, mcmc


Goal-Aware Prediction: Learning to Model what Matters


Authors: Suraj Nair, Silvio Savarese, Chelsea Finn

Contact: surajn@stanford.edu

Links: Paper

Keywords: reinforcement learning, visual planning, robotics


Graph-based, Self-Supervised Program Repair from Diagnostic Feedback


Authors: Michihiro Yasunaga, Percy Liang

Contact: myasu@cs.stanford.edu

Links: Paper | Blog Post | Video

Keywords: program repair, program synthesis, self-supervision, pre-training, graph


Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions


Authors: Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez

Contact: gottesman@fas.harvard.edu

Links: Paper

Keywords: reinforcement learning, off-policy evaluation, interpretability


Learning Near Optimal Policies with Low Inherent Bellman Error


Authors: Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Contact: zanette@stanford.edu

Links: Paper

Keywords: reinforcement learning, exploration, function approximation


Maximum Likelihood With Bias-Corrected Calibration is Hard-To-Beat at Label Shift Domain Adaptation


Authors: Amr Alexandari*, Anshul Kundaje†, Avanti Shrikumar*† (*co-first †co-corresponding)

Contact: avanti@cs.stanford.edu, amr.alexandari@gmail.com, akundaje@stanford.edu

Links: Paper | Blog Post | Video

Keywords: domain adaptation, label shift, calibration, maximum likelihood


NGBoost: Natural Gradient Boosting for Probabilistic Prediction


Authors: Tony Duan*, Anand Avati*, Daisy Yi Ding, Sanjay Basu, Andrew Ng, Alejandro Schuler

Contact: avati@cs.stanford.edu

Links: Paper

Keywords: gradient boosting, uncertainty estimation, natural gradient


On the Expressivity of Neural Networks for Deep Reinforcement Learning


Authors: Kefan Dong, Yuping Luo, Tianhe Yu, Chelsea Finn, Tengyu Ma

Contact: kefandong@gmail.com

Links: Paper

Keywords: reinforcement learning


On the Generalization Effects of Linear Transformations in Data Augmentation


Authors: Sen Wu, Hongyang Zhang, Gregory Valiant, Christopher Ré

Contact: senwu@cs.stanford.edu

Links: Paper | Blog Post | Video

Keywords: data augmentation, generalization


Predictive Coding for Locally-Linear Control


Authors: Rui Shu*, Tung Nguyen*, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung Bui

Contact: ruishu@stanford.edu

Links: Paper | Video

Keywords: representation learning, information theory, generative models, planning, control


Robustness to Spurious Correlations via Human Annotations


Authors: Megha Srivastava, Tatsunori Hashimoto, Percy Liang

Contact: megha@cs.stanford.edu

Links: Paper

Keywords: robustness, distribution shift, crowdsourcing, human-in-the-loop


Sample Amplification: Increasing Dataset Size even when Learning is Impossible


Authors: Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

Contact: shivamgarg@stanford.edu

Links: Paper | Video

Keywords: learning theory, sample amplification, generative models


Scalable Identification of Partially Observed Systems with Certainty-Equivalent EM


Authors: Kunal Menda, Jean de Becdelièvre, Jayesh K. Gupta, Ilan Kroo, Mykel J. Kochenderfer, Zachary Manchester

Contact: kmenda@stanford.edu

Links: Paper | Video

Keywords: system identification; time series and sequence models


The Implicit and Explicit Regularization Effects of Dropout


Authors: Colin Wei, Sham Kakade, Tengyu Ma

Contact: colinwei@stanford.edu

Links: Paper

Keywords: dropout, deep learning theory, implicit regularization


Training Deep Energy-Based Models with f-Divergence Minimization


Authors: Lantao Yu, Yang Song, Jiaming Song, Stefano Ermon

Contact: lantaoyu@cs.stanford.edu

Links: Paper

Keywords: energy-based models; f-divergences; deep generative models


Two Routes to Scalable Credit Assignment without Weight Symmetry


Authors: Daniel Kunin*, Aran Nayebi*, Javier Sagastuy-Brena*, Surya Ganguli, Jonathan M. Bloom, Daniel L. K. Yamins

Contact: jvrsgsty@stanford.edu

Links: Paper | Video

Keywords: learning rules, computational neuroscience, machine learning


Understanding Self-Training for Gradual Domain Adaptation


Authors: Ananya Kumar, Tengyu Ma, Percy Liang

Contact: ananya@cs.stanford.edu

Links: Paper | Video

Keywords: domain adaptation, self-training, semi-supervised learning


Understanding and Mitigating the Tradeoff between Robustness and Accuracy


Authors: Aditi Raghunathan*, Sang Michael Xie*, Fanny Yang, John C. Duchi, Percy Liang

Contact: aditir@stanford.edu, xie@cs.stanford.edu

Links: Paper | Video

Keywords: adversarial examples, adversarial training, robustness, accuracy, tradeoff, robust self-training


Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling


Authors: Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Contact: yaoliu@stanford.edu

Links: Paper

Keywords: reinforcement learning, off-policy evaluation, importance sampling


Visual Grounding of Learned Physical Models


Authors: Yunzhu Li, Toru Lin*, Kexin Yi*, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba

Contact: liyunzhu@mit.edu

Links: Paper | Video

Keywords: intuitive physics, visual grounding, physical reasoning


Learning to Simulate Complex Physics with Graph Networks


Authors: Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, Peter W. Battaglia

Contact: rexying@stanford.edu

Links: Paper

Keywords: simulation, graph neural networks


Coresets for Data-Efficient Training of Machine Learning Models


Authors: Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec

Contact: baharanm@cs.stanford.edu

Links: Paper | Video

Keywords: Coresets, Data-efficient training, Submodular optimization, Incremental gradient methods


Which Tasks Should be Learned Together in Multi-Task Learning


Authors: Trevor Standley, Amir Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

Contact: tstand@cs.stanford.edu

Links: Paper | Video

Keywords: machine learning, multi-task learning, computer vision


Accelerated Message Passing for Entropy-Regularized MAP Inference



Contact: jnl@stanford.edu

Links: Paper

Keywords: graphical models, map inference, message passing, optimization


We look forward to seeing you at ICML 2020!

Read More

Grounding Natural Language Instructions to Mobile UI Actions

Grounding Natural Language Instructions to Mobile UI Actions

Posted by Yang Li, Research Scientist, Google Research

Mobile devices offer a myriad of functionalities that can assist in everyday activities. However, many of these functionalities are not easily discoverable or accessible to users, forcing users to look up how to perform a specific task — how to turn on the traffic mode in Maps or change notification settings in YouTube, for example. While searching the web for detailed instructions for these questions is an option, it is still up to the user to follow these instructions step-by-step and navigate UI details through a small touchscreen, which can be tedious and time consuming, and results in reduced accessibility. What if one could design a computational agent to turn these language instructions into actions and automatically execute them on the user’s behalf?

In “Mapping Natural Language Instructions to Mobile UI Action Sequences”, published at ACL 2020, we present the first step towards addressing the problem of automatic action sequence mapping, creating three new datasets used to train deep learning models that ground natural language instructions to executable mobile UI actions. This work lays the technical foundation for task automation on mobile devices that would alleviate the need to maneuver through UI details, which may be especially valuable for users who are visually or situationally impaired. We have also open-sourced our model code and data pipelines through our GitHub repository, in order to spur further developments among the research community.

Constructing Language Grounding Models
People often provide one another with instructions in order to coordinate joint efforts and accomplish tasks involving complex sequences of actions, for example, following a recipe to bake a cake, or having a friend walk you through setting up a home network. Building computational agents able to help with similar interactions is an important goal that requires true language grounding in the environments in which the actions take place.

The learning task addressed here is to predict a sequence of actions for a mobile platform given a set of instructions, a sequence of screens produced as the system transitions from one screen to another, as well as the set of interactive elements on those screens. Training such a model end-to-end would require paired language-action data, which is difficult to acquire at a large scale.

Instead, we deconstruct the problem into two sequential steps: an action phrase-extraction step and a grounding step.

The workflow of grounding language instructions to executable actions.

The action phrase-extraction step identifies the operation, object and argument descriptions from multi-step instructions using a Transformer model with area attention for representing each description phrase. Area attention allows the model to attend to a group of adjacent words in the instruction (a span) as a whole for decoding a description.

The action phrase extraction model takes a word sequence of a natural language instruction and outputs a sequence of spans (denoted in red boxes) that indicate the phrases describing the operation, the object and the argument of each action in the task.

Next, the grounding step matches the extracted operation and object descriptions with a UI object on the screen. Again, we use a Transformer model, but in this case, it contextually represents UI objects and grounds object descriptions to them.

The grounding model takes the extracted spans as input and grounds them to executable actions, including the object an action is applied to, given the UI screen at each step during execution.

Results
To investigate the feasibility of this task and the effectiveness of our approach, we construct three new datasets to train and evaluate our model. The first dataset includes 187 multi-step English instructions for operating Pixel phones along their corresponding action-screen sequences and enables assessment of full task performance on naturally occurring instructions, which is used for testing end-to-end grounding quality. For action phrase extraction training and evaluation, we obtain English “how-to” instructions that can be found abundantly from the web and annotate phrases that describe each action. To train the grounding model, we synthetically generate 295K single-step commands to UI actions, covering 178K different UI objects across 25K mobile UI screens from a public android UI corpus.

A Transformer with area attention obtains 85.56% accuracy for predicting span sequences that completely match the ground truth. The phrase extractor and grounding model together obtain 89.21% partial and 70.59% complete accuracy for matching ground-truth action sequences on the more challenging task of mapping language instructions to executable actions end-to-end. We also evaluated alternative methods and representations of UI objects, such as using a graph convolutional network (GCN) or a feedforward network, and found those that can represent an object contextually in the screen lead to better grounding accuracy. The new datasets, models and results provide an important first step on the challenging problem of grounding natural language instructions to mobile UI actions.

Conclusion
This research, and language grounding in general, is an important step for translating multi-stage instructions into actions on a graphical user interface. Successful application of task automation to the UI domain has the potential to significantly improve accessibility, where language interfaces might help individuals who are visually impaired perform tasks with interfaces that are predicated on sight. This also matters for situational impairment when one cannot access a device easily while encumbered by tasks at hand.

By deconstructing the problem into action phrase extraction and language grounding, progress on either can improve full task performance and it alleviates the need to have language-action paired datasets, which are difficult to collect at scale. For example, action span extraction is related to both semantic role labeling and extraction of multiple facts from text and could benefit from innovations in span identification and multitask learning. Reinforcement learning that has been applied in previous grounding work may help improve out-of-sample prediction for grounding in UIs and improve direct grounding from hidden state representations. Although our datasets were based on Android UIs, our approach can be applied generally to instruction grounding on other user interface platforms. Lastly, our work provides a technical foundation for investigating user experiences in language-based human computer interaction.

Acknowledgements
Many thanks to my collaborators on this work at Google Research. Xin Zhou and Jiacong He contributed substantially to the data pipelines and the creation of the datasets. Yuan Zhang and Jason Baldridge provided much valuable advice for the project and contributed to the presentation of the work. Gang Li provided generous help for creating open-source datasets. Many thanks to Ashwin Kakarla, Muqthar Mohammad and Mohd Majeed for their help with the annotations.

Read More

TensorFlow 2 meets the Object Detection API

TensorFlow 2 meets the Object Detection API

Posted by Vivek Rathod and Jonathan Huang, Google Research


At the TF Dev Summit earlier this year, we mentioned that we are making more of the TF ecosystem compatible so your favorite libraries and models work with TF 2.x. Today we are happy to announce that the TF Object Detection API (OD API) officially supports TensorFlow 2!

Over the last year we’ve been migrating our TF Object Detection API models to be TensorFlow 2 compatible. If you are a frequent visitor to the Object Detection API GitHub repository, you may have already seen bits and pieces of these new models. Our codebase offers tight Keras integration, access to distribution strategies, easy debugging with eager execution; all the goodies that one might expect from a TensorFlow 2 codebase. Specifically, this release includes:

  • New binaries for train/eval/export that are eager mode compatible.
  • A suite of TF2 compatible (Keras-based) models; this includes migrations of our most popular TF1 models (e.g., SSD with MobileNet, RetinaNet, Faster R-CNN, Mask R-CNN), as well as a few new architectures for which we will only maintain TF2 implementations: (1) CenterNet – a simple and effective anchor-free architecture based on the recent Objects as Points paper by Zhou et al, and (2) EfficientDet — a recent family of SOTA models discovered with the help of Neural Architecture Search.
  • COCO pre-trained weights for all of the models provided as TF2 style object-based checkpoints
  • Access to DistributionStrategies for distributed training: traditionally, we have mainly relied on asynchronous training for our TF1 models. We now support synchronous training as the primary strategy; Our TF2 models are designed to be trainable using sync multi-GPU and TPU platforms
  • Colab demonstrations of eager mode compatible few-shot training and inference
  • First-class support for keypoint estimation, including multi-class estimation, more data augmentation support, better visualizations, and COCO evaluation.

If you’d like to get your feet wet immediately, we recommend checking out our shiny new Colab demos (for inference and few-shot training). As a fun example, we’ve included a tutorial demonstrating how to train a rubber ducky detector using fine-tuning based few-shot training (with just five example images!).
Our philosophy for this migration was to expose all the benefits of TF2 and Keras, while continuing to support our wide user base still using TF1. We believe that there might be many teams out there grappling with similar migration projects, so we thought that a few words about our thought process and approach here might be useful even for non object-detection TensorFlow users.
Users to our codebase now belong to three categories: (1) New users who want to leverage new features (eager mode training, Distribution Strategies) and new models, (2) Existing TF1 users who want to migrate to TF2, and (3) Existing TF1 users who prefer not to migrate just yet. To support all three categories of users, we have followed a number of strategies detailed below:

  • Refactor low-level core and meta-architecture to work in both TF1 and TF2. We realized most of our codebase could be shared across TF1 and TF2 (e.g., bounding box arithmetic, loss functions, input pipelines, visualization code, etc); where possible, we’ve tried to ensure that our code is agnostic about whether it is run under TF1 or TF2.
  • Treat feature extractors/backbones as being specific to either TF1 or TF2. We continue to maintain our TF1 backbones which are implemented in tf-slim, and introduce TF2 backbones implemented in Keras. Then depending on the version of TensorFlow that a user is running, these models will be either enabled or disabled.
  • Leverage community-maintained existing backbone implementations. Instead of re-implementing backbone architectures (e.g. MobileNet or ResNet) in Keras, our models depend on implementations in the Keras applications collection – a set of community-maintained canned architectures. We have also verified that our new Keras backbones maintain or surpass the accuracy of comparable tf-slim backbones (at least for the models that were already in the OD API).
  • Increase unit test coverage to cover GPU/TPU, TF1 and TF2. Given that we now need to ensure functionality on multiple platforms (GPU and TPU) as well as across TF versions, we’ve designed a new and flexible unit testing framework that tests OD API functions under all four settings ({GPU, TPU}x{TF1, TF2}), while allowing for certain tests to be disabled (e.g. input pipelines are not tested on TPU)
  • Separate front-end binaries (training loops, exporters) for TF1 and TF2. We have added a separate entry point for TF2 models (in the form of new TF2 training and export binaries) which can be run in eager mode, leveraging various DistributionStrategies.
  • No changes to the frontend config language. In order to make migration from TF1 to TF2 as easy as possible for our users, we’ve worked hard to ensure that model specifications using OD API’s config language produce equivalent model architectures in both TF1 and TF2 and that models can be trained to the same level of numerical performance under both TF versions. As an example, if you have an existing ResNet-50 based RetinaNet model config that is trainable using TF1 binaries, then to train the same model with TF2 binaries, you would simply change the name of the feature extractor in the config (in this case from ssd_resnet50_v1_fpn to ssd_resnet50_v1_fpn_keras); all other hyperparameter specifications would remain unchanged.

This release is just one example of making the TF ecosystem TF2 compatible and easier to use. Over the next few months, we will continue to migrate large-scale codebases from TF1 to TF2. In addition, we are working to provide a more integrated, end-to-end experience in the TF ecosystem for researchers looking for easy-to-use modeling, starting with a unified computer vision library coming soon.
As always, please feel free to reach out with questions and feedback via GitHub. We appreciate help from the open source community. In particular, if you are a prior TF1.x user of the TensorFlow Object Detection API and there is a feature that you really like that you don’t see supported in the TF2 pipelines, we encourage you to let us know as this may help us to prioritize as we continue to release features/models.

Acknowledgements

This release is the result of a close collaboration among a number of teams within Google Research. In particular we want to highlight the contributions of the following individuals: first, a special thanks to Tomer Kaftan and Yanhui Liang for initiating this entire effort and doing most of the early heavy lifting. We also thank our main OD API contributors: Vighnesh Birodkar, Ronny Votel, Zhichao Lu, Yu-hui Chen, Sergi Caelles Prat, Jordi Pont-Tuset, Austin Myers. We are also grateful to many other contributors including: Sudheendra Vijayanarasimhan, Sara Beery, Shan Yang, Anjali Sridhar, Kathy Ruan, Karmel Allison, Allen Lavoie, Lu He, Yixin Shi, Derek Chow, David Ross, Pengchong Jin, Jaeyoun Kim, Jing Li, Mingxing Tan, Dan Kondratyuk, Kaushik Shivakumar, Yiming Shi and Tina Tian. Finally we also thank our interns and summer of code students for their contributions: Kathy Ruan, Kaushik Shivakumar, Yiming Shi, Vishnu Banna, Akhil Chinnakotla, and Anirudh Vegesana.Read More

Tools for language access during COVID-19

Translation services make it easier to communicate with someone who doesn’t speak the same language, whether you’re traveling abroad or living in a new country. But in the context of a global pandemic, government and health officials urgently need to deliver vital information to their communities, and every member of the community needs access to information in a language they understand. In the U.S. alone, that means reaching 51 million migrants in at least 350 languages, with information ranging from how to keep people and their families safe, to financial, employment or food resources.

To better understand the challenges in addressing these translation needs, we conducted a research study, and interviewed health and government officials responsible for disseminating critical information. We assessed the current shortcomings in providing this information in the relevant languages, and how translation tools could help mitigate them.

The struggle for language access 

When organizations—from health departments to government agencies—update information on a website, it needs to be quickly accessible in a wide variety of languages. We learned that these organizations are struggling to keep up with the high volume of rapidly-changing content and lack the resources to translate this content into the needed languages. 

Officials, who are already spread thin, can barely keep up with the many updates surrounding COVID-19—from the evolving scientific understanding, to daily policy amendments, to new resources for the public. Nearly all new information is coming in as PDFs several times a day, and many officials report not being able to offer professional translation for all needed languages. This is where machine translation can serve as a useful tool.  

How machine translation can help

Machine translation is an automated way to translate text or speech from one language to another. It can take volumes of data and provide translations into a large number of supported languages. Although not intended to fully replace human translators, it can provide value when immediate translations are needed for a wide variety of languages.

If you’re looking to translate content on the web, you have several options.

Use your browser

Many popular browsers offer translation capabilities, which are either built in (e.g. Chrome) or require installing an add-on or extension (e.g. Microsoft Edge or Firefox). To translate web content in Chrome, all you have to do is go to a webpage in another language, then click “Translate” at the top.

Use a website translation widget

If you are a webmaster of a government, non-profit, and/or non-commercial website (e.g. academic institutions), you may be eligible to sign up for the Google Translate Website Translator widget. This tool translates web page content into 100+ different languages. To find out more, please visit the webmasters blog.

Upload PDFs and documents

Google Translate supports translating many different document formats (.doc, .docx, .odf, .pdf, .ppt, .pptx, .ps, .rtf, .txt, .xls, .xlsx). By simply uploading the document, you can get a translated version in the language that you choose.

Millions of people need translations of resources at this time. Google’s researchers, designers and product developers are listening. We are continuously looking for ways to improve our products and come to people’s aid as we navigate the pandemic. 

Read More

Not So Taxing: Intuit Uses AI to Make Tax Day Easier

Not So Taxing: Intuit Uses AI to Make Tax Day Easier

Understanding the U.S. tax code can take years of study — it’s 80,000 pages long. Software company Intuit has decided that it’s a job for AI.

Ashok Srivastava, its senior vice president and chief data officer, spoke to AI Podcast host Noah Kravitz about how the company is utilizing machine learning to help customers with taxes and aid small businesses through the financial effects of COVID-19.

EMBED PODCAST HERE

To help small businesses, Intuit has a range of programs such as the Intuit Aid Assist Program, which helps business owners figure out if they’re eligible for loans from the government. Other programs include cash flow forecasting, which estimates how much money businesses will have within a certain time frame.

And in the long term, Intuit is working on a machine learning program capable of using photos of financial documents to automatically extract necessary information and fill in tax documents.

Key Points From This Episode:

  • Intuit frequently uses an AI technique called knowledge engineering, which converts written regulations or rules into code, providing the information behind programs such as TurboTax.
  • Intuit also provides natural language processing and chatbot services, which use a customer’s questions as well as their feedback and product usage to determine the best reply.

Tweetables:

“We’re spending our time not only analyzing data, but thinking about new ways that we can use artificial intelligence in order to help small businesses.” — Ashok Srivastava [10:23]

“Data and artificial intelligence are going to come into play again and again to help people … make the best decisions about their own financial future.” — Ashok Srivastava [26:43]

You Might Also Like

AI Startup Brings Computer Vision to Customer Service

When your appliances break, the last thing you want to do is spend an hour on the phone trying to reach a customer service representative. Using computer vision, Drishyam.AI is eliminating service lines to help consumers more quickly.

Dial A for AI: Charter Boosts Customer Service with AI

Charter Communications is working to make customer service smarter before an operator even picks up the phone. Senior Director of Wireless Engineering Jared Ritter speaks about Charter’s perspective on customer relations.

What’s in Your Wallet? For Capital One, the Answer is AI

Nitzan Mekel, managing vice president of machine learning at Capital One, explains how the banking giant is integrating AI and machine learning into customer-facing applications such as fraud-monitoring and detection, call center operations and customer experience.

Tune in to the AI Podcast

Get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn. If your favorite isn’t listed here, drop us a note.

Tune in to the Apple Podcast Tune in to the Google Podcast Tune in to the Spotify Podcast

Make the AI Podcast Better

Have a few minutes to spare? Fill out this listener survey. Your answers will help us make a better podcast.

The post Not So Taxing: Intuit Uses AI to Make Tax Day Easier appeared first on The Official NVIDIA Blog.

Read More