For October, we are spotlighting Akshitha Sriraman, Assistant Professor, Department of Electrical and Computer Engineering at Carnegie Mellon University (CMU).Read More
Google at ECCV 2022
Google is proud to be a Platinum Sponsor of the European Conference on Computer Vision (ECCV 2022), a premier forum for the dissemination of research in computer vision and machine learning (ML). This year, ECCV 2022 will be held as a hybrid event, in person in Tel Aviv, Israel with virtual attendance as an option. Google has a strong presence at this year’s conference with over 60 accepted publications and active involvement in a number of workshops and tutorials. We look forward to sharing some of our extensive research and expanding our partnership with the broader ML research community.
Registered for ECCV 2022? We hope you’ll visit our on-site or virtual booths to learn more about the research we’re presenting at ECCV 2022, including several demos and opportunities to connect with our researchers. Learn more about Google’s research being presented at ECCV 2022 below (Google affiliations in bold).
Organizing Committee
Program Chairs include: Moustapha Cissé
Awards Paper Committee: Todd Zickler
Area Chairs include: Ayan Chakrabarti, Tali Dekel, Alireza Fathi, Vittorio Ferrari, David Fleet, Dilip Krishnan, Michael Rubinstein, Cordelia Schmid, Deqing Sun, Federico Tombari, Jasper Uijlings, Ming-Hsuan Yang, Todd Zickler
Accepted Publications
NeuMesh: Learning Disentangled Neural Mesh-Based Implicit Field for Geometry and Texture Editing
Bangbang Yang, Chong Bao, Junyi Zeng, Hujun Bao, Yinda Zhang, Zhaopeng Cui, Guofeng Zhang
Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks
Zihang Zou, Boqing Gong, Liqiang Wang
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris N. Metaxas
Waymo Open Dataset: Panoramic Video Panoptic Segmentation
Jieru Mei, Alex Zhu, Xinchen Yan, Hang Yan, Siyuan Qiao, Yukun Zhu, Liang-Chieh Chen, Henrik Kretzschmar
PRIF: Primary Ray-Based Implicit Function
Brandon Yushan Feng, Yinda Zhang, Danhang Tang, Ruofei Du, Amitabh Varshney
LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling
Boyan Jiang, Xinlin Ren, Mingsong Dou, Xiangyang Xue, Yanwei Fu, Yinda Zhang
k-Means Mask Transformer (see blog post)
Qihang Yu*, Siyuan Qiao, Maxwell D Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
MaxViT: Multi-Axis Vision Transformer (see blog post)
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li
E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs
Yanyan Li, Federico Tombari
RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation
Ruida Zhang, Yan Di, Zhiqiang Lou, Fabian Manhardt, Federico Tombari, Xiangyang Ji
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
Huseyin Coskun, Alireza Zareian, Joshua L Moore, Federico Tombari, Chen Wang
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin*
Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-spoofing
Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang
DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning
Zifeng Wang*, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister
BLT: Bidirectional Layout Transformer for Controllable Layout Generation
Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma
Learning Visibility for Robust Dense Human Body Estimation
Chun-Han Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang
Are Vision Transformers Robust to Patch Perturbations?
Jindong Gu, Volker Tresp, Yao Qin
PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds
Zhaoqi Leng, Shuyang Cheng, Ben Caine, Weiyue Wang, Xiao Zhang, Jonathon Shlens, Mingxing Tan, Dragomir Anguelov
Structure and Motion from Casual Videos
Zhoutong Zhang, Forrester Cole, Zhengqi Li, Noah Snavely, Michael Rubinstein, William T. Freeman
PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map
Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer, Masayoshi Tomizuka, Alireza Fathi, Wei Zhan
Novel Class Discovery Without Forgetting
Joseph K J, Sujoy Paul, Gaurav Aggarwal, Soma Biswas, Piyush Rai, Kai Han, Vineeth N Balasubramanian
Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning
Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks
Nan Ding, Xi Chen, Tomer Levinboim, Soravit Changpinyo, Radu Soricut
InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images
Zhengqi Li, Qianqian Wang*, Noah Snavely, Angjoo Kanazawa*
Generalizable Patch-Based Neural Rendering (see blog post)
Mohammed Suhail*, Carlos Esteves, Leonid Sigal, Ameesh Makadia
LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds
Minghua Liu, Yin Zhou, Charles R. Qi, Boqing Gong, Hao Su, Dragomir Anguelov
The Missing Link: Finding Label Relations Across Datasets
Jasper Uijlings, Thomas Mensink, Vittorio Ferrari
Learning Instance-Specific Adaptation for Cross-Domain Segmentation
Yuliang Zou, Zizhao Zhang, Chun-Liang Li, Han Zhang, Tomas Pfister, Jia-Bin Huang
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Medhini Narasimhan*, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid
On Label Granularity and Object Localization
Elijah Cole, Kimberly Wilber, Grant Van Horn, Xuan Yang, Marco Fornoni, Pietro Perona, Serge Belongie, Andrew Howard, Oisin Mac Aodha
Disentangling Architecture and Training for Optical Flow
Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David J. Fleet, William T. Freeman
NewsStories: Illustrating Articles with Visual Summaries
Reuben Tan, Bryan Plummer, Kate Saenko, J.P. Lewis, Avneesh Sud, Thomas Leung
Improving GANs for Long-Tailed Data Through Group Spectral Regularization
Harsh Rangwani, Naman Jaswani, Tejan Karmali, Varun Jampani, Venkatesh Babu Radhakrishnan
Planes vs. Chairs: Category-Guided 3D Shape Learning Without Any 3D Cues
Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James Rehg
A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch
Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, James Hays
Learned Monocular Depth Priors in Visual-Inertial Initialization
Yunwen Zhou, Abhishek Kar, Eric L. Turner, Adarsh Kowdle, Chao Guo, Ryan DuToit, Konstantine Tsotsos
How Stable are Transferability Metrics Evaluations?
Andrea Agostinelli, Michal Pandy, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari
Data-Free Neural Architecture Search via Recursive Label Calibration
Zechun Liu*, Zhiqiang Shen, Yun Long, Eric Xing, Kwang-Ting Cheng, Chas H. Leichner
Fast and High Quality Image Denoising via Malleable Convolution
Yifan Jiang*, Bartlomiej Wronski, Ben Mildenhall, Jonathan T. Barron, Zhangyang Wang, Tianfan Xue
Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation
Jogendra Nath Kundu, Suvaansh Bhambri, Akshay R Kulkarni, Hiran Sarkar,
Varun Jampani, Venkatesh Babu Radhakrishnan
Learning Online Multi-Sensor Depth Fusion
Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc Van Gool
Hierarchical Semantic Regularization of Latent Spaces in StyleGANs
Tejan Karmali, Rishubh Parihar, Susmit Agrawal, Harsh Rangwani, Varun Jampani, Maneesh K Singh, Venkatesh Babu Radhakrishnan
RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers
Michał J Tyszkiewicz, Kevis-Kokitsi Maninis, Stefan Popov, Vittorio Ferrari
Neural Video Compression Using GANs for Detail Synthesis and Propagation
Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn, Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie
Implicit Neural Representations for Image Compression
Yannick Strümpler, Janis Postels, Ren Yang, Luc Van Gool, Federico Tombari
3D Compositional Zero-Shot Learning with DeCompositional Consensus
Muhammad Ferjad Naeem, Evin Pınar Örnek, Yongqin Xian, Luc Van Gool, Federico Tombari
FindIt: Generalized Localization with Natural Language Queries (see blog post)
Weicheng Kuo, Fred Bertsch, Wei Li, AJ Piergiovanni, Mohammad Saffar, Anelia Angelova
A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation
Wuyang Chen*, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou
Improved Masked Image Generation with Token-Critic
Jose Lezama, Huiwen Chang, Lu Jiang, Irfan Essa
Learning Discriminative Shrinkage Deep Networks for Image Deconvolution
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis*, Scott Wisdom, Tal Remez, John Hershey
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer, Alexey Gritsenko, Austin C Stone, Maxim Neumann, Dirk Weißenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby
COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf
Video Question Answering with Iterative Video-Text Co-tokenization (see blog post)
AJ Piergiovanni, Kairo Morton*, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova
Class-Agnostic Object Detection with Multi-modal Transformer
Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang
FILM: Frame Interpolation for Large Motion (see blog post)
Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, Brian Curless
Compositional Human-Scene Interaction Synthesis with Semantic Control
Kaifeng Zhao, Shaofei Wang, Yan Zhang, Thabo Beeler, Siyu Tang
Workshops
LatinX in AI
Mentors include: José Lezama
Keynote Speakers include: Andre Araujo
AI for Creative Video Editing and Understanding
Keynote Speakers include: Tali Dekel, Negar Rostamzadeh
Learning With Limited and Imperfect Data (L2ID)
Invited Speakers include: Xiuye Gu
Organizing Committee includes: Sadeep Jayasumana
International Challenge on Compositional and Multimodal Perception (CAMP)
Program Committee includes: Edward Vendrow
Self-Supervised Learning: What is Next?
Invited Speakers include: Mathilde Caron, Arsha Nagrani
Organizers include: Andrew Zisserman
3rd Workshop on Adversarial Robustness In the Real World
Invited Speakers include: Ekin Dogus Cubuk
Organizers include: Xinyun Chen, Alexander Robey, Nataniel Ruiz, Yutong Bai
AV4D: Visual Learning of Sounds in Spaces
Invited Speakers include: John Hershey
Challenge on Mobile Intelligent Photography and Imaging (MIPI)
Invited Speakers include: Peyman Milanfar
Robust Vision Challenge 2022
Organizing Committee includes: Alina Kuznetsova
Computer Vision in the Wild
Challenge Organizers include: Yi-Ting Chen, Ye Xia
Invited Speakers include: Yin Cui, Yongqin Xian, Neil Houlsby
Self-Supervised Learning for Next-Generation Industry-Level Autonomous Driving (SSLAD)
Organizers include: Fisher Yu
Responsible Computer Vision
Organizing Committee includes: Been Kim
Invited Speakers include: Emily Denton
Cross-Modal Human-Robot Interaction
Invited Speakers include: Peter Anderson
ISIC Skin Image Analysis
Organizing Committee includes: Yuan Liu
Steering Committee includes: Yuan Liu, Dale Webster
Invited Speakers include: Yuan Liu
Observing and Understanding Hands in Action
Sponsored by Google
Autonomous Vehicle Vision (AVVision)
Speakers include: Fisher Yu
Visual Perception for Navigation in Human Environments: The JackRabbot Human Body Pose Dataset and Benchmark
Organizers include: Edward Vendrow
Language for 3D Scenes
Invited Speakers include: Jason Baldridge
Organizers include: Leonidas Guibas
Designing and Evaluating Computer Perception Systems (CoPe)
Organizers include: Andrew Zisserman
Learning To Generate 3D Shapes and Scenes
Panelists include: Pete Florence
Advances in Image Manipulation
Program Committee includes: George Toderici, Ming-Hsuan Yang
TiE: Text in Everything
Challenge Organizers include: Shangbang Long, Siyang Qin
Invited Speakers include: Tali Dekel, Aishwarya Agrawal
Instance-Level Recognition
Organizing Committee: Andre Araujo, Bingyi Cao, Tobias Weyand
Invited Speakers include: Mathilde Caron
What Is Motion For?
Organizing Committee: Deqing Sun, Fitsum Reda, Charles Herrmann
Invited Speakers include: Tali Dekel
Neural Geometry and Rendering: Advances and the Common Objects in 3D Challenge
Invited Speakers include: Ben Mildenhall
Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications
Invited Speakers include: Klaus Greff, Thomas Kipf
Organizing Committee includes: Leonidas Guibas
Vision with Biased or Scarce Data (VBSD)
Program Committee includes: Yizhou Wang
Multiple Object Tracking and Segmentation in Complex Environments
Invited Speakers include: Xingyi Zhou, Fisher Yu
3rd Visual Inductive Priors for Data-Efficient Deep Learning Workshop
Organizing Committee includes: Ekin Dogus Cubuk
DeeperAction: Detailed Video Action Understanding and Anomaly Recognition
Advisors include: Rahul Sukthankar
Sign Language Understanding Workshop and Sign Language Recognition, Translation & Production Challenge
Organizing Committee includes: Andrew Zisserman
Speakers include: Andrew Zisserman
Ego4D: First-Person Multi-Modal Video Understanding
Invited Speakers include: Michal Irani
AI-Enabled Medical Image Analysis: Digital Pathology & Radiology/COVID19
Program Chairs include: Po-Hsuan Cameron Chen
Workshop Partner: Google Health
Visual Object Tracking Challenge (VOT 2022)
Technical Committee includes: Christoph Mayer
Assistive Computer Vision and Robotics
Technical Committee includes: Maja Mataric
Human Body, Hands, and Activities from Egocentric and Multi-View Cameras
Organizers include: Francis Engelmann
Frontiers of Monocular 3D Perception: Implicit x Explicit
Panelists include: Pete Florence
Tutorials
Self-Supervised Representation Learning in Computer Vision
Invited Speakers include: Ting Chen
Neural Volumetric Rendering for Computer Vision
Organizers include: Ben Mildenhall, Pratul Srinivasan, Jon Barron
Presenters include: Ben Mildenhall, Pratul Srinivasan
New Frontiers in Efficient Neural Architecture Search!
Speakers include: Ruochen Wang
*Work done while at Google. ↩
How AI can help in the fight against breast cancer
In 2020, there were 2.3 million people diagnosed with breast cancer and 685,000 deaths globally. Early cancer detection is key to better health outcomes. But screenings are work intensive, and patients often find getting mammograms and waiting for results stressful.
In response to these challenges, Google Health and Northwestern Medicine partnered in 2021 on a clinical research study to explore whether artificial intelligence (AI) models can reduce the time to diagnosis during the screening process, narrowing the assessment gap and improving the patient experience. This work is among the first prospective randomized controlled studies for AI in breast cancer screening, and the results will be published in early 2023.
Behind this work, are scientists and researchers united in the fight against breast cancer. We spoke with Dr. Sunny Jansen, a technical program manager at Google, and Sally Friedewald, MD, the division chief of Breast and Women’s Imaging at Northwestern University Feinberg School of Medicine, on how they hope this work will help screening providers catch cancer earlier and improve the patient experience.
What were you hoping to achieve with this work in the fight against breast cancer?
Dr. Jansen: Like so many of us, I know how breast cancer can impact families and communities, and how critical early detection can be. The experiences of so many around me have influenced my work in this area. I hope that AI can make the future of breast cancer screening easier, faster, more accurate — and, ultimately, more accessible for women globally.
So we sought to understand how AI can reduce diagnostic delays and help patients receive diagnoses as soon as possible by streamlining care into a single visit. For patients with abnormal findings at screening, the diagnostic delay to get additional imaging tests is typically a couple of weeks in the U.S. Often, the results are normal after the additional imaging tests, but that waiting period can be nerve-racking. Additionally, it can be harder for some patients to come back to get additional imaging tests, which exacerbates delays and leads to disparities in the timeliness of care.
Dr. Friedewald: I anticipate an increase in the demand for screenings and challenges in having enough providers with the necessary specialized training. Using AI, we can identify patients who need additional imaging when they are still in the clinic. We can expedite their care, and, in many cases, eliminate the need for return visits. Patients who aren’t flagged still receive the care they need as well. This translates into operational efficiencies and ultimately leads to patients getting a breast cancer diagnosis faster. We already know the earlier treatment starts, the better.
What were your initial beliefs about applying AI to identify breast cancer? How have these changed through your work on this project?
Dr. Jansen: Most existing publications about AI and breast cancer analyze AI performance retrospectively by reviewing historical datasets. While retrospective studies have a lot of value, they don’t necessarily represent how AI works in the real world. Sally decided early on that it would be important to do a prospective study, incorporating AI into real-world clinical workflows and measuring the impact. I wasn’t sure what to expect!
Dr. Friedewald: Computer-aided detection (CAD), which was developed a few decades ago to help radiologists identify cancers via mammogram, has proven to be helpful in some environments. Overall, in the U.S., CAD has not resulted in increased cancer detection. I was concerned that AI would be similar to CAD in efficacy. However, AI gathers data in a fundamentally different way. I am hopeful that with this new information we can identify cancers earlier with the ultimate goal of saving lives.
The research will be published in early 2023. What did you find most inspiring and hopeful about what you learned?
Dr. Jansen: The patients who consented to participate in the study inspired me. Clinicians and scientists must conduct quality real-world research so that the best ideas can be identified and moved forward, and we need patients as equal partners in our research.
Dr. Friedewald: Agreed! There’s an appetite to improve our processes and make screening easier and less anxiety-provoking. I truly believe that if we can streamline care for our patients, we will decrease the stress associated with screening and hopefully improve access for those who need it.
Additionally, AI has the potential to go beyond the prioritization of patients who need care. By prospectively identifying patients who are at higher risk of developing breast cancer, AI could help us determine patients that might need a more rigorous screening regimen. I am looking forward to collaborating with Google on this topic and others that could ultimately improve cancer survival.
Lessons learned from 10 years of DynamoDB
Prioritizing predictability over efficiency, adapting data partitioning to traffic, and continuous verification are a few of the principles that help ensure stability, availability, and efficiency.Read More
PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations
Evolution strategy (ES) is a family of optimization techniques inspired by the ideas of natural selection: a population of candidate solutions are usually evolved over generations to better adapt to an optimization objective. ES has been applied to a variety of challenging decision making problems, such as legged locomotion, quadcopter control, and even power system control.
Compared to gradient-based reinforcement learning (RL) methods like proximal policy optimization (PPO) and soft actor-critic (SAC), ES has several advantages. First, ES directly explores in the space of controller parameters, while gradient-based methods often explore within a limited action space, which indirectly influences the controller parameters. More direct exploration has been shown to boost learning performance and enable large scale data collection with parallel computation. Second, a major challenge in RL is long-horizon credit assignment, e.g., when a robot accomplishes a task in the end, determining which actions it performed in the past were the most critical and should be assigned a greater reward. Since ES directly considers the total reward, it relieves researchers from needing to explicitly handle credit assignment. In addition, because ES does not rely on gradient information, it can naturally handle highly non-smooth objectives or controller architectures where gradient computation is non-trivial, such as meta–reinforcement learning. However, a major weakness of ES-based algorithms is their difficulty in scaling to problems that require high-dimensional sensory inputs to encode the environment dynamics, such as training robots with complex vision inputs.
In this work, we propose “PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations”, a learning algorithm that combines representation learning and ES to effectively solve high dimensional problems in a scalable way. The core idea is to leverage predictive information, a representation learning objective, to obtain a compact representation of the high-dimensional environment dynamics, and then apply Augmented Random Search (ARS), a popular ES algorithm, to transform the learned compact representation into robot actions. We tested PI-ARS on the challenging problem of visual-locomotion for legged robots. PI-ARS enables fast training of performant vision-based locomotion controllers that can traverse a variety of difficult environments. Furthermore, the controllers trained in simulated environments successfully transfer to a real quadruped robot.
PI-ARS trains reliable visual-locomotion policies that are transferable to the real world. |
Predictive Information
A good representation for policy learning should be both compressive, so that ES can focus on solving a much lower dimensional problem than learning from raw observations would entail, and task-critical, so the learned controller has all the necessary information needed to learn the optimal behavior. For robotic control problems with high-dimensional input space, it is critical for the policy to understand the environment, including the dynamic information of both the robot itself and its surrounding objects.
As such, we propose an observation encoder that preserves information from the raw input observations that allows the policy to predict the future states of the environment, thus the name predictive information (PI). More specifically, we optimize the encoder such that the encoded version of what the robot has seen and planned in the past can accurately predict what the robot might see and be rewarded in the future. One mathematical tool to describe such a property is that of mutual information, which measures the amount of information we obtain about one random variable X by observing another random variable Y. In our case, X and Y would be what the robot saw and planned in the past, and what the robot sees and is rewarded in the future. Directly optimizing the mutual information objective is a challenging problem because we usually only have access to samples of the random variables, but not their underlying distributions. In this work we follow a previous approach that uses InfoNCE, a contrastive variational bound on mutual information to optimize the objective.
Predictive Information with Augmented Random Search
Next, we combine PI with Augmented Random Search (ARS), an algorithm that has shown excellent optimization performance for challenging decision-making tasks. At each iteration of ARS, it samples a population of perturbed controller parameters, evaluates their performance in the testing environment, and then computes a gradient that moves the controller towards the ones that performed better.
We use the learned compact representation from PI to connect PI and ARS, which we call PI-ARS. More specifically, ARS optimizes a controller that takes as input the learned compact representation PI and predicts appropriate robot commands to achieve the task. By optimizing a controller with smaller input space, it allows ARS to find the optimal solution more efficiently. Meanwhile, we use the data collected during ARS optimization to further improve the learned representation, which is then fed into the ARS controller in the next iteration.
Visual-Locomotion for Legged Robots
We evaluate PI-ARS on the problem of visual-locomotion for legged robots. We chose this problem for two reasons: visual-locomotion is a key bottleneck for legged robots to be applied in real-world applications, and the high-dimensional vision-input to the policy and the complex dynamics in legged robots make it an ideal test-case to demonstrate the effectiveness of the PI-ARS algorithm. A demonstration of our task setup in simulation can be seen below. Policies are first trained in simulated environments, and then transferred to hardware.
Experiment Results
We first evaluate the PI-ARS algorithm on four challenging simulated tasks:
- Uneven stepping stones: The robot needs to walk over uneven terrain while avoiding gaps.
- Quincuncial piles: The robot needs to avoid gaps both in front and sideways.
- Moving platforms: The robot needs to walk over stepping stones that are randomly moving horizontally or vertically. This task illustrates the flexibility of learning a vision-based policy in comparison to explicitly reconstructing the environment.
- Indoor navigation: The robot needs to navigate to a random location while avoiding obstacles in an indoor environment.
As shown below, PI-ARS is able to significantly outperform ARS in all four tasks in terms of the total task reward it can obtain (by 30-50%).
We further deploy the trained policies to a real Laikago robot on two tasks: random stepping stone and indoor navigation. We demonstrate that our trained policies can successfully handle real-world tasks. Notably, the success rate of the random stepping stone task improved from 40% in the prior work to 100%.
PI-ARS trained policy enables a real Laikago robot to navigate around obstacles. |
Conclusion
In this work, we present a new learning algorithm, PI-ARS, that combines gradient-based representation learning with gradient-free evolutionary strategy algorithms to leverage the advantages of both. PI-ARS enjoys the effectiveness, simplicity, and parallelizability of gradient-free algorithms, while relieving a key bottleneck of ES algorithms on handling high-dimensional problems by optimizing a low-dimensional representation. We apply PI-ARS to a set of challenging visual-locomotion tasks, among which PI-ARS significantly outperforms the state of the art. Furthermore, we validate the policy learned by PI-ARS on a real quadruped robot. It enables the robot to walk over randomly-placed stepping stones and navigate in an indoor space with obstacles. Our method opens the possibility of incorporating modern large neural network models and large-scale data into the field of evolutionary strategy for robotics control.
Acknowledgements
We would like to thank our paper co-authors: Ofir Nachum, Tingnan Zhang, Sergio Guadarrama, and Jie Tan. We would also like to thank Ian Fischer and John Canny for valuable feedback.
Create synthetic data for computer vision pipelines on AWS
Collecting and annotating image data is one of the most resource-intensive tasks on any computer vision project. It can take months at a time to fully collect, analyze, and experiment with image streams at the level you need in order to compete in the current marketplace. Even after you’ve successfully collected data, you still have a constant stream of annotation errors, poorly framed images, small amounts of meaningful data in a sea of unwanted captures, and more. These major bottlenecks are why synthetic data creation needs to be in the toolkit of every modern engineer. By creating 3D representations of the objects we want to model, we can rapidly prototype algorithms while concurrently collecting live data.
In this post, I walk you through an example of using the open-source animation library Blender to build an end-to-end synthetic data pipeline, using chicken nuggets as an example. The following image is an illustration of the data generated in this blog post.
What is Blender?
Blender is an open-source 3D graphics software primarily used in animation, 3D printing, and virtual reality. It has an extremely comprehensive rigging, animation, and simulation suite that allows the creation of 3D worlds for nearly any computer vision use case. It also has an extremely active support community where most, if not all, user errors are solved.
Set up your local environment
We install two versions of Blender: one on a local machine with access to a GUI, and the other on an Amazon Elastic Compute Cloud (Amazon EC2) P2 instance.
Install Blender and ZPY
Install Blender from the Blender website.
Then complete the following steps:
- Run the following commands:
- Copy the necessary Python headers into the Blender version of Python so that you can use other non-Blender libraries:
- Override your Blender version and force installs so that the Blender-provided Python works:
- Download
zpy
and install from source: - Change the NumPy version to
>=1.19.4
andscikit-image>=0.18.1
to make the install on3.10.2
possible and so you don’t get any overwrites: - To ensure compatibility with Blender 3.2, go into
zpy/render.py
and comment out the following two lines (for more information, refer to Blender 3.0 Failure #54): - Next, install the
zpy
library: - Download the add-ons version of
zpy
from the GitHub repo so you can actively run your instance: - Save a file called
enable_zpy_addon.py
in your/home
directory and run the enablement command, because you don’t have a GUI to activate it:If
zpy-addon
doesn’t install (for whatever reason), you can install it via the GUI. - In Blender, on the Edit menu, choose Preferences.
- Choose Add-ons in the navigation pane and activate
zpy
.
You should see a page open in the GUI, and you’ll be able to choose ZPY. This will confirm that Blender is loaded.
AliceVision and Meshroom
Install AliceVision and Meshrooom from their respective GitHub repos:
FFmpeg
Your system should have ffmpeg
, but if it doesn’t, you’ll need to download it.
Instant Meshes
You can either compile the library yourself or download the available pre-compiled binaries (which is what I did) for Instant Meshes.
Set up your AWS environment
Now we set up the AWS environment on an EC2 instance. We repeat the steps from the previous section, but only for Blender and zpy
.
- On the Amazon EC2 console, choose Launch instances.
- Choose your AMI.There are a few options from here. We can either choose a standard Ubuntu image, pick a GPU instance, and then manually install the drivers and get everything set up, or we can take the easy route and start with a preconfigured Deep Learning AMI and only worry about installing Blender.For this post, I use the second option, and choose the latest version of the Deep Learning AMI for Ubuntu (Deep Learning AMI (Ubuntu 18.04) Version 61.0).
- For Instance type¸ choose p2.xlarge.
- If you don’t have a key pair, create a new one or choose an existing one.
- For this post, use the default settings for network and storage.
- Choose Launch instances.
- Choose Connect and find the instructions to log in to our instance from SSH on the SSH client tab.
- Connect with SSH:
ssh -i "your-pem" ubuntu@IPADDRESS.YOUR-REGION.compute.amazonaws.com
Once you’ve connected to your instance, follow the same installation steps from the previous section to install Blender and zpy
.
Data collection: 3D scanning our nugget
For this step, I use an iPhone to record a 360-degree video at a fairly slow pace around my nugget. I stuck a chicken nugget onto a toothpick and taped the toothpick to my countertop, and simply rotated my camera around the nugget to get as many angles as I could. The faster you film, the less likely you get good images to work with depending on the shutter speed.
After I finished filming, I sent the video to my email and extracted the video to a local drive. From there, I used ffmepg
to chop the video into frames to make Meshroom ingestion much easier:
Open Meshroom and use the GUI to drag the nugget_images
folder to the pane on the left. From there, choose Start and wait a few hours (or less) depending on the length of the video and if you have a CUDA-enabled machine.
You should see something like the following screenshot when it’s almost complete.
Data collection: Blender manipulation
When our Meshroom reconstruction is complete, complete the following steps:
- Open the Blender GUI and on the File menu, choose Import, then choose Wavefront (.obj) to your created texture file from Meshroom.
The file should be saved inpath/to/MeshroomCache/Texturing/uuid-string/texturedMesh.obj
. - Load the file and observe the monstrosity that is your 3D object.
Here is where it gets a bit tricky. - Scroll to the top right side and choose the Wireframe icon in Viewport Shading.
- Select your object on the right viewport and make sure it’s highlighted, scroll over to the main layout viewport, and either press Tab or manually choose Edit Mode.
- Next, maneuver the viewport in such a way as to allow yourself to be able to see your object with as little as possible behind it. You’ll have to do this a few times to really get it correct.
- Click and drag a bounding box over the object so that only the nugget is highlighted.
- After it’s highlighted like in the following screenshot, we separate our nugget from the 3D mass by left-clicking, choosing Separate, and then Selection.
We now move over to the right, where we should see two textured objects:texturedMesh
andtexturedMesh.001
. - Our new object should be
texturedMesh.001
, so we choosetexturedMesh
and choose Delete to remove the unwanted mass.
- Choose the object (
texturedMesh.001
) on the right, move to our viewer, and choose the object, Set Origin, and Origin to Center of Mass.
Now, if we want, we can move our object to the center of the viewport (or simply leave it where it is) and view it in all its glory. Notice the large black hole where we didn’t really get good film coverage from! We’re going to need to correct for this.
To clean our object of any pixel impurities, we export our object to an .obj file. Make sure to choose Selection Only when exporting.
Data collection: Clean up with Instant Meshes
Now we have two problems: our image has a pixel gap creating by our poor filming that we need to clean up, and our image is incredibly dense (which will make generating images extremely time-consuming). To tackle both issues, we need to use a software called Instant Meshes to extrapolate our pixel surface to cover the black hole and also to shrink the total object to a smaller, less dense size.
- Open Instant Meshes and load our recently saved
nugget.obj
file.
- Under Orientation field, choose Solve.
- Under Position field, choose Solve.
Here’s where it gets interesting. If you explore your object and notice that the criss-cross lines of the Position solver look disjointed, you can choose the comb icon under Orientation field and redraw the lines properly. - Choose Solve for both Orientation field and Position field.
- If everything looks good, export the mesh, name it something like
nugget_refined.obj
, and save it to disk.
Data collection: Shake and bake!
Because our low-poly mesh doesn’t have any image texture associated with it and our high-poly mesh does, we either need to bake the high-poly texture onto the low-poly mesh, or create a new texture and assign it to our object. For sake of simplicity, we’re going to create an image texture from scratch and apply that to our nugget.
I used Google image search for nuggets and other fried things in order to get a high-res image of the surface of a fried object. I found a super high-res image of a fried cheese curd and made a new image full of the fried texture.
With this image, I’m ready to complete the following steps:
- Open Blender and load the new
nugget_refined.obj
the same way you loaded your initial object: on the File menu, choose Import, Wavefront (.obj), and choose thenugget_refined.obj
file. - Next, go to the Shading tab.
At the bottom you should notice two boxes with the titles Principled BDSF and Material Output. - On the Add menu, choose Texture and Image Texture.
An Image Texture box should appear. - Choose Open Image and load your fried texture image.
- Drag your mouse between Color in the Image Texture box and Base Color in the Principled BDSF box.
Now your nugget should be good to go!
Data collection: Create Blender environment variables
Now that we have our base nugget object, we need to create a few collections and environment variables to help us in our process.
- Left-click on the hand scene area and choose New Collection.
- Create the following collections: BACKGROUND, NUGGET, and SPAWNED.
- Drag the nugget to the NUGGET collection and rename it nugget_base.
Data collection: Create a plane
We’re going to create a background object from which our nuggets will be generated when we’re rendering images. In a real-world use case, this plane is where our nuggets are placed, such as a tray or bin.
- On the Add menu, choose Mesh and then Plane.
From here, we move to the right side of the page and find the orange box (Object Properties). - In the Transform pane, for XYZ Euler, set X to 46.968, Y to 46.968, and Z to 1.0.
- For both Location and Rotation, set X, Y, and Z to 0.
Data collection: Set the camera and axis
Next, we’re going to set our cameras up correctly so that we can generate images.
- On the Add menu, choose Empty and Plain Axis.
- Name the object Main Axis.
- Make sure our axis is 0 for all the variables (so it’s directly in the center).
- If you have a camera already created, drag that camera to under Main Axis.
- Choose Item and Transform.
- For Location, set X to 0, Y to 0, and Z to 100.
Data collection: Here comes the sun
Next, we add a Sun object.
- On the Add menu, choose Light and Sun.
The location of this object doesn’t necessarily matter as long as it’s centered somewhere over the plane object we’ve set. - Choose the green lightbulb icon in the bottom right pane (Object Data Properties) and set the strength to 5.0.
- Repeat the same procedure to add a Light object and put it in a random spot over the plane.
Data collection: Download random backgrounds
To inject randomness into our images, we download as many random textures from texture.ninja as we can (for example, bricks). Download to a folder within your workspace called random_textures
. I downloaded about 50.
Generate images
Now we get to the fun stuff: generating images.
Image generation pipeline: Object3D and DensityController
Let’s start with some code definitions:
We first define a basic container Class with some important properties. This class mainly exists to allow us to create a BVH tree (a way to represent our nugget object in 3D space), where we’ll need to use the BVHTree.overlap
method to see if two independent generated nugget objects are overlapping in our 3D space. More on this later.
The second piece of code is our density controller. This serves as a way to bound ourselves to the rules of reality and not the 3D world. For example, in the 3D Blender world, objects in Blender can exist inside each other; however, unless someone is performing some strange science on our chicken nuggets, we want to make sure no two nuggets are overlapping by a degree that makes it visually unrealistic.
We use our Plane
object to spawn a set of bounded invisible cubes that can be queried at any given time to see if the space is occupied or not.
See the following code:
In the following snippet, we select the nugget and create a bounding cube around that nugget. This cube represents the size of a single pseudo-voxel of our psuedo-kdtree object. We need to use the bpy.context.view_layer.update()
function because when this code is being run from inside a function or script vs. the blender-gui, it seems that the view_layer
isn’t automatically updated.
Next, we slightly update our cube object so that its length and width are square, as opposed to the natural size of the nugget it was created from:
Now we use our updated cube object to create a plane that can volumetrically hold num_objects
amount of nuggets:
We take our plane object and create a giant cube of the same length and width as our plane, with the height of our nugget cube, CUBE1:
From here, we want to create voxels from our cube. We take the number of cubes we would to fit num_objects
and then cut them from our cube object. We look for the upward-facing mesh-face of our cube, and then pick that face to make our cuts. See the following code:
Lastly, we calculate the center of the top-face of each cut we’ve made from our big cube and create actual cubes from those cuts. Each of these newly created cubes represents a single piece of space to spawn or move nuggets around our plane. See the following code:
Next, we develop an algorithm that understands which cubes are occupied at any given time, finds which objects overlap with each other, and moves overlapping objects separately into unoccupied space. We won’t be able get rid of all overlaps entirely, but we can make it look real enough.
See the following code:
Image generation pipeline: Cool runnings
In this section, we break down what our run
function is doing.
We initialize our DensityController
and create something called a saver using the ImageSaver
from zpy
. This allows us to seemlessly save our rendered images to any location of our choosing. We then add our nugget
category (and if we had more categories, we would add them here). See the following code:
Next, we need to make a source object from which we spawn copy nuggets from; in this case, it’s the nugget_base
that we created:
Now that we have our base nugget, we’re going to save the world poses (locations) of all the other objects so that after each rendering run, we can use these saved poses to reinitialize a render. We also move our base nugget completely out of the way so that the kdtree doesn’t sense a space being occupied. Finally, we initialize our kdtree-cube objects. See the following code:
The following code collects our downloaded backgrounds from texture.ninja, where they’ll be used to be randomly projected onto our plane:
Here is where the magic begins. We first regenerate out kdtree-cubes for this run so that we can start fresh:
We use our density controller to generate a random spawn point for our nugget, create a copy of nugget_base
, and move the copy to the randomly generated spawn point:
Next, we randomly jitter the size of the nugget, the mesh of the nugget, and the scale of the nugget so that no two nuggets look the same:
We turn our nugget copy into an Object3D
object where we use the BVH tree functionality to see if our plane intersects or overlaps any face or vertices on our nugget copy. If we find an overlap with the plane, we simply move the nugget upwards on its Z axis. See the following code:
Now that all nuggets are created, we use our DensityController
to move nuggets around so that we have a minimum number of overlaps, and those that do overlap aren’t hideous looking:
In the following code: we restore the Camera
and Main Axis
poses and randomly select how far the camera is to the Plane
object:
We decide how randomly we want the camera to travel along the Main Axis
. Depending on if we want it to be mainly overhead or if we care very much about the angle from which it sees the board, we can adjust the top_down_mostly
parameter depending on how well our training model is picking up the signal of “What even is a nugget anyway?”
In the following code, we do the same thing with the Sun
object, and randomly pick a texture for the Plane
object:
Finally, we hide all our objects that we don’t want to be rendered: the nugget_base
and our entire cube structure:
Lastly, we use zpy
to render our scene, save our images, and then save our annotations. For this post, I made some small changes to the zpy
annotation library for my specific use case (annotation per image instead of one file per project), but you shouldn’t have to for the purpose of this post).
Voila!
Run the headless creation script
Now that we have our saved Blender file, our created nugget, and all the supporting information, let’s zip our working directory and either scp
it to our GPU machine or uploaded it via Amazon Simple Storage Service (Amazon S3) or another service:
Log in to your EC2 instance and decompress your working_blender folder:
Now we create our data in all its glory:
The script should run for 500 images, and the data is saved in /path/to/working_blender_dir/nugget_data
.
The following code shows a single annotation created with our dataset:
Conclusion
In this post, I demonstrated how to use the open-source animation library Blender to build an end-to-end synthetic data pipeline.
There are a ton of cool things you can do in Blender and AWS; hopefully this demo can help you on your next data-starved project!
References
- Easily Clean Your 3D Scans (blender)
- Instant Meshes: A free quad-based autoretopology program
- How to 3D Scan an Object for Synthetic Data
- Generate synthetic data with Blender and Python
About the Author
Matt Krzus is a Sr. Data Scientist at Amazon Web Service in the AWS Professional Services group
Enable CI/CD of multi-Region Amazon SageMaker endpoints
Amazon SageMaker and SageMaker inference endpoints provide a capability of training and deploying your AI and machine learning (ML) workloads. With inference endpoints, you can deploy your models for real-time or batch inference. The endpoints support various types of ML models hosted using AWS Deep Learning Containers or your own containers with custom AI/ML algorithms. When you launch SageMaker inference endpoints with multiple instances, SageMaker distributes the instances across multiple Availability Zones (in a single Region) for high availability.
In some cases, however, to ensure lowest possible latency for customers in diverse geographical areas, you may require deploying inference endpoints in multiple Regions. Multi-Regional deployment of SageMaker endpoints and other related application and infrastructure components can also be part of a disaster recovery strategy for your mission-critical workloads aimed at mitigating the risk of a Regional failure.
SageMaker Projects implements a set of pre-built MLOps templates that can help manage endpoint deployments. In this post, we show how you can extend an MLOps SageMaker Projects pipeline to enable multi-Regional deployment of your AI/ML inference endpoints.
Solution overview
SageMaker Projects deploys both training and deployment MLOPs pipelines; you can use these to train a model and deploy it using an inference endpoint. To reduce complexity and cost of a multi-Region solution, we assume that you train the model in a single Region and deploy inference endpoints in two or more Regions.
This post presents a solution that slightly modifies a SageMaker project template to support multi-Region deployment. To better illustrate the changes, the following figure displays both a standard MLOps pipeline created automatically by SageMaker (Steps 1-5) as well as changes required to extend it to a secondary Region (Steps 6-11).
The SageMaker Projects template automatically deploys a boilerplate MLOps solution, which includes the following components:
- Amazon EventBridge monitors AWS CodeCommit repositories for changes and starts a run of AWS CodePipeline if a code commit is detected.
- If there is a code change, AWS CodeBuild orchestrates the model training using SageMaker training jobs.
- After the training job is complete, the SageMaker model registry registers and catalogs the trained model.
- To prepare for the deployment stage, CodeBuild extends the default AWS CloudFormation template configuration files with parameters of an approved model from the model registry.
- Finally, CodePipeline runs the CloudFormation templates to deploy the approved model to the staging and production inference endpoints.
The following additional steps modify the MLOps Projects template to enable the AI/ML model deployment in the secondary Region:
- A replica of the Amazon Simple Storage Service (Amazon S3) bucket in the primary Region storing model artifacts is required in the secondary Region.
- The CodePipeline template is extended with more stages to run a cross-Region deployment of the approved model.
- As part of the cross-Region deployment process, the CodePipeline template uses a new CloudFormation template to deploy the inference endpoint in a secondary Region. The CloudFormation template deploys the model from the model artifacts from the S3 replica bucket created in Step 6.
9–11 optionally, create resources in Amazon Route 53, Amazon API Gateway, and AWS Lambda to route application traffic to inference endpoints in the secondary Region.
Prerequisites
Create a SageMaker project in your primary Region (us-east-2 in this post). Complete the steps in Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines until the section Modifying the sample code for a custom use case.
Update your pipeline in CodePipeline
In this section, we discuss how to add manual CodePipeline approval and cross-Region model deployment stages to your existing pipeline created for you by SageMaker.
- On the CodePipeline console in your primary Region, find and select the pipeline containing your project name and ending with deploy. This pipeline has already been created for you by SageMaker Projects. You modify this pipeline to add AI/ML endpoint deployment stages for the secondary Region.
- Choose Edit.
- Choose Add stage.
- For Stage name, enter
SecondaryRegionDeployment
. - Choose Add stage.
- In the
SecondaryRegionDeployment
stage, choose Add action group.In this action group, you add a manual approval step for model deployment in the secondary Region. - For Action name, enter
ManualApprovaltoDeploytoSecondaryRegion
. - For Action provider, choose Manual approval.
- Leave all other settings at their defaults and choose Done.
- In the
SecondaryRegionDeployment
stage, choose Add action group (afterManualApprovaltoDeploytoSecondaryRegion
).In this action group, you add a cross-Region AWS CloudFormation deployment step. You specify the names of build artifacts that you create later in this post. - For Action name, enter
DeploytoSecondaryRegion
. - For Action provider, choose AWS Cloud Formation.
- For Region, enter your secondary Region name (for example,
us-west-2
). - For Input artifacts, enter
BuildArtifact
. - For ActionMode, enter
CreateorUpdateStack
. - For StackName, enter
DeploytoSecondaryRegion
. - Under Template, for Artifact Name, select
BuildArtifact
. - Under Template, for File Name, enter
template-export-secondary-region.yml
. - Turn Use Configuration File on.
- Under Template, for Artifact Name, select
BuildArtifact
. - Under Template, for File Name, enter
secondary-region-config-export.json
. - Under Capabilities, choose
CAPABILITY_NAMED_IAM
. - For Role, choose
AmazonSageMakerServiceCatalogProductsUseRole
created by SageMaker Projects. - Choose Done.
- Choose Save.
- If a Save pipeline changes dialog appears, choose Save again.
Modify IAM role
We need to add additional permissions to the AWS Identity and Access Management (IAM) role AmazonSageMakerServiceCatalogProductsUseRole
created by AWS Service Catalog to enable CodePipeline and S3 bucket access for cross-Region deployment.
- On the IAM console, choose Roles in the navigation pane.
- Search for and select
AmazonSageMakerServiceCatalogProductsUseRole
. - Choose the IAM policy under Policy name:
AmazonSageMakerServiceCatalogProductsUseRole-XXXXXXXXX
. - Choose Edit Policy and then JSON.
- Modify the AWS CloudFormation permissions to allow CodePipeline to sync the S3 bucket in the secondary Region. You can replace the existing IAM policy with the updated one from the following GitHub repo (see lines:16-18, 198, 213)
- Choose Review policy.
- Choose Save changes.
Add the deployment template for the secondary Region
To spin up an inference endpoint in the secondary Region, the SecondaryRegionDeployment
stage needs a CloudFormation template (for endpoint-config-template-secondary-region.yml
) and a configuration file (secondary-region-config.json
).
The CloudFormation template is configured entirely through parameters; you can further modify it to fit your needs. Similarly, you can use the config file to define the parameters for the endpoint launch configuration, such as the instance type and instance count:
To add these files to your project, download them from the provided links and upload them to Amazon SageMaker Studio in the primary Region. In Studio, choose File Browser and then the folder containing your project name and ending with modeldeploy
.
Upload these files to the deployment repository’s root folder by choosing the upload icon. Make sure the files are located in the root folder as shown in the following screenshot.
Modify the build Python file
Next, we need to adjust the deployment build.py
file to enable SageMaker endpoint deployment in the secondary Region to do the following:
- Retrieve the location of model artifacts and Amazon Elastic Container Registry (Amazon ECR) URI for the model image in the secondary Region
- Prepare a parameter file that is used to pass the model-specific arguments to the CloudFormation template that deploys the model in the secondary Region
You can download the updated build.py
file and replace the existing one in your folder. In Studio, choose File Browser and then the folder containing your project name and ending with modeldeploy
. Locate the build.py file and replace it with the one you downloaded.
The CloudFormation template uses the model artifacts stored in a S3 bucket and the Amazon ECR image path to deploy the inference endpoint in the secondary Region. This is different from the deployment from the model registry in the primary Region, because you don’t need to have a model registry in the secondary Region.
Modify the buildspec file
buildspec.yml
contains instructions run by CodeBuild. We modify this file to do the following:
- Install the SageMaker Python library needed to support the code run
- Pass through the –secondary-region and model-specific parameters to
build.py
- Add the S3 bucket content sync from the primary to secondary Regions
- Export the secondary Region CloudFormation template and associated parameter file as artifacts of the CodeBuild step
Open the buildspec.yml
file from the model deploy folder and make the highlighted modifications as shown in the following screenshot.
Alternatively, you can download the following buildspec.yml
file to replace the default file.
Add CodeBuild environment variables
In this step, you add configuration parameters required for CodeBuild to create the model deployment configuration files in the secondary Region.
- On the CodeBuild console in the primary Region, find the project containing your project name and ending with deploy. This project has already been created for you by SageMaker Projects.
- Choose the project and on the Edit menu, choose Environment.
- In the Advanced configuration section, deselect Allow AWS CodeBuild to modify this service role so it can be used with this build project.
- Add the following environment variables, defining the names of the additional CloudFormation templates, secondary Region, and model-specific parameters:
-
EXPORT_TEMPLATE_NAME_SECONDARY_REGION – For Value, enter
template-export-secondary-region.yml
and for Type, choose PlainText. -
EXPORT_TEMPLATE_SECONDARY_REGION_CONFIG – For Value, enter
secondary-region-config-export.json
and for Type, choose PlainText. - AWS_SECONDARY_REGION – For Value, enter us-west-2 and for Type, choose PlainText.
-
FRAMEWORK – For Value, enter
xgboost
(replace with your framework) and for Type, choose PlainText. - MODEL_VERSION – For Value, enter 1.0-1 (replace with your model version) and for Type, choose PlainText.
-
EXPORT_TEMPLATE_NAME_SECONDARY_REGION – For Value, enter
- Copy the value of
ARTIFACT_BUCKET
into Notepad or another text editor. You need this value in the next step. - Choose Update environment.
You need the values you specified for model training for FRAMEWORK
and MODEL_VERSION
. For example, to find these values for the Abalone model used in MLOps boilerplate deployment, open Studio and on the File Browser menu, open the folder with your project name and ending with modelbuild. Navigate to pipelines/abalone
and open the pipeline.py
file. Search for sagemaker.image_uris.retrieve
and copy the relevant values.
Create an S3 replica bucket in the secondary Region
We need to create an S3 bucket to hold the model artifacts in the secondary Region. SageMaker uses this bucket to get the latest version of model to spin up an inference endpoint. You only need to do this one time. CodeBuild automatically syncs the content of the bucket in the primary Region to the replication bucket with each pipeline run.
- On the Amazon S3 console, choose Create bucket.
- For Bucket name, enter the value of
ARTEFACT_BUCKET
copied in the previous step and append-replica
to the end (for example,sagemaker-project-X-XXXXXXXX-replica
. - For AWS Region, enter your secondary Region (
us-west-2
). - Leave all other values at their default and choose Create bucket.
Approve a model for deployment
The deployment stage of the pipeline requires an approved model to start. This is required for the deployment in the primary Region.
- In Studio (primary Region), choose SageMaker resources in the navigation pane.
- For Select the resource to view, choose Model registry.
- Choose model group name starting with your project name.
- In the right pane, check the model version, stage and status.
- If the status shows pending, choose the model version and then choose Update status.
- Change status to Approved, then choose Update status.
Deploy and verify the changes
All the changes required for multi-Region deployment of your SageMaker inference endpoint are now complete and you can start the deployment process.
- In Studio, save all the files you edited, choose Git, and choose the repository containing your project name and ending with deploy.
- Choose the plus sign to make changes.
- Under Changed, add
build.py
andbuildspec.yml
. - Under Untracked, add
endpoint-config-template-secondary-region.yml
andsecondary-region-config.json
. - Enter a comment in the Summary field and choose Commit.
- Push the changes to the repository by choosing Push.
Pushing these changes to the CodeCommit repository triggers a new pipeline run, because an EventBridge event monitors for pushed commits. After a few moments, you can monitor the run by navigating to the pipeline on the CodePipeline console.
Make sure to provide manual approval for deployment to production and the secondary Region.
You can verify that the secondary Region endpoint is created on the SageMaker console, by choosing Dashboard in the navigation pane and confirming the endpoint status in Recent activity.
Add API Gateway and Route 53 (Optional)
You can optionally follow the instructions in Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda to expose the SageMaker inference endpoint in the secondary Region as an API using API Gateway and Lambda.
Clean up
To delete the SageMaker project, see Delete an MLOps Project using Amazon SageMaker Studio. To ensure the secondary inference endpoint is destroyed, go to the AWS CloudFormation console and delete the related stacks in your primary and secondary Regions; this destroys the SageMaker inference endpoints.
Conclusion
In this post, we showed how a MLOps specialist can modify a preconfigured MLOps template for their own multi-Region deployment use case, such as deploying workloads in multiple geographies or as part of implementing a multi-Regional disaster recovery strategy. With this deployment approach, you don’t need to configure services in the secondary Region and can reuse the CodePipeline and CloudBuild setups in the primary Region for cross-Regional deployment. Additionally, you can save on costs by continuing the training of your models in the primary Region while utilizing SageMaker inference in multiple Regions to scale your AI/ML deployment globally.
Please let us know your feedback in the comments section.
About the Authors
Mehran Najafi, PhD, is a Senior Solutions Architect for AWS focused on AI/ML and SaaS solutions at Scale.
Steven Alyekhin is a Senior Solutions Architect for AWS focused on MLOps at Scale.
Building the Future of TensorFlow
Posted by the TensorFlow team
We’ve started planning the future of TensorFlow! In this article, we’d like to share our vision.
We open-sourced TensorFlow nearly seven years ago, on November 9, 2015. Since then, thanks to thousands of open-source contributors and our incredible community of Google Developer Experts, community organizers, researchers, and educators around the globe, TensorFlow has come to define its category.
Today, TensorFlow is the most-used machine learning platform, adopted by millions of developers. It’s the 3rd most-starred software repository on GitHub (right behind Vue and React) and the most-downloaded machine learning package on PyPI. It has brought machine learning to the mobile ecosystem: TFLite now runs on four billion devices (maybe on yours, too!). TensorFlow has also brought machine learning to the Web: TensorFlow.js is now downloaded 170 thousand times weekly.
Across Google’s product lineup, TensorFlow powers virtually all production machine learning, from Search, GMail, YouTube, Maps, Play, Ads, Photos, and many more. Beyond Google, at other Alphabet companies, TensorFlow and Keras enable the machine intelligence in Waymo’s self-driving cars.
In the broader industry, TensorFlow powers machine learning systems at thousands of companies, including most of the largest machine learning users in the world – Apple, ByteDance, Netflix, Tencent, Twitter, and countless more. And in the research world, every month, Google Scholar is indexing over 3,000 new scientific publications that mention TensorFlow or Keras.
Today, our user base and developer ecosystem are larger than ever, and growing!
We see the growth of TensorFlow not just as an achievement to celebrate, but as an opportunity to go further and deliver more value for the machine learning community.
Our goal is to provide the best machine learning platform on the planet. Software that will become a new superpower in the toolbox of every developer. Software that will turn machine learning from a niche craft into an industry as mature as web development.
To achieve this, we listen to the needs of our users, anticipate new industry trends, iterate on our APIs, and work to make it increasingly easy for you to innovate at scale. In the same way that TensorFlow originally helped the rise of deep learning, we want to continue to facilitate the evolution of machine learning by giving you the platform that lets you push the boundaries of what’s possible. Machine learning is evolving rapidly, and so is TensorFlow.
Today, we’re excited to announce we’ve started working on the next iteration of TensorFlow that will enable the next decade of machine learning development. We are building on TensorFlow’s class-leading capabilities, and focusing on four pillars.
Four pillars of TensorFlow
Fast and scalable
- XLA Compilation. We are focusing on XLA compilation and aim to make most model training and inference workflows faster on GPU and CPU, building on XLA’s performance wins on TPU. We intend for XLA to become the industry-standard deep learning compiler, and we’ve opened it up to open-source collaboration as part of the OpenXLA initiative.
-
Distributed computing. We are investing in DTensor, a new API for large-scale model parallelism. DTensor unlocks the future of ultra-large model training and deployment and allows you to develop your model as if you were training on a single device, even while using multiple clients. DTensor will be unified with the tf.distribute API, allowing for flexible model and data parallelism.
-
Performance optimization. Besides compilation, we are also further investing in algorithmic performance optimization techniques such as mixed-precision and reduced-precision computation, which can deliver considerable speed ups on GPUs and TPUs.
Applied ML
-
Developer resources. We are adding more code examples, guides, and documentation for popular and emerging applied ML use cases. We aim to increasingly reduce the barrier to entry of ML and turn it into a tool in the hands of every developer.
Ready to deploy
-
C++ API for applications. We are developing a public TF2 C++ API for native server-side inference as part of a C++ application.
-
Deploy JAX models. We are making it easier for you to deploy models developed using JAX with TensorFlow Serving, and to mobile and the web with TensorFlow Lite and TensorFlow.js.
Simplicity
-
NumPy API. As the field of ML expanded over the last few years TensorFlow’s API surface also increased, not always in ways that are consistent or simple to understand. We are working actively on consolidating and simplifying these APIs. For example, we will be adopting the NumPy API standard for numerics.
-
Easier debugging. A framework isn’t just its API surface, it’s also its debugging experience. We aim at minimizing the time-to-solution for developing any applied ML system by focusing on better debugging capabilities.
The future of TensorFlow will be 100% backwards-compatible
We want TensorFlow to serve as a bedrock foundation for the machine learning industry to build upon. We see API stability as our most important feature. As an engineer who relies on TensorFlow as part of their product, as a builder of a TensorFlow ecosystem package, you should be able to upgrade to the latest TensorFlow version and immediately start benefiting from its new features and performance improvements – without fear that your existing codebase might break. As such, we commit to full backwards compatibility from TensorFlow 2 to the next version – your TensorFlow 2 code will run as-is. There will be no conversion script to run, no manual changes to apply.
Timeline
We plan to release a preview of the new TensorFlow capabilities in Q2 2023 and will release the production version later in the year. We will publish regular updates on our progress in the meantime. You can follow our progress via the TensorFlow blog, and on the TensorFlow YouTube channel.
Your feedback is welcome
We want to hear from you! For questions or feedback, please reach out via the TensorFlow forum.
Deep learning with light
Ask a smart home device for the weather forecast, and it takes several seconds for the device to respond. One reason this latency occurs is because connected devices don’t have enough memory or power to store and run the enormous machine-learning models needed for the device to understand what a user is asking of it. The model is stored in a data center that may be hundreds of miles away, where the answer is computed and sent to the device.
MIT researchers have created a new method for computing directly on these devices, which drastically reduces this latency. Their technique shifts the memory-intensive steps of running a machine-learning model to a central server where components of the model are encoded onto light waves.
The waves are transmitted to a connected device using fiber optics, which enables tons of data to be sent lightning-fast through a network. The receiver then employs a simple optical device that rapidly performs computations using the parts of a model carried by those light waves.
This technique leads to more than a hundredfold improvement in energy efficiency when compared to other methods. It could also improve security, since a user’s data do not need to be transferred to a central location for computation.
This method could enable a self-driving car to make decisions in real-time while using just a tiny percentage of the energy currently required by power-hungry computers. It could also allow a user to have a latency-free conversation with their smart home device, be used for live video processing over cellular networks, or even enable high-speed image classification on a spacecraft millions of miles from Earth.
“Every time you want to run a neural network, you have to run the program, and how fast you can run the program depends on how fast you can pipe the program in from memory. Our pipe is massive — it corresponds to sending a full feature-length movie over the internet every millisecond or so. That is how fast data comes into our system. And it can compute as fast as that,” says senior author Dirk Englund, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and member of the MIT Research Laboratory of Electronics.
Joining Englund on the paper is lead author and EECS grad student Alexander Sludds; EECS grad student Saumil Bandyopadhyay, Research Scientist Ryan Hamerly, as well as others from MIT, the MIT Lincoln Laboratory, and Nokia Corporation. The research is published today in Science.
Lightening the load
Neural networks are machine-learning models that use layers of connected nodes, or neurons, to recognize patterns in datasets and perform tasks, like classifying images or recognizing speech. But these models can contain billions of weight parameters, which are numeric values that transform input data as they are processed. These weights must be stored in memory. At the same time, the data transformation process involves billions of algebraic computations, which require a great deal of power to perform.
The process of fetching data (the weights of the neural network, in this case) from memory and moving them to the parts of a computer that do the actual computation is one of the biggest limiting factors to speed and energy efficiency, says Sludds.
“So our thought was, why don’t we take all that heavy lifting — the process of fetching billions of weights from memory — move it away from the edge device and put it someplace where we have abundant access to power and memory, which gives us the ability to fetch those weights quickly?” he says.
The neural network architecture they developed, Netcast, involves storing weights in a central server that is connected to a novel piece of hardware called a smart transceiver. This smart transceiver, a thumb-sized chip that can receive and transmit data, uses technology known as silicon photonics to fetch trillions of weights from memory each second.
It receives weights as electrical signals and imprints them onto light waves. Since the weight data are encoded as bits (1s and 0s) the transceiver converts them by switching lasers; a laser is turned on for a 1 and off for a 0. It combines these light waves and then periodically transfers them through a fiber optic network so a client device doesn’t need to query the server to receive them.
“Optics is great because there are many ways to carry data within optics. For instance, you can put data on different colors of light, and that enables a much higher data throughput and greater bandwidth than with electronics,” explains Bandyopadhyay.
Trillions per second
Once the light waves arrive at the client device, a simple optical component known as a broadband “Mach-Zehnder” modulator uses them to perform super-fast, analog computation. This involves encoding input data from the device, such as sensor information, onto the weights. Then it sends each individual wavelength to a receiver that detects the light and measures the result of the computation.
The researchers devised a way to use this modulator to do trillions of multiplications per second, which vastly increases the speed of computation on the device while using only a tiny amount of power.
“In order to make something faster, you need to make it more energy efficient. But there is a trade-off. We’ve built a system that can operate with about a milliwatt of power but still do trillions of multiplications per second. In terms of both speed and energy efficiency, that is a gain of orders of magnitude,” Sludds says.
They tested this architecture by sending weights over an 86-kilometer fiber that connects their lab to MIT Lincoln Laboratory. Netcast enabled machine-learning with high accuracy — 98.7 percent for image classification and 98.8 percent for digit recognition — at rapid speeds.
“We had to do some calibration, but I was surprised by how little work we had to do to achieve such high accuracy out of the box. We were able to get commercially relevant accuracy,” adds Hamerly.
Moving forward, the researchers want to iterate on the smart transceiver chip to achieve even better performance. They also want to miniaturize the receiver, which is currently the size of a shoe box, down to the size of a single chip so it could fit onto a smart device like a cell phone.
“Using photonics and light as a platform for computing is a really exciting area of research with potentially huge implications on the speed and efficiency of our information technology landscape,” says Euan Allen, a Royal Academy of Engineering Research Fellow at the University of Bath, who was not involved with this work. “The work of Sludds et al. is an exciting step toward seeing real-world implementations of such devices, introducing a new and practical edge-computing scheme whilst also exploring some of the fundamental limitations of computation at very low (single-photon) light levels.”
The research is funded, in part, by NTT Research, the National Science Foundation, the Air Force Office of Scientific Research, the Air Force Research Laboratory, and the Army Research Office.
Deep learning with light
Ask a smart home device for the weather forecast, and it takes several seconds for the device to respond. One reason this latency occurs is because connected devices don’t have enough memory or power to store and run the enormous machine-learning models needed for the device to understand what a user is asking of it. The model is stored in a data center that may be hundreds of miles away, where the answer is computed and sent to the device.
MIT researchers have created a new method for computing directly on these devices, which drastically reduces this latency. Their technique shifts the memory-intensive steps of running a machine-learning model to a central server where components of the model are encoded onto light waves.
The waves are transmitted to a connected device using fiber optics, which enables tons of data to be sent lightning-fast through a network. The receiver then employs a simple optical device that rapidly performs computations using the parts of a model carried by those light waves.
This technique leads to more than a hundredfold improvement in energy efficiency when compared to other methods. It could also improve security, since a user’s data do not need to be transferred to a central location for computation.
This method could enable a self-driving car to make decisions in real-time while using just a tiny percentage of the energy currently required by power-hungry computers. It could also allow a user to have a latency-free conversation with their smart home device, be used for live video processing over cellular networks, or even enable high-speed image classification on a spacecraft millions of miles from Earth.
“Every time you want to run a neural network, you have to run the program, and how fast you can run the program depends on how fast you can pipe the program in from memory. Our pipe is massive — it corresponds to sending a full feature-length movie over the internet every millisecond or so. That is how fast data comes into our system. And it can compute as fast as that,” says senior author Dirk Englund, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and member of the MIT Research Laboratory of Electronics.
Joining Englund on the paper is lead author and EECS grad student Alexander Sludds; EECS grad student Saumil Bandyopadhyay, Research Scientist Ryan Hamerly, as well as others from MIT, the MIT Lincoln Laboratory, and Nokia Corporation. The research is published today in Science.
Lightening the load
Neural networks are machine-learning models that use layers of connected nodes, or neurons, to recognize patterns in datasets and perform tasks, like classifying images or recognizing speech. But these models can contain billions of weight parameters, which are numeric values that transform input data as they are processed. These weights must be stored in memory. At the same time, the data transformation process involves billions of algebraic computations, which require a great deal of power to perform.
The process of fetching data (the weights of the neural network, in this case) from memory and moving them to the parts of a computer that do the actual computation is one of the biggest limiting factors to speed and energy efficiency, says Sludds.
“So our thought was, why don’t we take all that heavy lifting — the process of fetching billions of weights from memory — move it away from the edge device and put it someplace where we have abundant access to power and memory, which gives us the ability to fetch those weights quickly?” he says.
The neural network architecture they developed, Netcast, involves storing weights in a central server that is connected to a novel piece of hardware called a smart transceiver. This smart transceiver, a thumb-sized chip that can receive and transmit data, uses technology known as silicon photonics to fetch trillions of weights from memory each second.
It receives weights as electrical signals and imprints them onto light waves. Since the weight data are encoded as bits (1s and 0s) the transceiver converts them by switching lasers; a laser is turned on for a 1 and off for a 0. It combines these light waves and then periodically transfers them through a fiber optic network so a client device doesn’t need to query the server to receive them.
“Optics is great because there are many ways to carry data within optics. For instance, you can put data on different colors of light, and that enables a much higher data throughput and greater bandwidth than with electronics,” explains Bandyopadhyay.
Trillions per second
Once the light waves arrive at the client device, a simple optical component known as a broadband “Mach-Zehnder” modulator uses them to perform super-fast, analog computation. This involves encoding input data from the device, such as sensor information, onto the weights. Then it sends each individual wavelength to a receiver that detects the light and measures the result of the computation.
The researchers devised a way to use this modulator to do trillions of multiplications per second, which vastly increases the speed of computation on the device while using only a tiny amount of power.
“In order to make something faster, you need to make it more energy efficient. But there is a trade-off. We’ve built a system that can operate with about a milliwatt of power but still do trillions of multiplications per second. In terms of both speed and energy efficiency, that is a gain of orders of magnitude,” Sludds says.
They tested this architecture by sending weights over an 86-kilometer fiber that connects their lab to MIT Lincoln Laboratory. Netcast enabled machine-learning with high accuracy — 98.7 percent for image classification and 98.8 percent for digit recognition — at rapid speeds.
“We had to do some calibration, but I was surprised by how little work we had to do to achieve such high accuracy out of the box. We were able to get commercially relevant accuracy,” adds Hamerly.
Moving forward, the researchers want to iterate on the smart transceiver chip to achieve even better performance. They also want to miniaturize the receiver, which is currently the size of a shoe box, down to the size of a single chip so it could fit onto a smart device like a cell phone.
“Using photonics and light as a platform for computing is a really exciting area of research with potentially huge implications on the speed and efficiency of our information technology landscape,” says Euan Allen, a Royal Academy of Engineering Research Fellow at the University of Bath, who was not involved with this work. “The work of Sludds et al. is an exciting step toward seeing real-world implementations of such devices, introducing a new and practical edge-computing scheme whilst also exploring some of the fundamental limitations of computation at very low (single-photon) light levels.”
The research is funded, in part, by NTT Research, the National Science Foundation, the Air Force Office of Scientific Research, the Air Force Research Laboratory, and the Army Research Office.