September 2022 – Page 18

Concept Designer Ben Mauro Delivers Epic 3D Trailer ‘Huxley’ This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology accelerates creative workflows.

The gripping sci-fi comic Huxley was brought to life in an action-packed 3D trailer full of excitement and intrigue this week In the NVIDIA Studio.

3D artist, concept designer and storyteller Ben Mauro has contributed to some of the world’s biggest entertainment franchises. He’s worked on movies like Elysium, Valerian and Metal Gear Solid, as well as video games such as Halo Infinite and Call of Duty: Black Ops III.

Mauro has met many inspirational artists throughout his storied career, and he collaborated with a few of them to bring Huxley to life. He called the 3D trailer a year’s worth of work, worth every minute spent — following his decade-long process of creating the comic itself.

“Huxley” introduces a vibrant, futuristic world.

In Mauro’s fantastical, fictional world, two post-apocalyptic scavengers stumble upon a forgotten treasure map in the form of an ancient sentient robot, finding themselves amidst a mystery of galactic scale.

In designing Huxley the comic, Mauro worked old-school magic with a pad and pencil, sketching characters and environments before importing visuals into Adobe Photoshop. His NVIDIA GeForce RTX 3090 GPU provided fast performance and AI features to speed up his creative workflow.

Early concept art of “Huxley.” “What has become of me?” it thought.

The artist used Photoshop’s “Artboards” to quickly view reference artwork for inspiration, as well as “Image Size” to preserve critical details — both features accelerated by his GPU. To finish up the comic, Mauro turned to Blender software to create mockups and block out scenes with the intention of later converting back to 3D from 2D.

With 3D trailer production in progress, matte painter and environment artist Steve Cormann used Mauro’s Blender models as a convenient starting point, virtually a one-to-one match to the desired 3D outcome.

Cormann, who specializes in Autodesk 3ds Max software, applied advanced modeling techniques in building the scene. 3ds Max has a GPU-accelerated viewport that guarantees fast and interactive 3D modeling. It also lets artists choose their preferred 3D renderer — which in Cormann’s case is Maxon’s Redshift, where combining GPU acceleration and AI-powered OptiX denoising resulted in lightning-fast final-frame rendering.

Applying textures in Adobe Substance 3D Painter.

This proved useful as Cormann exported scenes into Adobe Substance 3D Painter to apply various textures and colors. RTX-accelerated light- and ambient-occlusion features baked and optimized assets within the scenes in mere seconds, giving Cormann the option to experiment with different visual aesthetics quickly and easily.

All of the hero characters were textured from scratch by artist Antonio Esparza and team.

Enter more of Mauro’s collaborators: lead character artist Antonio Esparza and his team, who spent significant time in 3ds Max to refine individual scenes and generate the staggering number of hero characters. This included uniquely texturing each of the characters and props. Esparza said his GeForce RTX 2080 SUPER GPU allowed him to modify characters and export renders dramatically faster than his previous hardware.

Esparza joked that before his hardware upgrade, “Most of the last hours of the day, it was me here, you know, like, waiting.” Director Sava Živković would say to Esparza, “Turn the lights off Antonio, we don’t want to see that progress bar.”

Meanwhile, Živković turned his focus to lighting in 3ds Max. His trusty GeForce RTX 2080 Ti GPU enabled RTX-accelerated AI denoising with Maxon’s Redshift, resulting in photorealistic visuals while remaining highly interactive. This let the director tweak and modify scenes freely and easily.

City scenes were brought to life using Anima, a simple crowd-simulation software with off-the-shelf character assets.

With renders and textures in a good place, rigging and modeling artist Lucas Salmon began building meshes and rigging in 3ds Max to prepare for animation. Motion capture work was then outsourced to the well-regarded Belgrade-based studio, Take One. With 54 Vicon cameras and one of the biggest capture stages in Europe, it’s no surprise the animation quality in Huxley is world class.

Visual effects were added in Adobe After Effects.

Živković then deployed Adobe After Effects to composite the piece. Roughly 90% of the visual effects (VFX) were accomplished with built-in tools, stock footage and various plugins. Key 3D VFX such as ship smoke trails were simulated in Blender and then added in comp. The ability to move between multiple apps quickly is a testament to the power of the RTX GPU, Živković said.

“I love the RTX 3090 GPU for the extra VRAM, especially for increasingly bigger scenes where I want everything to look really nice and have quality texture sizes,” he said.

Photorealistic details create an immersive experience for the trailer’s viewers.

Satisfied with the trailer, Mauro reflected on artistry. “As creatives, if we don’t see the film, game, or universe we want to experience in our entertainment, we’re in the position to create it with our hard-earned skills. I feel this is our duty as artists and creators to leave behind more imagined worlds than existed before we got there, to inspire the world and the next generation of artists/creators to push things even further than we did.” he said.

Concept designer and storyteller Ben Mauro.

Access Mauro’s impressive portfolio on his website.

Huxley is an entire world rich in history and intrigue, currently being developed into a feature film and TV series.

Onwards and Upwards

Many of the techniques Mauro deployed can be learned by viewing free Studio Session tutorials on the NVIDIA Studio YouTube channel.

Learn core foundational warm-up exercises to inspire and ignite creative thinking, discover how to design sci-fi objects such as props, and transform 2D sketches into 3D models.

Also, in the spirit of learning, the NVIDIA Studio team has posed a challenge for the community to show off personal growth. Participate in the #CreatorsJourney challenge for a chance to be showcased on NVIDIA Studio social media channels.

It’s time to show how you’ve grown as an artist (just like @lowpolycurls)!

Join our #CreatorJourney challenge by sharing something old you created next to something new you’ve made for a chance to be featured on our channels.

Tag #CreatorJourney so we can see your post. pic.twitter.com/PmkgOvhcBW

— NVIDIA Studio (@NVIDIAStudio) August 15, 2022

Entering is easy. Post an older piece of artwork alongside a more recent one to showcase your growth as an artist. Follow and tag NVIDIA Studio on Instagram, Twitter or Facebook, and use the #CreatorsJourney tag to join.

Get creativity-inspiring updates directly to your inbox by subscribing to the NVIDIA Studio newsletter.

The post Concept Designer Ben Mauro Delivers Epic 3D Trailer ‘Huxley’ This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Announcing the winners of the Meta Towards Trustworthy Products in AR, VR, and Smart Devices and Security Research requests for proposals

In April, Meta jointly launched the 2022 Meta Security request for proposals (RFP) and the third annual Towards Trustworthy Products in AR, VR, and Smart Devices RFP…Read More

September Machine Learning Updates

Posted by the TensorFlow team

On September 14, at the Google Developers Summit in Shanghai, China, members of Google’s open-source ML teams will be on stage to talk about updates to our growing ecosystem, and we’d love to share them here with you.

MediaPipe Studio

We recognize that creating and productionizing custom on-device ML solutions can be challenging, so we’re reinventing how you develop them by leveraging simple-to-use abstraction APIs and no-code GUIs. We’re excited to give you a sneak peek at MediaPipe Studio, our low-code and no-code solution that gets you from data to modeling to deployment on Android or iOS with native code integration libraries that make it easy to build ML-powered apps.

General Availability of TensorFlow Lite in Google Play Services

We recently launched the general availability of TensorFlow Lite in Google Play services. With this, the TensorFlow Lite runtime is automatically managed and updated by Google Play services, meaning you no longer need to ship it as part of your application. Your apps get smaller, and you get regular updates in the background, so your users will always have the latest version. This is nice for you as an app developer, because your user will get updates and bug fixes to the framework automatically, reducing the burden on you to provide them. And TensorFlow Lite in Google Play Services is production ready, already running over 100 billion daily inferences.

Tensor Projects

At Google, we are creating a world-class family of ML tools across all hardware and device types. Because we are committed to building tools that are fit for purpose, from cutting-edge research to tried-and-true planet-scale deployments, we are sharing our vision of an open ML ecosystem of the future: Tensor Projects.

Tensor Projects is an ecosystem of ML technologies and platforms that bring together Google’s ML tools, and organize efforts across our world-class engineering and research teams. It creates a space and a promise of continued innovation and support to enable researchers, developers, MLOps, and business teams to build responsible and cutting edge ML, from novel model development to scaled production ML in any data center or on any device.

These tools, like TensorFlow, Keras, JAX, and MediaPipe Studio, will work well independently, with each other, and/or with other industry-leading tools and standards. We want to give you full flexibility and choice to build powerful, performant infrastructure for all of your ML use cases. And it’s just the beginning. Tensor Projects will evolve and grow as ML continues to advance. Watch the summary video here:

Updates to Tensorflow.org

We have an updated experience on tensorflow.org for new or advanced users to easily find resources. You can quickly identify the right TensorFlow tool for your task, explore pre-built artifacts for faster model creation, find ideas and inspiration, get involved in the community, discover quick start guides for common scenarios and much more.

PyTorch Foundation

We believe in the power of choice for ML developers and continue to invest resources to make it easy to train, deploy and manage models. Our investment intends to bring machine learning to every developer’s toolbox and covers a broad spectrum of offerings: from TensorFlow and Keras, which provide free and open source offerings to millions of developers, allowing them to succeed with ML, and to JAX, which empowers researchers across Alphabet.

Additionally, in the spirit of openness, we support PyTorch developers with Cloud TPU using XLA. To continue to help all developers succeed with Google Cloud, and to better position Google to make meaningful contributions to the community, we’re delighted to announce our role as a founding member of the newly formed PyTorch Foundation. As a member of the board, we will deepen our open source investment to deliver on the Foundation’s mission to drive the adoption of AI and ML through open source platforms.

Thank you for reading! To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow.

Save the date: Join AWS at NVIDIA GTC, September 19–22

Register free for NVIDIA GTC to learn from experts on how AI and the evolution of the 3D internet are profoundly impacting industries—and society as a whole. We have prepared several AWS sessions to give you guidance on how to use AWS services powered by NVIDIA technology to meet your goals. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast machine learning (ML) training, cost-effective ML inference, flexible remote virtual workstations, and powerful HPC computations.

AWS is a Global Diamond Sponsor of the conference.

Available sessions

Scaling Deep Learning Training on Amazon EC2 using PyTorch (Presented by Amazon Web Services) [A41454]
As deep learning models grow in size and complexity, they need to be trained using distributed architectures. In this session, we review the details of the PyTorch fully sharded data parallel (FSDP) algorithm, which enables you to train deep learning models at scale.

Tuesday, September 20, at 2:00 PM – 2:50 PM PDT
Speakers: Shubha Kumbadakone, Senior GTM Specialist, AWS ML, AWS; and Less Wright, Partner Engineer, Meta

A Developer’s Guide to Choosing the Right GPUs for Deep Learning (Presented by Amazon Web Services) [A41463]
As a deep learning developer or data scientist, choosing the right GPU for deep learning can be challenging. On AWS, you can choose from multiple NVIDIA GPU-based EC2 compute instances depending on your training and deployment requirements. We dive into how to choose the right instance for your needs in this session.

Available on demand
Speaker: Shashank Prasanna, Senior Developer Advocate, AI/ML, AWS

Real-time Design in the Cloud with NVIDIA Omniverse on Amazon EC2 (Presented by Amazon Web Services) [A4631]
In this session, we discuss how, by deploying NVIDIA Omniverse Nucleus—the Universal Scene Description (USD) collaboration engine—on EC2 On-Demand compute instances, Omniverse is able to scale to meet the demands of global teams.

Available on demand
Speaker: Kellan Cartledge, Spatial Computing Solutions Architect, AWS

5G Killer App: Making Augmented and Virtual Reality a Reality [A41234]
Extended reality (XR), which comprises augmented, virtual, and mixed realities, is consistently envisioned as one of the key killer apps for 5G, because XR requires ultra-low latency and large bandwidths to deliver wired-equivalent experiences for users. In this session, we share how Verizon, AWS, and Ericsson are collaborating to combine 5G and XR technology with NVIDIA GPUs, RTX vWS, and CloudXR to build the infrastructure for commercial XR services across a variety of industries.

Tuesday, September 20, at 1:00 PM – 1:50 PM PDT
Speakers: David Randle, Global Head of GTM for Spatial Computing, AWS; Veronica Yip, Product Manager and Product Marketing Manager, NVIDIA; Balaji Raghavachari, Executive Director, Tech Strategy, Verizon; and Peter Linder, Head of 5G Marketing, North America, Ericsson

Accelerate and Scale GNNs with Deep Graph Library and GPUs [A41386]
Graphs play important roles in many applications, including drug discovery, recommender systems, fraud detection, and cybersecurity. Graph neural networks (GNNs) are the current state-of-the-art method for computing graph embeddings in these applications. This session discusses the recent improvements of the Deep Graph Library on NVIDIA GPUs in the DGL 0.9 release cycle.

Wednesday, September 21, at 2:00 PM – 2:50 PM PDT
Speaker: Da Zheng, Senior Applied Scientist, AWS

Register for free for access to this content, and be sure to visit our sponsor page to learn more about AWS solutions powered by NVIDIA. See you there!

About the author

Jeremy Singh is a Partner Marketing Manager for storage partners within the AWS Partner Network. In his spare time, he enjoys traveling, going to the beach, and spending time with his dog Bolin.

How Medidata used Amazon SageMaker asynchronous inference to accelerate ML inference predictions up to 30 times faster

This post is co-written with Rajnish Jain, Priyanka Kulkarni and Daniel Johnson from Medidata.

Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical devices, and diagnostics companies as well as academic researchers with accelerating value, minimizing risk, and optimizing outcomes for their solutions. More than one million registered users across over 1,900 customers and partners access the world’s most trusted platform for clinical development, commercial, and real-world data.

Medidata’s AI team combines unparalleled clinical data, advanced analytics, and industry expertise to help life sciences leaders reimagine what is possible, uncover breakthrough insights to make confident decisions, and pursue continuous innovation. Medidata’s AI suite of solutions is backed by an integrated team of scientists, physicians, technologists, and ex-regulatory officials—built upon Medidata’s core platform comprising over 27,000 trials and 8 million patients.

Amazon SageMaker is a fully managed machine learning (ML) platform within the secure AWS landscape. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. For hosting trained ML models, SageMaker offers a wide array of options. Depending on the type of traffic pattern and latency requirements, you could choose one of these several options. For example, real-time inference is suitable for persistent workloads with millisecond latency requirements, payload sizes up to 6 MB, and processing times of up to 60 seconds. With Serverless Inference, you can quickly deploy ML models for inference without having to configure or manage the underlying infrastructure, and you pay only for the compute capacity used to process inference requests, which is ideal for intermittent workloads. For requests with large unstructured data with payload sizes up to 1 GB, with processing times up to 15 mins, and near real-time latency requirements, you can use asynchronous inference. Batch transform is ideal for offline predictions on large batches of data that are available up front.

In this collaborative post, we demonstrate how AWS helped Medidata take advantage of the various hosting capabilities within SageMaker to experiment with different architecture choices for predicting the operational success of proposed clinical trials. We also validate why Medidata chose SageMaker asynchronous inference for its final design and how this final architecture helped Medidata serve its customers with predictions up to 30 times faster while keeping ML infrastructure costs relatively low.

Architecture evolution

System design is not about choosing one right architecture. It’s the ability to discuss and experiment multiple possible approaches and weigh their trade-offs in satisfying the given requirements for our use case. During this process, it’s essential to take into account prior knowledge of various types of requirements and existing common systems that can interact with our proposed design. The scalability of a system is its ability to easily and cost-effectively vary resources allocated to it so as to serve changes in load. This applies to both increasing or decreasing user numbers or requests to the system.

In the following sections, we discuss how Medidata worked with AWS in iterating over a list of possible scalable architecture designs. We especially focus on the evolution journey, design choices, and trade-offs we went through to arrive at a final choice.

SageMaker batch transform

Medidata originally used SageMaker batch transform for ML inference to meet current requirements and develop a minimum viable product (MVP) for a new predictive solution due to low usage and loose performance requirements of the application. When a batch transform job starts, SageMaker initializes compute instances and distributes the inference or preprocessing workload between them. It’s a high-performance and high-throughput method for transforming data and generating inferences. It’s ideal for scenarios where you’re dealing with large batches of data, don’t need subsecond latency, and need to either preprocess or transform the data or use a trained model to run batch predictions on it in a distributed manner. The Sagemaker batch transform workflow also uses Amazon Simple Storage Service (Amazon S3) as the persistent layer, which maps to one of our data requirements.

Initially, using SageMaker batch transform worked well for the MVP, but as the requirements evolved and Medidata needed to support its customers in near real time, batch transform wasn’t suitable because it was an offline method and customers need to wait anywhere between 5–15 minutes for responses. This primarily included the startup cost for the underlying compute cluster to spin up every time a batch workload needs to be processed. This architecture also required configuring Amazon CloudWatch event rules to track the progress of the batch predictions job together with employing a database of choice to track the states and metadata of the fired job. The MVP architecture is shown in the following diagram.

The flow of this architecture is as follows:

The incoming bulk payload is persisted as an input to an S3 location. This event in turn triggers an AWS Lambda Submit function.
The Submit function kicks off a SageMaker batch transform job using the SageMaker runtime client.
The Submit function also updates a state and metadata tracker database of choice with the job ID and sets the status of the job to inProgress. The function also updates the job ID with its corresponding metadata information.
The transient (on-demand) compute cluster required to process the payload spins up, initiating a SageMaker batch transform job. At the same time, the job also emits status notifications and other logging information to CloudWatch logs.
The CloudWatch event rule captures the status of the batch transform job and sends a status notification to an Amazon Simple Notification Service (Amazon SNS) topic configured to capture this information.
The SNS topic is subscribed by a Notification Lambda function that is triggered every time an event rule is fired by CloudWatch and when there is a message in the SNS topic.
The Notification function then updates the status of the transform job for success or failure in the tracking database.

While exploring alternative strategies and architectures, Medidata realized that the traffic pattern for the application consisted of short bursts followed by periods of inactivity. To validate the drawbacks of this existing MVP architecture, Medidata performed some initial benchmarking to understand and prioritize the bottlenecks of this pipeline. As shown in the following diagram, the largest bottleneck was the transition time before running the model for inference due to spinning up new resources with each bulk request. The definition of a bulk request here corresponds to a payload that is a collection of operational site data to be processed rather than a single instance of a request. The second biggest bottleneck was the time to save and write the output, which was also introduced due to the batch model architecture.

As the number of clients increased and usage multiplied, Medidata prioritized user experience by tightening performance requirements. Therefore, Medidata decided to replace the batch transform workflow with a faster alternative. This led to Medidata experimenting with several architecture designs involving SageMaker real-time inference, Lambda, and SageMaker asynchronous inference. In the following sections, we compare these evaluated designs in depth and analyze the technical reasons for choosing one over the other for Medidata’s use case.

SageMaker real-time inference

You can use SageMaker real-time endpoints to serve your models for predictions in real time with low latency. Serving your predictions in real time requires a model serving stack that not only has your trained model, but also a hosting stack to be able to serve those predictions. The hosting stack typically include a type of proxy, a web server that can interact with your loaded serving code, and your trained model. Your model can then be consumed by client applications through a real-time invoke API request. The request payload sent when you invoke the endpoint is routed to a load balancer and then routed to your ML instance or instances that are hosting your models for prediction. SageMaker real-time inference comes with all of the aforementioned components and makes it relatively straightforward to host any type of ML model for synchronous real-time inference.

SageMaker real-time inference has a 60-second timeout for endpoint invocation, and the maximum payload size for invocation is capped out at 6 MB. Because Medidata’s inference logic is complex and frequently requires more than 60 seconds, real-time inference alone can’t be a viable option for dealing with bulk requests that normally require unrolling and processing many individual operational identifiers without re-architecting the existing ML pipeline. Additionally, real-time inference endpoints need to be sized to handle peak load. This could be challenging because Medidata has quick bursts of high traffic. Auto scaling could potentially fix this issue, but it would require manual tuning to ensure there are enough resources to handle all requests at any given time. Alternatively, we could manage a request queue to limit the number of concurrent requests at a given time, but this would introduce additional overhead.

Lambda

Serverless offerings like Lambda eliminate the hassle of provisioning and managing servers, and automatically take care of scaling in response to varying workloads. They can be also much cheaper for lower-volume services because they don’t run 24/7. Lambda works well for workloads that can tolerate cold starts after periods of inactivity. If a serverless function has not been run for approximately 15 minutes, the next request experiences what is known as a cold start because the function’s container must be provisioned.

Medidata built several proof of concept (POC) architecture designs to compare Lambda with other alternatives. As a first simple implementation, the ML inference code was packaged as a Docker image and deployed as a container using Lambda. To facilitate faster predictions with this setup, the invoked Lambda function requires a large provisioned memory footprint. For larger payloads, there is an extra overhead to compress the input before calling the Lambda Docker endpoint. Additional configurations are also needed for the CloudWatch event rules to save the inputs and outputs, tracking the progress of the request, and employing a database of choice to track the internal states and metadata of the fired requests. Additionally, there is also an operational overhead for reading and writing data to Amazon S3. Medidata calculated the projected cost of the Lambda approach based on usage estimates and determined it would be much more expensive than SageMaker with no added benefits.

SageMaker asynchronous inference

Asynchronous inference is one of the newest inference offerings in SageMaker that uses an internal queue for incoming requests and processes them asynchronously. This option is ideal for inferences with large payload sizes (up to 1 GB) or long-processing times (up to 15 minutes) that need to be processed as requests arrive. Asynchronous inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.

For use cases that can tolerate a cold start penalty of a few minutes, you can optionally scale down the endpoint instance count to zero when there are no outstanding requests and scale back up as new requests arrive so that you only pay for the duration that the endpoints are actively processing requests.

Creating an asynchronous inference endpoint is very similar to creating a real-time endpoint. You can use your existing SageMaker models and only need to specify additional asynchronous inference configuration parameters while creating your endpoint configuration. Additionally, you can attach an auto scaling policy to the endpoint according to your scaling requirements. To invoke the endpoint, you need to place the request payload in Amazon S3 and provide a pointer to the payload as a part of the invocation request. Upon invocation, SageMaker enqueues the request for processing and returns an output location as a response. Upon processing, SageMaker places the inference response in the previously returned Amazon S3 location. You can optionally choose to receive success or error notifications via Amazon SNS.

Based on the different architecture designs discussed previously, we identified several bottlenecks and complexity challenges with these architectures. With the launch of asynchronous inference and based on our extensive experimentation and performance benchmarking, Medidata decided to choose SageMaker asynchronous inference for their final architecture for hosting due to a number of reasons outlined earlier. SageMaker is designed from the ground up to support ML workloads, whereas Lambda is more of a general-purpose tool. For our specific use case and workload type, SageMaker asynchronous inference is cheaper than Lambda. Also, SageMaker asynchronous inference’s timeout is much longer (15 minutes) compared to the real-time inference timeout of 60 seconds. This ensures that asynchronous inference can support all of Medidata’s workloads without modification. Additionally, SageMaker asynchronous inference queues up requests during quick bursts of traffic rather than dropping them, which was a strong requirement as per our use case. Exception and error handling is automatically taken care of for you. Asynchronous inference also makes it easy to handle large payload sizes, which is a common pattern with our inference requirements. The final architecture diagram using SageMaker asynchronous inference is shown in the following figure.

The flow of our final architecture is as follows:

The Submit function receives the bulk payload from upstream consumer applications and is set up to be event-driven. This function uploads the payload to the pre-designated Amazon S3 location.
The Submit function then invokes the SageMaker asynchronous endpoint, providing it with the Amazon S3 pointer to the uploaded payload.
The function also updates the state of the request to inProgress in the state and metadata tracker database.
The SageMaker asynchronous inference endpoint reads the input from Amazon S3 and runs the inference logic. When the ML inference succeeds or fails, the inference output is written back to Amazon S3 and the status is sent to an SNS topic.
A Notification Lambda function subscribes to the SNS topic. The function is invoked whenever a status update notification is published to the topic.
The Notification function updates the status of the request to success or failure in the state and metadata tracker database.

To recap, the batch transform MVP architecture we started with took 5–15 minutes to run depending on the size of the input. With the switch to asynchronous inference, the new solution runs end to end in 10–60 seconds. We see a speedup of at least five times faster for larger inputs and up to 30 times faster for smaller inputs, leading to better customer satisfaction with the performance results. The revised final architecture greatly simplifies the previous asynchronous fan-out/fan-in architecture because we don’t have to worry about partitioning the incoming payload, spawning workers, and delegating and consolidating work amongst the worker Lambda functions.

Conclusion

With SageMaker asynchronous inference, Medidata’s customers using this new predictive application now experience a speedup that’s up to 30 times faster for predictions. Requests aren’t dropped during traffic spikes because the asynchronous inference endpoint queues up requests rather than dropping them. The built-in SNS notification was able to overcome the custom CloudWatch event log notification that Medidata had built to notify the app when the job was complete. In this case, the asynchronous inference approach is cheaper than Lambda. SageMaker asynchronous inference is an excellent option if your team is running heavy ML workloads with burst traffic while trying to minimize cost. This is a great example of collaboration with the AWS team to push the boundaries and use bleeding edge technology for maximum efficiency.

For detailed steps on how to create, invoke, and monitor asynchronous inference endpoints, refer to documentation, which also contains a sample notebook to help you get started. For pricing information, visit Amazon SageMaker Pricing. For examples on using asynchronous inference with unstructured data such as computer vision and natural language processing (NLP), refer to Run computer vision inference on large videos with Amazon SageMaker asynchronous endpoints and Improve high-value research with Hugging Face and Amazon SageMaker asynchronous inference endpoints, respectively.

About the authors

Rajnish Jain is a Senior Director of Engineering at Medidata AI based in NYC. Rajnish heads engineering for a suite of applications that use machine learning on AWS to help customers improve operational success of proposed clinical trials. He is passionate about the use of machine learning to solve business problems.

Priyanka Kulkarni is a Lead Software Engineer within Acorn AI at Medidata Solutions. She architects and develops solutions and infrastructure to support ML predictions at scale. She is a data-driven engineer who believes in building innovative software solutions for customer success.

Daniel Johnson is a Senior Software Engineer within Acorn AI at Medidata Solutions. He builds APIs to support ML predictions around the feasibility of proposed clinical trials.

Arunprasath Shankar is a Senior AI/ML Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.

Raghu Ramesha is an ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.

Amazon and Harvard launch alliance to advance research in quantum networking

Collaboration will seek to advance the development of a quantum internet.Read More

Fusion-Id: A Photoplethysmography and Motion Sensor Fusion Biometric Authenticator With Few-Shot on-Boarding

The abundance of wrist-worn heart rate measuring devices enables long term cardiovascular monitoring through photoplethysmography (PPG). Such signals contain unique identifiable information that can help in biometric authentication. In this work, we propose Fusion-ID, which use wrist-worn PPG sensors fused with motion sensor data as a way to do bio authentication on wrist worn devices. We conducted a user study using a PPG and motion sensor enabled wrist-worn device and collected data from 247 users. We then propose a novel sensor fusion deep Siamese network architecture for feature embedding…Apple Machine Learning Research

PyTorch strengthens its governance by joining the Linux Foundation

Today, I am proud to announce that PyTorch is moving to the Linux Foundation (LF) as a top-level project under the name PyTorch Foundation. The core mission of the Linux Foundation is the collaborative development of open source software. With a governing board of leaders from AMD, Amazon Web Services (AWS), Google Cloud, Meta, Microsoft Azure and NVIDIA, this model aligns with where PyTorch stands today and what it needs to travel forward. The creation of the PyTorch Foundation will ensure business decisions are being made in a transparent and open manner by a diverse group of members for years to come. The technical decisions remain in control of individual maintainers. I’m excited that the Linux Foundation will be our new home as they have notable experience supporting large open-source projects like ours such as Kubernetes and NodeJS. At this pivotal moment, I want to take a look back at how we started, share why we are moving, and what’s ahead.

This January, PyTorch celebrated its 5 year anniversary! I reflected on what it meant to me in this tweet thread, and this conversation with my colleagues Mike Schroepfer, Lin Qiao, and Yann LeCun. When we started PyTorch development in 2016, it was a collective effort by a band of people from the [Lua]Torch community with a big chunk of people and funding from Meta and individuals contributing from NVIDIA, Twitter and other entities.

Since 2017, PyTorch has grown far beyond our initial vision. With over 2,400 contributors who have built nearly 154,000 projects using PyTorch as a foundation, PyTorch has become one of the primary platforms for AI research, as well as commercial production use. We’ve seen its impact across industry and academia, from large companies to numerous university courses at Stanford, NYU, EPFL, Oxford, and other academic institutions. As a maintainer of PyTorch, the journey has been extremely fulfilling, with the impact of the project seen in various fields from self-driving cars to healthcare to aerospace.

As PyTorch grew, many companies have made foundational investments around it. While Meta remains the largest contributor to PyTorch, companies such as AMD, Amazon Web Services (AWS), Google Cloud, HuggingFace, Lightning AI, Microsoft Azure, Nvidia, and many others have made significant investments, including both technical contributions and community building efforts. They’ve established teams around PyTorch or filled significant voids within the PyTorch community and sent countless contributions to the PyTorch core and to the ecosystem around it — PyTorch is an important part of their future. With PyTorch continuing to grow as a multi-stakeholder project, it’s time to move to a broader open-source foundation.

The business governance of PyTorch was fairly unstructured for quite some time since launch – we operated like a scrappy startup. Team members at Meta spent the time and energy to structure this properly and organize PyTorch into an organizationally more healthy entity. Meta helped PyTorch with introducing many structures, such as Contributor License Agreements, Branding Guidelines, and Trademark registration. Keeping PyTorch’s organizational health up to check is essential and beneficial for the community. The next stage of our organizational progress is to support the interests of multiple stakeholders, hence moving to a foundation is good. We chose the Linux Foundation as it has vast organization experience hosting large multi-stakeholder open-source projects with the right balance of organizational structure and finding specific solutions for these projects.

Simultaneously, the technical governance of PyTorch has been a loosely structured community model of open-source development — A set of people maintaining PyTorch by area with their responsibility often tied to their individual identity rather than their employment. While we kept a codified list at the PyTorch – Persons of Interest page, the technical governance was not formalized nor codified. As PyTorch scales as a community, the next step is to structure and codify. The PyTorch Technical Governance now supports a hierarchical maintainer structure and clear outlining of processes around day to day work and escalations. This doesn’t change how we run things, but it does add discipline and openness that at our scale feels essential and timely.

It’s been an exciting journey since 2016. I am grateful for the experiences and people I’ve met along the way. PyTorch started with a small group of contributors which have grown and diversified over the years, all bringing in new ideas and innovations that would not have been possible without our community. We want to continue the open-source spirit – for the community and by the community. Thank you to our contributors, maintainers, users, supporters and new foundation members. We look forward to the next chapter of PyTorch with the PyTorch Foundation.

Tracking Any Pixel in a Video

We upgrade pixels into PIPs: “Persistent Independent Particles”. With this representation, we track any pixel over time, and overcome visibility issues with a learned temporal prior.

Motion estimation is a fundamental task of computer vision, with extremely broad applications. By tracking something, you can build models of its various properties: shape, texture, articulation, dynamics, affordances, and so on. More fine-grained tracking allows more fine-grained understanding. For robots, fine-grained tracking also enables fine-grained manipulation. Even setting aside downstream AI-related applications, motion tracks are directly useful for video editing applications — making realistic edits to a person or object in a video demands precise-as-possible tracking of the pixels, across an indefinite timespan.

There are a variety of methods for tracking objects (at the level of segmentation masks or bounding boxes), or for tracking certain points in certain categories (e.g., the joints of a person), but there are actually very few options for general-purpose fine-grained tracking. In this domain, the dominant approaches are feature matching and optical flow. The feature matching approach is: compute a feature for the target on the first frame, then compute features for pixels in other frames, and then compute “matches” using feature similarity (i.e., nearest neighbors). This often works well, but does not take into account temporal context, like smoothness of motion. The optical flow approach is: compute a dense “motion field” that relates each pair of frames, and then do some post-processing to link the fields together. Optical flow is very powerful, but since it only describes motion for a pair of frames at a time, it cannot produce useful outputs for targets that undergo multi-frame occlusion. “Occlusion” means our view is obstructed, and we need to guess the target’s location from context.

During an occlusion, appearance information does not suffice, because the target is not even present in the frames. Multi-frame temporal priors are key.

Around the year 2006, Peter Sand and Seth Teller proposed an alternative to flow-based and feature-based methods, called a “particle video.” This approach aims to represent a video with a set of particles that move across multiple frames. Their proposed method did not handle occlusions, but in our view, they laid the groundwork for treating pixels as persistent entities, with multi-frame trajectories and long-range temporal priors.

Inspired by their work, we propose Persistent Independent Particles (PIPs), a new particle video method. Our method takes a video as input, along with the ( (x, y) ) coordinate of a target to track, and produces the target’s trajectory as output. The model can be queried for any number of particles, at any positions.

Particle trajectories for arbitrary “target” pixels in the video.

You may notice in our visualizations that the PIP trajectories often leave the video bounds. We treat pixels flying out-of-bounds as just another “occlusion.” This type of robustness is exactly what is missing from feature-based and flow-based methods.

Let’s step through how we achieved this.

How does it work?

At a high level, our method makes an extreme trade-off between spatial awareness and temporal awareness: we estimate the trajectory of every target independently. This extreme choice allows us to devote the majority of parameters into a module that simultaneously learns (1) temporal priors, and (2) an iterative inference mechanism that searches for the target pixel’s location in all input frames. Related work on optical flow estimation typically uses the opposite approach: they estimate the motion of all pixels simultaneously (using maximal spatial context), for just 2 frames at a time (using minimal temporal context).

Given the ( (x_1,y_1) ) coordinate of the target on the first frame, our concrete goal is to estimate the target’s full trajectory over (T) frames: ( (x_1,y_1), ldots, (x_T,y_T) ).

We start by initializing a zero-velocity estimate. This means copying the initial coordinate to every timestep.

To track the target, we need to know what it looks like, so we compute appearance features for all the frames (using a CNN), and initialize the target’s appearance trajectory with a bilinear sample at the given coordinate on the first frame. (The “bilinear sample” step extracts a feature vector at a subpixel location in the spatial map of features.)

Our inference process will proceed by iteratively refining the sequence of positions, and sequence of appearance features, until they (hopefully) match the true trajectory of the target. This idea is illustrated in the video below: at initialization, only “timestep 1” has the correct location of the target (since this was given), and gradually, the model “locks on” to the target in all frames.

Our model’s job is to produce updates (i.e., deltas) for the positions and features, so that the trajectory tracks the target on every frame. A critical detail here is that we ask our model to produce these updates for multiple timesteps simultaneously. This allows us to “catch” a target after it re-emerges from an occluder, and “fill in” the missing part of the trajectory.

There are many ways to implement this, but for fast training and good generalization, we need to carefully select what information we provide to the model.

The main source of information we provide to the model is: measurements of local appearance similarity. We obtain these measurements using cross correlation (i.e., dot products), computed at multiple scales. When the target is visible, it should show up as a strong peak in at least one of the similarity maps. Also, when we are “locked on”, the peak should be in the middle of the map. This is illustrated in the animation below.

The second source of information we provide to the model is: the estimated trajectory itself. This allows the model to impose a temporal prior, and fix up parts of the trajectory where the local similarity information was ambiguous.

Finally, we allow the model to inspect the feature vector of the target, in case it might learn different strategies for different types of features. For example, depending on the scale or texture of the target, it may adjust the way it uses information from the multi-scale similarity maps.

For the model architecture, we elected to use an MLP-Mixer, which we found to have a good trade-off between model capacity, training time, and generalization. We also tried convolutional models and transformers, but the convolutional models could not fit the data as well as the MLP-Mixer, and the transformers took too long to train.

We trained the model in synthetic data that we made (based on an existing optical flow dataset), where we could provide multi-frame ground-truth for targets that undergo occlusions. The animation below shows the kind of data we trained on. You might say the data looks crazy — but that’s the point! If you can’t get real data, your best bet is synthetic data with extremely high diversity.

**FlyingThings++:** We train our model to track objects in this data, so that real videos are easy in comparison.

After training for a couple days on this data, the model starts to work on real videos.

Results

In the paper, we provide some quantitative analysis, showing that it works better than existing methods. It’s certainly not perfect — on keypoint tracking, the model works about 6/10 times, so there is a lot of room for improvement. The baselines are at around 5/10 or less.

The idea here is: pick a point on the first frame, and try to locate that same point in other frames of the video. This is a hard task especially when the point gets occluded.

Output of our PIPs model.

Baseline methods tend to get stuck on occluders, since they do not use multi-frame temporal context. For example, here is the output of a state-of-the-art optical flow method, on the same video and same target.

Output of an optical flow model (RAFT).

We have had fun trying the model on various videos, and observing the estimated trajectories. Sometimes they are surprisingly complex, since the target’s actual motion is subtly entangled with camera motion.

Visualizing the trajectories more densely gives mesmerizing results. Notice that the model even tracks ripples and specularities in the water.

Despite the fact that each particle trajectory is estimated independently of the others, they show surprisingly accurate grouping. Notice that the background particles all move together.

What’s next?

Our method upgrades pixels into PIPs: “Persistent Independent Particles.” This independence assumption, however, is probably not what we want in general. In ongoing work, we are trying to incorporate context across particles, so that confident particles can help the unconfident ones, and so that we track at multiple levels of granularity simultaneously.

We have released our code and model weights on GitHub. We encourage you to try our demo.py! If you are interested in building on our method, the provided tests, visualizations, and training scripts should make that easy. Or, if you are working on a video-based method that currently relies on optical flow, you may want to try our PIP trajectories as a replacement, which should give a better signal under occlusions.

We hope that our work opens up long-range fine-grained tracking of “anything.” For more details, please check out our project page and paper.

Learning to Walk in the Wild from Terrain Semantics

Posted by Yuxiang Yang, Student Researcher, Robotics at Google

An important promise for quadrupedal robots is their potential to operate in complex outdoor environments that are difficult or inaccessible for humans. Whether it’s to find natural resources deep in the mountains, or to search for life signals in heavily-damaged earthquake sites, a robust and versatile quadrupedal robot could be very helpful. To achieve that, a robot needs to perceive the environment, understand its locomotion challenges, and adapt its locomotion skill accordingly. While recent advances in perceptive locomotion have greatly enhanced the capability of quadrupedal robots, most works focus on indoor or urban environments, thus they cannot effectively handle the complexity of off-road terrains. In these environments, the robot needs to understand not only the terrain shape (e.g., slope angle, smoothness), but also its contact properties (e.g., friction, restitution, deformability), which are important for a robot to decide its locomotion skills. As existing perceptive locomotion systems mostly focus on the use of depth cameras or LiDARs, it can be difficult for these systems to estimate such terrain properties accurately.

In “Learning Semantics-Aware Locomotion Skills from Human Demonstrations”, we design a hierarchical learning framework to improve a robot’s ability to traverse complex, off-road environments. Unlike previous approaches that focus on environment geometry, such as terrain shape and obstacle locations, we focus on environment semantics, such as terrain type (grass, mud, etc.) and contact properties, which provide a complementary set of information useful for off-road environments. As the robot walks, the framework decides the locomotion skill, including the speed and gait (i.e., shape and timing of the legs’ movement) of the robot based on the perceived semantics, which allows the robot to walk robustly on a variety of off-road terrains, including rocks, pebbles, deep grass, mud, and more.

Our framework selects skills (gait and speed) of the robot from the camera RGB image. We first compute the speed from terrain semantics, and then select a gait based on the speed.

Overview
The hierarchical framework consists of a high-level skill policy and a low level motor controller. The skill policy selects a locomotion skill based on camera images, and the motor controller converts the selected skill into motor commands. The high-level skill policy is further decomposed into a learned speed policy and a heuristic-based gait selector. To decide a skill, the speed policy first computes the desired forward speed, based on the semantic information from the onboard RGB camera. For energy efficiency and robustness, quadrupedal robots usually select a different gait for each speed, so we designed the gait selector to compute a desired gait based on the forward speed. Lastly, a low-level convex model-predictive controller (MPC) converts the desired locomotion skill into motor torque commands, and executes them on the real hardware. We train the speed policy directly in the real world using imitation learning because it requires fewer training data compared to standard reinforcement learning algorithms.

The framework consists of a high-level skill policy and a low-level motor controller.

Learning Speed Command from Human Demonstrations
As the central component in our pipeline, the speed policy outputs the desired forward speed of the robot based on the RGB image from the onboard camera. Although many robot learning tasks can leverage simulation as a source of lower-cost data collection, we train the speed policy in the real world because accurate simulation of complex and diverse off-road environments is not yet available. As policy learning in the real world is time-consuming and potentially unsafe, we make two key design choices to improve the data efficiency and safety of our system.

The first is learning from human demonstrations. Standard reinforcement learning algorithms typically learn by exploration, where the agent attempts different actions in an environment and builds preferences based on the rewards received. However, such explorations can be potentially unsafe, especially in off-road environments, since any robot failures can damage both the robot hardware and the surrounding environment. To ensure safety, we train the speed policy using imitation learning from human demonstrations. We first ask a human operator to teleoperate the robot on a variety of off-road terrains, where the operator controls the speed and heading of the robot using a remote joystick. Next, we collect the training data by storing (image, forward_speed) pairs. We then train the speed policy using standard supervised learning to predict the human operator’s speed command. As it turns out, the human demonstration is both safe and high-quality, and allows the robot to learn a proper speed choice for different terrains.

The second key design choice is the training method. Deep neural networks, especially those involving high-dimensional visual inputs, typically require lots of data to train. To reduce the amount of real-world training data required, we first pre-train a semantic segmentation model on RUGD (an off-road driving dataset where the images look similar to those captured by the robot’s onboard camera), where the model predicts the semantic class (grass, mud, etc.) for every pixel in the camera image. We then extract a semantic embedding from the model’s intermediate layers and use that as the feature for on-robot training. With the pre-trained semantic embedding, we can train the speed policy effectively using less than 30 minutes of real-world data, which greatly reduces the amount of effort required.

We pre-train a semantic segmentation model and extract a semantic embedding to be fine-tuned on robot data.

Gait Selection and Motor Control
The next component in the pipeline, the gait selector, computes the appropriate gait based on the speed command from the speed policy. The gait of a robot, including its stepping frequency, swing height, and base height, can greatly affect the robot’s ability to traverse different terrains.

Scientific studies have shown that animals switch between different gaits at different speeds, and this result is further validated in quadrupedal robots, so we designed the gait selector to compute a robust gait for each speed. Compared to using a fixed gait across all speeds, we find that the gait selector further enhances the robot’s navigation performance on off-road terrains (more details in the paper).

The last component of the pipeline is a motor controller, which converts the speed and gait commands into motor torques. Similar to previous work, we use separate control strategies for swing and stance legs. By separating the task of skill learning and motor control, the skill policy only needs to output the desired speed, and does not need to learn low-level locomotion controls, which greatly simplifies the learning process.

Experiment Results
We implemented our framework on an A1 quadrupedal robot and tested it on an outdoor trail with multiple terrain types, including grass, gravel, and asphalt, which pose varying degrees of difficulty for the robot. For example, while the robot needs to walk slowly with high foot swings in deep grass to prevent its foot from getting stuck, on asphalt it can walk much faster with lower foot swings for better energy efficiency. Our framework captures such differences and selects an appropriate skill for each terrain type: slow speed (0.5m/s) on deep grass, medium speed (1m/s) on gravel, and high speed (1.4m/s) on asphalt. It completes the 460m-long trail in 9.6 minutes with an average speed of 0.8m/s (i.e., that’s 1.8 miles or 2.9 kilometers per hour). In contrast, non-adaptive policies either cannot complete the trail safely or walk significantly slower (0.5m/s), illustrating the importance of adapting locomotion skills based on the perceived environments.

The framework selects different speeds based on conditions of the trail.

To test generalizability, we also deployed the robot to a number of trails that are not seen during training. The robot traverses through all of them without failure, and adjusts its locomotion skills based on terrain semantics. In general, the skill policy selects a faster skill on rigid and flat terrains and a slower speed on deformable or uneven terrain. At the time of writing, the robot has traversed over 6km of outdoor trails without failure.

With the framework, the robot walks safely on a variety of outdoor terrains not seen during training.

Conclusion
In this work, we present a hierarchical framework to learn semantic-aware locomotion skills for off-road locomotion. Using less than 30 minutes of human demonstration data, the framework learns to adjust the speed and gait of the robot based on the perceived semantics of the environment. The robot can walk safely and efficiently on a wide variety of off-road terrains. One limitation of our framework is that it only adjusts locomotion skills for standard walking and does not support more agile behaviors such as jumping, which can be essential for traversing more difficult terrains with gaps or hurdles. Another limitation is that our framework currently requires manual steering commands to follow a desired path and reach the goal. In future work, we plan to look into a deeper integration of high-level skill policy with the low-level controller for more agile behaviors, and incorporate navigation and path planning into the framework so that the robot can operate fully autonomously in challenging off-road environments.

Acknowledgements
We would like to thank our paper co-authors: Xiangyun Meng, Wenhao Yu, Tingnan Zhang, Jie Tan, and Byron Boots. We would also like to thank the team members of Robotics at Google for discussions and feedback.

Vedere AI

Monthly Archives: September 2022