Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Transformer has been unlocked by self-supervised pretraining strategies based on masked autoencoderswhich rely on reconstructing masked inputs, directly, or contrastively from unmasked content. This…Apple Machine Learning Research
Self-Conditioning Pre-Trained Language Models
In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such concepts. We describe how to identify expert units and how to activate them during inference in order to induce any desired concept in the generated output. We find that the activation of a…Apple Machine Learning Research
Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation
Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance.
Lexical information can be obtained from pre-trained acoustic models, where the learned representations can improve valence estimation from speech. We investigate the use of pre-trained model representations…Apple Machine Learning Research
Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusion-in-Decoder for Cross-Lingual Question Answering
We describe our two-stage system for the Multilingual Information Access (MIA) 2022 Shared Task on Cross-Lingual Open-Retrieval Question Answering. The first stage consists of multilingual passage retrieval with a hybrid dense and sparse retrieval strategy. The second stage consists of a reader which outputs the answer from the top passages returned by the first stage. We show the efficacy of using entity representations, sparse retrieval signals to help dense retrieval, and Fusion-in-Decoder. On the development set, we obtain 43.46 F1 on XOR-TyDi QA and 21.99 F1 on MKQA, for an average F1…Apple Machine Learning Research
ICML 2022
Apple Machine Learning Research
Efficient Representation Learning via Adaptive Context Pooling
Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing…Apple Machine Learning Research
LiDAR 3D point cloud labeling with Velodyne LiDAR sensor in Amazon SageMaker Ground Truth
LiDAR is a key enabling technology in growing autonomous markets, such as robotics, industrial, infrastructure, and automotive. LiDAR delivers precise 3D data about its environment in real time to provide “vision” for autonomous solutions. For autonomous vehicles (AVs), nearly every carmaker uses LiDAR to augment camera and radar systems for a comprehensive perception stack capable of safely navigating complex roadway environments. Computer vision systems can use the 3D maps generated by LiDAR sensors for object detection, object classification, and scene segmentation. Like any other supervised machine learning (ML) system, the point cloud data generated by LiDAR sensors should be labeled correctly in order for the ML model to make correct inferences. This allows AVs to operate smoothly and efficiently, avoiding incidents and collisions with objects, pedestrians, vehicles, and other road users.
In this post, we demonstrate how to label 3D point cloud data generated by Velodyne LiDAR sensors using Amazon SageMaker Ground Truth. We break down the process of sending data for annotation so that you can obtain precise, high-quality results.
The code for this example is available on GitHub.
Solution overview
SageMaker Ground Truth is a data labeling service that you can use to create high-quality labeled datasets for various types of ML use cases. SageMaker Ground Truth is a capability in Amazon SageMaker, which is a comprehensive and fully managed ML service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready environment.
In addition to LiDAR data, we also include camera images, using the sensor fusion feature in SageMaker Ground Truth to deliver robust visual information about the scenes that annotators are labeling. Through sensor fusion, annotators can adjust labels in the 3D scene as well as in 2D images. It delivers the unique capability to ensure that annotations in LiDAR data are mirrored in 2D imagery, making the process more efficient.
With SageMaker Ground Truth, Velodyne LiDAR’s 3D point cloud data generated by a Velodyne LiDAR sensor mounted on a vehicle can be labeled for tracking moving objects. In this challenging use case, we can follow the trajectory of an object like a car or a pedestrian in a dynamic environment, while our point of reference is also moving. In this case, our point of reference is a car that is equipped with Velodyne LiDAR.
To perform this task, we walk through the following topics:
- Velodyne technology
- The dataset
- Creating a labeling job
- The point cloud sequence input manifest file
- Building the sequence input manifest file
- Labeling the category configuration file
- Specifying the job resources
- Completing a labeling job
Prerequisites
To implement the solution in this post, you must have the following prerequisites:
- An AWS account for running the code.
- An Amazon Simple Storage Service (Amazon S3) bucket you can write to. The bucket must be in the same Region as the SageMaker notebook instance. We can also define a valid S3 prefix. All the files related to this experiment are stored in that prefix of our bucket. We must attach the CORS policy to this bucket. For instructions, refer to Configuring cross-origin resource sharing (CORS). Enter the following policy in the CORS configuration editor:
- An AWS Identity and Access Management (IAM) role to access SageMaker.
- A SageMaker notebook instance.
- Familiarity with the Ground Truth 3D point cloud labeling job.
- Familiarity with Python and NumPy.
- Basic understanding of SageMaker.
- Basic familiarity with the AWS Command Line Interface (AWS CLI).
- The code and associated dataset.
Velodyne technology
LiDAR can be divided into different categories, including scanning LiDAR and flash LiDAR. Conventionally scanning LiDAR uses mechanical rotation to spin the sensor for 360-degree detection. Velodyne, which invented the industry’s first 3D LiDAR, continues to innovate and launch new rotational products with cutting-edge technology. Velodyne’s Ultra Puck is a scanning LiDAR sensor that uses Velodyne’s patented surround view technology. It provides a full 360-degree environmental view to deliver accurate real-time 3D data. The Ultra Puck has a compact form factor and delivers the real-time object detection needed for safe navigation and reliable operation. With a combination of optimal power and high performance, this sensor provides distance and calibrated reflectivity measurements at all rotational angles. It’s an ideal solution for robotics, mapping, security, driver assistance, and autonomous navigation. Besides the LiDAR sensor itself, Velodyne has created the Vella Development Kit (VDK), a collection of tools, hardware, and documentation that facilitate access to the Velodyne’s autonomy software stack. The VDK can be configured for different custom interfaces and environments, providing you with a broad range of applications for increased autonomy and improved safety.
Additionally, the VDK can reduce the upfront work you would have to otherwise put in to enable an end-to-end data collection and annotation pipeline by providing the following necessary capabilities:
- Clock synchronization between LiDAR, odometry, and camera frames
- Calibration for LiDAR vehicle 5-DOF extrinsic calibration (z is not observable)
- Calibration for LiDAR camera extrinsic, intrinsic, and distortion parameters
- Collect motion compensated (intra-frame or multi-frame), synchronized LiDAR point clouds and camera images
To develop vehicle-based perception capabilities, Velodyne’s software team has set up their own data collection vehicle with one of their Ultra Puck LiDAR units, a camera and GPS/IMU sensors mounted to the vehicle hood. In the subsequent steps, we refer to their internal processes that use the VDK to prepare, collect, and annotate data needed to develop their vehicle-based perception capabilities as an example to other customers trying to solve their own perception use cases.
Clock synchronization
Accurate clock synchronization of the LiDAR, odometry, and camera outputs can be crucial for any multi-sensor application that combines those data streams. For best results, you should use a PTP synchronization system with a primary clock and support by all sensors. One advantage of PTP is the ability to synchronize multiple devices to high accuracy with a single timing source. Such a system can achieve synchronization accuracy better than 1 microsecond. Other solutions include PPS distribution and per-device time sources. As an alternative option, the VDK supports software synchronization utilizing time-of-arrival timestamping, which can be a great way to get an application off the ground quickly in the absence of proper clock synchronization infrastructure. This can result in timestamping errors on the order of 1–10 milliseconds due to a combination of latency and queuing delays at various levels of the network infrastructure and host operating system, which may or may not be acceptable, depending on the application.
LiDAR vehicle calibration
The LiDAR vehicle calibration estimates the extrinsic position of the LiDAR in vehicle frame along five axes. Z value is unobservable; therefore you must measure the z value independently. Our process is a targetless calibration technique but it works well in an environment where the ground is relatively flat, and the environment has contiguous static objects features rather than dynamic (vehicles, pedestrians) or non-contiguous (shrubs and bushes) features. Think of a parking lot with few obstacles and buildings with flat facades. The presence of geometric structures is ideal for improving the calibration quality. The user is required to drive in some predefined driving patterns indicated by the VDK to expose most of the parameters. One minute of data is sufficient for this calibration. After the data is uploaded to Veldoyne’s platform service, the calibration takes place on the cloud and the result is made available within 24 hours. For the purposes of this notebook, the calibration parameters have already been processed and provided.
The LiDAR dataset
The dataset and resources used in this notebook are provided by Velodyne. This dataset contains one continuous scene from an autonomous vehicle experiment driving around on a highway in California. The entire scene contains 60 frames. The dataset contents are as follows:
- lidar_cam_calib_vlp32_06_10_2021.yaml – Camera calibration information, one camera only
- images/ – Camera footage for each frame
- poses/ – Pose JSON file containing LiDAR extrinsic matrix for each frame
- rectified_scans_local/ – .pcb files in LiDAR sensor local coordinate system
Run the following code to download the dataset locally and then upload to your S3 bucket, which we defined in the initialization section:
Create a labeling job
As the next step, we need to create a data labeling job in SageMaker Ground Truth. We select the task type as object tracking. For more information about 3D point cloud labeling task types, refer to 3D Point Cloud Task types. To create an object tracking point cloud labeling job, we need to add the following resources as the labeling job inputs:
- Point cloud sequence input manifest – A JSON file defining the point cloud frame sequence and associated sensor fusion data. For more information, see Create a Point Cloud Sequence Input Manifest.
- Input manifest file – The input file for the labeling job. Each line of the manifest file contains a link to a sequence file defined in the point cloud sequence input manifest.
- Label category configuration file – This file is used to specify your labels, label category, frame attributes, and worker instructions. For more information, see Create a Labeling Category Configuration File with Label Category and Frame Attributes.
-
Predefined AWS resources – Includes the following:
- Pre-annotation Lambda ARN – Refer to PreHumanTaskLambdaArn.
- Annotation consolidation ARN – The AWS Lambda function used to consolidate labels from different workers. Refer to AnnotationConsolidationLambdaArn.
- Workforce ARN – Defines which workforce type we want to use. Refer to Create and Manage Workforces for more details.
-
HumanTaskUiArn – Defines the worker UI template to do the labeling job. This should have a format similar to
arn:aws:sagemaker:<region>:123456789012:human-task-ui/PointCloudObjectTracking
.
Keep in mind the following:
- There should not be an entry for the
UiTemplateS3Uri
parameter. - Your
LabelAttributeName
must end in-ref
. For example,ot-labels-ref
. - The number of workers specified in
NumberOfHumanWorkersPerDataObject
should be 1. - 3D point cloud labeling doesn’t support active learning, so we shouldn’t specify values for parameters in
LabelingJobAlgorithmsConfig
. - 3D point cloud object tracking labeling jobs can take multiple hours to complete. You should specify a longer time limit for these labeling jobs in
TaskTimeLimitInSeconds
(up to 7 days, or 604,800 seconds).
Point cloud sequence input manifest file
The following of the most important steps to generating a sequence input manifest file:
- Convert the 3D points to a world coordinate system.
- Generate the sensor extrinsic matrix to enable the sensor fusion feature in SageMaker Ground Truth.
The LiDAR sensor is mounted on a moving vehicle (ego vehicle), which captures the data in its own frame of reference. To perform object tracking, we need to convert this data to a global frame of reference to account for the moving ego vehicle itself. This is the world coordinate system.
Sensor fusion is a feature in SageMaker Ground Truth that synchronizes the 3D point cloud frame side by side with the camera frame. This provides visual context for human labelers and allows labelers to adjust annotation in 3D and 2D images synchronously. For instructions on matrix transformation, refer to Labeling data for 3D object tracking and sensor fusion in Amazon SageMaker Ground Truth.
The generate_transformed_pcd_from_point_cloud
function performs the coordinate translation and then generates the 3D point data file, which SageMaker Ground Truth can consume.
To translate the data from local/sensor global coordinate system, multiply each point in a 3D frame with the extrinsic matrix for the LiDAR sensor.
SageMaker Ground Truth renders the 3D point cloud data in either Compact Binary Pack (.bin) or ASCII (.txt) format. Files in these formats need to contain information about the location (x, y, and z coordinates) of all points that make up that frame, and, optionally, information about the pixel color of each point for colored point clouds (i, r, g, b).
To read more about SageMaker Ground Truth accepted raw 3D data formats, see Accepted Raw 3D Data Formats.
Build the sequence input manifest file
The next step is to build the point cloud sequence input manifest file. The steps listed in this section are also available in the notebook.
- Point the cloud data from the
.pcd
file, the LiDAR extrinsic matrix from the pose file, and the camera extrinsic, intrinsic, and distortion data from the camera calibration.yaml
file. - Perform a per-frame transform of the raw point cloud to the global frame of reference. Generate and store ASCII (.txt) for each frame to Amazon S3.
- Extract the ego vehicle pose from the LiDAR extrinsic matrix.
- Build a sensor position in the global coordinate system by extracting the camera pose from the camera inverse extrinsic matrix.
- Provide camera calibration parameters (such as distortion and skew).
- Build the array of data frames. Reference the ASCII file location, define the vehicle position in world coordinate system, and so on.
- Create the sequence manifest file
sequence.json
. - Create our input manifest file. Each line identifies a single sequence file we just uploaded.
Label the category configuration file
Our label category configuration file is used to specify labels, or classes, for our labeling job. When we use the object detection or object tracking task types; we can also include label attributes in our label category configuration file. Workers can assign one or more attributes we provide to annotations to give more information about that object. For example, we may want to use the attribute occluded
to have workers identify when an object is partially obstructed. Let’s look at an example of the label category configuration file for an object detection or object tracking labeling job:
Specify the job resources
As the next step, we specify various labeling job resources:
-
Human task UI ARN – HumanTaskUiArn is a resource that defines the worker task template used to render the worker UI and tools for the labeling job. This attribute is defined under
UiConfig
and the resource name is configured by Region and task type: - Work resource – In this example, we use private team resources. For instructions, refer to Create a Private Workforce (Amazon Cognito Console). When we’re done, we should put our resource ARN in the following parameter:
- Pre-annotation Lambda ARN and post-annotation Lambda ARN – See the following code:
- HumanTaskConfig – We use this to specify our work team and configure our labeling job task. Feel free to update the task description in the following code:
Create the labeling job
Next, we create the labeling request, as shown in the following code:
Finally, we create the labeling job:
Complete a labeling job
When our labeling job is ready, we can add ourselves to our private work team and experiment with the worker’s portal. We should receive an email with the portal link, our user name, and a temporary password. When we log in, we choose the labeling job from the list, and then we should see the worker’s portal like the following screenshot. (It may take a few minutes for a new labeling job to show up in the portal). More information on how to set up workers and instructions can be found here and here respectively.
When we’re are done with the labeling job, we can choose Submit, and then view the output data in the S3 output location we specified earlier.
Conclusion
In this post, we showed how we can create a 3D point cloud labeling job for object tracking for data captured using Velodyne’s LiDAR sensor. We followed the step-by-step instructions in this post and ran the provided code to create a SageMaker Ground Truth labeling job to label the 3D point cloud data. ML models can use the labels created with this job to train object detection, object recognition, and object tracking models commonly used in autonomous vehicle scenarios.
If you are interested in labeling 3D point cloud data captured via Velodyne’s LiDAR sensor, follow the steps in this article to label the data using Amazon SageMaker Ground Truth.
About the Authors
Sharath Nair leads the Computer Vision team that focusses on building perception algorithms for some of Velodyne’s software products like Object Detection & Tracking, Semantic Segmentation, SLAM, etc. Prior to Velodyne, Sharath worked on Autonomous Vehicles and Robotics and has been involved in this space for the past 6 years.
Oliver Monson is a Senior Data Operations Manager at Velodyne Lidar, responsible for the data pipelines and acquisition strategies that support the development of perception software. Prior to Velodyne, Oliver has managed operational teams executing on HD mapping, geospatial, and archaeological applications.
John Kua is Director of Software Engineering at Velodyne, overseeing the System Integration and Robotics, Vella Go, and Software Production teams. Prior to joining Velodyne, John spent over a decade building multimodal sensor platforms for a wide range of 3D localization and mapping applications in commercial and government applications. These platforms included a wide array of sensors including visible light, thermal, and hyperspectral cameras, lidar, GPS, IMUs, and even gamma-ray spectrometers and imagers.
Sally Frykman, Chief Marketing Officer at Velodyne, oversees the strategic development and execution of global marketing and communications programs that advance the company’s innovative vision and goals. Her multifaceted role encompasses a wide array of responsibilities, including promotion of the Velodyne brand, thought leadership development, and robust sales lead generation fueled by highly engaging digital marketing. Previously, Sally worked in public education and social work.
Nitin Wagh is Sr. Business Development Manager for Amazon AI. He likes the opportunity to help customers understand Machine Learning and power of Augmented AI in AWS cloud. In his spare time, he loves spending time with family in outdoors activities.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from The University of Texas at Austin and a MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization and related domains. Based in Dallas, Texas, he and his family love to travel and make long road trips.
How events like Prime Day helped Amazon navigate the pandemic
The SCOT science team used lessons from the past — and improved existing tools — to contend with “a peak that lasted two years”.Read More
Windfall: Omniverse Accelerates Turning Wind Power Into Clean Hydrogen Fuel
Engineers are using the NVIDIA Omniverse 3D simulation platform as part of a proof of concept that promises to become a model for putting green energy to work around the world.
Dubbed Gigastack, the pilot project — led by a consortium that includes Phillips 66 and Denmark-based renewable energy company Ørsted — will create low-emission fuel for the energy company’s Humber refinery in England.
Hydrogen is expected to play a critical role as the world moves to reduce its dependence on fossil fuels over the coming years. The market for hydrogen fuel is predicted to grow over 45x to $90 billion by 2030, up from $1.8 billion today.
The Gigastack project aims to showcase how green energy can be woven into complex, industrial energy infrastructure on a massive scale and accelerate net-zero emissions progress.
To make that happen, new kinds of collaboration are vital, explained Ahsan Yousufzai, global head of business development for energy surface at NVIDIA, during a conversation about the project in an on-demand panel discussion at NVIDIA GTC.
“To meet global sustainability targets, the entire energy ecosystem needs to work together,” Yousufzai said. “For that, technologies like AI and digital twins will play a major role.”
The system — now in the planning stages — will draw power from Ørsted’s massive Hornsea 1,218-megawatt offshore wind farm, the largest in the world upon its completion in January last year.
Hornsea will be connected to ITM Power’s Gigastack electrolyzer facility, which will use electrolysis to turn water into clean, renewable hydrogen fuel.
That fuel, in turn, will be put to work at Phillips 66’s Humber refinery, decarbonizing one of the U.K.’s largest industrial facilities.
The project is unique because of its scale — with plans to eventually ramp up Gigastack into a massive 1-gigawatt electrolyzer system — and because of its potential to become a blueprint for deploying electrolyzer technology for wider decarbonization.
Weaving all these elements together, however, requires tight collaboration between team members from Element Energy, ITM Power, Ørsted, Phillips 66 and Worley.
Worley — one of the largest global providers of engineering and professional services to the oil, gas, mining, power and infrastructure industries — turned to Aspen Technology’s Aspen OptiPlant, sophisticated software that’s a workhouse for planning and optimizing some of the world’s most complex infrastructure.
“When you have a finite amount of money to be spent, you want to maximize the number of options on how facilities can be designed, fabricated and constructed,” explained Vishal Mehta, senior vice president at Worley.
“This is the importance of rapid optioneering, where you’re able to run AI models and engines with not only mathematics but also visual representation,” Mehta said. “People can come up with ideas and, in real time, move them around with mathematical equations changing in the background.”
Worley relied on AspenTech’s OptiPlant to develop a 3D conceptual layout of the Gigastack green hydrogen project. The industrial optimization software combines decades of process modeling expertise with cutting-edge AI and machine learning.
The next step: connecting OptiPlant’s sophisticated physics-based plant piping and layout capabilities to build a 3D conceptual layout of the plant with Omniverse, potentially allowing teams to work together on plant design in real-time — connecting their various 3D software tools, datasets and teams together.
“With a traditional model review, it’s one person leading the way, but here we have this opportunity for everybody to be immersed in the facility,” said Sonali Singh, vice president of product management for performance engineering at AspenTech. “They can really all collaborate by looking at their individual priorities.”
Omniverse can be the platform on which they further build their digital twin of the growing facility, enabling connection of simulation data and AIs, capturing knowledge from human and AI collaborators working on the project and bringing intelligent optimization.
To learn more, watch the on-demand GTC session and explore the Gigastack project.
Find out how Siemens Gamesa and Zenotech are accelerating offshore wind farm simulations with NVIDIA’s full-stack technologies.
The post Windfall: Omniverse Accelerates Turning Wind Power Into Clean Hydrogen Fuel appeared first on NVIDIA Blog.
Intuitive physics learning in a deep-learning model inspired by developmental psychology
Despite significant effort, current AI systems pale in their understanding of intuitive physics, in comparison to even very young children. In the present work, we address this AI problem, specifically by drawing on the field of developmental psychology.Read More