September 2022 – Page 9

Continental and AEye Join NVIDIA DRIVE Sim Sensor Ecosystem, Providing Rich Capabilities for AV Development

Autonomous vehicle sensors require the same rigorous testing and validation as the car itself, and one simulation platform is up to the task.

Global tier-1 supplier Continental and software-defined lidar maker AEye announced this week at NVIDIA GTC that they will migrate their intelligent lidar sensor model into NVIDIA DRIVE Sim. The companies are the latest to join the extensive ecosystem of sensor makers using NVIDIA’s end-to-end, cloud-based simulation platform for technology development.

Continental offers a full suite of cameras, radars and ultrasonic sensors, as well as its recently launched short-range flash lidar, some of which are incorporated into the NVIDIA Hyperion autonomous-vehicle development platform.

Last year, Continental and AEye announced a collaboration in which the tier-1 supplier would use the lidar maker’s software-defined architecture to produce a long-range sensor. Now, the companies are contributing this sensor model to DRIVE Sim, helping to bring their vision to the industry.

DRIVE Sim is built on the NVIDIA Omniverse platform for connecting and building custom 3D pipelines, providing physically based digital twin environments to develop and validate autonomous vehicles. DRIVE Sim is open and modular — users can create their own extensions or choose from a rich library of sensor plugins from ecosystem partners.

In addition to providing sensor models, partners use the platform to validate their own sensor architectures.

By joining this rich community of DRIVE Sim users, Continental and AEye can now rapidly simulate edge cases in varying environments to test and validate lidar performance.

A Lidar for All Seasons

AEye and Continental are creating HRL 131, a high-performance, long-range lidar for both passenger cars and commercial vehicles that’s software configurable and can adapt to various driving environments.

The lidar incorporates dynamic performance modes where the laser scan pattern adapts for any automated driving application, including highway driving or dense urban environments in all weather conditions, including direct sun, night, rain, snow, fog, dust and smoke. It features a range of more than 300 meters for detecting vehicles and 200 meters for detecting pedestrians, and is slated for mass production in 2024.

The simulated Continental HRL131 long-range lidar sensor, built on AEye’s 4Sight intelligent sensing platform, running in NVIDIA DRIVE Sim.

With DRIVE Sim, developers can recreate obstacles with their exact physical properties and place them in complex highway environments. They can determine which lidar performance modes are suitable for the chosen application based on uncertainties experienced in a particular scenario.

Once identified and tuned, performance modes can be activated on the fly using external cues such as speed, location or even vehicle pitch, which can change with loading conditions, tire-pressure variations and suspension modes.

The ability to simulate performance characteristics of a software-defined lidar model adds even greater flexibility to DRIVE Sim, further accelerating robust autonomous vehicle development.

‘’With the scalability and accuracy of NVIDIA DRIVE Sim, we’re able to validate our long-range lidar technology efficiently,’’ said Gunnar Juergens, head of product line, lidar, at Continental. ‘’It’s a robust tool for the industry to train, test and validate safe self-driving solutions’’

The post Continental and AEye Join NVIDIA DRIVE Sim Sensor Ecosystem, Providing Rich Capabilities for AV Development appeared first on NVIDIA Blog.

Building safer dialogue agents

In our latest paper, we introduce Sparrow – a dialogue agent that’s useful and reduces the risk of unsafe and inappropriate answers. Our agent is designed to talk with a user, answer questions, and search the internet using Google when it’s helpful to look up evidence to inform its responses.Read More

Building safer dialogue agents

Amazon Comprehend Targeted Sentiment adds synchronous support

Earlier this year, Amazon Comprehend, a natural language processing (NLP) service that uses machine learning (ML) to discover insights from text, launched the Targeted Sentiment feature. With Targeted Sentiment, you can identify groups of mentions (co-reference groups) corresponding to a single real-world entity or attribute, provide the sentiment associated with each entity mention, and offer the classification of the real-world entity based on a pre-determined list of entities.

Today, we’re excited to announce the new synchronous API for targeted sentiment in Amazon Comprehend, which provides a granular understanding of the sentiments associated with specific entities in input documents.

In this post, we provide an overview of how you can get started with the Amazon Comprehend Targeted Sentiment synchronous API, walk through the output structure, and discuss three separate use cases.

Targeted sentiment use cases

Real-time targeted sentiment analysis in Amazon Comprehend has several applications to enable accurate and scalable brand and competitor insights. You can use targeted sentiment for business-critical processes such as live market research, producing brand experience, and improving customer satisfaction.

The following is an example of using targeted sentiment for a movie review.

“Movie” is the primary entity, identified as type movie, and is mentioned two more times as “movie” and the pronoun “it.” The Targeted Sentiment API provides the sentiment towards each entity. Green refers to a positive sentiment, red for negative, and blue for neutral.

Traditional analysis provides sentiment of the overall text, which in this case is mixed. With targeted sentiment, you can get more granular insights. In this scenario, the sentiment towards the movie is both positive and negative: positive in regards to the actors, but negative in relation to the overall quality. This can provide targeted feedback for the film team, such as to exercise more diligence in script writing, but to consider the actors for future roles.

Prominent applications of real-time sentiment analysis will vary across industries. It includes extracting marketing and customer insights from live social media feeds, videos, live events, or broadcasts, understanding emotions for research purposes, or deterring cyberbullying. Synchronous targeted sentiment drives business value by providing real-time feedback within seconds so that you can make decisions in real time.

Let’s take a closer look at these various real-time targeted sentiment analysis applications and how different industries may use them:

Scenario 1 – Opinion mining of financial documents to determine sentiment towards a stock, person, or organization
Scenario 2 – Real-time call center analytics to determine granular sentiment in customer interactions
Scenario 3 – Monitoring organization or product feedback across social media and digital channels, and providing real-time support and resolutions

In the following sections, we discuss each use case in more detail.

Scenario 1: Financial opinion mining and trading signal generation

Sentiment analysis is crucial for market-makers and investment firms when building trading strategies. Determining granular sentiment can help traders infer what reaction the market may have towards global events, business decisions, individuals, and industry direction. This sentiment can be a determining factor on whether to buy or sell a stock or commodity.

To see how we can use the Targeted Sentiment API in these scenarios, let’s look at a statement from Federal Reserve Chair Jerome Powell on inflation.

As we can see in the example, understanding the sentiment towards inflation can inform a buy or sell decision. In this scenario, it can be inferred from the Targeted Sentiment API that Chair Powell’s opinion on inflation is negative, and this is most likely going to result in higher interest rates slowing economic growth. For most traders, this could result in a sell decision. The Targeted Sentiment API can provide traders faster and more granular insight than a traditional document review, and in an industry where speed is crucial, it can result in substantial business value.

The following is a reference architecture for using targeted sentiment in financial opinion mining and trading signal generation scenarios.

Scenario 2: Real-time contact center analysis

A positive contact center experience is crucial in delivering a strong customer experience. To help ensure positive and productive experiences, you can implement sentiment analysis to gauge customer reactions, the changing customer moods through the duration of the interaction, and the effectiveness of contact center workflows and employee training. With the Targeted Sentiment API, you can get granular information within your contact center sentiment analysis. Not only can we determine the sentiment of the interaction, but now we can see what caused the negative or positive reaction and take the appropriate action.

We demonstrate this with the following transcripts of a customer returning a malfunctioning toaster. For this example, we show sample statements that the customer is making.

As we can see, the conversation starts off fairly negative. With the Targeted Sentiment API, we’re able to determine the root cause of the negative sentiment and see it’s regarding a malfunctioning toaster. We can use this information to run certain workflows, or route to different departments.

Through the conversation, we can also see the customer wasn’t receptive to the offer of a gift card. We can use this information to improve agent training, reevaluate if we should even bring up the topic in these scenarios, or decide if this question should only be asked with a more neutral or positive sentiment.

Lastly, we can see that the service that was provided by the agent was received positively even though the customer was still upset about the toaster. We can use this information to validate agent training and reward strong agent performance.

The following is a reference architecture incorporating targeted sentiment into real-time contact center analytics.

Scenario 3: Monitoring social media for customer sentiment

Social media reception can be a deciding factor for product and organizational growth. Tracking how customers are reacting to company decisions, product launches, or marketing campaigns is critical in determining effectiveness.

We can demonstrate how to use the Targeted Sentiment API in this scenario by using Twitter reviews of a new set of headphones.

In this example, there are mixed reactions to the launch of the headphones, but there is a consistent theme of the sound quality being poor. Companies can use this information to see how users are reacting to certain attributes and see where product improvements should be made in future iterations.

The following is a reference architecture using the Targeted Sentiment API for social media sentiment analysis.

Get started with Targeted Sentiment

To use targeted sentiment on the Amazon Comprehend console, complete the following steps:

On the Amazon Comprehend console, choose Launch Amazon Comprehend.
For Input text, enter any text that you want to analyze.
Choose Analyze.

After the document has been analyzed, the output of the Targeted Sentiment API can be found on the Targeted sentiment tab in the Insights section. Here you can see the analyzed text, each entity’s respective sentiment, and the reference group it’s associated with.

In the Application integration section, you can find the request and response for the analyzed text.

Programmatically use Targeted Sentiment

To get started with the synchronous API programmatically, you have two options:

detect-targeted-sentiment – This API provides the targeted sentiment for a single text document
batch-detect-targeted-sentiment – This API provides the targeted sentiment for a list of documents

You can interact with the API with the AWS Command Line Interface (AWS CLI) or through the AWS SDK. Before we get started, make sure that you have configured the AWS CLI, and have the required permissions to interact with Amazon Comprehend.

The Targeted Sentiment synchronous API requires two request parameters to be passed:

LanguageCode – The language of the text
Text or TextList – The UTF-8 text that is processed

The following code is an example for the detect-targeted-sentiment API:

{
"LanguageCode": "string", 
"Text": "string"
}

The following is an example for the batch-detect-targeted-sentiment API:

{

"LanguageCode": "string", 
"TextList": ["string"]

}

Now let’s look at some sample AWS CLI commands.

The following code is an example for the detect-targeted-sentiment API:

aws comprehend 
--region us-east-2 
detect-targeted-sentiment  
--text "I like the burger but service was bad" 
--language-code en

The following is an example for the batch-detect-targeted-sentiment API:

aws comprehend 
--region us-east-2 
batch-detect-targeted-sentiment 
--text-list "We loved the Seashore Hotel! It was clean and the staff was friendly. However, the Seashore was a little too noisy at night." "I like the burger but service is bad" 
--language-code en

The following is a sample Boto3 SDK API call:

import boto3
import subprocess

session = boto3.Session()
comprehend_client = session.client(service_name='comprehend', region_name='us-east-2')

The following is an example of the detect-targeted-sentiment API:

response = comprehend_client.detect_targeted_sentiment(
LanguageCode='en',
Text = "I like the burger but service was bad"
)
print(response)

The following is an example of the batch-detect-targeted-sentiment API:

response = comprehend_client.batch_detect_targeted_sentiment(
    LanguageCode='en',
    TextList = ["I like the burger but service was bad","The staff was really sweet though"]
)

For more details about the API syntax, refer to the Amazon Comprehend Developer Guide.

API response structure

The Targeted Sentiment API provides a simple way to consume the output of your jobs. It provides a logical grouping of the entities (entity groups) detected, along with the sentiment for each entity. The following are some definitions of the fields that are in the response:

Entities – The significant parts of the document. For example, Person, Place, Date, Food, or Taste.
Mentions – The references or mentions of the entity in the document. These can be pronouns or common nouns such as “it,” “him,” “book,” and so on. These are organized in order by location (offset) in the document.
DescriptiveMentionIndex – The index in Mentions that gives the best depiction of the entity group. For example, “ABC Hotel” instead of “hotel,” “it,” or other common noun mentions.
GroupScore – The confidence that all the entities mentioned in the group are related to the same entity (such as “I,” “me,” and “myself” referring to one person).
Text – The text in the document that depicts the entity.
Type – A description of what the entity depicts.
Score – The model confidence that this is a relevant entity.
MentionSentiment – The actual sentiment found for the mention.
Sentiment – The string value of positive, neutral, negative, or mixed.
SentimentScore – The model confidence for each possible sentiment.
BeginOffset – The offset into the document text where the mention begins.
EndOffset – The offset into the document text where the mention ends.

For a more detailed breakdown, refer to Extract granular sentiment in text with Amazon Comprehend Targeted Sentiment or Output file organization.

Conclusion

Sentiment analysis remains crucial for organizations for a myriad of reasons—from tracking customer sentiment over time for businesses, to inferring whether a product is liked or disliked, to understanding opinions of users of a social network towards certain topics, or even predicting the results of campaigns. Real-time targeted sentiment can be effective for businesses, allowing them to go beyond overall sentiment analysis to explore insights to drive customer experiences using Amazon Comprehend.

To learn more about Targeted Sentiment for Amazon Comprehend, refer to Targeted sentiment.

About the authors

Raj Pathak is a Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Document Extraction, Contact Center Transformation and Computer Vision.

Wrick Talukdar is a Senior Architect with Amazon Comprehend Service team. He works with AWS customers to help them adopt machine learning on a large scale. Outside of work, he enjoys reading and photography.

Inside AI: NVIDIA DRIVE Ecosystem Creates Pioneering In-Cabin Features With NVIDIA DRIVE IX

As personal transportation becomes electrified and automated, time in the vehicle has begun to resemble that of a living space rather than a mind-numbing commute.

Companies are creating innovative ways for drivers and passengers to make the most of this experience, using the flexibility and modularity of NVIDIA DRIVE IX. In-vehicle technology companies Cerence, Smart Eye, Rightware and DSP Concepts are now using the platform to deliver intelligent features for every vehicle occupant.

These partners are joining a diverse ecosystem of companies developing on DRIVE IX, including Soundhound, Jungo and VisionLabs, providing cutting-edge solutions for any in-vehicle need.

DRIVE IX provides an open software stack for cockpit solution providers to build and deploy features that will turn personal vehicles into interactive environments, enabling intelligent assistants, graphic user interfaces and immersive media and entertainment.

Intelligent Assistants

AI isn’t just transforming the way people drive, but also how they interact with cars.

Using speech, gestures and advanced graphical user interfaces, passengers can communicate with the vehicle through an AI assistant as naturally as they would with a human.

An intelligent assistant will help to operate vehicle functions more intuitively, warn passengers in critical situations and provide services such as giving local updates like the weather forecast, making reservations and phone calls, and managing calendars.

Conversational AI Interaction

Software partner Cerence is enabling AI-powered, voice-based interaction with the Cerence Assistant, its intelligent in-car assistant platform.

Cerence Assistant uses sensor data to serve drivers throughout their daily journeys, notifying them, for example, when fuel or battery levels are low and navigating to the nearest gas or charging station.

It features robust speech recognition, natural language understanding and text-to-speech capabilities, enhancing the driver experience.

Using DRIVE IX, it can empower both embedded and cloud-based natural language processing on the same architecture, ensuring drivers have access to important capabilities regardless of connectivity.

Cerence Assistant also supports major global markets and related languages, and is customizable for brand-specific speech recognition and personalization.

Gesture Recognition

In addition to speech, passengers can interact with AI assistants via gesture, which relies on interior sensing technologies.

Smart Eye is a global leader in AI-based driver monitoring and interior sensing solutions. Its production driver-monitoring system is already in 1 million vehicles on the roads around the world, and will be incorporated in the upcoming Polestar 3, set to be revealed in October.

Working with DRIVE IX, Smart Eye’s technology makes it possible to detect eye movements, facial expressions, body posture and gestures, bringing insight into people’s behavior, activities and mood.

Using NVIDIA GPU technology, Smart Eye has been able to speed up its cabin-monitoring system — which consists of 10 deep neural networks running in parallel — by more than 10x.

This interior sensing is critical for safety — ensuring driver attention is on the road when it should be and detecting left-behind children or pets — and it customizes and enhances the entire mobility experience for comfort, wellness and entertainment.

Graphical User Interface

AI assistants can communicate relevant information easily and clearly with high-resolution graphical interfaces. Using DRIVE IX, Rightware is creating a seamless visual experience across all cockpit and infotainment domains.

Rightware’s automotive human-machine interface tool Kanzi One helps designers bring amazing real-time 3D graphical user interfaces into the car and hides the underlying operating system and framework complexity. Automakers can completely customize the vehicle’s user interface with Kanzi One, providing a brand-specific signature UI.

Enhancing visual user interfaces by audio features is equally important for interacting with the vehicle. The Audio Weaver development platform from DSP Concepts can be integrated into the DRIVE IX advanced sound engine. It provides a sound design toolchain and audio framework to graphically design features.

Creators and sound artists can design a car- and brand-specific sound experience without the hassle of writing complex code for low-level audio features from scratch.

Designing in Simulation

With NVIDIA DRIVE Sim on Omniverse, developers can integrate, refine and test all these new features in the virtual world before implementing them in vehicles.

Interior sensing companies can build driver- or occupant-monitoring models in the cockpit using the DRIVE Replicator synthetic-data generation tool on DRIVE Sim. Partners providing content for vehicle displays can first develop on a rich, simulated screen configuration.

Access to this virtual vehicle platform can significantly accelerate end-to-end development of intelligent in-vehicle technology.

By combining the flexibility of DRIVE IX with leading in-cabin solutions providers, spending time in the car can become a luxury rather than a chore.

The post Inside AI: NVIDIA DRIVE Ecosystem Creates Pioneering In-Cabin Features With NVIDIA DRIVE IX appeared first on NVIDIA Blog.

View Synthesis with Transformers

Posted by Carlos Esteves and Ameesh Makadia, Research Scientists, Google Research

A long-standing problem in the intersection of computer vision and computer graphics, view synthesis is the task of creating new views of a scene from multiple pictures of that scene. This has received increased attention [1, 2, 3] since the introduction of neural radiance fields (NeRF). The problem is challenging because to accurately synthesize new views of a scene, a model needs to capture many types of information — its detailed 3D structure, materials, and illumination — from a small set of reference images.

In this post, we present recently published deep learning models for view synthesis. In “Light Field Neural Rendering” (LFNR), presented at CVPR 2022, we address the challenge of accurately reproducing view-dependent effects by using transformers that learn to combine reference pixel colors. Then in “Generalizable Patch-Based Neural Rendering” (GPNR), to be presented at ECCV 2022, we address the challenge of generalizing to unseen scenes by using a sequence of transformers with canonicalized positional encoding that can be trained on a set of scenes to synthesize views of new scenes. These models have some unique features. They perform image-based rendering, combining colors and features from the reference images to render novel views. They are purely transformer-based, operating on sets of image patches, and they leverage a 4D light field representation for positional encoding, which helps to model view-dependent effects.

We train deep learning models that are able to produce new views of a scene given a few images of it. These models are particularly effective when handling view-dependent effects like the refractions and translucency on the test tubes. This animation is compressed; see the original-quality renderings here. Source: Lab scene from the NeX/Shiny dataset.

Overview

The input to the models consists of a set of reference images and their camera parameters (focal length, position, and orientation in space), along with the coordinates of the target ray whose color we want to determine. To produce a new image, we start from the camera parameters of the input images, obtain the coordinates of the target rays (each corresponding to a pixel), and query the model for each.

Instead of processing each reference image completely, we look only at the regions that are likely to influence the target pixel. These regions are determined via epipolar geometry, which maps each target pixel to a line on each reference frame. For robustness, we take small regions around a number of points on the epipolar line, resulting in the set of patches that will actually be processed by the model. The transformers then act on this set of patches to obtain the color of the target pixel.

Transformers are especially useful in this setting since their self-attention mechanism naturally takes sets as inputs, and the attention weights themselves can be used to combine reference view colors and features to predict the output pixel colors. These transformers follow the architecture introduced in ViT.

To predict the color of one pixel, the models take a set of patches extracted around the epipolar line of each reference view. Image source: LLFF dataset.

Light Field Neural Rendering

In Light Field Neural Rendering (LFNR), we use a sequence of two transformers to map the set of patches to the target pixel color. The first transformer aggregates information along each epipolar line, and the second along each reference image. We can interpret the first transformer as finding potential correspondences of the target pixel on each reference frame, and the second as reasoning about occlusion and view-dependent effects, which are common challenges of image-based rendering.

LFNR uses a sequence of two transformers to map a set of patches extracted along epipolar lines to the target pixel color.

LFNR improved the state-of-the-art on the most popular view synthesis benchmarks (Blender and Real Forward-Facing scenes from NeRF and Shiny from NeX) with margins as large as 5dB peak signal-to-noise ratio (PSNR). This corresponds to a reduction of the pixel-wise error by a factor of 1.8x. We show qualitative results on challenging scenes from the Shiny dataset below:

LFNR reproduces challenging view-dependent effects like the rainbow and reflections on the CD, reflections, refractions and translucency on the bottles. This animation is compressed; see the original quality renderings here. Source: CD scene from the NeX/Shiny dataset.

Prior methods such as NeX and NeRF fail to reproduce view-dependent effects like the translucency and refractions in the test tubes on the Lab scene from the NeX/Shiny dataset. See also our video of this scene at the top of the post and the original quality outputs here.

Generalizing to New Scenes

One limitation of LFNR is that the first transformer collapses the information along each epipolar line independently for each reference image. This means that it decides which information to preserve based only on the output ray coordinates and patches from each reference image, which works well when training on a single scene (as most neural rendering methods do), but it does not generalize across scenes. Generalizable methods are important because they can be applied to new scenes without needing to retrain.

We overcome this limitation of LFNR in Generalizable Patch-Based Neural Rendering (GPNR). We add a transformer that runs before the other two and exchanges information between points at the same depth over all reference images. For example, this first transformer looks at the columns of the patches from the park bench shown above and can use cues like the flower that appears at corresponding depths in two views, which indicates a potential match. Another key idea of this work is to canonicalize the positional encoding based on the target ray, because to generalize across scenes, it is necessary to represent quantities in relative and not absolute frames of reference. The animation below shows an overview of the model.

GPNR consists of a sequence of three transformers that map a set of patches extracted along epipolar lines to a pixel color. Image patches are mapped via the linear projection layer to initial features (shown as blue and green boxes). Then those features are successively refined and aggregated by the model, resulting in the final feature/color represented by the gray rectangle. Park bench image source: LLFF dataset.

To evaluate the generalization performance, we train GPNR on a set of scenes and test it on new scenes. GPNR improved the state-of-the-art on several benchmarks (following IBRNet and MVSNeRF protocols) by 0.5–1.0 dB on average. On the IBRNet benchmark, GPNR outperforms the baselines while using only 11% of the training scenes. The results below show new views of unseen scenes rendered with no fine-tuning.

GPNR-generated views of held-out scenes, without any fine tuning. This animation is compressed; see the original quality renderings here. Source: IBRNet collected dataset.

Details of GPNR-generated views on held-out scenes from NeX/Shiny (left) and LLFF (right), without any fine tuning. GPNR reproduces more accurately the details on the leaf and the refractions through the lens when compared against IBRNet.

Future Work

One limitation of most neural rendering methods, including ours, is that they require camera poses for each input image. Poses are not easy to obtain and typically come from offline optimization methods that can be slow, limiting possible applications, such as those on mobile devices. Research on jointly learning view synthesis and input poses is a promising future direction. Another limitation of our models is that they are computationally expensive to train. There is an active line of research on faster transformers which might help improve our models’ efficiency. For the papers, more results, and open-source code, you can check out the projects pages for “Light Field Neural Rendering” and “Generalizable Patch-Based Neural Rendering“.

Potential Misuse

In our research, we aim to accurately reproduce an existing scene using images from that scene, so there is little room to generate fake or non-existing scenes. Our models assume static scenes, so synthesizing moving objects, such as people, will not work.

Acknowledgments

All the hard work was done by our amazing intern – Mohammed Suhail – a PhD student at UBC, in collaboration with Carlos Esteves and Ameesh Makadia from Google Research, and Leonid Sigal from UBC. We are thankful to Corinna Cortes for supporting and encouraging this project.

Our work is inspired by NeRF, which sparked the recent interest in view synthesis, and IBRNet, which first considered generalization to new scenes. Our light ray positional encoding is inspired by the seminal paper Light Field Rendering and our use of transformers follow ViT.

Video results are from scenes from LLFF, Shiny, and IBRNet collected datasets.

In-home wireless device tracks disease progression in Parkinson’s patients

Parkinson’s disease is the fastest-growing neurological disease, now affecting more than 10 million people worldwide, yet clinicians still face huge challenges in tracking its severity and progression.

Clinicians typically evaluate patients by testing their motor skills and cognitive functions during clinic visits. These semisubjective measurements are often skewed by outside factors — perhaps a patient is tired after a long drive to the hospital. More than 40 percent of individuals with Parkinson’s are never treated by a neurologist or Parkinson’s specialist, often because they live too far from an urban center or have difficulty traveling.

In an effort to address these problems, researchers from MIT and elsewhere demonstrated an in-home device that can monitor a patient’s movement and gait speed, which can be used to evaluate Parkinson’s severity, the progression of the disease, and the patient’s response to medication.

The device, which is about the size of a Wi-Fi router, gathers data passively using radio signals that reflect off the patient’s body as they move around their home. The patient does not need to wear a gadget or change their behavior. (A recent study, for example, showed that this type of device could be used to detect Parkinson’s from a person’s breathing patterns while sleeping.)

The researchers used these devices to conduct a one-year at-home study with 50 participants. They showed that, by using machine-learning algorithms to analyze the troves of data they passively gathered (more than 200,000 gait speed measurements), a clinician could track Parkinson’s progression and medication response more effectively than they would with periodic, in-clinic evaluations.

“By being able to have a device in the home that can monitor a patient and tell the doctor remotely about the progression of the disease, and the patient’s medication response so they can attend to the patient even if the patient can’t come to the clinic — now they have real, reliable information — that actually goes a long way toward improving equity and access,” says senior author Dina Katabi, the Thuan and Nicole Pham Professor in the Department of Electrical Engineering and Computer Science (EECS), and a principle investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Jameel Clinic.

The co-lead authors are EECS graduate students Yingcheng Liu and Guo Zhang. The research is published today in Science Translational Medicine.

A human radar

This work utilizes a wireless device previously developed in the Katabi lab that analyzes radio signals that bounce off people’s bodies. It transmits signals that use a tiny fraction of the power of a Wi-Fi router — these super-low-power signals don’t interfere with other wireless devices in the home. While radio signals pass through walls and other solid objects, they are reflected off humans due to the water in our bodies.

This creates a “human radar” that can track the movement of a person in a room. Radio waves always travel at the same speed, so the length of time it takes the signals to reflect back to the device indicates how the person is moving.

The device incorporates a machine-learning classifier that can pick out the precise radio signals reflected off the patient even when there are other people moving around the room. Advanced algorithms use these movement data to compute gait speed — how fast the person is walking.

Because the device operates in the background and runs all day, every day, it can collect a massive amount of data. The researchers wanted to see if they could apply machine learning to these datasets to gain insights about the disease over time.

They gathered 50 participants, 34 of whom had Parkinson’s, and conducted a one-year study of in-home gait measurements Through the study, the researchers collected more than 200,000 individual measurements that they averaged to smooth out variability due to the conditions irrelevant to the disease. (For example, a patient may hurry up to answer an alarm or walk slower when talking on the phone.)

They used statistical methods to analyze the data and found that in-home gait speed can be used to effectively track Parkinson’s progression and severity. For instance, they showed that gait speed declined almost twice as fast for individuals with Parkinson’s, compared to those without.

“Monitoring the patient continuously as they move around the room enabled us to get really good measurements of their gait speed. And with so much data, we were able to perform aggregation that allowed us to see very small differences,” Zhang says.

Better, faster results

Drilling down on these variabilities offered some key insights. For instance, the researchers showed that daily fluctuations in a patient’s walking speed correspond with how they are responding to their medication — walking speed may improve after a dose and then begin to decline after a few hours, as the medication impact wears off.

“This enables us to objectively measure how your mobility responds to your medication. Previously, this was very cumbersome to do because this medication effect could only be measured by having the patient keep a journal,” Liu says.

A clinician could use these data to adjust medication dosage more effectively and accurately. This is especially important since drugs used to treat disease symptoms can cause serious side effects if the patient receives too much.

The researchers were able to demonstrate statistically significant results regarding Parkinson’s progression after studying 50 people for just one year. By contrast, an often-cited study by the Michael J. Fox Foundation involved more than 500 individuals and monitored them for more than five years, Katabi says.

“For a pharmaceutical company or a biotech company trying to develop medicines for this disease, this could greatly reduce the burden and cost and speed up the development of new therapies,” she adds.

Katabi credits much of the study’s success to the dedicated team of scientists and clinicians who worked together to tackle the many difficulties that arose along the way. For one, they began the study before the Covid-19 pandemic, so team members initially visited people’s homes to set up the devices. When that was no longer possible, they developed a user-friendly phone app to remotely help participants as they deployed the device at home.

Through the course of the study, they learned to automate processes and reduce effort, especially for the participants and clinical team.

This knowledge will prove useful as they look to deploy devices in at-home studies of other neurological disorders, such as Alzheimer’s, ALS, and Huntington’s. They also want to explore how these methods could be used, in conjunction with other work from the Katabi lab showing that Parkinson’s can be diagnosed by monitoring breathing, to collect a holistic set of markers that could diagnose the disease early and then be used to track and treat it.

“This radio-wave sensor can enable more care (and research) to migrate from hospitals to the home where it is most desired and needed,” says Ray Dorsey, a professor of neurology at the University of Rochester Medical Center, co-author of Ending Parkinson’s, and a co-author of this research paper. “Its potential is just beginning to be seen. We are moving toward a day where we can diagnose and predict disease at home. In the future, we may even be able to predict and ideally prevent events like falls and heart attacks.”

This work is supported, in part, by the National Institutes of Health and the Michael J. Fox Foundation.

In-home wireless device tracks disease progression in Parkinson’s patients

The co-lead authors are EECS graduate students Yingcheng Liu and Guo Zhang. The research is published today in Science Translational Medicine.

A human radar

Better, faster results

Through the course of the study, they learned to automate processes and reduce effort, especially for the participants and clinical team.

This work is supported, in part, by the National Institutes of Health and the Michael J. Fox Foundation.

Run machine learning enablement events at scale using AWS DeepRacer multi-user account mode

This post was co-written by Marius Cealera, Senior Partner Solutions Architect at AWS, Zdenko Estok, Cloud Architect at Accenture and Sakar Selimcan, Cloud Architect at Accenture.

Machine learning (ML) is a high-stakes business priority, with companies spending $306 billion on ML applications in the past 3 years. According to Accenture, companies that scale ML across a business can achieve nearly triple the return on their investments. But too many companies aren’t achieving the value they expected. Scaling ML effectively for the long term requires the professionalization of the industry and the democratization of ML literacy across the enterprise. This requires more accessible ML training, speaking to a larger number of people with diverse backgrounds.

This post shows how companies can introduce hundreds of employees to ML concepts by easily running AWS DeepRacer events at scale.

Run AWS DeepRacer events at scale

AWS DeepRacer is a simple and fun way to get started with reinforcement learning (RL), an ML technique where an agent, such as a physical or virtual AWS DeepRacer vehicle, discovers the optimal actions to take in a given environment. You can get started with RL quickly with hands-on tutorials that guide you through the basics of training RL models and testing them in an exciting, autonomous car racing experience.

“We found the user-friendly nature of DeepRacer allowed our enablement sessions to reach parts of our organizations that are usually less inclined to participate in AI/ML events,” says Zdenko Estok, a Cloud Architect at Accenture. “Our post-event statistics indicate that up to 75% of all participants to DeepRacer events are new to AI/ML and 50% are new to AWS.”

Until recently, organizations hosting private AWS DeepRacer events had to create and assign AWS accounts to every event participant. This often meant securing and monitoring usage across hundreds or even thousands of AWS accounts. The setup and participant onboarding was cumbersome and time-consuming, often limiting the size of the event. With AWS DeepRacer multi-user account management, event organizers can provide hundreds of participants access to AWS DeepRacer using a single AWS account, simplifying event management and improving the participant experience.

Build a solution around AWS DeepRacer multi-user account management

You can use AWS DeepRacer multi-user account management to set usage quotas on training hours, monitor spending on training and storage, enable and disable training, and view and manage models for every event participant. In addition, when combined with an enterprise identity provider (IdP), AWS DeepRacer multi-user account management provides a quick and frictionless onboarding experience for event participants. The following diagram explains what such a setup looks like.

The solution assumes access to an AWS account.

To set up your account with AWS DeepRacer admin permissions for multi-user, follow the steps in Set up your account with AWS DeepRacer admin permissions for multi-user to attach the AWS Identity and Access Management (IAM) AWS DeepRacer Administrator policy, AWSDeepRacerAccountAdminAccess, to the user, group, or role used to administer the event. Next, navigate to the AWS DeepRacer console and activate multi-user account mode.

By activating multi-user account mode, you enable participants to train models on the AWS DeepRacer console, with all training and storage charges billed to the administrator’s AWS account. By default, a sponsoring account in multi-user mode is limited to 100 concurrent training jobs, 100 concurrent evaluation jobs, 1,000 cars, and 50 private leaderboards, shared among all sponsored profiles. You can increase these limits by contacting Customer Service.

This setup also relies on using an enterprise IdP with AWS IAM Identity Center (Successor to AWS Single Sign-On) enabled. For information on setting up IAM Identity Center with an IdP, see Enable IAM Identity Center and Connect to your external identity provider. Note that different IdPs may require slightly different setup steps. Consult your IdP’s documentation for more details.

The solution depicted here works as follows:

Event participants are directed to a dedicated event portal. This can be a simple webpage where participants can enter their enterprise email address in a basic HTML form and choose Register. Registered participants can use this portal to access the AWS DeepRacer console. You can further personalize this page to gather additional user data (such as the user’s DeepRacer AWS profile or their level of AI and ML knowledge) or to add event marketing and training materials.
The event portal registration form calls a customer API endpoint that stores email addresses in Amazon DynamoDB through AWS AppSync. For more information, refer to Attaching a Data Source for a sample CloudFormation template on setting up AWS AppSync with DynamoDB and calling the API from a browser client.
For every new registration, an Amazon DynamoDB Streams event triggers an AWS Lambda function that calls the IdP’s API (in this case, the Azure Active Directory API) to add the participant’s identity in a dedicated event group that was previously set up with IAM Identity Center. The IAM Identity Center permission set controls the level of access racers have in the AWS account. At a minimum, this permission set should include the AWSDeepRacerDefaultMultiUserAccess managed policy. For more information, refer to Permission sets and AWS DeepRacer managed policies.
If the IdP call is successful, the same Lambda function sends an email notification using Amazon Pinpoint, informing the participant the registration was successful and providing the AWS Management Console access URL generated in IAM Identity Center. For more information, refer to Send email by using the Amazon Pinpoint API.
When racers choose this link, they’re asked to authenticate with their enterprise credentials, unless their current browser session is already authenticated. After authentication, racers are redirected to the AWS DeepRacer console where they can start training AWS DeepRacer models and submit them to virtual races.
Event administrators use the AWS DeepRacer console to create and manage races. Race URLs can be shared with the racers through a Lambda-generated email, either as part as the initial registration flow or as a separate notification. Event administrators can monitor and limit usage directly on the AWS DeepRacer console, including estimated spending and training model hours. Administrators can also pause racer sponsorship and delete models.
Finally, administrators can disable multi-user account mode after the event ends and remove participant access to the AWS account either by removing the users from IAM Identity Center or by disabling the setup in the external IdP.

Conclusion

AWS DeepRacer events are a great way to raise interest and increase ML knowledge across all pillars and levels of an organization. This post explains how you can couple AWS DeepRacer multi-user account mode with IAM Identity Center and an enterprise IdP to run AWS DeepRacer events at scale with minimum administrative effort, while ensuring a great participant experience.

The solution presented in this post was developed and used by Accenture to run the world’s largest private AWS DeepRacer event in 2021, with more than 2,000 racers. By working with the Accenture AWS Business Group (AABG), a strategic collaboration by Accenture and AWS, you can learn from the cultures, resources, technical expertise, and industry knowledge of two leading innovators, helping you accelerate the pace of innovation to deliver disruptive products and services. Connect with our team at accentureaws@amazon.com to engage with a network of specialists steeped in industry knowledge and skilled in strategic AWS services in areas ranging from big data to cloud native to ML.

About the authors

Marius Cealera is a senior partner solutions architect at AWS. He works closely with the Accenture AWS Business Group (AABG) to develop and implement innovative cloud solutions. When not working, he enjoys being with his family, biking and trekking in and around Luxembourg.

Zdenko Estok works as a cloud architect and DevOps engineer at Accenture. He works with AABG to develop and implement innovative cloud solutions, and specializes in Infrastructure as Code and Cloud Security. Zdenko likes to bike to the office and enjoys pleasant walks in nature.

Selimcan “Can” Sakar is a cloud first developer and solution architect at Accenture Germany with focus on emerging technologies such as AI/ML, IoT, and Blockchain. Can suffers from Gear Acquisition Syndrome (aka G.A.S.) and likes to pursuit new instruments, bikes and sim-racing equipment in his free time.

A Lidar for All Seasons

Targeted sentiment use cases

Scenario 1: Financial opinion mining and trading signal generation

Scenario 2: Real-time contact center analysis

Scenario 3: Monitoring social media for customer sentiment

Get started with Targeted Sentiment

Programmatically use Targeted Sentiment

API response structure

Conclusion

About the authors

Intelligent Assistants

Conversational AI Interaction

Gesture Recognition

Graphical User Interface

Designing in Simulation

Overview

Light Field Neural Rendering

Generalizing to New Scenes

Future Work

Potential Misuse

Acknowledgments

Run AWS DeepRacer events at scale

Build a solution around AWS DeepRacer multi-user account management

Conclusion

About the authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.