A whale of a tale about responsibility and AI

A couple of years ago, Google AI for Social Good’s Bioacoustics team created a ML model that helps the scientific community detect the presence of humpback whale sounds using an acoustic recording. This tool, developed in partnership with the National Oceanic and Atmospheric Association, helps biologists study whale behaviors, patterns, population and potential human interactions. 

We realized other researchers could use this model for their work, too — it could help them better understand the oceans and protect key biodiversity areas. We wanted to freely share this model, but  struggled with a big dilemma: On one hand, it could help ocean scientists. On the other, though, we worried about whale poachers or other bad actors. What if they used our shared knowledge in a way we didn’t intend? 

We decided to consult with experts in the field in order to help us responsibly open source this machine learning model. We worked with Google’s Responsible Innovation team to use our AI Principles — aguide to responsibly developing technology — to make a decision.

The team gave us the guidance we needed to open source a machine learning model that could be socially beneficial and was built and tested for safety, while also upholding high standards of scientific excellence for the marine biologists and researchers worldwide. 

On Earth Day — and every day — putting the AI Principles into practice is important to the communities we serve, on land and in the sea. 

Curious about diving deeper? You can use AI to explore thousands of hours of humpback whale songs and make your own discoveries with our Pattern Radio and see our collaboration with the National Oceanic and Atmospheric Association of the United States as well as our work with Fisheries and Oceans Canada (DFO) to apply machine learning to protect killer whales in the Salish Sea.

Read More

Cultivating AI: AgTech Industry Taps NVIDIA GPUs to Protect the Planet

What began as a budding academic movement into farm AI projects has now blossomed into a field of startups creating agriculture technology with a positive social impact for Earth.

Whether it’s the threat to honey bees worldwide from varroa mites, devastation to citrus markets from citrus greening, or contamination of groundwater caused from agrochemicals — AI startups are enlisting NVIDIA GPUs to help solve these problems.

With Earth Day today, here’s looking at some of the work of developers, researchers and entrepreneurs who are harnessing NVIDIA GPUs to protect the planet.

The Bee’s Knees: Parasite Prevention 

Bees are under siege by varroa parasites destroying their colonies. And saving the world’s honeybee population is about a lot more than just honey. All kinds of farmers now need to rent bees because of their scarcity to get their own crops pollinated.

Beewise, a startup based in Israel, has developed robo hives with computer vision for infestation identification and treatment capabilities. In December, TIME magazine named the Beewise Beehome to its “Best Inventions of 2020” list. Others are using deep learning to understand hives better and look at improved hive designs.

Orange You Glad AI Helps

If it weren’t for AI, that glass of orange juice for breakfast might be a puckery one. A rampant “citrus greening” disease is decimating orchards and souring fruit worldwide. Thankfully, University of Florida researchers are developing computer vision for smart sprayers of agrochemicals, which are now being licensed and deployed in pilot tests by CCI, an agricultural equipment company.

The system can adjust in real time to turn off or on the application of crop protection products or fertilizers as well as adjust the amount sprayed based on the plant’s size.

SeeTree, based in Israel, is tackling citrus greening, too. It offers a GPU-driven tree analytics platform of image recognition algorithms, sensors, drones and a data collection app.

The startup uses NVIDIA Jetson TX2 to process images and CUDA as the interface for cameras at orchards. The TX2 enables it to do fruit-detection for orchards as well as provide farms with a yield estimation tool.

AI Land of Sky Blue Water

Bilberry, located in Paris, develops weed recognition powered by the NVIDIA Jetson edge AI platform for precision application of herbicides. The startup has helped customers reduce the usage of chemicals by as much as 92 percent.

FarmWise, based in San Francisco, offers farmers an AI-driven robotic machine for pulling weeds rather than spraying them, reducing groundwater contamination.

Also, John Deere-owned Blue River offers precision spraying of crops to reduce the usage of agrochemicals harmful to land and water.

And two students from India last year developed Nindamani, an AI-driven, weed-removal robot prototype that took top honors at the AI at the Edge Challenge on Hackster.io.

Milking AI for Dairy Farmers 

AI is going to the cows, too. Advanced Animal Diagnostics, based in Morrisville, North Carolina, offers a portable testing device to predict animal performance and detect infections in cattle before they take hold. Its tests are processed on NVIDIA GPUs in the cloud. The machine can help reduce usage of antibiotics.

Similarly, SomaDetect aims to improve milk production with AI. The Halifax, Nova Scotia, company runs deep learning models on NVIDIA GPUs to analyze milk images.

Photo courtesy of Mark Kelly on Unsplash

The post Cultivating AI: AgTech Industry Taps NVIDIA GPUs to Protect the Planet appeared first on The Official NVIDIA Blog.

Read More

Green for Good: How We’re Supporting Sustainability Efforts in India

When a community embraces sustainability, it can reap multiple benefits: gainful employment for vulnerable populations, more resilient local ecosystems and a cleaner environment.

This Earth Day, we’re announcing our four latest corporate social responsibility investments in India, home to more than 2,700 NVIDIANs. These initiatives are part of our multi-year efforts in the country, which focus on investing in social innovation, job creation and climate action.

Last year, we funded projects that aided migrant workers affected by COVID-19, increased green cover, furthered sustainable waste management processes and improved livelihoods through job creation.

The organizations we’re supporting this year are:

Foundation for Ecological Security

This project will build 10 water-harvesting structures and a dozen systems for diversion-based irrigation, a technique to irrigate farms by redirecting water from rivers or streams. It will benefit around 4,000 individuals from vulnerable migrant households by increasing vegetative cover and the irrigation potential of the land. The foundation will also create community-based initiatives to augment rural household income through the sale of non-timber forest products such as medicinal plants, leaves or honey.

Impact Guru Foundation

We’re supporting the organization Grow-Trees’ efforts to plant local, non-invasive trees in the Dalma Wildlife Sanctuary, home to dozens of endangered Asiatic elephants. Located in the northeastern state of Jharkhand, this project will employ tribal women and other villagers to plant more than 26,000 trees to improve the environment and reinstate elephant migration routes.

Naandi Foundation

Hyderabad-based Naandi is securing sustainable livelihoods for tribal communities by encouraging the organic farming of coffee and other crops. We’re funding this Rockefeller Foundation Award-winning project to transform depleted soil into carbon-rich landscapes, improving plant health for 3,000 acres of coffee farms and boosting coffee quality to a gourmet product that boosts the income of thousands of farming families.

Energy Harvest Charitable Trust

Energy Harvest aims to reduce open-field burning by connecting small farmers with machinery owners and straw buyers, paving paths for alternative energy sources using agricultural waste. The initiative — which will use AI and edge devices to identify farm fires and track local emission levels — will create dozens of employment opportunities, benefit more than 100 farmers and improve air quality by saving hundreds of acres from burning.

NVIDIA has previously funded projects in India that provided education programs for underprivileged youth, taught computer skills to young women, supported people with disabilities and opened 50 community libraries in remote areas. Many of these initiatives have centered in communities near our three offices — Bangalore, Hyderabad and Pune.

Learn more about corporate social responsibility at NVIDIA.

The post Green for Good: How We’re Supporting Sustainability Efforts in India appeared first on The Official NVIDIA Blog.

Read More

GFN Thursday Drops the Hammer with ‘Vermintide 2 Chaos Wastes’ Free Expansion, ‘Immortals Fenyx Rising The Lost Gods’ DLC

GFN Thursday is our ongoing commitment to bringing great PC games and service updates to our members each week. Every Thursday, we share updates on what’s new in the cloud — games, exclusive features, and news on GeForce NOW.

This week, it includes the latest updates for two popular games: Fatshark’s free expansion Warhammer: Vermintide 2 Chaos Wastes, and The Lost Gods DLC for Ubisoft’s Immortals Fenyx Rising.

Since GeForce NOW is streaming the PC version of these games, members receive the full experience — with expansion and DLC support — and can play with and against millions of other PC players.

The GeForce NOW library also grows by 15 games this week, with game releases from NewCore Games, Ubisoft and THQ.

A High-Stakes Adventure

GeForce NOW members can explore the Chaos Wastes with their fellow heroes in Warhammer: Vermintide 2 with this new rogue-lite inspired game mode. Teams of up to four are built from the ground up, working together on tactics while preparing for the unexpected. As their team progresses, the reward grows greater. Failure is not an option.

Explore the Chaos Wastes together with your fellow heroes in Warhammer: Vermintide 2’s new rogue-lite inspired game mode.

Discover 15 new locations in the free expansion and prepare for an extra challenge as cursed areas — with changing landscapes influenced by the current ruler — bring a more sinister threat.

Warhammer: Vermintide 2 Chaos Wastes is streaming now on GeForce NOW.

God’s Eye View of the Lost Gods

Immortals Fenyx Rising – The Lost Gods, the third narrative DLC, launched today and is streaming starting today on GeForce NOW. Unfolding entirely from an overhead, god’s-eye perspective, the adventure centers on Ash, a new mortal champion following a series of catastrophic disasters.

Meet a new hero, Ash, embarking on an epic journey to reunite the Greek gods in Immortals Fenyx Rising’s newest DLC.

Ash’s mission is to travel to a new land, the Pyrite Island, to find and reunite the gods who left Olympos in a huff after a falling-out with Zeus. These “lost gods,” including Poseidon and Hades, will all need to be convinced to return to the Pantheon and restore balance to the world. Naturally, there are plenty of monsters standing between them and Ash, which players can dispatch using a new, brawler-inspired combat system.

Get Your Game On

It’s a busy GFN Thursday this week with 15 games joining the GeForce NOW library today.

Turnip Boy Commits Tax Evasion, releasing day-and-date with the Steam launch, is one of 15 new games this GFN Thursday.

Turnip Boy Commits Tax Evasion (Steam)

Launching on Steam today, play as an adorable yet trouble-making turnip. Avoid paying taxes, solve plantastic puzzles, harvest crops and battle massive beasts all in a journey to tear down a corrupt vegetable government!

Warhammer 40,000: Inquisitor – Martyr (Steam)

Enter the Chaos-infested Caligari Sector and purge the unclean with the most powerful agents of the Imperium of Man! Warhammer 40,000: Inquisitor – Martyr is a grim, action-RPG featuring multiple classes of the Inquisition who will carry out the Emperor’s will.

Anno 2070 (Steam) and Anno 2205 (Steam)

Two games from Ubisoft’s long-running city-building franchise, Anno, release on GeForce NOW today. Anno 2070 offers a new world full of challenges, where you’ll need to master resources, diplomacy and trade in the most comprehensive economic management system in the Anno series.

In Anno 2205, you join humankind‘s next step into the future with the promise to build a better tomorrow. You conquer Earth, establishing rich, bustling cities and grand industrial complexes, but to secure the prosperity of your people, you must travel into space.

In addition, members can look for the following:

What are you playing? Let us know on Twitter using #GFNThursday, or in the comments below.

The post GFN Thursday Drops the Hammer with ‘Vermintide 2 Chaos Wastes’ Free Expansion, ‘Immortals Fenyx Rising The Lost Gods’ DLC appeared first on The Official NVIDIA Blog.

Read More

Mooning Over Selene: NVIDIA’s Julie Bernauer Talks Setting Up One of World’s Fastest Supercomputers

Though admittedly prone to breaking kitchen appliances like ovens and microwaves, Julie Bernauer — senior solutions architect for machine learning and deep learning at NVIDIA — led the small team that successfully built Selene, the world’s fifth-fastest supercomputer.

Adding to an already impressive feat, Bernauer’s team brought up Selene as the world went into lockdown in early 2020. They used skeleton crews, social distancing protocols, and remote cable validation to achieve what typically takes months with a larger install team in a few weeks.

 

Bernauer told NVIDIA AI Podcast host Noah Kravitz about the goal in creating Selene, which was primarily to support NVIDIA’s researchers. Referencing her time as a doctoral student, Bernauer explains how researchers are often prevented from working on larger models due to expense and infrastructure.

With Selene, the infrastructure is modular and can be scaled up or down depending on what users require, and allows for different types of research to be performed simultaneously. Bernauer said that Selene is proving most useful to autonomous vehicle and language modeling research at the moment.

Going forward, Bernauer envisions some of the power and efficiency of systems like Selene becoming more available on widely accessible devices, such as laptops or edge products such as cars.

Key Points From This Episode:

  • Selene’s unique, pandemic-safe installation is further explained in an NVIDIA blog detailing the specific efforts of Bernauer’s team and the lessons learned from past NVIDIA supercomputers such as SATURNV and Circe.
  • Bernauer joined NVIDIA in 2015, after spending 15 years in academia. She obtained her Ph.D. in structural genomics from Université Paris-Sud, after which she studied with Nobel Prize winner Michael Levitt at Stanford.

Tweetables:

“[Selene] is an infrastructure that people can share, where we can do different types of research at a time” — Julie Bernauer [8:30]

“We did [Selene] for ourselves, but we also did it … to figure out how to make a product better by going through the experience” — Julie Bernauer [13:27]

You Might Also Like:

NVIDIA’s Marc Hamilton on Building the Cambridge-1 Supercomputer During a Pandemic

Marc Hamilton, vice president of solutions architecture and engineering at NVIDIA, speaks about overseeing the construction of the U.K.’s most powerful supercomputer, Cambridge-1. Built on the NVIDIA DGX SuperPOD architecture, the system will be used by AstraZeneca, GSK, Oxford Nanopore and more.

Hugging Face’s Sam Shleifer Talks Natural Language Processing

Hugging Face is more than just an adorable emoji — it’s a company that’s demystifying AI by transforming the latest developments in deep learning into usable code. Research engineer Sam Shleifer talks about the company’s NLP technology, which is used at over 1,000 companies.

NVIDIA’s Bryan Catanzaro on the Latest from NVIDIA Research

Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, walks through some of the latest developments at NVIDIA research … as well as shares a story involving Andrew Ng and cats.

The post Mooning Over Selene: NVIDIA’s Julie Bernauer Talks Setting Up One of World’s Fastest Supercomputers appeared first on The Official NVIDIA Blog.

Read More

How we’re minimizing AI’s carbon footprint

A photograph of a textbook about computer architecture.

The book that led to my visit to Google.

When I first visited Google back in 2002, I was a computer science professor at UC Berkeley. My colleague John Hennessey and I were updating our textbook on computer architecture, and Larry Page — who rode a hot-rodded electric scooter at the time — agreed to show me how his then three-year-old company designed its computing for Search. I remember the setup was lean yet powerful: just 6,000 low-cost PC servers and 12,000 PC disks answering 70 million queries around the world, every day. It was my first real look at how Google built its computer systems from the ground up, optimizing for efficiency at every level.

When I joined the company in 2016, it was with the goal of helping research how to maximize the efficiency of computer systems built specifically for artificial intelligence. Last year, Google set an ambitious goal of operating on 24/7 carbon-free energy, everywhere, by the end of the decade. But at the same time, machine learning systems are quickly becoming larger and more capable. What will be the environmental impact of those systems — and how can we neutralize that impact going forward? 

Today, we’re publishing a detailed analysis that addresses both of those questions. It’s an account of the energy- and carbon-costs of training six state-of-the art ML models, including five of our own. (Training a model is like building infrastructure: You spend the energy to train the model once, after which it’s used and reused many times, possibly by hundreds of millions of people.) To our knowledge, it’s the most thorough evaluation of its kind yet published. And while we had reason to believe our systems were efficient, we were encouraged by just how efficient they turned out to be.

For instance, we found that developing the Evolved Transformer model, a more efficient version of the popular Transformer architecture for ML, emitted nearly 100 times less carbon dioxide equivalent than a widely cited estimate. Of the roughly 12.7 terawatt-hours of electricity that Google uses every year, less than 1/200th of a percent of it was spent training our most computationally demanding models.  

What’s more, our analysis found that there already exist many ways to develop and train ML systems even more efficiently: Specially designed models, processors and data centers can dramatically reduce energy requirements, while the right selection of energy sources can go a long way to reduce the carbon that’s emitted during training. In fact, the right combination of model, processor, data center and energy source can reduce the carbon footprint of training an ML system by 1000 times. 

There’s no one easy trick for achieving a reduction that large, so let’s unpack that figure.  Minimizing a system’s carbon footprint is a two-part problem: First you want to minimize the energy the system consumes, then you have to supply that energy from the cleanest source possible.

Our analysis took a closer look at GShard and Switch Transformer, two models recently developed at Google Research. They’re the largest models we’ve ever created, but they both use a technique called sparse activation that enables them to only use a small fraction of their total architecture for a given task. It’s a bit like how your brain uses a small fraction of its 100 billion neurons to help you read this sentence. The result is that these sparse models consume less than one tenth the energy that you’d expect of similarly sized dense models — without sacrificing accuracy.

But to minimize ML’s energy use, you need more than just efficient models — you also need efficient processors and data centers to train and serve them. Google’s Tensor Processing Units (TPUs) are specifically designed for machine learning, which makes them up to five times more efficient than off-the-shelf processors. And the cloud computing data centers that house those TPUs are up to twice as efficient as typical enterprise data centers. 

Once you’ve minimized your energy requirements, you have to think about where that energy originates. The electricity a data center consumes is determined by the grid where it’s located. And depending on what resources were used to generate the electricity on that grid, this may emit carbon. 

The carbon intensity of grids varies greatly across regions, so it really matters where models are trained. For instance, the mix of energy supplying Google’s Iowa data center produces 0.080kg of CO2e per kilowatt hour of electricity, when combining the electricity supplied by the grid and produced by Google’s wind farms in Iowa. That’s 5.4 times less than the U.S. average. 

Any one of these four factors — models, chips, data centers and energy sources — can have a sizable effect on the costs associated with developing an ML system. But their cumulative impact can be enormous.

When John and I updated our textbook with what we’d learned on our visit to Google back in 2002, we wrote that “reducing the power per PC [server]” presented “a major opportunity for the future.” Nearly 20 years later, Google has found many opportunities to streamline its systems — but plenty remain to be seized. As a result of our analysis, we’ve already begun shifting where we train our computationally intensive ML models. We’re optimizing data center efficiency by shifting compute tasks to times when low-carbon power sources are most plentiful. Our Oklahoma data center, in addition to receiving its energy from cleaner sources, will house many of our next generation of TPUs, which are even more efficient than their predecessors. And sparse activation is just one example of the algorithmic ingenuity Google is using to design ML models that work smarter, not harder.

Read More

Database researchers discuss research award opportunity in next-generation data infrastructure

On April 19, 2021, Facebook launched a request for proposals (RFP) on next-generation data infrastructure. With this RFP, which closes on June 2, 2021, the Facebook Core Data and Data Infra teams hope to deepen their ties to the academic research community by seeking out innovative solutions to the challenges that still remain in the data management community. To provide an inside look from the teams behind the RFP, we reached out to Stavros Harizopoulos and Shrikanth Shankar, who are leading the effort within their respective teams.
View RFPShankar is a Director of Engineering on the Core Data team, which builds and supports the online data serving stack for Facebook, providing the databases, caches, and worldwide distribution that power Facebook, Instagram, WhatsApp, and more. Harizopoulos is a Software Engineer within Data Infrastructure, which delivers efficient platforms and end-user tools for the collection, management, and analysis of Facebook data. In this Q&A, Shankar and Harizopoulos contextualize the RFP by providing more background to database research at Facebook. They also discuss what inspired this RFP and where people can stay updated about what their teams are up to.

Q: What does database research look like at Facebook, and how has it evolved over the years?

A: Facebook has had a long history of making contributions to the database space — Hive, Presto, RocksDB, and MyRocks all being examples of innovative work that started within the company. The scale we run at and the unique constraints of our workloads make many existing solutions infeasible and provide a perspective that leads to new ideas. This has become increasingly true over the years as the company has grown and new challenges associated with this scale have shown up. We aspire to continue our tradition of building new, innovative database technologies.

Q: What’s the goal of this RFP?

A: As businesses and organizations become increasingly data driven and products and services are further built around intelligence derived from data, the need for highly reliable, flexible, and efficient data infrastructure becomes even more important. Modern data infrastructure architectures inherit from decades of database research, but recent trends and developments, such as the decoupling of compute and storage and the need to operate efficiently at global scale, as well as the emergence of new use cases such as data science and machine learning workloads, pose new challenges and opportunities.

With this RFP, we seek out innovative approaches to a number of problems that have the potential to set the defining characteristics of next-generation data infrastructure. Many of these problems are not unique to Facebook, and we are keen to learn about the great research done in this area as well as to strengthen our relations with academia.

Q: How does this RFP fit into the bigger picture for database research at Facebook?

A: Defining the underpinnings of data infrastructure that is reliable, resilient, flexible, efficient, and performant at global scale is at the core of database research at Facebook. Our research efforts, however, extend to several directions along modeling, managing, and visualizing different types of data, ranging from structured data to machine-generated logs and time-series data. We innovate in areas such as data storage and indexing, query processing, data modeling, transaction processing, and distributed systems, as well as novel approaches to privacy and security in data management.

Q: What inspired this RFP?

A: While we share our experiences by writing papers and publishing them and, in turn, we benefit from all the innovation in the database space, we’ve seen a couple of ways we could be making this better. Concretely, we’ve seen that certain areas may not be perceived externally as being impactful or important even when they are critical for us. On our side, we recognize that the solutions we have in place or are considering may be limited by our specific systems and the history behind them. We began this RFP process as a way for us to collaborate with academia by highlighting specific problems and looking for innovative approaches that tackle these issues.

Q: Where can people stay updated and learn more?

A: We actively participate each year in major database conferences, such as ICDE, SIGMOD, VLDB, and CIDR. This is where the academic community can reach out to us with questions and ideas. We also contribute a lot of our work through open source. Here are some examples:

Applications for the Next-Generation Data Infrastructure RFP close on June 2, 2021, and winners will be announced the following month. To receive updates about new research award opportunities and deadline notifications, subscribe to our RFP email list.

The post Database researchers discuss research award opportunity in next-generation data infrastructure appeared first on Facebook Research.

Read More

MaX-DeepLab: Dual-Path Transformers for End-to-End Panoptic Segmentation

Posted by Huiyu Wang, Student Researcher and Liang-Chieh Chen, Research Scientist, Google Research

Panoptic segmentation is a computer vision task that unifies semantic segmentation (assigning a class label to each pixel) and instance segmentation (detecting and segmenting each object instance). A core task for real-world applications, panoptic segmentation predicts a set of non-overlapping masks along with their corresponding class labels (i.e., category of object, like “car”, “traffic light”, “road”, etc.) and is generally accomplished using multiple surrogate sub-tasks that approximate (e.g., by using box detection methods) the goals of panoptic segmentation.

An example image and its panoptic segmentation masks from the Cityscapes dataset.
Previous methods approximate panoptic segmentation with a tree of surrogate sub-tasks.

Each surrogate sub-task in this proxy tree introduces extra manually-designed modules, such as anchor design rules, box assignment rules, non-maximum suppression (NMS), thing-stuff merging, etc. Although there are good solutions to individual surrogate sub-tasks and modules, undesired artifacts are introduced when these sub-tasks come together in a pipeline for panoptic segmentation, especially in challenging conditions (e.g., two people with similar bounding boxes will trigger NMS, resulting in a missing mask).

Previous efforts, such as DETR, attempted to solve some of these issues by simplifying the box detection sub-task into an end-to-end operation, which is more computationally efficient and results in fewer undesired artifacts. However, the training process still relies heavily on box detection, which does not align with the mask-based definition of panoptic segmentation. Another line of work completely removes boxes from the pipeline, which has the benefit of removing an entire surrogate sub-task along with its associated modules and artifacts. For example, Axial-DeepLab predicts pixel-wise offsets to predefined instance centers, but the surrogate sub-task it uses encounters challenges with highly deformable objects, which have a large variety of shapes (e.g., a cat), or nearby objects with close centers in the image plane, e.g. the image below of a dog seated in a chair.

When the centers of the dog and the chair are close to each other, Axial-DeepLab merges them into one object.

In “MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers”, to be presented at CVPR 2021, we propose the first fully end-to-end approach for the panoptic segmentation pipeline, directly predicting class-labeled masks by extending the Transformer architecture to this computer vision task. Dubbed MaX-DeepLab for extending Axial-DeepLab with a Mask Xformer, our method employs a dual-path architecture that introduces a global memory path, allowing for direct communication with any convolution layers. As a result, MaX-DeepLab shows a significant 7.1% panoptic quality (PQ) gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. MaX-DeepLab achieves the state-of-the-art 51.3% PQ on COCO test-dev set, without test time augmentation.

MaX-DeepLab is fully end-to-end: It predicts panoptic segmentation masks directly from images.

End-to-End Panoptic Segmentation
Inspired by DETR, our model directly predicts a set of non-overlapping masks and their corresponding semantic labels, with output masks and classes that are optimized with a PQ-style objective. Specifically, inspired by the evaluation metric, PQ, which is defined as the recognition quality (whether or not the predicted class is correct) times the segmentation quality (whether the predicted mask is correct), we define a similarity metric between two class-labeled masks in the exact same way. The model is directly trained by maximizing this similarity between ground truth masks and predicted masks via one-to-one matching. This direct modeling of panoptic segmentation enables end-to-end training and inference, removing the hand-coded priors that are necessary in existing box-based and box-free methods.

MaX-DeepLab directly predicts N masks and N classes with a CNN and a mask transformer.

Dual-Path Transformer
Instead of stacking a traditional transformer on top of a convolutional neural network (CNN), we propose a dual-path framework for combining CNNs with transformers. Specifically, we enable any CNN layer to read and write to global memory by using a dual-path transformer block. This proposed block adopts all four types of attention between the CNN-path and the memory-path, and can be inserted anywhere in a CNN, enabling communication with the global memory at any layer. MaX-DeepLab also employs a stacked-hourglass-style decoder that aggregates multi-scale features into a high resolution output. The output is then multiplied with the global memory feature, to form the mask set prediction. The classes for the masks are predicted with another branch of the mask transformer.

An overview of the dual-path transformer architecture.

Results
We evaluate MaX-DeepLab on one of the most challenging panoptic segmentation datasets, COCO, against both of the state-of-the-art box-free (Axial-DeepLab) and box-based (DetectoRS) methods. MaX-DeepLab, without test time augmentation, achieves the state-of-the-art result of 51.3% PQ on the test-dev set.

Comparison on COCO test-dev set.

This result surpasses Axial-DeepLab by 7.1% PQ in the box-free regime and DetectoRS by 1.7% PQ, bridging the gap between box-based and box-free methods for the first time. For a consistent comparison with DETR, we also evaluated a lightweight version of MaX-DeepLab that matches the number of parameters and computations of DETR. The lightweight MaX-DeepLab outperforms DETR by 3.3% PQ on the val set and 3.0% PQ on the test-dev set. In addition, we performed extensive ablation studies and analyses on our end-to-end formulation, model scaling, dual-path architectures, and loss functions. Also the extra-long training schedule of DETR is not necessary for MaX-DeepLab.

As an example in the figure below, MaX-DeepLab correctly segments a dog sitting on a chair. Axial-DeepLab relies on a surrogate sub-task of regressing object center offsets. It fails because the centers of the dog and the chair are close to each other. DetectoRS classifies object bounding boxes, instead of masks, as a surrogate sub-task. It filters out the chair mask because the chair bounding box has a low confidence.

A case study for MaX-DeepLab and state-of-the-art box-free and box-based methods.

Another example shows how MaX-DeepLab correctly segments images with challenging conditions.

MaX-DeepLab correctly segments the overlapping zebras. This case is also challenging for other methods since the zebras have similar bounding boxes and nearby object centers. (credit & license)

Conclusion
We have shown for the first time that panoptic segmentation can be trained end-to-end. MaX-DeepLab directly predicts masks and classes with a mask transformer, removing the need for many hand-designed priors such as object bounding boxes, thing-stuff merging, etc. Equipped with a PQ-style loss and a dual-path transformer, MaX-DeepLab achieves the state-of-the-art result on the challenging COCO dataset, closing the gap between box-based and box-free methods.

Acknowledgements
We are thankful to our co-authors, Yukun Zhu, Hartwig Adam, and Alan Yuille. We also thank Maxwell Collins, Sergey Ioffe, Jiquan Ngiam, Siyuan Qiao, Chen Wei, Jieneng Chen, and the Mobile Vision team for the support and valuable discussions.

Read More

The Future’s So Bright: NVIDIA DRIVE Shines at Auto Shanghai

NVIDIA DRIVE-powered cars electrified the atmosphere this week at Auto Shanghai.

The global auto show is the oldest in China and has become the stage to debut the latest vehicles. And this year, automakers, suppliers and startups developing on NVIDIA DRIVE brought a new energy to the event with a wave of intelligent electric vehicles and self-driving systems.

The automotive industry is transforming into a technology industry — next-generation lineups will be completely programmable and connected to a network, supported by software engineers who will invent new software and services for the life of the car.

Just as the battery capacity of an electric vehicle provides miles of range, the computing capacity of these new vehicles will give years of new delight.

EVs for Everyday

Automakers have been introducing electric vehicle technology with one or two specialized models. Now, these lineups are becoming diversified, with an EV for every taste.

The all-new Mercedes-Benz EQB.

Joining the recently launched EQS flagship sedan and EQA SUV on the showfloor, the Mercedes-Benz EQB adds a new flavor to the all-electric EQ family. The compact SUV brings smart electromobility in a family size, with seven seats and AI features.

The latest generation MBUX AI cockpit, featured in the Mercedes-Benz EQB.

Like its EQA sibling, the EQB features the latest generation MBUX AI cockpit, powered by NVIDIA DRIVE. The high-performance system includes an augmented reality head-up display, AI voice assistant and rich interactive graphics to enable the driver to enjoy personalized, intelligent features.

EV maker Xpeng is bringing its new energy technology to the masses with the P5 sedan. It joins the P7 sports sedan in offering intelligent mobility with NVIDIA DRIVE.

The Xpeng P5.

The P5 will be the first to bring Xpeng’s Navigation Guided Pilot (NGP) capabilities to public roads. The automated driving system leverages the automaker’s full-stack XPILOT 3.5, powered by NVIDIA DRIVE AGX Xavier. The new architecture processes data from 32 sensors — including two lidars, 12 ultrasonic sensors, five millimeter-wave radars and 13 high-definition cameras — integrated into 360-degree dual-perception fusion to handle challenging and complex road conditions.

Also making its auto show debut was the NIO ET7, which was first unveiled during a company event in January. The ET7 is the first vehicle that features NIO’s Adam supercomputer, which leverages four NVIDIA DRIVE Orin processors to achieve more than 1,000 trillion operations per second (TOPS).

The NIO ET7.

The flagship vehicle leapfrogs current model capabilities, with more than 600 miles of battery range and advanced autonomous driving. With Adam, the ET7 can perform point-to-point autonomy, using 33 sensors and high-performance compute to continuously expand the domains in which it operates — from urban to highway driving to battery swap stations.

Elsewhere on the showfloor, SAIC’s R Auto exhibited the intelligent ES33. This smart, futuristic vehicle equipped with R-Tech leverages the high performance of NVIDIA DRIVE Orin to deliver automated driving features for a safer, more convenient ride.

The R-Auto ES33.

SAIC- and Alibaba-backed IM Motors — which stands for intelligence in motion — also made its auto show debut with the electric L7 sedan and SUV, powered by NVIDIA DRIVE. These first two vehicles will have autonomous parking and other automated driving features, as well as a 93kWh battery that comes standard.

The IM Motors L7.

Improving Intelligence

In addition to automaker reveals, suppliers and self-driving startups showcased their latest technology built on NVIDIA DRIVE.

The scalable ZF ProAI Supercomputer.

Global supplier ZF continued to push the bounds of autonomous driving performance with the latest iteration of its ProAI Supercomputer. With NVIDIA DRIVE Orin at its core, the scalable autonomous driving compute platform supports systems with level 2 capabilities all the way to full self-driving, with up to 1,000 TOPS of performance.

A Momenta test vehicle with MPilot automated driving system.

Autonomous driving startup Momenta demonstrated the newest capabilities of MPilot, its autopilot and valet parking system. The software, which is designed for mass production vehicles, leverages DRIVE Orin, which enhances production efficiency for a more streamlined time to market.

From advanced self-driving systems to smart, electric vehicles of all sizes, the NVIDIA DRIVE ecosystem stole the show this week at Auto Shanghai.

The post The Future’s So Bright: NVIDIA DRIVE Shines at Auto Shanghai appeared first on The Official NVIDIA Blog.

Read More