Learning to Route by Task for Efficient Inference

Scaling large language models has resulted in significant quality improvements natural language understanding (T5), generation (GPT-3) and multilingual neural machine translation (M4). One common approach to building a larger model is to increase the depth (number of layers) and width (layer dimensionality), simply enlarging existing dimensions of the network. Such dense models take an input sequence (divided into smaller components, called tokens) and pass every token through the full network, activating every layer and parameter. While these large, dense models have achieved state-of-the-art results on multiple natural language processing (NLP) tasks, their training cost increases linearly with model size.

An alternative, and increasingly popular, approach is to build sparsely activated models based on a mixture of experts (MoE) (e.g., GShard-M4 or GLaM), where each token passed to the network follows a separate subnetwork by skipping some of the model parameters. The choice of how to distribute the input tokens to each subnetwork (the “experts”) is determined by small router networks that are trained together with the rest of the network. This allows researchers to increase model size (and hence, performance) without a proportional increase in training cost.

While this is an effective strategy at training time, sending tokens of a long sequence to multiple experts, again makes inference computationally expensive because the experts have to be distributed among a large number of accelerators. For example, serving the 1.2T parameter GLaM model requires 256 TPU-v3 chips. Much like dense models, the number of processors needed to serve an MoE model still scales linearly with respect to the model size, increasing compute requirements while also resulting in significant communication overhead and added engineering complexity.

In “Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference”, we introduce a method called Task-level Mixture-of-Experts (TaskMoE), that takes advantage of the quality gains of model scaling while still being efficient to serve. Our solution is to train a large multi-task model from which we then extract smaller, stand-alone per-task subnetworks suitable for inference with no loss in model quality and with significantly reduced inference latency. We demonstrate the effectiveness of this method for multilingual neural machine translation (NMT) compared to other mixture of experts models and to models compressed using knowledge distillation.

Training Large Sparsely Activated Models with Task Information
We train a sparsely activated model, where router networks learn to send tokens of each task-specific input to different subnetworks of the model associated with the task of interest. For example, in the case of multilingual NMT, every token of a given language is routed to the same subnetwork. This differs from other recent approaches, such as the sparsely gated mixture of expert models (e.g., TokenMoE), where router networks learn to send different tokens in an input to different subnetworks independent of task.

Inference: Bypassing Distillation by Extracting Subnetworks
A consequence of this difference in training between TaskMoE and models like TokenMoE is in how we approach inference. Because TokenMoE follows the practice of distributing tokens of the same task to many experts at both training and inference time, it is still computationally expensive at inference.

For TaskMoE, we dedicate a smaller subnetwork to a single task identity during training and inference. At inference time, we extract subnetworks by discarding unused experts for each task. TaskMoE and its variants enable us to train a single large multi-task network and then use a separate subnetwork at inference time for each task without using any additional compression methods post-training. We illustrate the process of training a TaskMoE network and then extracting per-task subnetworks for inference below.

During training, tokens of the same language are routed to the same expert based on language information (either source, target or both) in task-based MoE. Later, during inference we extract subnetworks for each task and discard unused experts.

To demonstrate this approach, we train models based on the Transformer architecture. Similar to GShard-M4 and GLaM, we replace the feedforward network of every other transformer layer with a Mixture-of-Experts (MoE) layer that consists of multiple identical feedforward networks, the “experts”. For each task, the routing network, trained along with the rest of the model, keeps track of the task identity for all input tokens and chooses a certain number of experts per layer (two in this case) to form the task-specific subnetwork. The baseline dense Transformer model has 143M parameters and 6 layers on both the encoder and decoder. The TaskMoE and TokenMoE that we train are also both 6 layers deep but with 32 experts for every MoE layer and have a total of 533M parameters. We train our models using publicly available WMT datasets, with over 431M sentences across 30 language pairs from different language families and scripts. We point the reader to the full paper for further details.

Results
In order to demonstrate the advantage of using TaskMoE at inference time, we compare the throughput, or the number of tokens decoded per second, for TaskMoE, TokenMoE, and a baseline dense model. Once the subnetwork for each task is extracted, TaskMoE is 7x smaller than the 533M parameter TokenMoE model, and it can be served on a single TPUv3 core, instead of 64 cores required for TokenMoE. We see that TaskMoE has a peak throughput twice as high as that of TokenMoE models. In addition, on inspecting the TokenMoE model, we find that 25% of the inference time has been spent in inter-device communication, while virtually no time is spent in communication by TaskMoE.

Comparing the throughput of TaskMoE with TokenMoE across different batch sizes. The maximum batch size for TokenMoE is 1024 as opposed to 4096 for TaskMoE and the dense baseline model. Here, TokenMoE has one instance distributed across 64 TPUv3 cores, while TaskMoE and the baseline model have one instance on each of the 64 cores.

A popular approach to building a smaller network that still performs well is through knowledge distillation, in which a large teacher model trains a smaller student model with the goal of matching the teacher’s performance. However, this method comes at the cost of additional computation needed to train the student from the teacher. So, we also compare TaskMoE to a baseline TokenMoE model that we compress using knowledge distillation. The compressed TokenMoE model has a size comparable to the per-task subnetwork extracted from TaskMoE.

We find that in addition to being a simpler method that does not need any additional training, TaskMoE improves upon a distilled TokenMoE model by 2.1 BLEU on average across all languages in our multilingual translation model. We note that distillation retains 43% of the performance gains achieved from scaling a dense multilingual model to a TokenMoE, whereas extracting the smaller subnetwork from the TaskMoE model results in no loss of quality.

BLEU scores (higher is better) comparing a distilled TokenMoE model to the TaskMoE and TokenMoE models with 12 layers (6 on the encoder and 6 on the decoder) and 32 experts. While both approaches improve upon a multilingual dense baseline, TaskMoE improves upon the baseline by 3.1 BLEU on average while distilling from TokenMoE improves upon the baseline by 1.0 BLEU on average.

Next Steps
The quality improvements often seen with scaling machine learning models has incentivized the research community to work toward advancing scaling technology to enable efficient training of large models. The emerging need to train models capable of generalizing to multiple tasks and modalities only increases the need for scaling models even further. However, the practicality of serving these large models remains a major challenge. Efficiently deploying large models is an important direction of research, and we believe TaskMoE is a promising step towards more inference friendly algorithms that retain the quality gains of scaling.

Acknowledgements
We would like to first thank our coauthors – Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin and Minh-Thang Luong. We would also like to thank Wolfgang Macherey, Yuanzhong Xu, Zhifeng Chen and Macduff Richard Hughes for their helpful feedback. Special thanks to the Translate and Brain teams for their useful input and discussions, and the entire GShard development team for their foundational contributions to this project. We would also like to thank Tom Small for creating the animations for the blog post.

Read More

From Imagination to Animation, How an Omniverse Creator Makes Films Virtually

Editor’s note: This post is one in a series that features individual creators and developers who use NVIDIA Omniverse to boost their artistic processes.

A headshot of Jae Solina
Jae Solina

Growing up in the Philippines, award-winning filmmaker Jae Solina says he turned to movies for a reminder that the world was much larger than himself and his homeland.

He started the popular YouTube channel JSFILMZ a decade ago as a way to share home videos he made for fun.

Since then, he’s expanded the channel to showcase his computer graphics-based movies, which have won the Best Animation and Best Super Short Film awards at the Las Vegas Independent Film Festival.

He also posts tutorials for virtual filmmaking with tools, including NVIDIA Omniverse — a physically accurate 3D design collaboration platform exclusively available with NVIDIA RTX GPUs and part of the NVIDIA Studio suite of creator tools.

Making tutorials is a way of paying it forward for Solina, as he is self-taught, gaining his computer graphics skills from other artists’ YouTube videos.

Solina now lives in Las Vegas with his wife and two kids, balancing filmmaking with part-time school and a full-time job.

“The only thing stopping you from creating something is your effort and imagination,” he said. “There are so many free tools like Blender or Omniverse that are readily available, enabling us to create what we want.”

Virtual Film Production

Solina creates computer graphics-based animation films, which can usually take large amounts of time and money, he said. NVIDIA Omniverse eases this process.

“With Omniverse, I don’t have to wait a full week to render a 30-second animation,” Solina said. “The rendering speed in Omniverse is superb and saves me a lot of time, which is important when balancing my filmmaking, non-creative work and family.”

Solina uses an NVIDIA GeForce RTX 3060 GPU, as well as Omniverse apps like Audio2Face, Create and Machinima to create his films virtually.

He also uses Omniverse Connectors for 3D applications like Blender and Autodesk Maya, as well as Reallusion’s iClone and Character Creator, with which he edits motion-capture data.

As a solo filmmaker, Solina said his main challenge is finding virtual assets — like characters and environments — that are photorealistic enough to use for movies.

“My process can definitely be a bit backwards, since the ideal method would be to write a script and then find the assets to make the story come alive,” he said. “But when I’m limited in my resources, I have to think of a storyline that fits a character or an environment I find.”

New support for the Omniverse ecosystem provided by 3D marketplaces and digital asset libraries helps solve this challenge — with thousands of Omniverse-ready assets for creators, all based on Universal Scene Description format.

Looking forward, Solina plans to create a short film entirely inside Omniverse.

Explore the NVIDIA Omniverse Instagram, gallery, forums and Medium channel. Check out Omniverse tutorials on Twitter and YouTube, and join our Discord server and Twitch channel to chat with the community.

The post From Imagination to Animation, How an Omniverse Creator Makes Films Virtually appeared first on The Official NVIDIA Blog.

Read More

“Hey, Alexa! Are you trustworthy?”

A family gathers around their kitchen island to unbox the digital assistant they just purchased. They will be more likely to trust this new voice-user interface, which might be a smart speaker like Amazon’s Alexa or a social robot like Jibo, if it exhibits some humanlike social behaviors, according to a new study by researchers in MIT’s Media Lab.

The researchers found that family members tend to think a device is more competent and emotionally engaging if it can exhibit social cues, like moving to orient its gaze at a speaking person. In addition, their study revealed that branding — specifically, whether the manufacturer’s name is associated with the device — has a significant effect on how members of a family perceive and interact with different voice-user interfaces.

When a device has a higher level of social embodiment, such as the ability to give verbal and nonverbal social cues through motion or expression, family members also interacted with one another more frequently while engaging with the device as a group, the researchers found.

Their results could help designers create voice-user interfaces that are more engaging and more likely to be used by members of a family in the home, while also improving the transparency of these devices. The researchers also outline ethical concerns that could come from certain personality and embodiment designs.

“These devices are new technology coming into the home and they are still very under-explored,” says Anastasia Ostrowski, a research assistant in the Personal Robotics Group in the Media Lab, and lead author of the paper. “Families are in the home, so we were very interested in looking at this from a generational approach, including children and grandparents. It was super interesting for us to understand how people are perceiving these, and how families interact with these devices together.”

Coauthors include Vasiliki Zygouras, a recent Wellesley College graduate working in the Personal Robotics Group at the time of this research; Research Scientist Hae Won Park; Cornell University graduate student Jenny Fu; and senior author Cynthia Breazeal, professor of media arts and sciences, director of MIT RAISE, and director of the Personal Robotics Group, as well as a developer of the Jibo robot. The paper is published today in Frontiers in Robotics and AI.

“The human-centered insights of this work are relevant to the design of all kinds of personified AI devices, from smart speakers and intelligent agents to personal robots,” says Breazeal.

Investigating interactions

This work grew out of an earlier study where the researchers explored how people use voice-user interfaces at home. At the start of the study, users familiarized themselves with three devices before taking one home for a month. The researchers noticed that people spent more time interacting with a Jibo social robot than they did the smart speakers, Amazon Alexa and Google Home. They wondered why people engaged more with the social robot.

To get to the bottom of this, they designed three experiments that involved family members interacting as a group with different voice-user interfaces. Thirty-four families, comprising 92 people between age 4 and 69, participated in the studies.

The experiments were designed to mimic a family’s first encounter with a voice-user interface. Families were video recorded as they interacted with three devices, working through a list of 24 actions (like “ask about the weather” or “try to learn the agent’s opinions”). Then they answered questions about their perception of the devices and categorized the voice-user interfaces’ personalities.

In the first experiment, participants interacted with a Jibo robot, Amazon Echo, and Google Home, with no modifications. Most found the Jibo to be far more outgoing, dependable, and sympathetic. Because the users perceived that Jibo had a more humanlike personality, they were more likely to interact with it, Ostrowski explains.

An unexpected result

In the second experiment, researchers set out to understand how branding affected participants’ perspectives. They changed the “wake word” (the word the user says aloud to engage the device) of the Amazon Echo to “Hey, Amazon!” instead of “Hey, Alexa!,” but kept the “wake word” the same for the Google Home (“Hey, Google!”) and the Jibo robot (“Hey, Jibo!”). They also provided participants with information about each manufacturer. When branding was taken into account, users viewed Google as more trustworthy than Amazon, despite the fact that the devices were very similar in design and functionality.

“It also drastically changed how much people thought the Amazon device was competent or like a companion,” Ostrowski says. “I was not expecting it to have that big of a difference between the first and second study. We didn’t change any of the abilities, how they function, or how they respond. Just the fact that they were aware the device is made by Amazon made a huge difference in their perceptions.”

Changing the “wake word” of a device can have ethical implications. A personified name, which can make a device seem more social, could mislead users by masking the connection between the device and the company that made it, which is also the company that now has access to the user’s data, she says.

In the third experiment, the team wanted to see how interpersonal movement affected the interactions. For instance, the Jibo robot turns its gaze to the individual who is speaking. For this study, the researchers used the Jibo along with an Amazon Echo Show (a rectangular screen) with the modified wake word “Hey, Computer,” and an Amazon Echo Spot (a sphere with a circular screen) that had a rotating flag on top which sped up when someone called its wake word, “Hey, Alexa!”

Users found the modified Amazon Echo Spot to be no more engaging than the Amazon Echo Show, suggesting that repetitive movement without social embodiment may not be an effective way to increase user engagement, Ostrowski says.

Fostering deeper relationships

Deeper analysis of the third study also revealed that users interacted more among themselves, like glancing at each other, laughing together, or having side conversations, when the device they were engaging with had more social abilities.

“In the home, we have been wondering how these systems promote engagement between users. That is always a big concern for people: How are these devices going to shape people’s relationships? We want to design systems that can promote a more flourishing relationship between people,” Ostrowski says.

The researchers used their insights to lay out several voice-user interface design considerations, including the importance of developing warm, outgoing, and thoughtful personalities; understanding how the wake word influences user acceptance; and conveying nonverbal social cues through movement.

With these results in hand, the researchers want to continue exploring how families engage with voice-user interfaces that have varying levels of functionality. For instance, they might conduct a study with three different social robots. They would also like to replicate these studies in a real-world environment and explore which design features are best suited for specific interactions.

This research was funded by the Media Lab Consortia.

Read More

Scaling Vision with Sparse Mixture of Experts

Advances in deep learning over the last few decades have been driven by a few key elements. With a small number of simple but flexible mechanisms (i.e., inductive biases such as convolutions or sequence attention), increasingly large datasets, and more specialized hardware, neural networks can now achieve impressive results on a wide range of tasks, such as image classification, machine translation, and protein folding prediction.

However, the use of large models and datasets comes at the expense of significant computational requirements. Yet, recent works suggest that large model sizes might be necessary for strong generalization and robustness, so training large models while limiting resource requirements is becoming increasingly important. One promising approach involves the use of conditional computation: rather than activating the whole network for every single input, different parts of the model are activated for different inputs. This paradigm has been featured in the Pathways vision and recent works on large language models, while it has not been well explored in the context of computer vision.

In “Scaling Vision with Sparse Mixture of Experts”, we present V-MoE, a new vision architecture based on a sparse mixture of experts, which we then use to train the largest vision model to date. We transfer V-MoE to ImageNet and demonstrate matching state-of-the-art accuracy while using about 50% fewer resources than models of comparable performance. We have also open-sourced the code to train sparse models and provided several pre-trained models.

Vision Mixture of Experts (V-MoEs)
Vision Transformers (ViT) have emerged as one of the best architectures for vision tasks. ViT first partitions an image into equally-sized square patches. These are called tokens, a term inherited from language models. Still, compared to the largest language models, ViT models are several orders of magnitude smaller in terms of number of parameters and compute.

To massively scale vision models, we replace some dense feedforward layers (FFN) in the ViT architecture with a sparse mixture of independent FFNs (which we call experts). A learnable router layer selects which experts are chosen (and how they are weighted) for every individual token. That is, different tokens from the same image may be routed to different experts. Each token is only routed to at most K (typically 1 or 2) experts, among a total of E experts (in our experiments, E is typically 32). This allows scaling the model’s size while keeping its computation per token roughly constant. The figure below shows the structure of the encoder blocks in more detail.

V-MoE Transformer Encoder block.

Experimental Results
We first pre-train the model once on JFT-300M, a large dataset of images. The left plot below shows our pre-training results for models of all sizes: from the small S/32 to the huge H/14.

We then transfer the model to new downstream tasks (such as ImageNet), by using a new head (the last layer in a model). We explore two transfer setups: either fine-tuning the entire model on all available examples of the new task, or freezing the pre-trained network and tuning only the new head using a few examples (known as few-shot transfer). The right plot in the figure below summarizes our transfer results to ImageNet, training on only 5 images per class (called 5-shot transfer).

JFT-300M Precision@1 and ImageNet 5-shot accuracy. Colors represent different ViT variants and markers represent either standard ViT (●), or V-MoEs (▸) with expert layers on the last n even blocks. We set n=2 for all models, except V-MoE-H where n=5. Higher indicates better performance, with more efficient models being to the left.

In both cases, the sparse model strongly outperforms its dense counterpart at a given amount of training compute (shown by the V-MoE line being above the ViT line), or achieves similar performance much faster (shown by the V-MoE line being to the left of the ViT line).

To explore the limits of vision models, we trained a 15-billion parameter model with 24 MoE layers (out of 48 blocks) on an extended version of JFT-300M. This massive model — the largest to date in vision as far as we know — achieved 90.35% test accuracy on ImageNet after fine-tuning, near the current state-of-the-art.

Priority Routing
In practice, due to hardware constraints, it is not efficient to use buffers with a dynamic size, so models typically use a pre-defined buffer capacity for each expert. Assigned tokens beyond this capacity are dropped and not processed once the expert becomes “full”. As a consequence, higher capacities yield higher accuracy, but they are also more computationally expensive.

We leverage this implementation constraint to make V-MoEs faster at inference time. By decreasing the total combined buffer capacity below the number of tokens to be processed, the network is forced to skip processing some tokens in the expert layers. Instead of choosing the tokens to skip in some arbitrary fashion (as previous works did), the model learns to sort tokens according to an importance score. This maintains high quality predictions while saving a lot of compute. We refer to this approach as Batch Priority Routing (BPR), illustrated below.

Under high capacity, both vanilla and priority routing work well as all patches are processed. However, when the buffer size is reduced to save compute, vanilla routing selects arbitrary patches to process, often leading to poor predictions. BPR smartly prioritizes important patches resulting in better predictions at lower computational costs.

Dropping the right tokens turns out to be essential to deliver high-quality and more efficient inference predictions. When the expert capacity decreases, performance quickly decreases with the vanilla routing mechanism. Conversely, BPR is much more robust to low capacities.

Performance versus inference capacity buffer size (or ratio) C for a V-MoE-H/14 model with K=2. Even for large C’s, BPR improves performance; at low C the difference is quite significant. BPR is competitive with dense models (ViT-H/14) by processing only 15-30% of the tokens.

Overall, we observed that V-MoEs are highly flexible at inference time: for instance, one can decrease the number of selected experts per token to save time and compute, without any further training on the model weights.

Exploring V-MoEs
Because much is yet to be discovered about the internal workings of sparse networks, we also explored the routing patterns of the V-MoE.

One hypothesis is that routers would learn to discriminate and assign tokens to experts based on some semantic grounds (the “car” expert, the “animal” experts, and so on). To test this, below we show plots for two different MoE layers (a very early-on one, and another closer to the head). The x-axis corresponds to each of the 32 experts, and the y-axis shows the ID of the image classes (from 1 to 1000). Each entry in the plot shows how often an expert was selected for tokens corresponding to the specific image class, with darker colors indicating higher frequency. While in the early layers there is little correlation, later in the network, each expert receives and processes tokens from only a handful of classes. Therefore, we can conclude that some semantic clustering of the patches emerges in the deeper layers of the network.

Higher routing decisions correlate with image classes. We show two MoE layers of a V-MoE-H/14. The x-axis corresponds to the 32 experts in a layer. The y-axis are the 1000 ImageNet classes; orderings for both axes are different across plots (to highlight correlations). For each pair (expert e, class c) we show the average routing weight for the tokens corresponding to all images with class c for that particular expert e.

Final Thoughts
We train very large vision models using conditional computation, delivering significant improvements in representation and transfer learning for relatively little training cost. Alongside V-MoE, we introduced BPR, which requires the model to process only the most useful tokens in the expert layers.

We believe this is just the beginning of conditional computation at scale for computer vision; extensions include multi-modal and multi-task models, scaling up the expert count, and improving transfer of the representations produced by sparse models. Heterogeneous expert architectures and conditional variable-length routes are also promising directions. Sparse models can especially help in data rich domains such as large-scale video modeling. We hope our open-source code and models help attract and engage researchers new to this field.

Acknowledgments
We thank our co-authors: Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby. We thank Alex Kolesnikov, Lucas Beyer, and Xiaohua Zhai for providing continuous help and details about scaling ViT models. We are also grateful to Josip Djolonga, Ilya Tolstikhin, Liam Fedus, and Barret Zoph for feedback on the paper; James Bradbury, Roy Frostig, Blake Hechtman, Dmitry Lepikhin, Anselm Levskaya, and Parker Schuh for invaluable support helping us run our JAX models efficiently on TPUs; and many others from the Brain team for their support. Finally, we would also like to thank and acknowledge Tom Small for the awesome animated figure used in this post.

Read More

Advancing genomics to better understand and treat disease

Genome sequencing can help us better understand, diagnose and treat disease. For example, healthcare providers are increasingly using genome sequencing to diagnose rare genetic diseases, such as elevated risk for breast cancer or pulmonary arterial hypertension, which are estimated to affect roughly 8% of the population.

At Google Health, we’re applying our technology and expertise to the field of genomics. Here are recent research and industry developments we’ve made to help quickly identify genetic disease and foster the equity of genomic tests across ancestries. This includes an exciting new partnership with Pacific Biosciences to further advance genomic technologies in research and the clinic.

Helping identify life-threatening disease when minutes matter

Genetic diseases can cause critical illness, and in many cases, a timely identification of the underlying issue can allow for life-saving intervention. This is especially true in the case of newborns. Genetic or congenital conditions affect nearly 6% of births, but clinical sequencing tests to identify these conditions typically take days or weeks to complete.

We recently worked with the University of California Santa Cruz Genomics Institute to build a method – called PEPPER-Margin-DeepVariant – that can analyze data for Oxford Nanopore sequencers, one of the fastest commercial sequencing technologies used today. This week, the New England Journal of Medicine published a study led by the Stanford University School of Medicine detailing the use of this method to identify suspected disease-causing variants in five critical newborn intensive care unit (NICU) cases.

In the fastest cases, a likely disease-causing variant was identified less than 8 hours after sequencing began, compared to the prior fastest time of 13.5 hours. In five cases, the method influenced patient care. For example, the team quickly turned around a diagnosis of Poirier–Bienvenu neurodevelopmental disorder for one infant, allowing for timely, disease-specific treatment.

Time required to sequence and analyze individuals in the pilot study. Disease-causing variants were identified in patient IDs 1, 2, 8, 9, and 11.

Applying machine learning to maximize the potential in sequencing data

Looking forward, new sequencing instruments can lead to dramatic breakthroughs in the field. We believe machine learning (ML) can further unlock the potential of these instruments. Our new research partnership with Pacific Biosciences (PacBio), a developer of genomic sequence platforms, is a great example of how Google’s machine learning and algorithm development tools can help researchers unlock more information from sequencing data.

PacBio’s long-read HiFi sequencing provides the most comprehensive view of genomes, transcriptomes and epigenomes. Using PacBio’s technology in combination with DeepVariant, our award-winning variant detection method, researchers have been able to accurately identify diseases that are otherwise difficult to diagnose with alternative methods.

Additionally, we developed a new open source method called DeepConsensus that, in combination with PacBio’s sequencing platforms, creates more accurate reads of sequencing data. This boost in accuracy will help researchers apply PacBio’s technology to more challenges, such as the final completion of the Human Genome and assembling the genomes of all vertebrate species.

Supporting more equitable genomics resources and methods

Like other areas of health and medicine, the genomics field grapples with health equity issues that, if not addressed, could exclude certain populations. For example, the overwhelming majority of participants in genomic studies have historically been of European ancestry. As a result, the genomics resources that scientists and clinicians use to identify and filter genetic variants and to interpret the significance of these variants are not equally powerful across individuals of all ancestries.

In the past year, we’ve supported two initiatives aimed at improving methods and genomics resources for under-represented populations. We collaborated with 23andMe to develop an improved resource for individuals of African ancestry, and we worked with the UCSC Genomics Institute to develop pangenome methods with this work recently published in Science.

In addition, we recently published two open-source methods that improve genetic discovery by more accurately identifying disease labels and improving the use of health measurements in genetic association studies.

We hope that our work developing and sharing these methods with those in the field of genomics will improve overall health and the understanding of biology for everyone. Working together with our collaborators, we can apply this work to real-world applications.

Read More

How Retailers Meet Tough Challenges Using NVIDIA AI 

At the National Retail Federation’s annual trade show, conversations tend to touch on recurring themes: “Will we be able to stock must-have products for next Christmas?,” “What incentives can I offer to loyal workers?” and “What happens to my margins if Susie Consumer purchases three of the same dresses online and returns two?”

The $26 trillion global retail industry is undergoing accelerated change, brought on by the pandemic and rapidly changing consumer habits. Now, it’s looking for accelerated problem solving using NVIDIA AI to address increasingly acute labor, logistics and supply chain challenges that are accompanying those changes.

Working with an ecosystem of more than 100 startups, equipment providers and software partners, NVIDIA offers an AI Enterprise platform for retailers and quick-service restaurants that helps speed the creation of intelligent stores, AI-driven forecasting, interactive chatbots, voice-enabled order taking and hyperpersonalized recommendations, and logistics and store optimization using digital twin technologies for simulation.

A Labor Crisis

Labor shortages have become a critical issue. In September and October, accommodation and food services businesses lost 1.6 million, or 6.2 percent, of their workforce, while 1.4 million people quit their retail jobs, according to the U.S. Bureau of Labor Statistics.

One way to address the problem is by creating autonomous shopping experiences. AiFi, AWM and Trigo’s autonomous shopping platforms, each shown at NRF, provide a seamless store checkout process. Customers can walk into a store, grab the items they want and pay with their mobile phone on their way out. Beyond addressing labor shortages, these autonomous stores provide live inventory management and prevent shrink.

Store associates are the face of retail organizations, so it makes sense to reduce the time they spend on tasks that aren’t customer facing, such as performing inventory counts or scanning for out-of-stock items. Spacee is using computer vision and AI to help retailers handle these basic, repetitive tasks.

NVIDIA partners Everseen and Graymatics provide asset protection applications at the point of sale to reduce shrinkage and provide customers a faster self-checkout experience. Deep North’s store analytics application is used for queue management, to optimize labor scheduling and store merchandising, resulting in increased sales.

All these startups are using the NVIDIA AI platform to deliver real-time recommendations in stores and distribution centers.

NVIDIA Tokkio conversational AI avatars and the NVIDIA Riva conversational AI framework, as well as recommendation engines based on the NVIDIA Merlin application framework, also help improve the customer experience and solve labor shortages by allowing for automated order taking and upsell based on customer shopping history.

Vistry.AI is delivering drive-thru automated order taking with a speech and recommendation engine, as well as computer vision applications for queue prediction, to predict when orders are ready, ensure food freshness and accelerate curbside pickup.

A Broken Supply Chain

The supply chain is the lifeblood of the retail industry; it was on life support for many in 2021 as attempts to recover from pandemic-related shutdowns around the world were stymied by trucker and dock worker shortages, inclement weather and persistent shortfalls in key food and electronics components.

According to a November report from Adobe Digital Insights, online shoppers in October were met with more than 2 billion out-of-stock messages — double the rate reported in October 2020.

With consumers more likely than not to go — and possibly stay with — a competitor, retailers are investing heavily in predictive analytics to gain real-time insights for forecasting and ordering from point of embarkation through to individual store shelves and distribution centers.

Dematic, a global materials handling company, and startups Kinetic Vision and Osaro are other key companies that use the NVIDIA AI platform to develop edge AI applications that add intelligence to automated warehouse systems. From computer vision AI applications to autonomous forklifts to pick-and-place robots, these AI applications improve distribution center throughput and reduce equipment downtime. And with NVIDIA Fleet Command, these solutions can be remotely deployed and managed securely and at scale in hundreds of distribution centers.

Improving Logistics

To help the $9 trillion logistics industry efficiently route goods from distribution centers to stores and from stores to homes, NVIDIA in November announced its NVIDIA ReOpt AI software.

NVIDIA ReOpt is an accelerated solver for machine learning that optimizes vehicle route planning and logistics in real time. Working with the NVIDIA ReOpt team, Domino’s Pizza implemented a real-time predictive system that helps it meet important delivery standards for customers eager for dinner.

Retail Goes AI 

The NVIDIA AI Enterprise platform is helping retailers weather challenges expected to continue well beyond 2022. With consumers increasingly demanding what the industry calls an omnichannel experience, one that lets them order online and pick up at curbside or have items delivered speedily to their homes, balancing supply with demand has increased the need for fast, actionable insights.

As consumers move from seeking goods and services to experiences, the depth and breadth of interaction between customers and retailers is requiring AI to complement human interaction. It’s a shift that has moved from wishlist to deployment.

The post How Retailers Meet Tough Challenges Using NVIDIA AI  appeared first on The Official NVIDIA Blog.

Read More

AI Startup to Take a Bite Out of Fast-Food Labor Crunch

Addressing a growing labor crisis among quick-service restaurants, startup Vistry is harnessing AI to automate the process of taking orders.

The company will share its story at the NRF Big Show, the annual industry gathering of the National Retail Federation in New York, starting Jan. 16.

“They’re closing restaurants because there is not enough labor,” said Atif Kureishy, CEO of Vistry, which is a member of the NVIDIA Inception startup accelerator program.

At the same time, customers are placing orders in more ways than ever: for pickup, in drive-throughs and via delivery services, as well as in dining rooms.

“There are new store formats, new configurations, new digital capabilities,” Kureishy said.

To help restaurants keep up, Kureishy, a veteran of both NASA and Booz Allen Hamilton, assembled a team that includes veterans of the semiconductor industry and Ivy League neuroscience programs.

While restaurant labor shortages are grabbing headlines, Vistry is tackling an opportunity driven by a labor shortage demographers have been predicting for decades.

As a result, the quick-service dining industry, which does $300 billion in sales each year in the United States alone, is just one of the industries that will need to find ways to get more done with fewer people over the long term.

To address this, Vistry is working to build an AI-enabled automated order-taking solution. It’s harnessing the latest natural language processing for menu understanding and speech and recommendation systems to deliver faster, more accurate order-taking and more relevant, personalized offers.

The system relies on NVIDIA Riva, a collection of technologies for building speech AI applications. It includes natural language understanding and speech recognition and synthesis capabilities. It also uses computer vision technology optimized with the NVIDIA Metropolis application framework.

Vistry’s platform, powered by the NVIDIA Jetson edge AI platform and NVIDIA A2 Tensor Core GPUs, goes beyond just an automated order-taking kiosk.

Vistry’s computer vision applications also help restaurants automate curbside check-ins. It can speed up drive-throughs and better predict how long it will take for customer orders to be ready. And it will track and trace orders for customers relying on food delivery services, Kureishy explains.

“Buyer behaviors are changing — the guest experience is not solely in the dining room anymore,” Kureishy said.

“Pandemic uncertainty continues to impact dining, along with consumer expectations of a seamless, operationally excellent experience on every platform and touchpoint,” said Susan Beardslee, principal analyst with ABI Research. “Behind the scenes, providers must enable integrated, near real-time digital solutions to address everything from supplies to staffing to delivery optimization.”

Vistry promises its solutions will be easy to deploy, fully integrated with existing restaurant systems, secure and private. They’ll also provide sophisticated real-time dashboards, so restaurant operators can better understand a growing number of sales channels — from drive-through lines to dining rooms to delivery services.

“All the expectations have changed, all of us want food faster, and we want to make sure the quality is preserved,” Kureishy said.

Vistry is helping quick-service restaurants reduce drive-through lines, predict when customers’ orders are ready, ensure food freshness, deliver curbside orders faster and optimize restaurant performance using AI and its analytics dashboard.

Who’s hungry?

Learn more about NVIDIA’s AI solutions for quick-service restaurants

The post AI Startup to Take a Bite Out of Fast-Food Labor Crunch appeared first on The Official NVIDIA Blog.

Read More

GFN Thursday: ‘Fortnite’ Comes to iOS Safari and Android Through NVIDIA GeForce NOW via Closed Beta

Starting next week, Fortnite on GeForce NOW will launch in a limited-time closed beta for mobile, all streamed through the Safari web browser on iOS and the GeForce NOW Android app.

The beta is open for registration for all GeForce NOW members, and will help test our server capacity, graphics delivery and new touch controls performance. Members will be admitted to the beta in batches over the coming weeks.

‘Fortnite’ Streaming Gameplay Comes to Mobile Through iOS Safari and Android With Touch Inputs

Alongside the amazing team at Epic Games, we’ve been working to enable a touch-friendly version of Fortnite for mobile delivered through the cloud. While PC games in the GeForce NOW library are best experienced on mobile with a gamepad, the introduction of touch controls built by the GeForce NOW team offers more options for players, starting with Fortnite.

Beginning today, GeForce NOW members can sign up for a chance to join the Fortnite limited-time closed beta for mobile devices. Not an existing member? No worries. Register for a GeForce NOW membership and sign up to become eligible for the closed beta once the experience starts rolling out next week. Upgrade to a Priority or RTX 3080 membership to receive priority access to gaming servers. A paid GeForce NOW membership is not required to participate.

Fortnite Chapter 3 on GeForce NOW
You could say the world is a little upside down in Fortnite Chapter 3.

For tips on gameplay mechanics or a refresher on playing Fortnite with touch controls, check out Fortnite’s Getting Started page.

More Touch Games

And we’re just getting started. Cloud-to-mobile gaming is a great opportunity for publishers to get their games into more gamers’ hands with touch-friendly versions of their games. PC games or game engines, like Unreal Engine 4, which support Windows touch events can easily enable mobile touch support on GeForce NOW.

We’re working with additional publishers to add more touch-enabled games to GeForce NOW. And look forward to more publishers streaming full PC versions of their games to mobile devices with built-in touch support — reaching millions through the Android app and iOS Safari devices.

GFN Thursday Releases

The Anacrusis on GeForce NOW
Take on a four-player, first-person shooter set aboard a starship stranded at the edge of explored space in The Anacrusis.

GFN Thursday always means more games. Members can find these and more streaming on the cloud this week:

We make every effort to launch games on GeForce NOW as close to their release as possible, but, in some instances, games may not be available immediately.

What are you planning to play this weekend? Let us know on Twitter or in the comments below.

The post GFN Thursday: ‘Fortnite’ Comes to iOS Safari and Android Through NVIDIA GeForce NOW via Closed Beta appeared first on The Official NVIDIA Blog.

Read More