Posted by Jason Wei and Yi Tay, Research Scientists, Google Research, Brain Team
The field of natural language processing (NLP) has been revolutionized by language models trained on large amounts of text data. Scaling up the size of language models often leads to improved performance and sample efficiency on a range of downstream NLP tasks. In many cases, the performance of a large language model can be predicted by extrapolating the performance trend of smaller models. For instance, the effect of scale on language model perplexity has been empirically shown to span more than seven orders of magnitude.
On the other hand, performance for certain other tasks does not improve in a predictable fashion. For example, the GPT-3 paper showed that the ability of language models to perform multi-digit addition has a flat scaling curve (approximately random performance) for models from 100M to 13B parameters, at which point the performance jumped substantially. Given the growing use of language models in NLP research and applications, it is important to better understand abilities such as these that can arise unexpectedly.
In “Emergent Abilities of Large Language Models,” recently published in the Transactions on Machine Learning Research (TMLR), we discuss the phenomena of emergent abilities, which we define as abilities that are not present in small models but are present in larger models. More specifically, we study emergence by analyzing the performance of language models as a function of language model scale, as measured by total floating point operations (FLOPs), or how much compute was used to train the language model. However, we also explore emergence as a function of other variables, such as dataset size or number of model parameters (see the paper for full details). Overall, we present dozens of examples of emergent abilities that result from scaling up language models. The existence of such emergent abilities raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.
Emergent Prompted Tasks
First we discuss emergent abilities that may arise in prompted tasks. In such tasks, a pre-trained language model is given a prompt for a task framed as next word prediction, and it performs the task by completing the response. Without any further fine-tuning, language models can often perform tasks that were not seen during training.
Example of few-shot prompting on movie review sentiment classification. The model is given one example of a task (classifying a movie review as positive or negative) and then performs the task on an unseen example.
We call a prompted task emergent when it unpredictably surges from random performance to above-random at a specific scale threshold. Below we show three examples of prompted tasks with emergent performance: multi-step arithmetic, taking college-level exams, and identifying the intended meaning of a word. In each case, language models perform poorly with very little dependence on model size up to a threshold at which point their performance suddenly begins to excel.
The ability to perform multi-step arithmetic (left), succeed on college-level exams (middle), and identify the intended meaning of a word in context (right) all emerge only for models of sufficiently large scale. The models shown include LaMDA, GPT-3, Gopher, Chinchilla, and PaLM.
Performance on these tasks only becomes non-random for models of sufficient scale — for instance, above 1022 training FLOPs for the arithmetic and multi-task NLU tasks, and above 1024 training FLOPs for the word in context tasks. Note that although the scale at which emergence occurs can be different for different tasks and models, no model showed smooth improvement in behavior on any of these tasks. Dozens of other emergent prompted tasks are listed in our paper.
Emergent Prompting Strategies
The second class of emergent abilities encompasses prompting strategies that augment the capabilities of language models. Prompting strategies are broad paradigms for prompting that can be applied to a range of different tasks. They are considered emergent when they fail for small models and can only be used by a sufficiently-large model.
One example of an emergent prompting strategy is called “chain-of-thought prompting”, for which the model is prompted to generate a series of intermediate steps before giving the final answer. Chain-of-thought prompting enables language models to perform tasks requiring complex reasoning, such as a multi-step math word problem. Notably, models acquire the ability to do chain-of-thought reasoning without being explicitly trained to do so. An example of chain-of-thought prompting is shown in the figure below.
Chain of thought prompting enables sufficiently large models to solve multi-step reasoning problems.
The empirical results of chain-of-thought prompting are shown below. For smaller models, applying chain-of-thought prompting does not outperform standard prompting, for example, when applied to GSM8K, a challenging benchmark of math word problems. However, for large models (1024 FLOPs), chain-of-thought prompting substantially improves performance in our tests, reaching a 57% solve rate on GSM8K.
Chain-of-thought prompting is an emergent ability — it fails to improve performance for small language models, but substantially improves performance for large models. Here we illustrate the difference between standard and chain-of-thought prompting at different scales for two language models, LaMDA and PaLM.
Implications of Emergent Abilities
The existence of emergent abilities has a range of implications. For example, because emergent few-shot prompted abilities and strategies are not explicitly encoded in pre-training, researchers may not know the full scope of few-shot prompted abilities of current language models. Moreover, the emergence of new abilities as a function of model scale raises the question of whether further scaling will potentially endow even larger models with new emergent abilities.
Identifying emergent abilities in large language models is a first step in understanding such phenomena and their potential impact on future model capabilities. Why does scaling unlock emergent abilities? Because computational resources are expensive, can emergent abilities be unlocked via other methods without increased scaling (e.g., better model architectures or training techniques)? Will new real-world applications of language models become unlocked when certain abilities emerge? Analyzing and understanding the behaviors of language models, including emergent behaviors that arise from scaling, is an important research question as the field of NLP continues to grow.
Acknowledgements
It was an honor and privilege to work with Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus.
When legendary computer scientist Jim Gray accepted the Turing Award in 1999, he laid out a dozen long-range information technology research goals. One of those goals called for the creation of trouble-free server systems or, in Gray’s words, to “build a system used by millions of people each day and yet administered and managed by a single part-time person.”
Gray envisioned a self-organizing “server in the sky” that would store massive amounts of data, and refresh or download data as needed. Today, with the emergence and rapid advancement of artificial intelligence (AI), machine learning (ML) and cloud computing, and Microsoft’s development of Cloud Intelligence/AIOps, we are closer than we have ever been to realizing that vision—and moving beyond it.
Over the past fifteen years, the most significant paradigm shift in the computing industry has been the migration to cloud computing, which has created unprecedented digital transformation opportunities and benefits for business, society, and human life.
The implication is profound: cloud computing platforms have become part of the world’s basic infrastructure. As a result, the non-functional properties of cloud computing platforms, including availability, reliability, performance, efficiency, security, and sustainability, have become immensely important. Yet the distributed nature, massive scale, and high complexity of cloud computing platforms—ranging from storage to networking, computing and beyond—present huge challenges to building and operating such systems.
Cloud Intelligence/AIOps (“AIOps” for brevity) aims to innovate AI/ML technologies to help design, build, and operate complex cloud platforms and services at scale—effectively and efficiently.
AIOps has three pillars, each with its own goal:
AI for Systems to make intelligence a built-in capability to achieve high quality, high efficiency, self-control, and self-adaptation with less human intervention.
AI for Customers to leverage AI/ML to create unparalleled user experiences and achieve exceptional user satisfaction using cloud services.
AI for DevOps to infuse AI/ML into the entire software development lifecycle to achieve high productivity.
Where did the research on AIOps begin?
Gartner, a leading industry analyst firm, first coined the term AIOps (Artificial Intelligence for IT Operations) in 2017. According to Gartner, AIOps is the application of machine learning and data science to IT operation problems. While Gartner’s AIOps concept focuses only on DevOps, Microsoft’s Cloud Intelligence/AIOps research has a much broader scope, including AI for Systems and AI for Customers.
The broader scope of Microsoft’s Cloud Intelligence/AIOps stems from the Software Analytics research we proposed in 2009, which seeks to enable software practitioners to explore and analyze data to obtain insightful and actionable information for data-driven tasks related to software and services. We started to focus our Software Analytics research on cloud computing in 2014 and named this new topic Cloud Intelligence (Figure 1). In retrospect, Software Analytics is about the digital transformation of the software industry itself, such as empowering practitioners to use data-driven approaches and technologies to develop software, operate software systems, and engage with customers.
What is the AIOps problem space?
There are many scenarios around each of the three pillars of AIOps. Some example scenarios include predictive capacity forecasting for efficient and sustainable services, monitoring service health status, and detecting health issues in a timely manner in AI for Systems; ensuring code quality and preventing defective build deployed into production in AI for DevOps; and providing effective customer support in AI for Customers. Across all these scenarios, there are four major problem categories that, taken together, constitute the AIOps problem space: detection, diagnosis, prediction, and optimization (Figure 2). Specifically, detection aims to identify unexpected system behaviors (or anomalies) in a timely manner. Given the symptom and associated artifacts, the goal of diagnosis is to localize the cause of service issues and find the root cause. Prediction attempts to forecast system behaviors, customer workload patterns, or DevOps activities, and so on. Lastly, optimization tries to identify the optimal strategies or decisions required to achieve certain performance targets related to system quality, customer experience and DevOps productivity.
Each problem has its own challenges. Take detection as an example. To ensure service health at runtime, it is important for engineers to continuously monitor various metrics and detect anomalies in a timely manner. In the development process, to ensure the quality of the continuous integration/continuous delivery (CI/CD) practice, engineers need to create mechanisms to catch defective builds and prevent them from being deployed to other production sites.
Both scenarios require timely detection, and in both there are common challenges for conducting effective detection. For example, time series data and log data are the most common data forms. Yet they are often multi-dimensional, there may be noise in the data, and they often have different detection requirements—all of which can pose significant challenges to reliable detection.
Microsoft Research: Our AIOps vision
Microsoft is conducting continuous research in each of the AIOps problem categories. Our goal for this research is to empower cloud systems to be more autonomous, more proactive, more manageable, and more comprehensive across the entire cloud stack.
AIOps strives to make cloud systems more autonomous, to minimize human operations and rule-based decisions, which significantly helps reduce user impact caused by system issues, make better operation decisions, and reduce maintenance cost. This is achieved by automating DevOps as much as possible, including build, deployment, monitoring, and diagnosis. For example, the purpose of safe deployment is to catch a defective build early to prevent it from rolling out to production and resulting in significant customer impact. It can be extremely labor intensive and time consuming for engineers, because anomalous behaviors have a variety of patterns that may change over time, and not all anomalous behaviors are caused by a new build, which may introduce false positives.
At Microsoft Research, we used transfer learning and active learning techniques to develop a safe deployment solution that overcomes these challenges. We’ve been running the solution in Microsoft Azure, and it has been highly effective at helping to catch defective builds – achieving more than 90% precision and near 100% recall in production over a period of 18 months.
Root cause analysis is another way that AIOps is reducing human operations in cloud systems. To shorten the mitigation time, engineers in cloud systems must quickly identify the root causes of emerging incidents. Owing to the complex structure of cloud systems, however, incidents often contain only partial information and can be triggered by many services and components simultaneously, which forces engineers to spend extra time diagnosing the root causes before any effective actions can be taken. By leveraging advanced contrast-mining algorithms, we have implemented autonomous incident-diagnosis systems, including HALO and Outage Scope, to reduce response time and increase accuracy in incident diagnosis tasks. These systems have been integrated in both Azure and Microsoft 365 (M365), which has considerably improved engineers’ ability to handle incidents in cloud systems.
Making cloud systems more proactive
AIOps makes cloud systems more proactive by introducing the concept of proactive design. In the design of a proactive system, an ML-based prediction component is added to the traditional system. The prediction system takes the input signals, does the necessary processing, and outputs the future status of the system. For example, what the capacity status of cluster A looks like next week, whether a disk will fail in a few days, or how many virtual machines (VMs) of a particular type will be needed in the next hour.
Knowing the future status makes it possible for the system to proactively avoid negative system impacts. For example, engineers can live migrate the services on an unhealthy computing node to a healthy one to reduce VM downtime, or pre-provision a certain number of VMs of a particular type for the next hour to reduce the latency of VM provisioning. In addition, AI/ML techniques can enable systems to learn over time which decision to make.
As an example of proactive design, we built a system called Narya, which proactively mitigated potential hardware failures to reduce service interruption and minimize customer impact. Narya, which is in production in Microsoft Azure, performs prediction on hardware failures and uses a bandit algorithm to decide which mitigation action to take.
Making cloud systems more manageable
AIOps makes cloud systems more manageable by introducing the notion of tiered autonomy. Each tier represents a set of operations that require a certain level of human expertise and intervention. These tiers range from the top tier of autonomous routine operations to the bottom tier, which requires deep human expertise to respond to rare and complex problems.
AI-driven automation often cannot handle such problems. By building AIOps solutions targeted at each tier, we can make cloud platforms easier to manage across the long tail of rare problems that inevitably arise in complex systems. Furthermore, the tiered design ensures that autonomous systems are developed from the start to evaluate certainty and risk, and that they have safe fallbacks when automation fails or the platform faces a previously unseen set of circumstances, such as the unforeseen increase in demand in 2020 due to the COVID-19 pandemic.
As an example of tiered autonomy, we built Safe On-Node Learning (SOL), a framework for safe learning and actuation on server nodes for the top tier. As another example, we are exploring how to predict the commands that operators should perform to mitigate incidents, while considering the associated certainty and risks of those commands when the top-tier automation fails to prevent the incidents.
Making AIOps more comprehensive across the cloud stack
AIOps can also be made more comprehensive by spanning the cloud stack—from the lowest infrastructure layers (such as network and storage) through the service layer (such as the scheduler and database) and on to the application layer. The benefit of applying AIOps more broadly would be a significant increase in the capability for holistic diagnosis, optimization, and management.
Microsoft services built on top of Azure are called first-party (1P) services. A 1P setting, which is often used to optimize system resources, is particularly suited to a more comprehensive approach to AIOps. This is because with the 1P setting a single entity has visibility into, and control over, the layers of the cloud stack, which enables engineers to amplify the AIOps impact. Examples of 1P services at Microsoft include large and established services such as Office 365, relatively new but sizeable services such as Teams, and up and coming services such as Windows 365 Cloud PC. These 1P services typically account for a significant share of resource usage, such as wide-area network (WAN) traffic and compute cores.
As an example of applying a more comprehensive AIOps approach to the 1P setting, the OneCOGS project, which is a joint effort of Azure, M365, and MSR, considers three broad opportunities for optimization:
Modeling users and their workload using signals cutting across the layers—such as using the user’s messaging activity versus fixed working hours to predict when a Cloud PC user will be active—thereby increasing accuracy to enable enabling appropriate allocation of system resources.
Jointly optimizing the application and the infrastructure to achieve cost savings and more.
Tame the complexity of data and configuration, thereby democratizing AIOps.
The AIOps methodologies, technologies and practices used for cloud computing platforms and 1P services are also applicable to third-party (3P) services on the cloud stack. To achieve this, further research and development are needed to make AIOps methods and techniques more general and/or easily adaptable. For example, when operating cloud services, detecting anomalies in multi-dimensional space and the subsequent fault localization are common monitoring and diagnosis problems.
Motivated by the real-world needs of Azure and M365, we proposed the technique AiDice, which automatically detects anomalies in multi-dimensional space, and HALO, a hierarchy-aware approach to locating fault-indicating combinations that uses telemetry data collected from cloud systems. In addition to deploying AiDice and HALO in Azure and M365, we’re also collaborating with product team partners to make AiDice and HALO AIOps services that can be leveraged by third-party services.
Conclusion
AIOps is a rapidly emerging technology trend and an interdisciplinary research direction across system, software engineering, and AI/ML communities. With years of research on Cloud Intelligence, Microsoft Research has built up rich technology assets in detection, diagnosis, prediction, and optimization. And through close collaboration with Azure and M365, we have deployed some of our technologies in production, which has created significant improvements in the reliability, performance, and efficiency of Azure and M365 while increasing the productivity of developers working on these products. In addition, we are collaborating with colleagues in academia and industry to promote the AIOps research and practices. For example, with the joint efforts we have organized 3 editions of AIOps Workshop at premium academic conferences AAAI 2020, ICSE 2021, and MLSys2022.
Moving forward, we believe that as a new dimension of innovation, Cloud Intelligence/AIOps will play an increasingly important role in making cloud systems more autonomous, more proactive, more manageable, and more comprehensive across the entire cloud stack. Ultimately, Cloud Intelligence/AIOps will help us make our vision for the future of the cloud a reality.
Business analysts work with data and like to analyze, explore, and understand data to achieve effective business outcomes. To address business problems, they often rely on machine learning (ML) practitioners such as data scientists to assist with techniques such as utilizing ML to build models using existing data and generate predictions. However, it isn’t always possible, as data scientists are typically tied up with their tasks and don’t have the bandwidth to help the analysts.
To be independent and achieve your goals as a business analyst, it would be ideal to work with easy-to-use, intuitive, and visual tools that use ML without the need to know the details and use code. Using these tools will help you solve your business problems and achieve the desired outcomes.
With a goal to help you and your organization become more effective, and use ML without writing code, we introduced Amazon SageMaker Canvas. This is a no-code ML solution that helps you build accurate ML models without the need to learn about technical details, such as ML algorithms and evaluation metrics. SageMaker Canvas offers a visual, intuitive interface that lets you import data, train ML models, perform model analysis, and generate ML predictions, all without writing a single line of code.
When using SageMaker Canvas to experiment, you may encounter data quality issues such as missing values or having the wrong problem type. These issues may not be discovered until quite late in the process after training a ML model. To alleviate this challenge, SageMaker Canvas now supports data validation. This feature proactively checks for issues in your data and provides guidance on resolutions.
In this post, we’ll demonstrate how you can use the data validation capability within SageMaker Canvas prior to model building. As the name suggests, this feature validates your dataset, reports issues, and provides useful pointers to fix them. By using better quality data, you will end up with a better performing ML model.
Validate data in SageMaker Canvas
Data Validation is a new feature in SageMaker Canvas to proactively check for potential data quality issues. After you import the data and select a target column, you’re given a choice to validate your data as shown here:
If you choose to validate your data, Canvas analyzes your data for numerous conditions including:
Too many unique labels in your target column – for the category prediction model type
Too many unique labels in your target column for the number of rows in your data – for the category prediction model type
Wrong model type for your data – the model type doesn’t fit the data you’re predicting in the Target column
Too many invalid rows – missing values in your target column
All feature columns are text columns – they will be dropped for standard builds
Too few columns – too few columns in your data
No complete rows – all of the rows in your data contain missing values
One or more column names contain double underscores – SageMaker can’t handle (__) in the column header
Details for each validation criteria will be provided in the later sections of this post.
If all of the checks are passed, then you’ll get the following confirmation: “No issues have been found in your dataset”.
If any issue is found, you’ll get a notification to view and understand. This surfaces the data quality issues early, and it lets you address them immediately before wasting time and resources further in the process.
You can make your adjustments and keep validating your dataset until all of the issues are addressed.
Validate target column and model types
When you’re building an ML model in SageMaker Canvas, several data quality issues related to the target column may cause your model build to fail. SageMaker Canvas checks for different kinds of problems that may impact your target column.
For your target column, check the Wrong model type for your data. For example, if a 2-category prediction model is selected but your target column has more than 2 unique labels, then SageMaker Canvas will provide the following validation warning.
If the model type is 2 or 3+ category prediction, then you must validate too many unique labels for your target column. The maximum number of unique classes is 2000. If you select a column with more than 2000 unique values in your Target column, then Canvas will provide the following validation warning.
In addition to too many unique target labels, you should also beware of too many unique target labels for the number of rows in your data. SageMaker Canvas enforces a ratio of target label to the number of total rows to be less than 10%. This makes sure you have enough representation for each category for a high quality model and reduce the potential for overfitting. Your model is considered overfitting when it predicts well on the training data but not on new data it hasn’t seen before. Refer here to learn more.
Finally, the last check for the target column is too many invalid rows. If your target column has more than 10% of the data missing or invalid, then it will impact your model performance, and in some cases cause your model build to fail. The following example has many missing values (>90% missing) in the target column, and you get the following validation warning.
If you get any of the above warnings for your target column, then use the following steps to mitigate the issues:
Are you using the right target column?
Did you select the correct model type?
Can you increase the number of rows in your dataset per target label?
Can you consolidate/group similar labels together?
Can you fill-in the missing/invalid values?
Do you have enough data that you can drop the missing/invalid values?
If all of the above options aren’t clearing the warning, then you should consider using a different dataset.
Aside from the target column, you may run into data quality issues with other data columns (feature columns) as well. Features columns are input data used to make an ML prediction.
Every dataset should have at least 1 feature column and 1 target column (2 columns in total). Otherwise, SageMaker Canvas will give you a Too few columns in your data warning. You must satisfy this requirement before you can proceed with building a model.
After that, you must make sure that your data has at-least 1 numeric column. If not, then you’ll get the all feature columns are text columns warning. This is because text columns are usually dropped during standard builds, thereby leaving the model with no features to train. Therefore, this will cause your model building to fail. You can use SageMaker Canvas to encode some of the text columns to numbers or use quick build instead of standard build.
The third type of warning you may get for feature columns is No complete rows. This validation checks if you have at least one row with no missing values. SageMaker Canvas requires at least one complete row, otherwise your quick build will fail. Try to fill in the missing values before building the model.
The last type of validation is One or more column names contain double underscores. This is a SageMaker Canvas specific requirement. If you have double underscores (__) in your column headers, then this will cause your quick build to fail. Rename the columns to remove any double underscores, and then try again.
Clean up
To avoid incurring future session charges, log out of SageMaker Canvas.
Conclusion
SageMaker Canvas is a no-code ML solution that allows business analysts to create accurate ML models and generate predictions through a visual, point-and-click interface. We showed you how SageMaker Canvas helps you to make sure of data quality and mitigate data issues by proactively validating the dataset. By identifying the issues early, SageMaker Canvas helps you build quality ML models and reduce build iterations without expertise in data science and programming. To learn more about this new feature, refer to the SageMaker Canvas documentation.
To get started and learn more about SageMaker Canvas, refer to the following resources:
Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.
Sainath Miriyala is a Senior Technical Account Manager at AWS working for automotive customers in the US. Sainath is passionate about designing and building large-scale distributed applications using AI/ML. In his spare time Sainath spends time with family and friends.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (e.g., greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in the human corpus (e.g., 0.02% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probability of repetitive tokens and their previous repetitions…Apple Machine Learning Research
The holiday season is approaching, and GeForce NOW has everyone covered. This GFN Thursday brings an easy way to give the gift of gaming with GeForce NOW gift cards, for yourself or for a gamer in your life.
Plus, stream 10 new games from the cloud this week, including the first story downloadable content (DLC) for Dying Light 2.
No Time Like the Present
For those seeking the best present to give any gamer, look no further than a GeForce NOW membership.
With digital gift cards, NVIDIA makes it easy for anyone to give an upgrade to GeForce PC performance in the cloud at any time of the year. And just in time for the holidays, physical gift cards will be available as well. For a limited time, these new $50 physical gift cards will ship with a special GeForce NOW holiday gift box at no additional cost, perfect to put in someone’s stocking.
These new gift cards can be redeemed for the membership level of preference, whether for three months of an RTX 3080 membership or six months of a Priority membership. Both let PC gamers stream over 1,400 games from popular digital gaming stores like Steam, Epic Games Store, Ubisoft Connect, Origin and GOG.com, all from GeForce-powered PCs in the cloud.
That means high-performance streaming on nearly any device, including PCs, Macs, Android mobile devices, iOS devices, SHIELD TV and Samsung and LG TVs. GeForce NOW is the only way to play Genshin Impact on Macs, one of the 100 free-to-play games in the GeForce NOW library.
RTX 3080 members get extra gaming goodness with dedicated access to the highest-performance servers, eight-hour gaming sessions and the ability to stream up to 4K at 60 frames per second or 1440p at 120 FPS, all at ultra-low latency.
Gift cards can be redeemed with an active GFN membership. Gift one to yourself or a buddy for hours of fun cloud gaming.
Dying Light 2’s “Bloody Ties” DLC is available now, and GeForce NOW members can stream it today.
Embark on a new story adventure and gain access to “The Carnage Hall” — an old opera building full of challenges and quests — including surprising new weapon types, character interactions and more discoveries to uncover.
Priority and RTX 3080 members can explore Villedor with NVIDIA DLSS and RTX ON for cinematic, real-time ray tracing — all while keeping an eye on their meter to avoid becoming infected themselves.
Put a Bow on It
There’s always a new adventure streaming from the cloud. Here are the 10 titles joining the GeForce NOW library this week:
On May 25, 2021, we announced the PyTorch Enterprise Support Program (ESP) that enabled providers to develop and offer tailored enterprise-grade support to their customers.
The program enabled Program certified service providers to develop and offer tailored enterprise-grade support to their customers through contribution of hotfixes and other improvements requested by PyTorch enterprise users who were developing models in production at scale for mission-critical applications. However, as we evaluate community feedback, we found ongoing ESP support was not necessary at this time and will immediately divert these resources to other areas to improve the user experience for the entire community.
Today, we are removing the PyTorch long-term support (LTS 1.8.2) download link from the “Get Started” page from the “Start Locally” download option in order to simplify the user experience. One can download previous versions of PyTorch starting from the first public release until the latest one. Please note that it is only supported for Python while it is being deprecated. If there are any updates to ESP/LTS, we will update future blogs.
This paper was accepted at the workshop “Causality for Real-world Impact” at NeurIPS 2022.
The Apple Watch encourages users to stand throughout the day by delivering a notification onto the users’ wrist if they have been sitting for the first 50 minutes of an hour. This simple behavioral intervention exemplifies the classical definition of nudge as a choice architecture that alters behavior without forbidding options or significantly changing economic incentives. In order to estimate from observational data the causal effect of the notification on the user’s standing probability through-out…Apple Machine Learning Research
This paper was accepted at the workshop “Machine Learning 4 Physical Sciences” at NeurIPS 2022.
Hybrid modelling reduces the misspecification of expert physical models with a machine learning (ML) component learned from data. Similarly to many ML algorithms, hybrid model performance guarantees are limited to the training distribution. To address this limitation, here we introduce a hybrid data augmentation strategy, termed expert augmentation. Based on a probabilistic formalization of hybrid modelling, we demonstrate that expert augmentation improves generalization. We validate the practical…Apple Machine Learning Research
Posted by Peter H. Li, Research Scientist, and Sven Dorkenwald, Student Researcher, Connectomics at Google
Mapping the wiring and firing activity of the human brain is fundamental to deciphering how we think — how we sense the world, learn, decide, remember, and create — as well as what issues can arise in brain disease or dysfunction. Recent efforts have delivered publicly available brain maps (high-resolution 3D mapping of brain cells and their connectivities) at unprecedented quality and scale, such as H01, a 1.4 petabyte nanometer-scale digital reconstruction of a sample of human brain tissue from Harvard / Google, and the cubic millimeter mouse cortex dataset from our colleagues at the MICrONS consortium.
To interpret brain maps at this scale requires multiple layers of analysis, including the identification of synaptic connections, cellular subcompartments, and cell types. Machine learning and computer vision technology have played a central role in enabling these analyses, but deploying such systems is still a laborious process, requiring hours of manual ground truth labeling by expert annotators and significant computational resources. Moreover, some important tasks, such as identifying the cell type from only a small fragment of axon or dendrite, can be challenging even for human experts, and have not yet been effectively automated.
Today, in “Multi-Layered Maps of Neuropil with Segmentation-Guided Contrastive Learning”, we are announcing Segmentation-Guided Contrastive Learning of Representations (SegCLR), a method for training rich, generic representations of cellular morphology (the cell’s shape) and ultrastructure (the cell’s internal structure) without laborious manual effort. SegCLR produces compact vector representations (i.e., embeddings) that are applicable across diverse downstream tasks (e.g., local classification of cellular subcompartments, unsupervised clustering), and are even able to identify cell types from only small fragments of a cell. We trained SegCLR on both the H01 human cortex dataset and the MICrONS mouse cortex dataset, and we are releasing the resulting embedding vectors, about 8 billion in total, for researchers to explore.
From brain cells segmented out of a 3D block of tissue, SegCLR embeddings capture cellular morphology and ultrastructure and can be used to distinguish cellular subcompartments (e.g., dendritic spine versus dendrite shaft) or cell types (e.g., pyramidal versus microglia cell).
Representing Cellular Morphology and Ultrastructure
SegCLR builds on recent advances in self-supervised contrastive learning. We use a standard deep network architecture to encode inputs comprising local 3D blocks of electron microscopy data (about 4 micrometers on a side) into 64-dimensional embedding vectors. The network is trained via a contrastive loss to map semantically related inputs to similar coordinates in the embedding space. This is close to the popular SimCLR setup, except that we also require an instance segmentation of the volume (tracing out individual cells and cell fragments), which we use in two important ways.
First, the input 3D electron microscopy data are explicitly masked by the segmentation, forcing the network to focus only on the central cell within each block. Second, we leverage the segmentation to automatically define which inputs are semantically related: positive pairs for the contrastive loss are drawn from nearby locations on the same segmented cell and trained to have similar representations, while inputs drawn from different cells are trained to have dissimilar representations. Importantly, publicly available automated segmentations of the human and mouse datasets were sufficiently accurate to train SegCLR without requiring laborious review and correction by human experts.
SegCLR is trained to represent rich cellular features without manual labeling. Top: The SegCLR architecture maps local masked 3D views of electron microscopy data to embedding vectors. Only the microscopy volume and a draft automated instance segmentation are required. Bottom: The segmentation is also used to define positive versus negative example pairs, whose representations are pushed closer together (positives, blue arrows) or further apart (negatives, red arrows) during training.
Reducing Annotation Training Requirements by Three Orders of Magnitude
SegCLR embeddings can be used in diverse downstream settings, whether supervised (e.g., training classifiers) or unsupervised (e.g., clustering or content-based image retrieval). In the supervised setting, embeddings simplify the training of classifiers, and can greatly reduce ground truth labeling requirements. For example, we found that for identifying cellular subcompartments (axon, dendrite, soma, etc.) a simple linear classifier trained on top of SegCLR embeddings outperformed a fully supervised deep network trained on the same task, while using only about one thousand labeled examples instead of millions.
We assessed the classification performance for axon, dendrite, soma, and astrocyte subcompartments in the human cortex dataset via mean F1-Score, while varying the number of training examples used. Linear classifiers trained on top of SegCLR embeddings matched or exceeded the performance of a fully supervised deep classifier (horizontal line), while using a fraction of the training data.
Distinguishing Cell Types, Even from Small Fragments
Distinguishing different cell types is an important step towards understanding how brain circuits develop and function in health and disease. Human experts can learn to identify some cortical cell types based on morphological features, but manual cell typing is laborious and ambiguous cases are common. Cell typing also becomes more difficult when only small fragments of cells are available, which is common for many cells in current connectomic reconstructions.
Human experts manually labeled cell types for a small number of proofread cells in each dataset. In the mouse cortex dataset, experts labeled six neuron types (top) and four glia types (not shown). In the human cortex dataset, experts labeled two neuron types (not shown) and four glia types (bottom). (Rows not to scale with each other.)
We found that SegCLR accurately infers human and mouse cell types, even for small fragments. Prior to classification, we collected and averaged embeddings within each cell over a set aggregation distance, defined as the radius from a central point. We found that human cortical cell types can be identified with high accuracy for aggregation radii as small as 10 micrometers, even for types that experts find difficult to distinguish, such as microglia (MGC) versus oligodendrocyte precursor cells (OPC).
SegCLR can classify cell types, even from small fragments. Left: Classification performance over six human cortex cell types for shallow ResNet models trained on SegCLR embeddings for different sized cell fragments. Aggregation radius zero corresponds to very small fragments with only a single embedding. Cell type performance reaches high accuracy (0.938 mean F1-Score) for fragments with aggregation radii of only 10 micrometers (boxed point). Right: Class-wise confusion matrix at 10 micrometers aggregation radius. Darker shading along the diagonal indicates that predicted cell types agree with expert labels in most cases. AC: astrocyte; MGC: microglia cell; OGC: oligodendrocyte cell; OPC: oligodendrocyte precursor cell; E: excitatory neuron; I: inhibitory neuron.
In the mouse cortex, ten cell types could be distinguished with high accuracy at aggregation radii of 25 micrometers.
Left: Classification performance over the ten mouse cortex cell types reaches 0.832 mean F1-Score for fragments with aggregation radius 25 micrometers (boxed point). Right: The class-wise confusion matrix at 25 micrometers aggregation radius. Boxes indicate broad groups (glia, excitatory neurons, and inhibitory interneurons). P: pyramidal cell; THLC: thalamocortical axon; BC: basket cell; BPC: bipolar cell; MC: Martinotti cell; NGC: neurogliaform cell.
Finally, we showed how SegCLR can be used for automated analysis of brain connectivity by cell typing the synaptic partners of reconstructed cells throughout the mouse cortex dataset. Knowing the connectivity patterns between specific cell types is fundamental to interpreting large-scale connectomic reconstructions of brain wiring, but this typically requires manual tracing to identify partner cell types. Using SegCLR, we replicated brain connectivity findings that previously relied on intensive manual tracing, while extending their scale in terms of the number of synapses, cell types, and brain areas analyzed. (See the paper for further details.)
SegCLR automated analysis of brain connectivity. Top: An example mouse pyramidal cell, with synapse locations color-coded according to whether the synaptic partner was classified as inhibitory (blue), excitatory (red), or unknown (black). Inset shows higher detail of the soma and proximal dendrites. Bottom: We counted how many upstream synaptic partners were classified as thalamocortical axons, which bring input from sensory systems to the cortex. We found that thalamic input arrives primarily at cortical layer L4, the canonical cortical input layer, and preferentially targets primary visual area V1, rather than higher visual areas (HVA).
What’s Next?
SegCLR captures rich cellular features and can greatly simplify downstream analyses compared to working directly with raw image and segmentation data. We are excited to see what the community can discover using the ~8 billion embeddings we are releasing for the human and mouse cortical datasets (example access code; browsable human and mouse views in Neuroglancer). By reducing complex microscopy data to rich and compact embedding representations, SegCLR opens many novel avenues for biological insight, and may serve as a link to complementary modalities for high-dimensional characterization at the cellular and subcellular levels, such as spatially-resolved transcriptomics.
Anyone who’s taken a photo with a digital camera is likely familiar with a “noisy” image: discolored spots that make the photo lose clarity and sharpness.
Many photographers have tips and tricks to reduce noise in images, including fixing the settings on the camera lens or taking photos in different lighting. But it isn’t just photographs that can look discolored — noise is common in computer graphics, too.
Noise refers to the random variations of brightness and color that aren’t part of the original image. Removing noise from imagery — which is becoming more common in the field of image processing and computer vision — is known as denoising.
Image denoising uses advanced algorithms to remove noise from graphics and renders, making a huge difference to the quality of images. Photorealistic visuals and immersive renders could not be possible without denoising technology.
What Is Denoising?
In computer graphics, images can be made up of both useful information and noise. The latter reduces clarity. The ideal end product of denoising would be a crisp image that only preserves the useful information. When denoising an image, it’s also important to keep visual details and components such as edges, corners, textures and other sharp structures.
To reduce noise without affecting the visual details, three types of signals in an image must be targeted by denoising:
Diffuse — scattered lighting reflected in all directions;
Specular or reflections — lighting reflected in a particular direction; and
Infinite light-source shadows — sunlight, shadows and any other visible light source.
To create the clearest image, a user must cast thousands of rays in directions following the diffuse and specular signals. Often in real-time ray tracing, however, only one ray per pixel or even less is used.
Denoising is necessary in real-time ray tracing because of the relatively low ray counts to maintain interactive performance.
How Does Denoising Work?
Image denoising is commonly based on three techniques: spatial filtering, temporal accumulation, and machine learning and deep learning reconstruction.
Spatial filtering selectively alters parts of an image by reusing similar neighboring pixels. The advantage of spatial filtering is that it doesn’t produce temporal lag, which is the inability to immediately respond to changing flow conditions. However, spatial filtering introduces blurriness and muddiness, as well as temporal instability, which refers to flickering and visual imperfections in the image.
Temporal accumulation reuses data from the previous frame to determine if there are any artifacts — or visual anomalies — in the current frame that can be corrected. Although temporal accumulation introduces temporal lag, it doesn’t produce blurriness. Instead, it adds temporal stability to reduce flickering and artifacts over multiple frames.
Machine learning and deep learning reconstruction uses a neural network to reconstruct the signal. The neural network is trained using various noisy and reference signals. Though the reconstructed signal for a single frame can look complete, it can become temporally unstable over time, so a form of temporal stabilization is needed.
Denoising in Images
Denoising provides users with immediate visual feedback, so they can see and interact with graphics and designs. This allows them to experiment with variables like light, materials, viewing angle and shadows.
Solutions like NVIDIA Real-Time Denoisers (NRD) make denoising techniques more accessible for developers to integrate into pipelines. NRD is a spatio-temporal denoising library that’s agnostic to application programming interfaces and designed to work with low rays per pixel.
NRD uses input signals and environmental conditions to deliver results comparable to ground-truth images. See NRD in action below:
With NRD, developers can achieve real-time results using a limited budget of rays per pixel. In the video above, viewers can see the heavy lifting that NRD does in real time to resolve image noise.
Popular games such as Dying Light 2 and Hitman III use NRD for denoising.
NRD supports the denoising of diffuse, specular or reflections, and shadow signals. The denoisers included in NRD are:
ReBLUR — based on the idea of self-stabilizing, recurrent blurring. It’s designed to work with diffuse and specular signals generated with low ray budgets.
SIGMA — a fast shadow denoiser. It supports shadows from any type of light source, like the sun and local lights.
ReLAX — preserves lighting details produced by NVIDIA RTX Direct Illumination, a framework that enables developers to render scenes with millions of dynamic area lights in real time. ReLAX also yields better temporal stability and remains responsive to changing lighting conditions.