The post A conversation with Kevin Scott: What’s next in AI appeared first on The AI Blog.
NeurIPS 2022: Seven Microsoft Research Papers Selected for Oral Presentations
Microsoft is proud to be a platinum sponsor of the 36th annual conference on Neural Information Processing Systems (NeurIPS), which is widely regarded as the world’s most prestigious research conference on artificial intelligence and machine learning.
Microsoft has a strong presence at NeurIPS again this year, with more than 150 of our researchers participating in the conference and 122 of our research papers accepted. Our researchers are also taking part in 10 workshops, four competitions and a tutorial.
In one of the workshops, AI for Science: Progress and Promises, a panel of leading researchers will discuss how artificial intelligence and machine learning have the potential to advance scientific discovery. The panel will include two Microsoft researchers: Max Welling, Vice President and Distinguished Scientist, Microsoft Research AI4Science, who will serve as moderator, and Peter Lee, Corporate Vice President, Microsoft Research and Incubations.
Of the 122 Microsoft research papers accepted for the conference, seven have been selected for oral presentations during the virtual NeurIPS experience the week of December 4th. The oral presentations provide a deeper dive into each of the featured research topics.
In addition, two other Microsoft research papers received Outstanding Paper Awards for NeurIPS 2022. One of those papers, Gradient Estimation with Discrete Stein Operators, explains how researchers developed a gradient estimator that achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations, which has the potential to improve problem solving in machine learning. In the other paper, A Neural Corpus Indexer for Document Retrieval, researchers demonstrate that an end-to-end deep neural network that unifies training and indexing stages can significantly improve the recall performance of traditional document retrieval methods.
Spotlight: On-Demand EVENT
Microsoft Research Summit 2022
On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.
Below we have provided the titles, authors and abstracts for all seven of the Microsoft research papers chosen for oral presentations at NeurIPS, with links to additional information for those who want to explore the topics more fully:
Uni[MASK]: Unified Inference in Sequential Decision Problems
Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin
Abstract: Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the UniMASK framework, which provides a unified way to specify models which can be trained on many different sequential decision-making tasks. We show that a single UniMASK model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine tuning, our UniMASK models consistently outperform comparable single-task models.
K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao
Abstract: The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, based on the broad concept coverage achieved through large-scale data collection process. Alternatively, we argue that learning with external knowledge about images is a promising way which leverages a much more structured source of supervision and offers sample efficiency.
In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is released at https://github.com/microsoft/klite.
Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He
Abstract: Extreme compression, particularly ultra-low bit precision (binary/ternary) quantization, has been proposed to fit large NLP models on resource-constraint devices. However, to preserve the accuracy for such aggressive compression schemes, cutting-edge methods usually introduce complicated compression pipelines, e.g., multi-stage expensive knowledge distillation with extensive hyperparameter tuning. Also, they oftentimes focus less on smaller transformer models that have already been heavily compressed via knowledge distillation and lack a systematic study to show the effectiveness of their methods.
In this paper, we perform a very comprehensive systematic study to measure the impact of many key hyperparameters and training strategies from previous. As a result, we find out that previous baselines for ultra-low bit precision quantization are significantly under-trained. Based on our study, we propose a simple yet effective compression pipeline for extreme compression.
Our simplified pipeline demonstrates that:
(1) we can skip the pre-training knowledge distillation to obtain a 5-layer bert while achieving better performance than previous state-of-the-art methods, like TinyBERT;
(2) extreme quantization plus layer reduction is able to reduce the model size by 50x, resulting in new state-of-the-art results on GLUE tasks.
On the Complexity of Adversarial Decision Making
Dylan J Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan
Abstract: A central problem in online learning and decision making—from bandits to reinforcement learning—is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision-making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show—via new upper and lower bounds—that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. in the stochastic counterpart to our setting, is necessary and sufficient to obtain low regret for adversarial decision making. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results –both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and György.
Maximum Class Separation as Inductive Bias in One Matrix
Tejaswi Kasarla, Gertjan J. Burghouts, Max van Spengler, Elise van der Pol, Rita Cucchiara, Pascal Mettes
Abstract: Maximizing the separation between classes constitutes a well-known inductive bias in machine learning and a pillar of many traditional algorithms. By default, deep networks are not equipped with this inductive bias and therefore many alternative solutions have been proposed through differential optimization. Current approaches tend to optimize classification and separation jointly: aligning inputs with class vectors and separating class vectors angularly.
This paper proposes a simple alternative: encoding maximum separation as an inductive bias in the network by adding one fixed matrix multiplication before computing the softmax activations. The main observation behind our approach is that separation does not require optimization but can be solved in closed-form prior to training and plugged into a network. We outline a recursive approach to obtain the matrix consisting of maximally separable vectors for any number of classes, which can be added with negligible engineering effort and computational overhead. Despite its simple nature, this one matrix multiplication provides real impact. We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to ImageNet. We find empirically that maximum separation works best as a fixed bias; making the matrix learnable adds nothing to the performance. The closed-form implementation and code to reproduce the experiments are available on GitHub.
Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis
Tim Pearce, Jong-Hyeon Jeong, Yichen Jia, Jun Zhu
Abstract: This paper considers doing quantile regression on censored data using neural networks (NNs). This adds to the survival analysis toolkit by allowing direct prediction of the target variable, along with a distribution-free characterization of uncertainty, using a flexible function approximator. We begin by showing how an algorithm popular in linear models can be applied to NNs. However, the resulting procedure is inefficient, requiring sequential optimization of an individual NN at each desired quantile. Our major contribution is a novel algorithm that simultaneously optimizes a grid of quantiles output by a single NN. To offer theoretical insight into our algorithm, we show firstly that it can be interpreted as a form of expectation-maximization, and secondly that it exhibits a desirable `self-correcting’ property. Experimentally, the algorithm produces quantiles that are better calibrated than existing methods on 10 out of 12 real datasets.
Learning (Very) Simple Generative Models Is Hard
Sitan Chen, Jerry Li, Yuanzhi Li
Abstract: Motivated by the recent empirical successes of deep generative models, we study the computational complexity of the following unsupervised learning problem. For an unknown neural network (F:mathbb{R}^dtomathbb{R}^{d’}), let (D) be the distribution over (mathbb{R}^{d’}) given by pushing the standard Gaussian (mathcal{N}(0,textrm{Id}_d)) through (F). Given i.i.d. samples from (D), the goal is to output ({any}) distribution close to (D) in statistical distance.
We show under the statistical query (SQ) model that no polynomial-time algorithm can solve this problem even when the output coordinates of (F) are one-hidden-layer ReLU networks with (log(d)) neurons. Previously, the best lower bounds for this problem simply followed from lower bounds for (supervised) (learning) and required at least two hidden layers and (poly(d)) neurons [Daniely-Vardi ’21, Chen-Gollakota-Klivans-Meka ’22].
The key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function (f) with polynomially-bounded slopes such that the pushforward of (mathcal{N}(0,1)) under (f) matches all low-degree moments of (mathcal{N}(0,1)).
The post NeurIPS 2022: Seven Microsoft Research Papers Selected for Oral Presentations appeared first on Microsoft Research.
Research Focus: Week of November 28, 2022
Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao
Knowledge distillation (KD) is effective in compressing large pre-trained language models, where we train a small student model to mimic the output distribution of a large teacher model (e.g., BERT, GPT-X). KD relies on hand-designed student model architectures that require several trials and pre-specified compression rates. In our paper, Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models, we discuss AutoDistil, a new technique pioneered by Microsoft Research that leverages advances in and neural architecture search (NAS) to automatically generate a suite of compressed models with variable computational cost (e.g., varying sizes, FLOPs and latency). NAS for distillation addresses customization challenges of hand-engineering compressed model architectures for diverse deployment environments having variable resource constraints with an automated framework. AutoDistil-generated compressed models obtain up to 41x reduction in FLOPs with limited regression in task performance and 6x FLOPs reduction with parity in performance with large teacher model. Given any state-of-the-art compressed model, AutoDistil finds a better compressed variant with better trade-off in task performance vs. computational cost during inference.
Neuron with steady response leads to better generalization
Qiang Fu, Lun Du, Haitao Mao, Xu Chen, Wei Fang, Shi Han, Dongmei Zhang
Improving models’ ability to generalize is one of the most important research problems in machine learning. Deep neural networks with diverse architectures have been invented and widely applied to various domains and tasks. Our goal was to study and identify the fundamental properties commonly shared by different kinds of deep neural networks, and then design a generic technique applicable for all of them to improve their generalization.
In this paper, from the neural level granularity, we study the characteristics of individual neurons’ response during the training dynamics. We find that keeping the response of activated neurons stable for the same class helps improve models’ ability to generalize. This is a new regularization perspective based on the neuron-level class-dependent response distribution. Meanwhile, we observed that the traditional vanilla model usually lacks good steadiness of intra-class response. Based on these observations, we designed a generic regularization method, Neuron Steadiness Regularization (NSR), to reduce large intra-class neuron response variance. NSR is computationally efficient and applicable to various architectures and tasks. Significant improvements are obtained on extensive experiments with multiple types of datasets and various network architectures. We will continue the research for improving the model generalization ability.
Long-form video-language pre-training with multimodal temporal contrastive learning
Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu
Huge numbers of videos on diverse topics and of various lengths are shared on social media. Analyzing and understanding these videos is an important but challenging problem. Previous work on action and scene recognition has been limited to certain labels, while neglecting the rich semantic and dynamic information in other videos. Inspired by the cross-modal pre-training paradigm in image-language domain (e.g., CLIP, Florence), researchers have explored video-language joint pre-training, which mainly use short-form videos (e.g.,
In this research, we propose a Long-Form VIdeo-LAnguage pre-training model (LF-VILA) to explore long-form video representation learning, and train it on a long-form video-language dataset (LF-VILA-8M) on the basis of our new collected video-language dataset (HD-VILA-100M). We then design a Multimodal Temporal Contrastive (MTC) loss to capture the temporal relation between video clips and single sentences. We also propose the Hierarchical Temporal Window Attention (HTWA) mechanism on video encoder to reduce the training time by one-third. Our model achieves significant improvements on nine benchmarks, including paragraph-to-video retrieval, long-form video question-answering, and action recognition tasks. In the future, we will explore using it for broader scenarios, such as ego-centric video understanding.
Microsoft Research Causality and ML team features multiple papers and workshops at NeurIPS 2022
Parikshit Bansal, Ranveer Chandra, Eleanor Dillon, Saloni Dash, Rui Ding, Darren Edge, Adam Foster, Wenbo Gong, Shi Han, Agrin Hilmkil, Joel Jennings, Jian Jiao, Emre Kıcıman, Hua Li, Chao Ma, Sara Malvar, Robert Ness, Nick Pawlowski, Yashoteja Prabhu, Eduardo Rodrigues, Amit Sharma, Swati Sharma, Cheng Zhang, Dongmei Zhang
Identifying causal effects is an integral part of scientific inquiry, helping us to understand everything from educational outcomes to the effects of social policies to risk factors for diseases. Questions of cause-and-effect are also critical for the design and data-driven improvement and evaluation of business and technological systems we build today. The intersection of causal analysis and machine learning is driving rapid advances. Microsoft researchers are excited to be presenting three papers at NeurIPS, along with workshops on new methods and their applications. This includes work improving deep methods for causal discovery, applying causal insights to improve responsible language models, and improving soil carbon modeling with causal approaches. To accelerate research and broaden adoption of the latest causal methods, Microsoft researchers are co-organizing the Workshop on Causality for Real-world Impact and releasing new no-code interactive ShowWhy tools for causal discovery and analysis. We encourage NeurIPS attendees to learn more via the links below or stop by the Microsoft booth for demos and talks.
Main conference papers
Workshop papers
Workshop on Causality for Real-world Impact
- Making A Causal AI Suite for Decision-Making
- The Counterfactual-Shapley Value: Attributing Change in System Metrics
- Counterfactual Generation Under Confounding
- Deep End-to-end Causal Inference
- Rhino: Deep Causal Temporal Relationship Learning with history-dependent noise
- Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning
Workshop on Tackling Climate Change with Machine Learning
Workshop on Distribution Shifts
Workshop on Understanding Deep Learning Through Empirical Falsification (“I can’t believe it’s not better”)
We’ll be participating in the panel.
Causal AI Software Resources
-
Download
Causal No-Code Tools (ShowWhy)
New research on generative models
Two papers covering new research on generative models will be presented at NeurIPS 2022.
Vikas Raunak, Matt Post, Arul Menezes
The first paper, Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models, presents recommendations on the evaluation of state-of-the-art generative models for constrained generation tasks. The progress on generative models has been rapid in recent years. These large-scale models have had three impacts: 1) The fluency of generation in both language and vision modalities has rendered common average-case evaluation metrics much less useful in diagnosing system errors; 2) The same substrate models now form the basis of a number of applications, driven both by the utility of their representations as well as phenomena such as in-context learning, which raise the abstraction level of interacting with such models; 3) The user expectations around these models have made the technical challenge of out-of-domain generalization much less excusable in practice. Subsequently, our evaluation methodologies haven’t adapted to these changes. More concretely, while the associated utility and methods of interacting with generative models have expanded, a similar expansion has not been observed in their evaluation practices. In this paper, we argue that the scale of generative models could be exploited to raise the abstraction level at which evaluation itself is conducted and provide recommendations for the same. Our recommendations are based on leveraging specifications as a powerful instrument to evaluate generation quality and are readily applicable to a variety of tasks.
-
Publication
Rank-One Editing of Encoder-Decoder Models
The second paper is Rank-One Editing of Encoder-Decoder Models. Here, we look at large sequence-to-sequence models for tasks such as neural machine translation (NMT), which are usually trained over hundreds of millions of samples. However, training is just the origin of a model’s life-cycle. Real-world deployments of models require further behavioral adaptations as new requirements emerge or shortcomings become known. Typically, in the space of model behaviors, behavior deletion requests are addressed through model retrainings, whereas model finetuning is done to address behavior addition requests. Both procedures are instances of data-based model intervention. In this work, we present a preliminary study investigating rank-one editing as a direct intervention method for behavior deletion requests in encoder-decoder transformer models. We propose four editing tasks for NMT and show that the proposed editing algorithm achieves high efficacy, while requiring only a single instance of positive example to fix an erroneous (negative) model behavior. This research therefore explores a path towards fixing the deleterious behaviors of encoder-decoder models for tasks such as translation, making them safer and more reliable without investing in a huge computational budget.
- Venue: The Second Workshop On Interactive Learning For Natural Language Processing (InterNLP 2022)
Award Winner: A Neural Corpus Indexer for Document Retrieval
Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Allen Sun, Weiwei Deng, Qi Zhang, Mao Yang
Note: this paper was named an Outstanding Paper at NeurIPS 2022
Current state-of-the-art document retrieval solutions typically follow an index-retrieve paradigm, where the index is not directly optimized towards the final target. The proposed Neural Corpus Indexer (NCI) model, instead, leverages a sequence-to-sequence architecture, which serves as a model-based index that takes a query as input and outputs the most relevant document identifiers. For the first time, we demonstrate that an end-to-end differentiable document retrieval model can significantly outperform both sparse inverted index and dense retrieval methods. Specifically, NCI achieves +17.6% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset respectively, and a competitive MRR without using an explicit re-ranking model. This work has received a NeurIPS 2022 Outstanding Paper award.
The pipeline is composed of three stages. In the first stage, documents are encoded into semantic identifiers by the hierarchical k-means algorithm. In the second stage, a query generation model is employed to prepare training pairs. At the third stage, the NCI is trained with cross-entropy and consistency-based regularization losses. To further align with the hierarchical nature of the semantic identifiers, a weight adaptation mechanism is introduced to make the decoder aware of semantic prefixes. During inference, top N relevant documents can be easily obtained via beam search. The proposed approach introduces architectural and training choices that demonstrate the promising future of neural indexers as a viable alternative. And the discussed open questions can serve as an inspiration for future research.
Microsoft Research career opportunities – come join us!
We’re hiring for multiple roles including internships and researchers at all levels in multiple Microsoft Research labs. Join us and work on causal ML, precision health, genomics, deep learning, robotics, or computational chemistry. If you’re attending the conference, stop by the Microsoft booth (Expo Hall G, Booth #202) to speak with researchers and recruiters about working at Microsoft and open job opportunities. Or you can browse our current openings at NeurIPS 2022 – Microsoft Research career opportunities.
The post Research Focus: Week of November 28, 2022 appeared first on Microsoft Research.
Research trends in privacy, security and cryptography
Trust is essential for people and organizations to use technology with confidence. At Microsoft, we strive to earn the trust of our customers, employees, communities, and partners by committing to privacy, security, the responsible use of AI, and transparency.
At Microsoft Research, we take on this challenge by creating and using state-of-the-art tools and technologies that support a proactive, integrated approach to security across all layers of the digital estate.
Threats to cybersecurity are constant and they continue to grow, impacting organizations and individuals everywhere. Attack tools are readily available and well-funded adversaries now have the capability to cause unprecedented harm. These threats help explain why U.S. President Joe Biden issued an executive order in 2021 calling for cybersecurity improvements. Similarly, the European Union recently called for stronger protection of its information and communication technology (ICT) supply chains.
Against that backdrop, Microsoft Research is focused on what comes next in security and privacy. New and emerging computing frontiers, like the metaverse and web3, will require consistent advances in identity, transparency and other security principles, in order to learn from the past and unlock these technologies’ potential. Developments in quantum computing and advances in machine learning and artificial intelligence offer great potential to advance science and the human condition. Our research aims to ensure that future breakthroughs come with robust safety and privacy protections, even as they accelerate profound changes and new business opportunities.
At Microsoft Research, we pursue ambitious projects to improve the privacy and security of everyone on the planet. This is the first blog post in a series exploring the work we do in privacy, security and cryptography. In future installments, we will dive deeper into the research challenges we are addressing, and the opportunities we see.
Spotlight: On-Demand EVENT
Microsoft Research Summit 2022
On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.
Digital identities
While the internet was not originally built with an identity layer, digital identities have grown to become foundational elements of today’s web and impacti people’s lives even beyond the digital world. Our research is aimed at modernizing digital identities and building more robust, usable, private and secure user-centric identity systems, putting each of us in control of our own digital identities.
This work includes researching cryptographic algorithms that enable privacy-preserving open-source user-centric identity systems. Such systems would let people present cryptographically signed electronic claims and selectively choose which information they wish to disclose, while preventing tracking of people between presentations of the claim. Our approach would preserve an individual’s privacy and work with existing web protocols to provide easy and safe access to a wide range of resources and activities.
Our research also includes investigating innovative ways for people to manage their identity secrets reliably and safely without having to provide any centralized party with full access to them. Success in this area will also require scalable and verifiable methods to distribute identity public keys, so people can know who exactly they are interacting with.
Media provenance and authenticity
Advances in graphics and machine learning algorithms have enabled the creation of easy-to-use tools for editing While useful in many ways, this technology has also enabled fraud and manipulation of digital images and media – or deepfakes. Early fakes were easy to spot, but current versions are becoming nearly impossible for machines or people to detect. The potential proliferation of fakes that are indistinguishable from reality undermines society’s trust in everything we see and hear.
Rather than trying to detect fakes, Microsoft Research has developed technology to determine the source of any digital media and whether it has been altered. We do this by adding digitally signed manifests to video, audio or images. The source of these media objects might be well-known news organizations, governments or even individuals using apps on mobile devices.
Since media creation, distribution, and consumption are complex and involve many industries, Microsoft has helped forma standards organization to stipulate how these signatures are added to media objects. We are also working with news organizations such as the BBC, New York Times, and CBC to promote media provenance as a mitigation for misinformation on social media networks.
Hardware security foundations
To promote cyber-resilience, we are developing systems which can detect a cyberattack and safely shut down protecting data and blocking the attacker. The systems are designed to be repaired quickly and securely, if compromised. These systems are built with simple hardware features that provide very high levels of protection for repair and recovery modules. To enable reliable detection of compromised systems, we are also developing storage features that can be used to protect security event logs. This makes it harder for attackers to cover their tracks.
Security analytics
Modern-day computers and networks are under constant attack by hackers of all kinds. In this seemingly never-ending cat-and-mouse contest, securing and defending today’s global systems is a multi-billion-dollar enterprise. Managing the massive quantities of security data collected is increasingly challenging, which creates an urgent need for disruptive innovation in security analytics.
We are investigating a transformer-based approach to modeling and analyzing large-scale security data. Applying and tuning such models is a novel field of study that could change the game for security analytics.
Privacy-preserving machine learning
A privacy-preserving AI system should generalize so well that its behavior reveals no personal or sensitive details that may have been contained in the original data on which it was trained.
How close can we get to this ideal? Differential privacy can enable analysts to extract useful insights from datasets containing personal information even while strengthening privacy protections. This method introduces “statistical noise.” The noise is significant enough that AI models are prevented from compromising the privacy of any individual, but still provide accurate, useful research findings. Our recent results show that large language models can be particularly effective differentially private learners.
Another approach, federated learning, enables large models to be trained and fine-tuned on customers’ own devices to protect the privacy of their data, and to respect data boundaries and data-handling policies. At Microsoft Research, we are creating an orchestration infrastructure for developers to deploy cross-platform, cross-device federated learning solutions.
Protecting data in training or fine-tuning is just one piece of the puzzle. Whenever AI is used in a personalized context, it may unintentionally leak information about the target of the personalization. Therefore, we must be able to describe the threat model for a complete deployment of a system with AI components, rather than just a single part of it.
Read more about our work on these and other related topics in an earlier blog post.
Confidential computing
Confidential computing has emerged as a practical solution to securing compute workloads in cloud environments, even from malicious cloud administrators. Azure already offers confidential computing environments in multiple regions, leveraging Trusted Execution Environments (TEEs) available in multiple hardware platforms.
Imagine if all computation were taking place in TEEs, where services would be able to access sensitive data only after they had been attested to perform specific tasks. This is not practical today and much research remains to be done. For example, there are no formal standards to even describe what a TEE is, what kind of programming interface a TEE cloud should have, or how different TEEs should interact.
Additionally, it is important to continuously improve the security guarantees of TEEs. For instance, understanding which side-channel attacks are truly realistic and developing countermeasures remains a major topic for research. Furthermore, we need to continue researching designs for confidential databases, confidential ledgers and confidential storage. Finally, even if we build both confidential computing and storage environments, how can we establish trust in the code that we want to run? As a cloud provider, our customers expect us to work continuously on improving the security of our infrastructure and the services that run on it.
Secure-by-design cloud
In the future, we can imagine Azure customers compiling their software for special hardware with memory tagging capabilities, eliminating problems like buffer overflows for good. To detect compromise, VM memory snapshots could be inspected and studied with AI-powered tools. In the worst case, system security could always be bootstrapped from a minimal hardware root of trust. At Microsoft Research, we are taking a step further and asking how we can build the cloud from the ground up, with security in mind.
New cryptography
The advance of quantum computing presents many exciting potential opportunities. As a leader in both quantum computing development and cryptographic research, Microsoft has a responsibility to ensure that the groundbreaking innovations on the horizon don’t compromise classical (non-quantum) computing systems and information. Working across Microsoft, we are learning more about the weaknesses of classical cryptography and how to build new cryptographic systems strong enough to resist future attacks.
Our active participation in the National Institute of Standards and Technology (NIST) Post-Quantum Cryptography projects has allowed Microsoft Research to examine deeply how the change to quantum-resistant algorithms will impact Microsoft services and Microsoft customers. With over seven years of work in this area, Microsoft Research’s leadership in quantum cryptography will help customers prepare for the upcoming change of cryptographic algorithms.
We’ve joined with the University of Waterloo and others to build a platform for experimenting with the newly proposed cryptographic systems and applying them to real-world protocols and scenarios. We’ve implemented real-world tests of post-quantum cryptography, to learn how these new systems will work at scale and how we can deploy them quickly to protect network tunnels Our specialized hardware implementations and cryptanalysis provide feedback to the new cryptosystems, which improves their performance, making post-quantum cryptosystems smaller and stronger.
ElectionGuard
-
Download
ElectionGuard
Advances in cryptography are enabling end-to-end verifiable elections and risk-limiting audits for elections. Our open-source ElectionGuard project uses cryptography to confirm all votes have been correctly counted. Individual voters can see that their vote has been accurately recorded and anyone can check that all votes have been correctly tallied—yet individual ballots are kept secret. Risk-limiting audits use advanced statistical methods that can determine when an election audit has hit a pre-determined level of confidence with greater efficiency than traditional audits.
The cryptography tools that enable verifiable voting are Shamir Secret Sharing, Threshold Encryption, and additive Homomorphic Encryption. The math is interesting, and we will explore that in future blog posts, but there’s much more than math to ElectionGuard.
Securing the future
Through our work, we aim to continue to earn customer trust, striving to ensure that Microsoft’s products and services and our customer’s information will remain safe and secure for years to come.
Forthcoming entries in this blog series will include more details on the areas covered in this post and more. Much of our work is open-source and published, so we will be highlighting our GitHub projects and other ways you can interact directly with our work.
Have a question or topic that you would like to see us address in a future post? Please contact us!
The post Research trends in privacy, security and cryptography appeared first on Microsoft Research.
Research Focus: Week of November 17, 2022
Microsoft Research at NeurIPS 2022
-
Event
NeurIPS 2022
Microsoft is a proud platinum sponsor of the 36th annual conference on Neural Information Processing Systems, running from November 28 to December 9. More than 150 of our researchers are involved in presentations, poster sessions, accepted papers, and workshops at this prestigious conference.
We are thrilled to have had more than 100 papers accepted at NeurIPS 2022. Our full roster includes cutting edge research ranging from differential privacy and reinforcement learning to extreme compression, motion capture and large language models.
This hybrid conference includes an in-person component at the New Orleans Convention Center during the first week, and a virtual component the second week. Check out our list of sessions and plan your schedule.
If you will be attending in person, we hope you will stop by our booth (#202) to chat with our experts, see demos of our latest research and find out about career opportunities with Microsoft.
Spotlight: On-Demand EVENT
Microsoft Research Summit 2022
On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.
Project Eclipse shows neighborhood level air pollution inequities in real time and during short-term air quality events
Precious Esie, Madeleine I. G. Daepp, Asta Roseway, and Scott Counts
Air pollution kills over 7 million people globally every year. In U.S. cities, pollution sources are more likely to affect communities of color – but the most impacted communities rarely have the data they need to understand and address this invisible hazard.
-
Project
Project Eclipse
Through Project Eclipse, a team from Microsoft Research has worked with the Chicago Department of Public Health and JCDecaux – an advertising agency and the world’s largest provider of outdoor street furniture – to deploy low-cost air quality sensors across the city’s bus shelters. In a new paper published this week in the American Journal of Public Health, the team showed how the citywide network of sensors enabled them to capture and visualize differences in exposures over time and space. The work was led by Precious Esie, a PhD student in epidemiology at Columbia’s Mailman School of Public Health, during her summer internship at Microsoft Research.
Over the month of July 2021, the research team found, pollution disproportionately affected Hispanic and Latinx residents on the West side of the city. But 4th of July spikes disproportionately affected the South side—including places where asthma rates are highest. In late July, wildfire smoke increased exposures across the city as a whole. This work shows how next-generation environmental sensing can help public health agencies target interventions when and where they are most needed.
Microsoft Research contributes to industry supply chain standards
Kiran Karunakaran, Principal Product Manager, Azure Security Engineering
Supply Chain Integrity, Transparency and Trust (SCITT) is an industry-standards initiative for managing the compliance of goods and services across end-to-end supply chains, allowing organizations to build, deploy and consume interoperable and automated solutions to manage end-to-end supply chain compliance. SCITT was initiated as a response to United States Executive Order on Improving the Nation’s Cybersecurity (EO14028), and was formally adopted into the Internet Engineering Task Force (IETF) in March 2022. Over the last eight months, SCITT has been one of the fastest growing initiatives within IETF and became a formal working group in October 2022.
Microsoft Research is actively contributing to IETF SCITT Architecture and SCITT Receipt Format Internet Drafts and will be providing and collaborating on several SCITT-related open-source software offerings. This includes the donation of a SCITT Emulator that allows SCITT implementers to experiment with SCITT APIs and message formats. Microsoft is also open sourcing our SCITT implementation prototype that runs on Confidential Consortium Framework (microsoft.com), providing visibility into one of the many possible implementations of SCITT.
A SCITT IETF Working Group session was held at IETF115 on Nov 10th. To learn more about the community efforts, blogs, and implementations around SCITT, please visit SCITT.io.
-
DOWNLOAD
SCITT API emulator
-
DOWNLOAD
SCITT implementation prototype
2010 paper lands “Test of Time” award for Microsoft researcher Mike Chieh-Jan Liang and co-authors
Mike Chieh-Jan Liang, a principal researcher in the Systems and Networking Research Group (Asia), is part of a team that won a “Test of Time” award for the 2010 paper: Design and Evaluation of a Versatile and Efficient Receiver-Initiated Link Layer for Low-Power Wireless.
The research introduced A-MAC, a receiver-initiated link layer for low-power wireless networks that supported several services under a unified architecture, and more efficiently and scalably than prior approaches.
Co-authors on the paper include Prabal Dutta (University of Michigan), Stephen Dawson-Haggerty (University of California, Berkeley), Yin Chen (Johns Hopkins University), and Andreas Terzis (Johns Hopkins University).
The Test of Time award is given by ACM SenSys in recognition of papers that are at least 10 years old and have had long lasting impact on networked embedded sensing system science and engineering.
The post Research Focus: Week of November 17, 2022 appeared first on Microsoft Research.
Cloud Intelligence/AIOps – Infusing AI into Cloud Computing Systems
When legendary computer scientist Jim Gray accepted the Turing Award in 1999, he laid out a dozen long-range information technology research goals. One of those goals called for the creation of trouble-free server systems or, in Gray’s words, to “build a system used by millions of people each day and yet administered and managed by a single part-time person.”
Gray envisioned a self-organizing “server in the sky” that would store massive amounts of data, and refresh or download data as needed. Today, with the emergence and rapid advancement of artificial intelligence (AI), machine learning (ML) and cloud computing, and Microsoft’s development of Cloud Intelligence/AIOps, we are closer than we have ever been to realizing that vision—and moving beyond it.
Over the past fifteen years, the most significant paradigm shift in the computing industry has been the migration to cloud computing, which has created unprecedented digital transformation opportunities and benefits for business, society, and human life.
The implication is profound: cloud computing platforms have become part of the world’s basic infrastructure. As a result, the non-functional properties of cloud computing platforms, including availability, reliability, performance, efficiency, security, and sustainability, have become immensely important. Yet the distributed nature, massive scale, and high complexity of cloud computing platforms—ranging from storage to networking, computing and beyond—present huge challenges to building and operating such systems.
What is Cloud Intelligence/AIOps?
Cloud Intelligence/AIOps (“AIOps” for brevity) aims to innovate AI/ML technologies to help design, build, and operate complex cloud platforms and services at scale—effectively and efficiently.
AIOps has three pillars, each with its own goal:
- AI for Systems to make intelligence a built-in capability to achieve high quality, high efficiency, self-control, and self-adaptation with less human intervention.
- AI for Customers to leverage AI/ML to create unparalleled user experiences and achieve exceptional user satisfaction using cloud services.
- AI for DevOps to infuse AI/ML into the entire software development lifecycle to achieve high productivity.
Where did the research on AIOps begin?
Gartner, a leading industry analyst firm, first coined the term AIOps (Artificial Intelligence for IT Operations) in 2017. According to Gartner, AIOps is the application of machine learning and data science to IT operation problems. While Gartner’s AIOps concept focuses only on DevOps, Microsoft’s Cloud Intelligence/AIOps research has a much broader scope, including AI for Systems and AI for Customers.
The broader scope of Microsoft’s Cloud Intelligence/AIOps stems from the Software Analytics research we proposed in 2009, which seeks to enable software practitioners to explore and analyze data to obtain insightful and actionable information for data-driven tasks related to software and services. We started to focus our Software Analytics research on cloud computing in 2014 and named this new topic Cloud Intelligence (Figure 1). In retrospect, Software Analytics is about the digital transformation of the software industry itself, such as empowering practitioners to use data-driven approaches and technologies to develop software, operate software systems, and engage with customers.
What is the AIOps problem space?
There are many scenarios around each of the three pillars of AIOps. Some example scenarios include predictive capacity forecasting for efficient and sustainable services, monitoring service health status, and detecting health issues in a timely manner in AI for Systems; ensuring code quality and preventing defective build deployed into production in AI for DevOps; and providing effective customer support in AI for Customers. Across all these scenarios, there are four major problem categories that, taken together, constitute the AIOps problem space: detection, diagnosis, prediction, and optimization (Figure 2). Specifically, detection aims to identify unexpected system behaviors (or anomalies) in a timely manner. Given the symptom and associated artifacts, the goal of diagnosis is to localize the cause of service issues and find the root cause. Prediction attempts to forecast system behaviors, customer workload patterns, or DevOps activities, and so on. Lastly, optimization tries to identify the optimal strategies or decisions required to achieve certain performance targets related to system quality, customer experience and DevOps productivity.
Each problem has its own challenges. Take detection as an example. To ensure service health at runtime, it is important for engineers to continuously monitor various metrics and detect anomalies in a timely manner. In the development process, to ensure the quality of the continuous integration/continuous delivery (CI/CD) practice, engineers need to create mechanisms to catch defective builds and prevent them from being deployed to other production sites.
Both scenarios require timely detection, and in both there are common challenges for conducting effective detection. For example, time series data and log data are the most common data forms. Yet they are often multi-dimensional, there may be noise in the data, and they often have different detection requirements—all of which can pose significant challenges to reliable detection.
Microsoft Research: Our AIOps vision
Microsoft is conducting continuous research in each of the AIOps problem categories. Our goal for this research is to empower cloud systems to be more autonomous, more proactive, more manageable, and more comprehensive across the entire cloud stack.
Making cloud systems more autonomous
AIOps strives to make cloud systems more autonomous, to minimize human operations and rule-based decisions, which significantly helps reduce user impact caused by system issues, make better operation decisions, and reduce maintenance cost. This is achieved by automating DevOps as much as possible, including build, deployment, monitoring, and diagnosis. For example, the purpose of safe deployment is to catch a defective build early to prevent it from rolling out to production and resulting in significant customer impact. It can be extremely labor intensive and time consuming for engineers, because anomalous behaviors have a variety of patterns that may change over time, and not all anomalous behaviors are caused by a new build, which may introduce false positives.
At Microsoft Research, we used transfer learning and active learning techniques to develop a safe deployment solution that overcomes these challenges. We’ve been running the solution in Microsoft Azure, and it has been highly effective at helping to catch defective builds – achieving more than 90% precision and near 100% recall in production over a period of 18 months.
Root cause analysis is another way that AIOps is reducing human operations in cloud systems. To shorten the mitigation time, engineers in cloud systems must quickly identify the root causes of emerging incidents. Owing to the complex structure of cloud systems, however, incidents often contain only partial information and can be triggered by many services and components simultaneously, which forces engineers to spend extra time diagnosing the root causes before any effective actions can be taken. By leveraging advanced contrast-mining algorithms, we have implemented autonomous incident-diagnosis systems, including HALO and Outage Scope, to reduce response time and increase accuracy in incident diagnosis tasks. These systems have been integrated in both Azure and Microsoft 365 (M365), which has considerably improved engineers’ ability to handle incidents in cloud systems.
Making cloud systems more proactive
AIOps makes cloud systems more proactive by introducing the concept of proactive design. In the design of a proactive system, an ML-based prediction component is added to the traditional system. The prediction system takes the input signals, does the necessary processing, and outputs the future status of the system. For example, what the capacity status of cluster A looks like next week, whether a disk will fail in a few days, or how many virtual machines (VMs) of a particular type will be needed in the next hour.
Knowing the future status makes it possible for the system to proactively avoid negative system impacts. For example, engineers can live migrate the services on an unhealthy computing node to a healthy one to reduce VM downtime, or pre-provision a certain number of VMs of a particular type for the next hour to reduce the latency of VM provisioning. In addition, AI/ML techniques can enable systems to learn over time which decision to make.
As an example of proactive design, we built a system called Narya, which proactively mitigated potential hardware failures to reduce service interruption and minimize customer impact. Narya, which is in production in Microsoft Azure, performs prediction on hardware failures and uses a bandit algorithm to decide which mitigation action to take.
Making cloud systems more manageable
AIOps makes cloud systems more manageable by introducing the notion of tiered autonomy. Each tier represents a set of operations that require a certain level of human expertise and intervention. These tiers range from the top tier of autonomous routine operations to the bottom tier, which requires deep human expertise to respond to rare and complex problems.
AI-driven automation often cannot handle such problems. By building AIOps solutions targeted at each tier, we can make cloud platforms easier to manage across the long tail of rare problems that inevitably arise in complex systems. Furthermore, the tiered design ensures that autonomous systems are developed from the start to evaluate certainty and risk, and that they have safe fallbacks when automation fails or the platform faces a previously unseen set of circumstances, such as the unforeseen increase in demand in 2020 due to the COVID-19 pandemic.
As an example of tiered autonomy, we built Safe On-Node Learning (SOL), a framework for safe learning and actuation on server nodes for the top tier. As another example, we are exploring how to predict the commands that operators should perform to mitigate incidents, while considering the associated certainty and risks of those commands when the top-tier automation fails to prevent the incidents.
Making AIOps more comprehensive across the cloud stack
AIOps can also be made more comprehensive by spanning the cloud stack—from the lowest infrastructure layers (such as network and storage) through the service layer (such as the scheduler and database) and on to the application layer. The benefit of applying AIOps more broadly would be a significant increase in the capability for holistic diagnosis, optimization, and management.
Microsoft services built on top of Azure are called first-party (1P) services. A 1P setting, which is often used to optimize system resources, is particularly suited to a more comprehensive approach to AIOps. This is because with the 1P setting a single entity has visibility into, and control over, the layers of the cloud stack, which enables engineers to amplify the AIOps impact. Examples of 1P services at Microsoft include large and established services such as Office 365, relatively new but sizeable services such as Teams, and up and coming services such as Windows 365 Cloud PC. These 1P services typically account for a significant share of resource usage, such as wide-area network (WAN) traffic and compute cores.
As an example of applying a more comprehensive AIOps approach to the 1P setting, the OneCOGS project, which is a joint effort of Azure, M365, and MSR, considers three broad opportunities for optimization:
- Modeling users and their workload using signals cutting across the layers—such as using the user’s messaging activity versus fixed working hours to predict when a Cloud PC user will be active—thereby increasing accuracy to enable enabling appropriate allocation of system resources.
- Jointly optimizing the application and the infrastructure to achieve cost savings and more.
- Tame the complexity of data and configuration, thereby democratizing AIOps.
The AIOps methodologies, technologies and practices used for cloud computing platforms and 1P services are also applicable to third-party (3P) services on the cloud stack. To achieve this, further research and development are needed to make AIOps methods and techniques more general and/or easily adaptable. For example, when operating cloud services, detecting anomalies in multi-dimensional space and the subsequent fault localization are common monitoring and diagnosis problems.
Motivated by the real-world needs of Azure and M365, we proposed the technique AiDice, which automatically detects anomalies in multi-dimensional space, and HALO, a hierarchy-aware approach to locating fault-indicating combinations that uses telemetry data collected from cloud systems. In addition to deploying AiDice and HALO in Azure and M365, we’re also collaborating with product team partners to make AiDice and HALO AIOps services that can be leveraged by third-party services.
Conclusion
AIOps is a rapidly emerging technology trend and an interdisciplinary research direction across system, software engineering, and AI/ML communities. With years of research on Cloud Intelligence, Microsoft Research has built up rich technology assets in detection, diagnosis, prediction, and optimization. And through close collaboration with Azure and M365, we have deployed some of our technologies in production, which has created significant improvements in the reliability, performance, and efficiency of Azure and M365 while increasing the productivity of developers working on these products. In addition, we are collaborating with colleagues in academia and industry to promote the AIOps research and practices. For example, with the joint efforts we have organized 3 editions of AIOps Workshop at premium academic conferences AAAI 2020, ICSE 2021, and MLSys2022.
Moving forward, we believe that as a new dimension of innovation, Cloud Intelligence/AIOps will play an increasingly important role in making cloud systems more autonomous, more proactive, more manageable, and more comprehensive across the entire cloud stack. Ultimately, Cloud Intelligence/AIOps will help us make our vision for the future of the cloud a reality.
The post Cloud Intelligence/AIOps – Infusing AI into Cloud Computing Systems appeared first on Microsoft Research.
Research Focus: Week of November 7, 2022
Microsoft Turing Universal Language Representation model, T-ULRv6, tops both XTREME and GLUE leaderboards with a single model
Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song
The most recent addition to Microsoft’s Turing Universal Language Representation Model family (T-ULRv6) has achieved the top position on both the Google XTREME and GLUE leaderboards, the first time that a single multilingual model has demonstrated state-of-the-art capabilities in both English and multilingual understanding tasks. The result of a collaboration between the Microsoft Turing team and Microsoft Research, the T-ULRv6 XXL model is based on “XY-LENT,” which leverages X-Y (non-English Centric) bitexts and incorporates the key innovations of XLM-E, the improved training data and vocabulary, and the advanced fine-tuning technique of XTune. Furthermore, to enable scaling to XXL sized models, T-ULRv6 leverages the memory optimization benefits afforded by ZeRO. To effectively utilize X-Y bitexts, the team adopted a novel sampling strategy and reconstructed the vocabulary using VoCap, which ensures an efficient distribution of data across languages and helps mitigate sparse sampling distributions from previous works. The XXL model variant outperforms both XLM-R XXL and mT5 XXL while being ~2x and ~3x smaller, respectively.
T-ULRv6 powers the language universalization of Microsoft Bing, enabling users to search and discover information across languages and domains. T-ULRv6 will soon enhance other Microsoft products with its multilingual capabilities.
XTREME, or Cross-lingual TRansfer Evaluation of Multilingual Encoders, is a benchmark covering 40 typologically diverse languages across 12 language families, with nine tasks that require reasoning about syntax or semantics.
GLUE – or the General Language Understanding Evaluation benchmark – is a collection of resources for training, evaluating, and analyzing natural language understanding systems.
PACT: Perception-Action Causal Transformer for autoregressive robotics pretraining
Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang Chen, Ashish Kapoor
Recent advances in machine learning architectures have induced a paradigm shift from task-specific models towards large general-purpose networks. For instance, in the past few years we have witnessed a revolution in the domains of natural language and computer vision with models such as GPT-3, BERT and DALL-E. The field of robotics is still mostly riddled with single-purpose systems architectures whose modules and connections, whether traditional or learning-based, require significant human design expertise. Inspired by these large pre-trained models, this work introduces a general-purpose robotics representation that can serve as a starting point for multiple tasks for a mobile agent, such as navigation, mapping and localization.
We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. Through autoregressive prediction of states and actions over time, our model implicitly encodes dynamics and behaviors for a particular robot. This representation can then function as a single starting point to achieve distinct tasks through fine-tuning with minimal data.
Microsoft Research and NHS Scotland conduct world’s first clinical trials of Holoportation-based 3D telemedicine system to increase access to healthcare
Steven Lo, Spencer Fowers, Kwame Darko, Thiago Spina, Catriona Graham, Andrea Britto, Anna Rose, David Tittsworth, Aileen McIntyre, Chris O’Dowd, Roma Maguire, Wayne Chang, David Young, Amber Hoak, Robin Young, Mark Dunlop, Levi Ankrah, Martina Messow, Opoku Ampomah, Ben Cutler, Roma Armstrong, Ruchi Lalwani, Ruairidh Davison, Sophie Bagnall, Whitney Hudson, Mike Shepperd, Jonny Johnson, 3DTM (3D Telemedicine) Collaborative research group
The Covid pandemic has increased the usage of remote health consultations and underscored the need for a better system. Current 2D telemedicine engagements fail to replicate the fluency or authenticity of in-person consultations. Real-time 3D telemedicine has previously been proposed within a research setting only, with constraints on complexity, bandwidth and technology.
This research reports on an international collaboration on the participatory development and first validated clinical use of a novel, real-time 360-degree 3D telemedicine system worldwide. NHS Greater Glasgow and Clyde have been working with Microsoft since 2019 to assess how health care could leverage Microsoft’s 3D telemedicine, focusing on plastic surgery patients and leveraging Microsoft’s Holoportation communication technology.
This research was designed to compare validated outcome measures of a patient-centered 3D telemedicine system with a 2D system, assess alignment with an in-person consultation, and to ensure safety, reliability and clinical concordance. In three separate studies, the 3D system improved patient metrics in comparison to 2D telemedicine, suggesting that remote consultations could get closer to the experience of face-to-face consultations.
Interactive code generation via test-driven user-intent formalization
Shuvendu Lahiri, Aaditya Naik, Georgios Sakkas, Piali Choudhury, Curtis von Veh, Madan Musuvathi, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao
Automatic code generation from natural language intent using large language models is disrupting coding. However, the correctness of the resulting code with respect to user intent expressed in natural language is difficult to establish because natural language lacks formal semantics. In this project, we investigate the problem of neural specification generation (i.e., generating partial formal specifications that match the intent expressed in natural language), and incorporating such specifications during the coding process to improve trust in human-written or AI-generated code.
We instantiate this framework starting with unit tests; tests serve as weak yet formal specifications of a module. We can leverage the abundance of human-written unit tests to train models. Further, these specifications (tests) can be checked using concrete execution without the need for more sophisticated abstract interpreters. In prior work on TOGA, we demonstrated a neural model for synthesizing test oracles for a method and illustrated its use in finding functional bugs in code. In this work on TiCoder, we describe an interactive workflow to formalizing the informal user-intent through such model-generated tests and improving the accuracy, correctness and understanding of generated code.
The post Research Focus: Week of November 7, 2022 appeared first on Microsoft Research.
Research Focus: Week of October 24, 2022
Meet the 2022 recipients of the Microsoft Research Global PhD Fellowship
Microsoft is thrilled to announce the 2022 Microsoft Research Global PhD Fellows from around the world. The program aims to empower the next generation of computing-related research talent. Microsoft recognizes the value of diversity in computing and aims to increase the pipeline of talent receiving advanced degrees in computing-related fields to build a stronger and inclusive computing-related research community. We currently offer PhD fellowships in Asia-Pacific, Canada and the United States, EMEA (Europe, Middle East, Africa), Latin America, Australia and New Zealand.
Making the most of text semantics to improve biomedical vision-language processing
Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel Coelho de Castro, Anton Schwaighofer, Stephanie Hyland, Maria Teodora Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez-Valle, Hoifung Poon, Ozan Oktay
Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care workflows and accelerating clinical research. With its complex semantics, biomedical text poses additional challenges in vision-language modelling, and previous work has used insufficiently adapted models that lack domain-specific language understanding. In this study, we show that principled textual semantic modelling can substantially improve contrastive learning in biomedical vision-language processing (VLP). We release a language model (CXR-BERT) that achieves state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective. Furthermore, we propose a self-supervised joint VLP approach (BioViL) with a focus on better text modelling. It establishes new state-of-the-art results on a wide range of publicly available benchmarks, in part by leveraging our novel domain-specific language model. As part of this study, a new dataset (MS-CXR) is released to facilitate the study of complex semantic modelling in biomedical VLP, which includes locally aligned phrase grounding annotations by radiologists. A broad evaluation, including on this new dataset, shows that our contrastive learning approach outperforms prior methods in segmentation tasks, despite only using a global-alignment objective.
How AI Happens podcast: A conversation with AI4Science Senior Director Bonnie Kruft
Microsoft’s AI4Science Senior Director Bonnie Kruft was interviewed for a recent podcast in the “How AI Happens” series. Tune in to learn about her journey from earning a Ph.D. focused on quantum chemistry to working in AI and machine learning. She explains how she first discovered her love of data science, and how her Ph.D. equipped her with the skills she needed to succeed. The conversation also covers the data science approach to problem-solving, deep learning emulators and the impact that machine learning could have on the natural sciences.
Recent researcher awards and accomplishments
Ronen Eldan wins prestigious New Horizons in Mathematics Prize
Ronen Eldan, of Microsoft Research and the Weizmann Institute of Science, was awarded the prestigious New Horizons in Mathematics Prize by The Breakthrough Prize Foundation. Eldan was recognized for creating the stochastic localization method, which has led to significant progress in several open problems in high-dimensional geometry and probability, including Jean Bourgain’s slicing problem and the KLS conjecture.
The Breakthrough Prize Foundation and its founding sponsors – Sergey Brin, Priscilla Chan and Mark Zuckerberg, Julia and Yuri Milner, and Anne Wojcicki – announced the 2023 award winners in September. The foundation highlights game-changing discoveries in fundamental physics, life sciences and mathematics, along with early-career scientists who have made significant contributions to their fields.
Gary J. Sullivan named Fellow of the Society of Motion Picture and Television Engineers
Microsoft’s Gary J. Sullivan was recognized as a 2022 Fellow of the Society of Motion Picture and Television Engineers (SMPTE). The membership grade of fellow is awarded to individuals who have, by proficiency and contributions, attained an outstanding rank among engineers or executives in the motion-picture, television or related industries, according to SMPTE.
Sullivan is a principal video and image technology standardization program manager at Microsoft Research in Redmond, Washington. At Microsoft, he has been the originator and lead designer of the DirectX Video Acceleration (DXVA) video decoding feature of Microsoft Windows and in the international standardization community, he has led team projects that have been recognized by three Emmy Awards. His standardization work includes chairing or co-chairing various projects related to media compression in the JPEG, MPEG, and VCEG standards groups, including the AVC (H.264), HEVC (H.265) and VVC (H.266) video compression codec design projects. He is currently the chair of ISO/IEC JTC 1/SC 29, which oversees the work of JPEG and MPEG, and the Rapporteur of video and image coding in ITU-T SG16.
The post Research Focus: Week of October 24, 2022 appeared first on Microsoft Research.
ECCV 2022 highlights: Advancing the foundations of mixed reality
By Microsoft Mixed Reality & AI Labs in Cambridge and Zurich
Computer vision is one of the most remarkable developments to emerge from the field of computer science. It’s among the most rapidly growing areas in the technology landscape and has the potential to significantly impact the way people live and work. Advances at the intersection of machine learning (ML) and computer vision have been accelerating in recent years, leading to significant progress in numerous fields, including healthcare, robotics, the automotive industry, and augmented reality (AR). Microsoft is proud to be a prominent contributor to computer vision research.
Microsoft researchers have long been collaborating with academics and experts in the field on numerous computer vision projects with the goal of expanding what’s possible and helping people achieve more. One example is PeopleLens, a head-worn device that helps children who are blind or have low vision more easily interact in social situations by identifying people around them through spatialized audio. Another example is Swin Transformer. This computer vision architecture attains high accuracy in object detection and provides an opportunity to unify computer vision and natural language processing (NLP) architectures—increasing the capacity and adaptability of computer vision models.
Microsoft Research is excited to share some of its newest work in this space at the European Conference on Computer Vision (ECCV) 2022, with 45 accepted papers that will be presented through live presentations, tutorials, and poster sessions. This post highlights two of these papers, which showcase the latest research from Microsoft and its collaborators. One involves increasing the number of facial landmarks for more accurate 3D face reconstruction, achieving state-of-the-art results while decreasing the required compute power. The other introduces a dataset that takes advantage of the capabilities of AR devices for visual localization and mapping driven by real-world AR scenarios.
3D face reconstruction with dense landmarks
Facial landmarks are points that correspond across all faces, and they often play a key role in face analysis. Researchers frequently rely on them when performing basic computer vision tasks, such as estimating head position and identifying gaze direction and more generally the position in space of all the details of the face. Facial landmarks include such areas as the tip of the nose, corners of the eyes, and points along the jawline. Typically, public datasets that practitioners use to train ML models contain annotations for 68 facial landmarks. However, numerous aspects of human faces are not precisely represented by 68 landmarks alone, and additional methods are often needed to supplement landmark detection, adding complexity to the training workflow and increasing the required compute power.
-
GitHub
Dense Landmarks
With the goal of achieving accurate 3D face reconstruction, we propose increasing the number of facial landmarks. In our paper “3D Face Reconstruction with Dense Landmarks,” we introduce a method to accurately predict 703 facial landmarks, more than 10 times as many as are commonly used, covering the entire face in great detail, including the eyes, ears, and teeth, as shown in Figure 1. We show that the increased number of landmarks are very precise when visible, and when they are occluded, for example, when someone lifts a coffee mug to their lips, we can estimate the location of these landmarks and what the part of the face looks like behind the object blocking it. We can use these landmarks to constrain a model-fitting problem to efficiently and precisely estimate all aspects of a face model, shown in the right-most column in Figure 2. This includes the head pose, eye gaze, as well as the identity of the person whose face is being reconstructed, for example, the thickness of the lips and the shape of the nose.
This simple pipeline is comprised only of dense landmarks and continuous mathematical optimization, allowing for extreme compute efficiency and enabling the entire system to run at over 150 frames per second on a single core of a laptop.
Increasing privacy, fairness, and efficiency with synthetic data
In computer vision, and particularly the area of face reconstruction, there are understandable concerns about anonymity when training ML models because training data often comes from real people. Our proposed method significantly reduces privacy concerns, as it uses only synthetic data to train ML models, compared with methods that use images of real people as part of their training datasets. That said, when we built the synthetic data pipeline, we needed to preserve the privacy of the people whose data we used, and we took care to acquire the consent of those several hundred subjects. This contrasts with the feasibility of acquiring consent from thousands (or even tens of thousands) of subjects, which would have been necessary if we were using real data.
It’s especially challenging, if not impossible, to preserve the privacy of people appearing in “found images” online, where the subject is often unknown. Using synthetic data helps us protect the privacy of data subjects and the rights of photographers and content creators. It’s another tool we can use in our mission to build technology in an ethical and responsible manner. Additionally, because people’s private information is not included in our dataset, if the ML model were to be attacked, only synthetic data would be subject to compromise.
Synthetic data also provides an opportunity to address inclusivity and fairness. Largely because the distribution of the data is fully controlled, ML practitioners can manage the fairness of representation by including diverse samples in their datasets, and all the data needed to do this would be perfectly labeled. For further details on how we build the synthetics model and training data and our approach to capturing the diversity of the human population, please see our face analysis paper.
There are other advantages to using synthetic data to train ML models, as well. For example, these models require a lot of data, giving rise to numerous difficulties that practitioners must navigate to obtain this data, such as the logistics of finding the number of people required, scheduling time in a lab, and situating multiple cameras to capture the various angles of a person’s face. These concerns are greatly reduced with synthetic data.
In addition, because data doesn’t need to be sourced from a real person, the iteration speed to improve the quality of the 3D face reconstruction is remarkably high, creating a robust workflow. And it isn’t necessary to apply quality assurance (QA) processes on each labeled image when using synthetic data—another cost- and time-saving benefit. Another advantage is the increase in accuracy, speed, and cost-effectiveness in labeling data. It would be nearly impossible to ask someone to consistently label 703 landmarks in a set of images.
Face analysis is a foundational piece for many ML systems, such as facial recognition and controlling avatars, and using a method that provides both accuracy and efficiency while also addressing privacy and fairness concerns pushes the boundaries of the state of the art. Up until now, there hasn’t been much work, if any, on methods that can yield this level of quality with only synthetic data. The ability to achieve 3D face reconstruction using dense landmarks and synthetic data has the potential to truly transform what’s possible with ML.
Acknowledgments
This research was conducted by Erroll Wood, Tadas Baltrušaitis, Charlie Hewitt, Matthew Johnson, Jingjing Shen, Nikola Milosavljević, Daniel Wilde, Stephan Garbin, Chirag Raman, Jamie Shotton, Toby Sharp, Ivan Stojiljković, Tom Cashman, and Julien Valentin.
LaMAR: Benchmarking localization and mapping for augmented reality
To unlock the full potential of augmented reality (AR), anyone using a mixed reality headset should be able to place virtual content in the physical world, share it with others, and expect it to remain in place over time. However, before they can augment digital content in the real world in the form of holograms, AR devices need to build a digital map of the physical 3D world. These devices then position, or re-localize, themselves with respect to this map, as illustrated in Figure 4, which allows them to retrieve previously placed holograms and show them to the user at a designated location. The computer vision foundations enabling these capabilities are called mapping and visual localization.
In general, research in visual localization focuses on single images, usually carefully selected views of famous attractions, shown on the left in Figure 5. However, this doesn’t reflect real AR scenarios—the combination of AR devices and applications—and the opportunity they provide. AR devices can locally map the environment and provide spatially registered sequences rather than single images, as shown in the image on the right in Figure 5. These sequences can also include additional data, like inertial or radio signals from sensors, which are typically available on modern AR devices, such as Microsoft HoloLens 2. Yet it’s challenging to use such sequences for localization because they are typically just collected during normal device usage and not generally aimed at facilitating localization.
To close this gap, we introduce a new benchmark, the first to focus on this more realistic setting for AR, with the understanding that visual re-localization is a key element for compelling, shared, and persistent AR experiences. Given the spatial scale of the environment for typical AR scenarios, such as navigating an airport or inspecting a factory, we had to design a pipeline that could automatically compute the ground-truth camera positions of real AR sequences captured by a variety of readily available AR devices, such as the HoloLens or iPhone. By evaluating state-of-the-art methods on our benchmark, we offer novel insights on current research and provide avenues for future work in the field of visual localization and mapping for AR.
This research is a result of a two-year collaboration between the Microsoft Mixed Reality & AI Lab in Zurich and ETH Zurich (Swiss Federal Institute of Technology) and will be published at ECCV 2022 in the paper, “LaMAR: Benchmarking Localization and Mapping for Augmented Reality.” We will also be giving a tutorial called Localization and Mapping for AR at ECCV.
Developing a large-scale AR dataset
To enable the research community to address the specifics of mapping and visual localization in the context of AR, we collected multi-sensor data streams from modern AR devices. These sensor streams come with camera poses (the camera’s position and orientation) from the on-device tracker at each instant. These data streams also contain images, depth measurements, samples from inertial measurement units (IMUs), and radio signals. Exploiting these can lead to more efficient algorithms. For example, radio signals such as Wi-Fi or Bluetooth can simplify image retrieval. Similarly, sequence localization can exploit the temporal aspect of sensor streams to provide a more spatial context, which can lead to more accurate estimates of camera poses. This typifies the realistic use case of a user launching an AR application and streaming sensorial data to localize the camera with respect to a previously built map, and it reflects how AR applications built on mixed reality cloud services, like Azure Spatial Anchors, work.
The initial release of the LaMAR dataset contains more than 100 hours of recordings covering 45,000 square meters (484,000 square feet) captured over the course of two years using the head-mounted HoloLens 2 and handheld iPhone/iPad devices. The data was captured at various indoor and outdoor locations (a historical building, a multi-story office building, and part of a city center) and represents typical AR scenarios. It includes changes in illumination and the movement of objects—either slowly, such as the placement of a book on a desk, or more quickly, like anonymized people walking down a sidewalk.
Automatically aligning AR sequences to establish ground truth
To estimate the ground-truth camera poses, we aligned the captured data with reference 3D models of the locations, as shown in Figure 8. These reference models were captured using NavVis M6 and VLX mapping systems, both equipped with laser scanners (lidars) that generate dense, textured, and highly accurate 3D models of the locations. To align the data, we developed a robust pipeline that does not require manual labeling or setting custom infrastructure, such as fiducial markers, and this enabled us to robustly handle crowd-sourced data from a variety of AR devices captured over extended periods.
The actual alignment process is fully automatic and utilizes the on-device real-time tracker of AR devices, which provides camera poses in their local coordinate system. We aligned each captured sequence individually with the dense ground truth reference model, as illustrated in Figure 9. Once completed, all camera poses were refined jointly by optimizing the visual constraints within and across sequences.
Evaluating localization and mapping in the context of AR
We evaluated current state-of-the-art approaches in the single-frame setting as localizing i) single images obtained from phones and ii) single images and full camera rigs from HoloLens 2. Then we adapted these state-of-the-art approaches to take advantage of radio signals. Finally, we designed baselines, building on these methods and utilizing the on-device real-time tracker in a multi-frame localization setting corresponding to a real-world AR application. The results show that performance of state-of-the-art methods can be significantly improved by including these additional data streams generally available in modern AR devices, as shown in Figure 10.
For a compelling user experience, AR applications should strive to retrieve and visualize content as quickly as possible after starting a session. To quantify this, we introduce a new metric called time-to-recall, which measures the sequence duration needed for successful localization. This encourages researchers to develop algorithms to accurately localize the camera as quickly as possible, as shown in Figure 11.
Using the LaMAR benchmark
LaMAR is the first benchmark that focuses on a realistic setup for visual localization and mapping using AR devices. The evaluation results show enormous potential for leveraging posed sequences instead of single frames and for leveraging other sensor modalities, like radio signals, to localize the camera and map the environment.
Researchers can access the LaMAR benchmark, evaluation server, implementations of the ground-truth pipeline, as well as baselines with additional sensory data at the LaMAR Benchmark page. We hope this work inspires future research in developing localization and mapping algorithms tailored to real AR scenarios.
Acknowledgments
This research was conducted by Paul-Edouard Sarlin, Mihai Dusmanu, Johannes L. Schönberger, Pablo Speciale, Lukas Gruber, Viktor Larsson, Ondrej Miksik, and Marc Pollefeys.
The post ECCV 2022 highlights: Advancing the foundations of mixed reality appeared first on Microsoft Research.