Heart rate variability (HRV) is a practical and noninvasive measure of autonomic nervous system activity, which plays an essential role in cardiovascular health. However, using HRV to assess physiology status is challenging. Even in clinical settings, HRV is sensitive to acute stressors such as physical activity, mental stress, hydration, alcohol, and sleep. Wearable devices provide convenient HRV measurements, but the irregularity of measurements and uncaptured stressors can bias conventional analytical methods. To better interpret HRV measurements for downstream healthcare applications, we…Apple Machine Learning Research
Providing Insights for Open-Response Surveys via End-to-End Context-Aware Clustering
Teachers often conduct surveys in order to collect data from a predefined group of students to gain insights into topics of interest. When analyzing surveys with open-ended textual responses, it is extremely time-consuming, labor-intensive, and difficult to manually process all the responses into an insightful and comprehensive report. In the analysis step, traditionally, the teacher has to read each of the responses and decide on how to group them in order to extract insightful information. Even though it is possible to group the responses only using certain keywords, such an approach would…Apple Machine Learning Research
Microsoft Research Summit 2022: What’s Next for Technology and Humanity?
Today, we are experiencing waves of breakthroughs in computing that are transforming just about every aspect of our lives. Artificial intelligence is changing the way we develop and create. Human language technologies are revolutionizing the workflows of healthcare professionals. Deep learning is accelerating our ability to understand and predict natural phenomena, from atomic to galactic scales. Meanwhile, the foundations of cloud computing are undergoing a reinvention from the atoms up.
Realizing the benefits of these new breakthroughs demands that we come together in new ways across the global research community. The vibrancy of invention and innovation increasingly lies at the intersections among traditional research disciplines, from the highly theoretical and to the immediately applicable. Ensuring that the continuing advancement of technology is beneficial to all requires communication, collaboration and co-innovation across the communities that create new technologies and those that aim to use them to improve their lives.
That’s why I’m excited to invite you to join us for this year’s Microsoft Research Summit, which will take place on October 18-20, 2022. This virtual event is where the global research community convenes to explore how emerging research might best address societal challenges and have significant impact on our lives in the coming years. This year’s event will feature over 120 speakers, including researchers and leaders from across the research community at Microsoft, alongside partners and collaborators from industry, academia and government who are advancing the frontiers of research in computing and across the sciences.
Each of our three days will begin with a plenary session during which we’ll explore the potential impact of deep learning on scientific discovery, the opportunity to use technology to make healthcare more precise and accessible, and the re-invention of foundational technologies to enable the cloud of the future. These plenaries will lead into tracks that dive deeper into research that spans from more efficient and adaptable AI, to technologies that amplify human creativity and help foster a more sustainable society.
For further details – and to register to attend – check out the Microsoft Research Summit website.
We hope you will join us.
The post Microsoft Research Summit 2022: What’s Next for Technology and Humanity? appeared first on Microsoft Research.
CCF: Bringing efficiency and usability to a decentralized trust model
Online trust has come a long way since the time of centralized databases, where information was concentrated in one location and the security and validation of that information relied on a core set of people and systems. While convenient, this model of centralized management and oversight had a number of drawbacks. Trust depended on how the workflows of those systems were established and the skillset and integrity of the people involved. It created opportunities for such issues as duplicate digital transactions, human error, and bias, as witnessed in recent history in the financial industry. In response to these systemic issues, a now-famous paper published in late 2008 proposed a distributed ledger, where new transactions could be added and validated only through participant consensus. This model of decentralized trust and execution would become known as distributed ledger technology, or blockchain, and it offered a more trustworthy alternative to centrally managed databases and a new way to store and decentralize data.
In a distributed trust model, network participants validate transactions over a network by performing computation on those transactions themselves and comparing the outputs. While their identities are private and those performing the transactions typically have pseudonyms, the transactions themselves are public, greatly limiting the use cases for decentralized computation systems. One use case where decentralized computation doesn’t work involves handling financial transactions so that they’re compliant with Know Your Client (KYC) standards and anti-money laundering (AML) regulations while also respecting privacy laws. Another involves managing medical records, where multiple organizations, such as healthcare providers and insurers, jointly govern the system.
Distributed trust with centralized confidential computation
While blockchain provided a more reliable option to centralized databases, it isn’t a perfect solution. The Confidential Computing team at Microsoft Research wanted to build a system that retained the advantages of decentralized trust while keeping transactions confidential. This meant we had to develop a way to centralize computation. At the time, no system offered these capabilities.
To tackle this issue, we developed Confidential Consortium Framework (CCF), a framework for building highly available stateful services that require centralized computation while providing decentralized trust. CCF is based on a distributed trust model like that of blockchain while maintaining data confidentiality through secure centralized computation. This centralized confidential computation model also provides another benefit—it addresses the substantial amount of energy used in blockchain and other distributed computation environments.
As widely reported in the media, blockchain comes at a great environmental cost. Cryptocurrency—the most widespread implementation of blockchain—requires a significant amount of computing power to verify transactions. According to the Cambridge Center for Alternative Finance (CCAF), bitcoin, the most common cryptocurrency, as of this writing, currently consumes slightly over 92 terawatt hours per year—0.41 percent of global electricity production, more than the annual energy draw of countries like Belgium or the Philippines.
Our goal was to develop a framework that reduced the amount of computing power it takes to run a distributed system and make it much more efficient, requiring no more energy than the cost of running the actual computation.
To apply the technology in a way that people can use, we worked with the Azure Security team to build Azure confidential ledger, an Azure service developed on CCF that manages sensitive data records in a highly secure way. In this post, we discuss the motivations behind CCF, the problems we set out to solve, and the approaches we took to solve them. We also explain our approach in supporting the development of Azure confidential ledger using CCF.
Overcoming a bias for blockchain
We discovered a strong bias for blockchain as we explained our research to different groups that were interested in this technology, including other teams at Microsoft, academic researchers exploring blockchain consensus, and external partners looking for enterprise-ready blockchain solutions. This bias was in the form of certain assumptions about what was needed to build a distributed ledger: that all transactions had to be public, that computation had to be geographically distributed, and that it had to be resilient to Byzantine faults from executors. First recognizing these biases and then countering them were some of the biggest challenges we had to surmount.
We worked to show how CCF broke from each of these assumptions while still providing an immutable ledger with distributed trust. We also had to prove that there were important use cases for maintaining confidentiality in a distributed trust system. We went through multiple rounds of discussion, explaining how the technology we wanted to build was different from traditional blockchains, why it was a worthwhile investment, and what the benefits were. Through these conversations, we discovered that many of our colleagues were just as frustrated as we were by the very issues in blockchain we were setting out to solve.
Additionally, we encountered skepticism from internal partner teams, who needed more than a research paper to be convinced that we could successfully accomplish our research goals and support our project. There were healthy doubts about the performance that was possible when executing inside an encrypted and isolated memory space, the ability to build a functional and useable system with minimal components that needed to be trusted, and how much of the internal complexity it was possible to hide from operators and users. Early versions of CCF and sample apps were focused on proving we could overcome those risks. We built basic proofs of concept and gave numerous demonstrations showing how we could implement distributed trust with centralized confidential computation. In the end, it was the strength of these demos that helped us get the resources we needed to pursue our research.
Building the compute stack
Another challenge involved was reimagining a secure compute stack for an enclave—the secured portion of the hardware’s processor and memory. At the time, enclaves were very resource constrained compared with traditional hardware, and we could run only small amounts of code on very little memory.
In addition, capabilities are limited when performing computation in an enclave. For example, the code can’t access anything outside the enclave, and it’s difficult to get the code to communicate with an external system. This challenge required us to design and build an entire compute stack from scratch with all the elements needed to establish consensus, implement transactional storage, establish runtimes for user languages, and so on.
Another consideration was the need to build a system that people could use. As researchers, we wanted our work to have real impact, but it was tempting to push the state of the art in the area of confidential computing research and develop very elaborate technology in these enclaves. However, these types of innovations cannot be deployed in actual products because they’re exceedingly difficult to explain and apply. We had committed to creating something that product teams could implement and use as a foundation for building real systems and products, so we worked to calibrate the guarantees and threat model so that our system could be used in actual products.
Establishing a root of trust with CCF
CCF strengthens the trust boundary in scenarios in which both distributed trust and data confidentiality are needed by decreasing the size of the trusted computing base (TCB)—the components of a computing environment that must be trusted for the appropriate level of security to be applied—reducing the attack surface. Specifically, CCF allows operators to greatly decrease or even eliminate their presence in the TCB, depending on the governance configuration.
Instead of a social root of trust—such as a cloud service provider or the participant consensus used in blockchain networks—CCF relies on trusted hardware to enforce transaction integrity and confidentiality, which creates a trusted execution environment (TEE). These TEEs are isolated memory spaces that are kept encrypted at all times, even when data is executing. The memory chip itself strictly enforces this memory encryption. Data in TEEs is never readable.
Decentralized trust is underpinned by remote attestation, providing the guarantee to a remote entity that all computation of user data takes place in a publicly verifiable TEE. The combination of this attestation with the isolated and encrypted TEE creates a distributed trust environment. Nodes in the network establish mutual trust by verifying their respective attestations, which affirm that they’re running the expected code in a TEE. The operator starting the nodes, which can be automated or manual, indicates where in the network they can find each other.
Service governance is performed by a flexible consortium, which is separate from the operator. CCF uses a ledger to provide offline trust. All transactions are reflected in a tamper-protected ledger that users can review to audit service governance and obtain universally verifiable transaction receipts, which can verify the consistency of the service and prove the execution of transactions to other users. This is particularly valuable for users who need to comply with specific laws and regulations.
Laying the foundation for Azure confidential ledger
We collaborated with the Azure Security team to refine and improve CCF so that it could be used as a foundation for building new Azure services for confidential computing. We applied Azure API standards and ensured that CCF complied with Azure best practices, including enabling it to log operations and perform error reporting and long-running queries. We then developed a prototype of an Azure application, and from this, the Azure Security team developed Azure confidential ledger, the first generally available managed service built on CCF, which provides tamper-protected audit logging that can be cryptographically verified.
Looking forward
We were pleasantly surprised by how quickly we discovered new use cases for CCF and Azure confidential ledger, both within Microsoft and with third-party users. Now, most of the use cases are those we had not initially foreseen, from atmospheric carbon removal to securing machine learning logs. We’re extremely excited by the potential for CCF to have much more impact than we had originally planned or expected when we first started on this journey, and we’re looking forward to discovering some of the countless ways in which it can be applied.
The post CCF: Bringing efficiency and usability to a decentralized trust model appeared first on Microsoft Research.
Interspeech 2022: The growth of interdisciplinary research
Cyclic training of speech synthesis and speech recognition models and language understanding for better speech prosody are just a few examples of cross-pollination in speech-related fields.Read More
Reinventing the Wheel: Gatik’s Apeksha Kumavat Accelerates Autonomous Delivery for Wal-Mart and More
As consumers expect faster, cheaper deliveries, companies are turning to AI to rethink how they move goods.
Foremost among these new systems are “hub-and-spoke,” or middle-mile, operations, where companies place distribution centers closer to retail operations for quicker access to inventory. However, faster delivery is just part of the equation. These systems must also be low-cost for consumers.
Autonomous delivery company Gatik seeks to provide lasting solutions for faster and cheaper shipping. By automating the routes between the hub — the distribution center — and the spokes — retail stores — these operations can run around the clock efficiently and with minimal investment.
Gatik co-founder and Chief Engineer Apeksha Kumavat joined NVIDIA’s Katie Burke Washabaugh on the latest episode of the AI Podcast to walk through how the company is developing autonomous trucks for middle-mile delivery.
Kumavat also discussed the progress of commercial pilots with companies such as Walmart and Georgia-Pacific.
She’ll elaborate on Gatik’s autonomous vehicle development in a virtual session at NVIDIA GTC on Tuesday, Sept. 20. Register free to learn more.
You Might Also Like
Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive
Teaching the AI brains of autonomous vehicles to understand the world as humans do requires billions of miles of driving experience. The road to achieving this astronomical level of driving leads to the virtual world. Learn how Waabi uses powerful high-fidelity simulations to train and develop production-level autonomous vehicles.
Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans
Driving enjoyment and autonomous driving capabilities can complement one another in intelligent, sustainable vehicles. Learn about the automaker’s plans to unveil its third vehicle, the Polestar 3, the tech inside it, and what the company’s racing heritage brings to the intersection of smarts and sustainability.
GANTheftAuto: Harrison Kinsley on AI-Generated Gaming Environments
Humans playing games against machines is nothing new, but now computers can develop their own games for people to play. Programming enthusiast and social media influencer Harrison Kinsley created GANTheftAuto, an AI-based neural network that generates a playable chunk of the classic video game, Grand Theft Auto V.
Subscribe to the AI Podcast: Now Available on Amazon Music
The AI Podcast is now available through Amazon Music.
In addition, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.
Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.
The post Reinventing the Wheel: Gatik’s Apeksha Kumavat Accelerates Autonomous Delivery for Wal-Mart and More appeared first on NVIDIA Blog.
How our principles helped define AlphaFold’s release
Our Operating Principles have come to define both our commitment to prioritising widespread benefit, as well as the areas of research and applications we refuse to pursue. These principles have been at the heart of our decision making since DeepMind was founded, and continue to be refined as the AI landscape changes and grows. They are designed for our role as a research-driven science company and consistent with Google’s AI principles.Read More
How our principles helped define AlphaFold’s release
Our Operating Principles have come to define both our commitment to prioritising widespread benefit, as well as the areas of research and applications we refuse to pursue. These principles have been at the heart of our decision making since DeepMind was founded, and continue to be refined as the AI landscape changes and grows. They are designed for our role as a research-driven science company and consistent with Google’s AI principles.Read More
LOLNeRF: Learn from One Look
An important aspect of human vision is our ability to comprehend 3D shape from the 2D images we observe. Achieving this kind of understanding with computer vision systems has been a fundamental challenge in the field. Many successful approaches rely on multi-view data, where two or more images of the same scene are available from different perspectives, which makes it much easier to infer the 3D shape of objects in the images.
There are, however, many situations where it would be useful to know 3D structure from a single image, but this problem is generally difficult or impossible to solve. For example, it isn’t necessarily possible to tell the difference between an image of an actual beach and an image of a flat poster of the same beach. However it is possible to estimate 3D structure based on what kind of 3D objects occur commonly and what similar structures look like from different perspectives.
In “LOLNeRF: Learn from One Look”, presented at CVPR 2022, we propose a framework that learns to model 3D structure and appearance from collections of single-view images. LOLNeRF learns the typical 3D structure of a class of objects, such as cars, human faces or cats, but only from single views of any one object, never the same object twice. We build our approach by combining Generative Latent Optimization (GLO) and neural radiance fields (NeRF) to achieve state-of-the-art results for novel view synthesis and competitive results for depth estimation.
Combining GLO and NeRF
GLO is a general method that learns to reconstruct a dataset (such as a set of 2D images) by co-learning a neural network (decoder) and table of codes (latents) that is also an input to the decoder. Each of these latent codes re-creates a single element (such as an image) from the dataset. Because the latent codes have fewer dimensions than the data elements themselves, the network is forced to generalize, learning common structure in the data (such as the general shape of dog snouts).
NeRF is a technique that is very good at reconstructing a static 3D object from 2D images. It represents an object with a neural network that outputs color and density for each point in 3D space. Color and density values are accumulated along rays, one ray for each pixel in a 2D image. These are then combined using standard computer graphics volume rendering to compute a final pixel color. Importantly, all these operations are differentiable, allowing for end-to-end supervision. By enforcing that each rendered pixel (of the 3D representation) matches the color of ground truth (2D) pixels, the neural network creates a 3D representation that can be rendered from any viewpoint.
We combine NeRF with GLO by assigning each object a latent code and concatenating it with standard NeRF inputs, giving it the ability to reconstruct multiple objects. Following GLO, we co-optimize these latent codes along with network weights during training to reconstruct the input images. Unlike standard NeRF, which requires multiple views of the same object, we supervise our method with only single views of any one object (but multiple examples of that type of object). Because NeRF is inherently 3D, we can then render the object from arbitrary viewpoints. Combining NeRF with GLO gives it the ability to learn common 3D structure across instances from only single views while still retaining the ability to recreate specific instances of the dataset.
Camera Estimation
In order for NeRF to work, it needs to know the exact camera location, relative to the object, for each image. Unless this was measured when the image was taken, it is generally unknown. Instead, we use the MediaPipe Face Mesh to extract five landmark locations from the images. Each of these 2D predictions correspond to a semantically consistent point on the object (e.g., the tip of the nose or corners of the eyes). We can then derive a set of canonical 3D locations for the semantic points, along with estimates of the camera poses for each image, such that the projection of the canonical points into the images is as consistent as possible with the 2D landmarks.
Example MediaPipe landmarks and segmentation masks (images from CelebA). |
Hard Surface and Mask Losses
Standard NeRF is effective for accurately reproducing the images, but in our single-view case, it tends to produce images that look blurry when viewed off-axis. To address this, we introduce a novel hard surface loss, which encourages the density to adopt sharp transitions from exterior to interior regions, reducing blurring. This essentially tells the network to create “solid” surfaces, and not semi-transparent ones like clouds.
We also obtained better results by splitting the network into separate foreground and background networks. We supervised this separation with a mask from the MediaPipe Selfie Segmenter and a loss to encourage network specialization. This allows the foreground network to specialize only on the object of interest, and not get “distracted” by the background, increasing its quality.
Results
We surprisingly found that fitting only five key points gave accurate enough camera estimates to train a model for cats, dogs, or human faces. This means that given only a single view of your beloved cats Schnitzel, Widget and friends, you can create a new image from any other angle.
Top: example cat images from AFHQ. Bottom: A synthesis of novel 3D views created by LOLNeRF. |
Conclusion
We’ve developed a technique that is effective at discovering 3D structure from single 2D images. We see great potential in LOLNeRF for a variety of applications and are currently investigating potential use-cases.
Interpolation of feline identities from linear interpolation of learned latent codes for different examples in AFHQ. |
Code Release
We acknowledge the potential for misuse and importance of acting responsibly. To that end, we will only release the code for reproducibility purposes, but will not release any trained generative models.
Acknowledgements
We would like to thank Andrea Tagliasacchi, Kwang Moo Yi, Viral Carpenter, David Fleet, Danica Matthews, Florian Schroff, Hartwig Adam and Dmitry Lagun for continuous help in building this technology.
Get up to Speed: Five Reasons Not to Miss NVIDIA CEO Jensen Huang’s GTC Keynote Sept. 20
Think fast. Enterprise AI, new gaming technology, the metaverse and the 3D internet, and advanced AI technologies tailored to just about every industry are all coming your way.
NVIDIA founder and CEO Jensen Huang’s keynote at NVIDIA GTC on Tuesday, Sept. 20, is the best way to get ahead of all these trends.
NVIDIA’s virtual technology conference, which takes place Sept. 19-22, sits at the intersections of business and technology, science and the arts in a way no other event can.
This GTC will focus on neural graphics — which bring together AI and visual computing to create stunning new possibilities — the metaverse, an update on large language models, and the changes coming to every industry with the latest generation of recommender systems.
The free online gathering features speakers from every corner of industry, academia and research.
Speakers include Johnson & Johnson CTO Rowena Yao; Boeing Vice President Linda Hapgood; Polestar COO Dennis Nobelius; Deutsche Bank CTO Bernd Leukert; UN Assistant Secretary-General Ahunna Eziakonwa; UC San Diego distinguished professor Henrik Christensen, and hundreds more.
For those who want to get hands on, GTC features developer sessions for newbies and veteran developers.
Two-hour training labs are included for those who sign up for a free conference pass. Those who want to dig deeper can sign up for one of 21 full-day virtual hands-on workshops at a special price of $149, and for group purchases of more than five seats, we are offering a special of $99 per seat.
Finally, GTC offers networking opportunities that bring together people working on the most challenging problems of our time from all over the planet.
Register free and start loading up your calendar with content today.
The post Get up to Speed: Five Reasons Not to Miss NVIDIA CEO Jensen Huang’s GTC Keynote Sept. 20 appeared first on NVIDIA Blog.