We are delighted to announce the results of the first phase of our joint research partnership with Moorfields Eye Hospital, which could potentially transform the management of sight-threatening eye disease.The results, published online inNature Medicine(open access full text, see end of blog), show that our AI system can quickly interpret eye scans from routine clinical practice with unprecedented accuracy. It can correctly recommend how patients should be referred for treatment for over 50 sight-threatening eye diseases as accurately as world-leading expert doctors.These are early results, but they show that our system could handle the wide variety of patients found in routine clinical practice. In the long term, we hope this will help doctors quickly prioritise patients who need urgent treatment which could ultimately save sight.A more streamlined processCurrently, eyecare professionals use optical coherence tomography (OCT) scans to help diagnose eye conditions. These 3D images provide a detailed map of the back of the eye, but they are hard to read and need expert analysis to interpret.The time it takes to analyse these scans, combined with the sheer number of scans that healthcare professionals have to go through (over 1,000 a day at Moorfields alone), can lead to lengthy delays between scan and treatment even when someone needs urgent care.Read More
Objects that Sound
Visual and audio events tend to occur together: a musician plucking guitar strings and the resulting melody; a wine glass shattering and the accompanying crash; the roar of a motorcycle as it accelerates. These visual and audio stimuli are concurrent because they share a common cause. Understanding the relationship between visual events and their associated sounds is a fundamental way that we make sense of the world around us.In Look, Listen, and Learn and Objects that Sound (to appear at ECCV 2018), we explore this observation by asking: what can be learnt by looking at and listening to a large number of unlabelled videos? By constructing an audio-visual correspondence learning task that enables visual and audio networks to be jointly trained from scratch, we demonstrate that:the networks are able to learn useful semantic concepts;the two modalities can be used to search one another (e.g. to answer the question, Which sound fits well with this image?); andthe object making the sound can be localised.Limitations of previous cross-modal learning approachesLearning from multiple modalities is not new; historically, researchers have largely focused on image-text or audio-vision pairings.Read More
Measuring abstract reasoning in neural networks
Neural network-based models continue to achieve impressive results on longstanding machine learning problems, but establishing their capacity to reason about abstract concepts has proven difficult. Building on previous efforts to solve this important feature of general-purpose learning systems, our latest paper sets out an approach for measuring abstract reasoning in learning machines, and reveals some important insights about the nature of generalisation itself.Read More
DeepMind papers at ICML 2018
The 2018 International Conference on Machine Learning will take place in Stockholm, Sweden from 10-15 July.For those attending and planning the week ahead, we are sharing a schedule of DeepMind presentations at ICML (you can download a pdf version here). We look forward to the many engaging discussions, ideas, and collaborations that are sure to arise from the conference!Efficient Neural Audio SynthesisAuthors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Nouri, Norman Casagrande, Edward Lockhart, Sander Dieleman, Aaron van den Oord, Koray KavukcuogluSequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating desired samples. Efficient sampling for this class of models at the cost of little to no loss in quality has however remained an elusive task. With a focus on text-to-speech synthesis, we show that compact recurrent architectures, a remarkably high degree of weight sparsification and a novel reordering of the variables greatly reduce sampling latency while maintaining high audio fidelity. We first describe a compact single-layer recurrent neural network, the WaveRNN, with a novel dual softmax layer that matches the quality of the state-of-the-art WaveNet model.Read More
DeepMind Health Response to Independent Reviewers’ Report 2018
When we set up DeepMind Health we believed that pioneering technology should be matched with pioneering oversight. Thats why when we launched in February 2016, we did so with an unusual and additional mechanism: a panel of Independent Reviewers, who meet regularly throughout the year to scrutinise our work. This is an innovative approach within tech companies – one that forces us to question not only what we are doing, but how and why we are doing it – and we believe that their robust challenges make us better. In their report last year, the Independent Reviewers asked us important questions about our engagement with stakeholders, data governance, and the behavioural elements that need to be considered when deploying new technologies in clinical environments. Weve done a lot over the past twelve months to address these questions, and were really proud that this years Annual Report recognises the progress weve made.Of course, this years report includes a series of new recommendations for areas where we can continue to improve, which well be working on in the coming months. In particular:Were developing our longer-term business model and roadmap, and look forward to sharing our ideas once theyre further ahead. Rather than charging for the early stages of our work, our first priority has been to prove that our technologies can help improve patient care and reduce costs.Read More
Neural scene representation and rendering
There is more than meets the eye when it comes to how we understand a visual scene: our brains draw on prior knowledge to reason and to make inferences that go far beyond the patterns of light that hit our retinas. For example, when entering a room for the first time, you instantly recognise the items it contains and where they are positioned. If you see three legs of a table, you will infer that there is probably a fourth leg with the same shape and colour hidden from view. Even if you cant see everything in the room, youll likely be able to sketch its layout, or imagine what it looks like from another perspective.These visual and cognitive tasks are seemingly effortless to humans, but they represent a significant challenge to our artificial systems. Today, state-of-the-art visual recognition systems are trained using large datasets of annotated images produced by humans. Acquiring this data is a costly and time-consuming process, requiring individuals to label every aspect of every object in each scene in the dataset. As a result, often only a small subset of a scenes overall contents is captured, which limits the artificial vision systems trained on that data.Read More
Royal Free London publishes findings of legal audit in use of Streams
Last July, the Information Commissioner concluded an investigation into the use of the Streams app at the Royal Free London NHS Foundation Trust. As part of the investigation the Royal Free signed up to a set of undertakings one of which was to commission a third party to audit the Royal Frees current data processing arrangements with DeepMind, to ensure that they fully complied with data protection law and respected the privacy and confidentiality rights of its patients.You can read the full report on the Royal Frees website here, and the Information Commissioners Offices response here. The report also has three recommendations that relate to DeepMind Health:It recommends a minor amendment to our information processing agreement to contain an express obligation on us to inform the Royal Free if, in our opinion, the Royal Frees instructions infringe data protection laws. Were working with the Royal Free to make this change to the agreement.It recommends that we continue to review and audit the activity of staff who have been approved access to these systems remotely.It recommends that the Royal Free terminate the historical memorandum of understanding (MOU) with DeepMind. This was originally signed in January 2016 to detail the services that we then planned to develop with the Trust.Read More
Prefrontal cortex as a meta-reinforcement learning system
Recently, AI systems have mastered a range of video-games such as Atari classics Breakout and Pong. But as impressive as this performance is, AI still relies on the equivalent of thousands of hours of gameplay to reach and surpass the performance of human video game players. In contrast, we can usually grasp the basics of a video game we have never played before in a matter of minutes.The question of why the brain is able to do so much more with so much less has given rise to the theory of meta-learning, or learning to learn. It is thought that we learn on two timescales in the short term we focus on learning about specific examples while over longer timescales we learn the abstract skills or rules required to complete a task. It is this combination that is thought to help us learn efficiently and apply that knowledge rapidly and flexibly on new tasks. Recreating this meta-learning structure in AI systems called meta-reinforcement learning has proven very fruitful in facilitating fast, one-shot, learning in our agents (see our paper and closely related work from OpenAI). However, the specific mechanisms that allow this process to take place in the brain are still largely unexplained in neuroscience.Read More
Navigating with grid-like representations in artificial agents
Most animals, including humans, are able to flexibly navigate the world they live in exploring new areas, returning quickly to remembered places, and taking shortcuts. Indeed, these abilities feel so easy and natural that it is not immediately obvious how complex the underlying processes really are. In contrast, spatial navigation remains a substantial challenge for artificial agents whose abilities are far outstripped by those of mammals.In 2005, a potentially crucial part of the neural circuitry underlying spatial behaviour was revealed by an astonishing discovery: neurons that fire in a strikingly regular hexagonal pattern as animals explore their environment. This lattice of points is believed to facilitate spatial navigation, similarly to the gridlines on a map. In addition to equipping animals with an internal coordinate system, these neurons – known as grid cells – have recently been hypothesised to support vector-based navigation. That is: enabling the brain to calculate the distance and direction to a desired destination, as the crow flies, allowing animals to make direct journeys between different places even if that exact route had not been followed before.The group that first discovered grid cells was jointly awarded the 2014 Nobel Prize in Physiology or Medicine for shedding light on how cognitive representations of space might work.Read More
DeepMind, meet Android
Were delighted to announce a new collaboration between DeepMind for Google and Android, the worlds most popular mobile operating system. Together, weve created two new features that will be available to people with devices running Android P later this year:Read More