Scaling Memory-Augmented Neural Networks with Sparse Reads and WritesAuthors:J Rae, JJ Hunt, T Harley, I Danihelka, A Senior, G Wayne, A Graves, T LillicrapWe can recall vast numbers of memories, making connections between superficially unrelated events. As you read a novel, youll likely remember quite precisely the last few things youve read, but also plot summaries, connections and character traits from far back in the novel.Many machine learning models of memory, such as Long Short Term Memory, struggle at these sort of tasks. The computational cost of these models scales quadratically with the number of memories they can store so they are quite limited in how many memories they can have. More recently, memory augmented neural networks such as the Differentiable Neural Computer or Memory Networks, have shown promising results by adding memory separate from the computation and solving tasks such as reading short stories and answering questions [e.g. Babi].However, while these new architectures show promising results on small tasks, they use “soft-attention for accessing their memories, meaning that at every timestep they touch every word in memory. So while they can scale to short stories, theyre a long way from reading novels.In this work, we develop a set of techniques to use sparse approximations of such models to dramatically improve their scalability.Read More