Moderation in online communities increases the quality of contributions and decreases antisocial behavior, and the tools that online platforms provide are one way that…Read More
Ten stories from the first half of 2022 that captivated readers
From Josh Miele’s passion for making the world more accessible to improving forecasting by learning quantile functions, these stories resonated with our audience.Read More
Human-centred mechanism design with Democratic AI
In our recent paper, published in Nature Human Behaviour, we provide a proof-of-concept demonstration that deep reinforcement learning (RL) can be used to find economic policies that people will vote for by majority in a simple game. The paper thus addresses a key challenge in AI research – how to train AI systems that align with human values.Read More
Human-centred mechanism design with Democratic AI
In our recent paper, published in Nature Human Behaviour, we provide a proof-of-concept demonstration that deep reinforcement learning (RL) can be used to find economic policies that people will vote for by majority in a simple game. The paper thus addresses a key challenge in AI research – how to train AI systems that align with human values.Read More
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Controllable generative sequence models with the capability to extract and replicate the style of specific examples enable many applications, including narrating audiobooks in different voices, auto-completing and auto-correcting written handwriting, and generating missing training samples for downstream recognition tasks. However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during…Apple Machine Learning Research
Dynamic Memory for Interpretable Sequential Optimisation
Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stationarity that is suitable for deployment at scale, to provide business operators with automated adaptive optimisation. Our solution aims to provide interpretable learning that can be trusted by humans…Apple Machine Learning Research
Better joint representations of image and text
Two methods presented at CVPR achieve state-of-the-art results by imposing additional structure on the representational space.Read More