Learning human objectives by evaluating hypothetical behaviours December 13, 2019 by admin Deepmind We present a new method for training reinforcement learning agents from human feedback in the presence of unknown unsafe states.Read More Previous Post Alexa’s ASRU papers concentrate on extracting high-value training data Next Post OpenAI Five