↓ Skip to main content

Human-level control through deep reinforcement learning

Overview of attention for article published in Nature, February 2015
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (99th percentile)
  • High Attention Score compared to outputs of the same age and source (99th percentile)

Readers on

mendeley
4192 Mendeley
citeulike
28 CiteULike
Title
Human-level control through deep reinforcement learning
Published in
Nature, February 2015
DOI 10.1038/nature14236
Pubmed ID
Authors

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D

Abstract

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Twitter Demographics

The data shown below were collected from the profiles of 936 tweeters who shared this research output. Click here to find out more about how the information was compiled.

Mendeley readers

The data shown below were compiled from readership statistics for 4,192 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Brazil 1 <1%
Unknown 4191 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 94 2%
Student > Master 90 2%
Student > Bachelor 47 1%
Researcher 39 <1%
Other 22 <1%
Other 47 1%
Unknown 3853 92%
Readers by discipline Count As %
Computer Science 184 4%
Engineering 66 2%
Unspecified 10 <1%
Mathematics 10 <1%
Psychology 10 <1%
Other 59 1%
Unknown 3853 92%

Attention Score in Context

This research output has an Altmetric Attention Score of 1524. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 13 November 2017.
All research outputs
#732
of 8,658,076 outputs
Outputs from Nature
#166
of 48,840 outputs
Outputs of similar age
#23
of 206,413 outputs
Outputs of similar age from Nature
#7
of 924 outputs
Altmetric has tracked 8,658,076 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 99th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 48,840 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 76.0. This one has done particularly well, scoring higher than 99% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 206,413 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 99% of its contemporaries.
We're also able to compare this research output to 924 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 99% of its contemporaries.