Human-level control through deep reinforcement learning

Overview of attention for article published in Nature, February 2015
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (99th percentile)
  • High Attention Score compared to outputs of the same age and source (99th percentile)

Readers on

336 Mendeley
29 CiteULike
Human-level control through deep reinforcement learning
Published in
Nature, February 2015
DOI 10.1038/nature14236
Pubmed ID

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D


The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Twitter Demographics

The data shown below were collected from the profiles of 936 tweeters who shared this research output. Click here to find out more about how the information was compiled.

Mendeley readers

The data shown below were compiled from readership statistics for 336 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 116 35%
United Kingdom 55 16%
Germany 47 14%
Japan 34 10%
China 23 7%
Spain 19 6%
France 18 5%
Canada 13 4%
Italy 12 4%
Other 132 39%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 1201 357%
Student > Master 810 241%
Researcher 637 190%
Student > Bachelor 383 114%
Other 170 51%
Other 565 168%
Readers by discipline Count As %
Computer Science 1999 595%
Engineering 633 188%
Agricultural and Biological Sciences 311 93%
Physics and Astronomy 178 53%
Psychology 153 46%
Other 492 146%

Attention Score in Context

This research output has an Altmetric Attention Score of 1506. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 16 March 2017.
All research outputs
of 7,430,338 outputs
Outputs from Nature
of 45,528 outputs
Outputs of similar age
of 201,493 outputs
Outputs of similar age from Nature
of 920 outputs
Altmetric has tracked 7,430,338 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 99th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 45,528 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 69.8. This one has done particularly well, scoring higher than 99% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 201,493 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 99% of its contemporaries.
We're also able to compare this research output to 920 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 99% of its contemporaries.