↓ Skip to main content

Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder

Overview of attention for article published in Frontiers in Computational Neuroscience, August 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Among the highest-scoring outputs from this source (#42 of 1,393)
  • High Attention Score compared to outputs of the same age (93rd percentile)
  • High Attention Score compared to outputs of the same age and source (93rd percentile)

Mentioned by

news
1 news outlet
blogs
1 blog
twitter
16 X users
googleplus
2 Google+ users

Citations

dimensions_citation
32 Dimensions

Readers on

mendeley
101 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder
Published in
Frontiers in Computational Neuroscience, August 2016
DOI 10.3389/fncom.2016.00092
Pubmed ID
Authors

Saeed R. Kheradpisheh, Masoud Ghodrati, Mohammad Ganjtabesh, Timothée Masquelier

Abstract

View-invariant object recognition is a challenging problem that has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g., 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best models for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition task using the same set of images and controlling the kinds of transformation (position, scale, rotation in plane, and rotation in depth) as well as their magnitude, which we call "variation level." We used four object categories: car, ship, motorcycle, and animal. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs (proposed respectively by Hinton's group and Zisserman's group) on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position (much easier). This suggests that DCNNs would be reasonable models of human feed-forward vision. In addition, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research.

X Demographics

X Demographics

The data shown below were collected from the profiles of 16 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 101 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Sri Lanka 1 <1%
Germany 1 <1%
Unknown 99 98%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 23 23%
Researcher 19 19%
Student > Master 12 12%
Student > Bachelor 9 9%
Student > Doctoral Student 6 6%
Other 12 12%
Unknown 20 20%
Readers by discipline Count As %
Computer Science 33 33%
Neuroscience 18 18%
Engineering 9 9%
Psychology 6 6%
Agricultural and Biological Sciences 4 4%
Other 6 6%
Unknown 25 25%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 30. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 26 September 2016.
All research outputs
#1,201,368
of 23,964,824 outputs
Outputs from Frontiers in Computational Neuroscience
#42
of 1,393 outputs
Outputs of similar age
#22,922
of 342,199 outputs
Outputs of similar age from Frontiers in Computational Neuroscience
#3
of 33 outputs
Altmetric has tracked 23,964,824 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 94th percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 1,393 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 7.0. This one has done particularly well, scoring higher than 97% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 342,199 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 93% of its contemporaries.
We're also able to compare this research output to 33 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 93% of its contemporaries.