↓ Skip to main content

Authorship identification of documents with high content similarity

Overview of attention for article published in Scientometrics, February 2018
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
29 Dimensions

Readers on

mendeley
82 Mendeley
Title
Authorship identification of documents with high content similarity
Published in
Scientometrics, February 2018
DOI 10.1007/s11192-018-2661-6
Pubmed ID
Authors

Andi Rexha, Mark Kröll, Hermann Ziak, Roman Kern

Abstract

The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e. authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors. Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper. Both studies confirmed that this task is quite challenging. To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (1) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (2) assist forensic experts or linguists to create profiles of writers, (3) support intelligence applications to analyze aggressive and threatening messages and (4) help editor conformity by adhering to, for instance, journal specific writing style.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 82 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 82 100%

Demographic breakdown

Readers by professional status Count As %
Student > Master 11 13%
Researcher 10 12%
Student > Ph. D. Student 9 11%
Student > Bachelor 6 7%
Student > Postgraduate 5 6%
Other 17 21%
Unknown 24 29%
Readers by discipline Count As %
Computer Science 27 33%
Social Sciences 6 7%
Agricultural and Biological Sciences 3 4%
Engineering 3 4%
Arts and Humanities 3 4%
Other 14 17%
Unknown 26 32%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 12 March 2018.
All research outputs
#20,468,008
of 23,026,672 outputs
Outputs from Scientometrics
#2,505
of 2,691 outputs
Outputs of similar age
#377,048
of 439,380 outputs
Outputs of similar age from Scientometrics
#48
of 54 outputs
Altmetric has tracked 23,026,672 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 2,691 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.8. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 439,380 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 54 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.