How tracking language diversity in Wikipedia gives deeper insights into attention

Wikipedia, with its vast repository of citations for scholarly publications, serves as a significant source driving traffic to referenced websites. Additionally, it plays a crucial role in quantifying attention received by research outputs cited on its platform. Altmetric’s expansion to include multi-language tracking was a step in the direction of gaining a deeper comprehension of attention beyond the English-speaking world. Drawing information from the webinar Tracking multiple languages in the Altmetric Explorer,” this blog gives insights into why it is important to track different languages.

Expansion of Altmetric tracking

Improving diversity has been an important principle for Altmetric as it not only underlines the commitment to open research and inclusivity but also allows Altmetric to ensure that it quantifies attention as widely as possible. That is why, two years ago, Altmetric expanded its tracking of Wikipedia citations to beyond English language Wikipedia. Before this expansion, the Altmetric Data Insights team investigated how the different languages describe a broader view of research and to what extent academics trust Wikipedia. 

Investigating language diversity

There has been an assumption that the English language Wikipedia, being larger and more active than other language versions, is the primary one to consider. However, conversations with academics in other languages means that excluding other languages could result in overlooking crucial sources.

To explore diversity, Mike Taylor, Head of Data Insights at Digital Science, and his colleagues examined 21 different language versions of Wikipedia. Although numerous other Wikipedia editions exist, and Altmetric encompasses more languages than those examined in the study, each language version studied required a minimum of 1,000 editors and 100,000 citations to academic research for inclusion in the research.

Of the 2 million research citations found in the English language Wikipedia, 1.1 million citations were unique to English and not present elsewhere. However, non-English Wikipedia editions cited 1.45 million research articles and books, constituting 42% of the total citations.

Individual language sites can be examined in detail. For instance, the French language site cites 108,000 research publications exclusively, and the team identified similarly substantial numbers for German, Italian, Japanese, and Spanish. However, the team observed that each language, even those with lower citation counts, makes a unique contribution. For example, Farsi/Persian, Turkish, Vietnamese, and Serbian each cite approximately 12,500 publications that are not referenced elsewhere.

Disciplinary insights 

The team broke the data down into six different disciplines, based on Dimensions fields of research, and looked at how strongly unique their citations were. They found, for example, that Polish has a strongly unique citation base in engineering technology, medical and health sciences and social sciences.

In terms of fields that differ across languages, the team found that engineering technology is probably overall the most the most different, followed by medical health. The initial assumption was that there would be greater variability across languages for citations for social sciences and humanities due to expected differences in linguistic and cultural research, as well as references to heritage. However, contrary to this assumption, the team found less differentiation in these areas compared to the sciences.

Taylor says that, “Investigating the publications that are cited in Wikipedia gave us fascinating insights into how different languages represent their own research. The data shows that different languages contribute a unique voice to the representation of research. Excitingly this enables us to start to establish a new way of thinking about Wikipedia content through the eyes of research of scientific citations.”

Getting a deeper view on topics

Looking at individual pages shows that some Wikipedias celebrate the lives of researchers who do not appear in English language Wikipedia. Elsewhere, there are differences in approaches across languages. For example, the French philosopher and mathematician Émilie du Châtelet has a large page in the English language Wikipedia with many more academic references than the French entry. Yet the French Wikipedia has separate, individual pages for many of her works, and each of these have numerous academic citations. 

Taylor adds, “By increasing the diversity that we cover in Altmetric we represent a broader coverage of scholarship and scholars globally than we could by indexing only the English language Wikipedia.”

Altmetric now tracks 35 languages, including the latest Swahili, Afrikaans, Egyptian Arabic, and Uzbek, with each language contributing substantially to the overall count. If you want to listen to the complete webinar, sign in to Tracking multiple languages in the Altmetric Explorer.

For more details about Altmetric contact the team.