On the effect of dropping layers of pre-trained transformer models

Overview of attention for article published in Computer Speech & Language, January 2023

Altmetric Badge

About this Attention Score

In the top 5% of all research outputs scored by Altmetric
One of the highest-scoring outputs from this source (#1 of 431)
High Attention Score compared to outputs of the same age (95th percentile)

Mentioned by

news: 1 news outlet
blogs: 1 blog

twitter: 59 X users
reddit: 1 Redditor

Citations

dimensions_citation: 24 Dimensions

Readers on

mendeley: 59 Mendeley

Summary News Blogs X Reddit Dimensions citations

So far, Altmetric has seen 71 X posts from 59 X users, with an upper bound of 252,593 followers.

RT @arxiv_cscl: On the Effect of Dropping Layers of Pre-trained Transformer Models https://t.co/YYZeP5msup

17 Aug 2022

Reply Repost Favourite

RT @arxiv_cscl: On the Effect of Dropping Layers of Pre-trained Transformer Models https://t.co/YYZeP5msup

16 Aug 2022

Reply Repost Favourite

Accepted CSL 2022: On the effect of dropping layers of pre-trained transformer models https://t.co/NcaCLtwYiF! We show that simply dropping top layers of the pretrained models is competitive to expensive and sophisticated distillation methods

16 Aug 2022

Reply Repost Favourite

RT @arxiv_cscl: On the Effect of Dropping Layers of Pre-trained Transformer Models https://t.co/YYZeP5msup

16 Aug 2022

Reply Repost Favourite

This page shows the most recent X posts that mention this research output.

Click here to find out how to access more activity, including 67 additional X posts.