RT @arxiv_cscl: On the Effect of Dropping Layers of Pre-trained Transformer Models https://t.co/YYZeP5msup
5,059 followers
133,974 followers
RT @arxiv_cscl: On the Effect of Dropping Layers of Pre-trained Transformer Models https://t.co/YYZeP5msup
320 followers
Accepted CSL 2022: On the effect of dropping layers of pre-trained transformer models https://t.co/NcaCLtwYiF! We show that simply dropping top layers of the pretrained models is competitive to expensive and sophisticated distillation methods
1,730 followers
RT @arxiv_cscl: On the Effect of Dropping Layers of Pre-trained Transformer Models https://t.co/YYZeP5msup