284 followers
the entire dataset is 1.7 GB compressed, or over 3,000 full-length novels. the paper announcing it has been cited over 800 times, in papers ranging from authorship analysis to text summarization to "managerial hubris detection" https://t.co/Nk7ZDzEiLK