Similarity encoding for learning with dirty categorical variables

Overview of attention for article published in Machine Learning, June 2018

Altmetric Badge

About this Attention Score

In the top 25% of all research outputs scored by Altmetric
Among the highest-scoring outputs from this source (#35 of 1,245)
High Attention Score compared to outputs of the same age (89th percentile)
High Attention Score compared to outputs of the same age and source (93rd percentile)

So far, Altmetric has seen 25 X posts from 22 X users, with an upper bound of 95,838 followers.

@fetzert Or use data processing techniques for dirty categories https://t.co/lUpPJLVFXE https://t.co/RMSs9nKEWE (Shameless self-plug, but I still think that such directions are really useful)

@francoisfleuret @david_picard For high-cardinality categories, use TargetEncoder (strong baseline in https://t.co/lUpPJLVFXE ) or string-based methods (see https://t.co/RMSs9nKEWE ) Neural-net embeddings require much more data 2/3

@ImadPhd For categorical encoding, I know of our own work, and citations within: https://t.co/UtTE6j4d6Z and https://t.co/5iz7rCUVNe but these are not specific to healthcare.

RT @GaelVaroquaux: Learning on dirty categories? @patricio_cerda presented our work with @balazskegl at @ECMLPKDD His slides: https://t.c…

This page shows the most recent X posts that mention this research output.

Click here to find out how to access more activity, including 21 additional X posts.