Post ALXxq6NiljolbPEE6K by 11011110@mathstodon.xyz
 (DIR) More posts by 11011110@mathstodon.xyz
 (DIR) Post #ALXxq6NiljolbPEE6K by 11011110@mathstodon.xyz
       2022-07-15T22:58:35Z
       
       0 likes, 0 repeats
       
       Seriously bad data in Google's GoEmotions dataset (58K reddit comments categorized by affect): https://www.surgehq.ai//blog/30-percent-of-googles-reddit-emotions-dataset-is-mislabeled, via https://news.ycombinator.com/item?id=32090389Opinions in the post and comments vary on why the categorization was so inaccurate, including lack of context, farming it out to poorly-paid workers in countries less likely to be familiar with the specific idioms used in the comments, or maybe just that it's a hard problem.
       
 (DIR) Post #ALXxq6rUz1dZ5lA0W0 by EdS@mastodon.sdf.org
       2022-07-16T10:52:49Z
       
       0 likes, 0 repeats
       
       +1 insightful@11011110