Post 9tjDim2i1K1ozPU2KW by puffinus_puffinus@sunbeam.city
(DIR) More posts by puffinus_puffinus@sunbeam.city
(DIR) Post #9qwYRTct2L8XWpnFK4 by puffinus_puffinus@sunbeam.city
2020-01-13T00:45:30Z
3 likes, 10 repeats
⚠️ The Fediverse has been scraped, again ⚠️ Almost six million posts from 363 instances have been scraped."All the posts with public visibility published by users hosted on Mastodon servers [...] which support the English language" have been scraped along with their metadata, and the "policy, the code of conduct and the prohibited contents of each instance".The dataset is an attempt at creating an open dataset for "research" into algorithms like the ones Facebook uses to identify problematic content, based around users' use of Content Warnings.The dataset can be found here:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/R1HKVSIt was created by the University of Milan, Italy, apparently for the 13th AAAI:https://aaai.org/The associated publishing:https://aaai.org/ojs/index.php/ICWSM/article/download/3262/3130/ or https://likeable.space/media/30ae595a191923a1ce84a1e0feac6a3cef5b8669f44e15535ea18c7a5594b93a.pdf?name=Mastodon%20Content%20Warnings%3A%20Inappropriate%20Contents%20in%20a%20Microblogging%20Platform.pdf or DM me for a copy.Related dataset:https://dataverse.mpi-sws.org/dataset.xhtml?persistentId=doi:10.5072/FK2/AMYZGSOriginal post:https://likeable.space/objects/98fe7449-1776-4343-818d-aaff710a867b @tastytea #FediAdmin #MastoAdmin #MastoDev #Privacy #OpSec #Warning #Fediverse #Mastodon #Scraping
(DIR) Post #9qwYRU0HdLqshOjvn6 by puffinus_puffinus@sunbeam.city
2020-01-13T00:52:34Z
0 likes, 0 repeats
I don't know enough about privacy and online data laws to say, but is there anything that instances can do? What is public is not necessarily for the public eye. I am assuming people will take issue with this dataset including their toots, anonymised or not. It's likely also a personal security issue, because a few large json files that include millions of toots is probably a useful resource to police forces.
(DIR) Post #9qwYRUFWig2rSfs6YC by puffinus_puffinus@sunbeam.city
2020-01-13T01:07:19Z
0 likes, 0 repeats
The irony of course is that the "inappropriate" toots that the study focuses on are just toots that use a Content Warning. So, they throw jokes that use the reveal as a punchline, and posts containing eye contact, in with the same toots they suspect are problematic and breach the code of conduct of instances.The premise is incoherent! Problematic toots are taken down by the admins. The Content Warned toots that were scraped are permitted! A Content Warning does not mean the toot has been identified as problematic or a breach of the instance's code of conduct by the user! The dataset seems to be of little use for what I understand to be its intended purpose.Also, what the fuck is with .social's Content Warning word cloud? They processed the words used in content warnings to brinf them back to root definitions and came up with cauliflower???
(DIR) Post #9qwYRUUPpJxGCqpzl2 by mewmew@busshi.moe
2020-01-13T01:40:55Z
1 likes, 0 repeats
@puffinus_puffinus this is the whole point? they were studying what people cw, not what is against the policy of instances. "inappropriate" in this context means "not everyone wants to see it", not "bannable"
(DIR) Post #9qwdLuXWYl2f25RyyG by wowaname@anime.website
2020-01-13T02:35:57.268754Z
0 likes, 0 repeats
@mewmew @puffinus_puffinus i cant reply to this clown given theyre on LS and SBC, but>The dataset contains […] the network of the “follow” relationshipsturn follower/following info off by default. i'd argue none of this information has any legitimate use for anyone except the user itself>allowed topicsassuming their dataset scraped only mastodon, their study is already a fucking bust. lot more diversity when you include other software. on mastodon you either have leftists, or sjws, or Literal Nazis; there isnt much in-between or "internet culture" presence like with gs or pleromareading the dataset descriptions themselves, it looks like mostly aggregate info was collected; shit you could find off fediverse.network. pretty sure OP is blowing this out of proportion. of course, i hadnt downloaded the sets and went through them, but really most of this information they collected seems of trivial use. i dont understand why they want to know where instances are located (hosted) and what the rules are on each, seems like a stupid fucking study
(DIR) Post #9qwewwpD7rTCmywnz6 by portpupper@social.sakamoto.gq
2020-01-13T02:53:51.384894Z
0 likes, 0 repeats
@puffinus_puffinus Probably some English to Italian to English translation problem.
(DIR) Post #9qwfbdnKmi5jdmTQq8 by mewmew@busshi.moe
2020-01-13T03:01:12Z
1 likes, 0 repeats
@wowaname the dataset does include something like 200mb of user posts - it was a pretty dumb study though, but not malicious or a huge cause for concern
(DIR) Post #9qworKpzvvhR4EHIhs by puffinus_puffinus@sunbeam.city
2020-01-13T01:12:18Z
0 likes, 0 repeats
To see if your instance was scraped, check here: https://pastebin.com/3i0hjCyBOriginally posted by https://cursed.technology/@tao/103472753585790488 @tao
(DIR) Post #9qworLGaL4y0OgiX9E by r000t@ligma.pro
2020-01-13T04:44:53Z
1 likes, 0 repeats
@puffinus_puffinus @tao BREAKING NEWS ALERT: Things you post publicly to public timelines might get consumed by the public.
(DIR) Post #9qwt14N8PFeEiFftIW by GatoOscuro@quey.org
2020-01-13T05:29:11Z
0 likes, 0 repeats
@snder 👀 :pepeEZ: @puffinus_puffinus @9@@tastytea
(DIR) Post #9qwt8UPryP9xVwUZQO by GatoOscuro@quey.org
2020-01-13T05:30:35Z
0 likes, 0 repeats
@snder 👀 :pepeEZ: @puffinus_puffinus @9@@tastytea
(DIR) Post #9qwvct5f046SdRf6US by kakol@freespeechextremist.com
2020-01-13T06:00:43.897391Z
0 likes, 0 repeats
@r000t You could say the same for discord and matrix channels, both of them are technically public however if you dump entire chat logs people are going to call you an asshole for doing so. @puffinus_puffinus @tao
(DIR) Post #9qwwCN5NTB4vneRwUC by KitsuneAlicia@octodon.social
2020-01-13T01:44:34Z
0 likes, 0 repeats
@puffinus_puffinus It's especially so because as we said in another thread, this list is almost entire left-wing instances. There's only about a half dozen "free speech" instances on there and the major ones (Gab, Spinster, Kiwi Farms, Librem, Free Speech Extremist, etc.) aren't there.So it's not just a good resource for law enforcement, it's a literal honeypot for any dictatorship like the USA or China looking to shut down dissent.
(DIR) Post #9qwwCNNSNxXYhiuNfM by mewmew@busshi.moe
2020-01-13T06:07:07Z
0 likes, 0 repeats
@KitsuneAlicia @puffinus_puffinus because this was two years ago! Gab, Spinster, Kiwifarms, Librem, and FSE didn't even exist as Fedi instances at the start of 2019, let alone 2018!
(DIR) Post #9qx6cXsqwcnGMNi8bQ by hoergen@horche.demkontinuum.de
2020-01-13T08:01:57Z
0 likes, 0 repeats
That's why I use the Friendica automatic delete feature.
(DIR) Post #9qxFbzP62V7v0fbOd6 by puffinus_puffinus@sunbeam.city
2020-01-13T09:44:37Z
0 likes, 0 repeats
@GatoOscuro fuck off @snder@quey.org @tastytea
(DIR) Post #9qxIP6LJEGSSd1NQLA by KitsuneAlicia@octodon.social
2020-01-13T10:15:55Z
0 likes, 0 repeats
@mewmew @puffinus_puffinus Pretty sure you're wrong about FSE, at the very least, but go off, I guess. Still doesn't change the nature of the situation for marginalized people here.
(DIR) Post #9qxRZIEGn0gjyd1EEy by GatoOscuro@quey.org
2020-01-13T11:58:33Z
0 likes, 0 repeats
@puffinus_puffinus 🤔 @tastytea
(DIR) Post #9qxS6GOCeP8vpLO1VA by puffinus_puffinus@sunbeam.city
2020-01-13T12:04:23Z
0 likes, 0 repeats
@GatoOscuro you're using the Pepe frog emoji at me, from an instance that has added the Pepe frog. That seems deserving of a "fuck off" to me
(DIR) Post #9qxn6ddiIz0KyQf4fg by m4iler@infosec.exchange
2020-01-13T15:59:56Z
0 likes, 0 repeats
@puffinus_puffinus Come on... "Cheese steak" and it's someone's dick cheese meat. That deserves a CW
(DIR) Post #9qxncL5IpNdqh1gLz6 by ink_slinger@coales.co
2020-01-13T16:03:02Z
0 likes, 0 repeats
@puffinus_puffinus Cauliflower is the most triggering vegetable of all. This is known.
(DIR) Post #9qxncN9d8GI971iq0W by msh@coales.co
2020-01-13T16:05:00Z
0 likes, 0 repeats
@ink_slinger @puffinus_puffinus most especially inaudible cauliflower.Silent but deadly.
(DIR) Post #9qy4p3VxA9oVBYnUVE by it_wasnt_arson@queer.party
2020-01-13T19:18:27Z
0 likes, 0 repeats
@mewmew @puffinus_puffinus Well, that seems to be what they put in the paper
(DIR) Post #9qyG9w7GxkwzolPcoa by r000t@ligma.pro
2020-01-13T21:25:30Z
0 likes, 0 repeats
@kakol @tao @puffinus_puffinus If you care about privacy DO NOT USE MATRIX, and especially DO NOT USE THE DEFAULT WEB CLIENT. It's p much guaranteed to leak your public IP through WebRTC.
(DIR) Post #9qyeig58y4197iW0tE by r000t@ligma.pro
2020-01-14T02:00:42Z
0 likes, 0 repeats
@mewmew @KitsuneAlicia @puffinus_puffinus Gab, Spinster, Kiwifarms, Librem: *exist*Alicia: OMFG! THEY'RE LITERALLY MURDERING MARGINALIZED PEOPLE! EVERYONE NEEDS TO BLOCK THEM AND WE NEED TO HARASS ANYBODY WHO MAKES ANY TOOL THAT CAN BE USED TO SEE THEIR CONTENT! NOBODY SHOULD SEE WHAT'S POSTED THERE!Some College: *Doesn't collect their content in a study about the fediverse*Alicia: OMFG! THEY'RE LITERALLY MURDERING MARGINALIZED PEOPLE! THEY SHOULD BE CHARGED WITH RECKLESS ENDANGERMENT!
(DIR) Post #9qynqYTg1VwNRXp00e by kakol@freespeechextremist.com
2020-01-14T03:42:59.055995Z
1 likes, 0 repeats
@r000t They fixed that ages ago @tao @puffinus_puffinus
(DIR) Post #9qzA5TeKSFuDnvKBW4 by r000t@ligma.pro
2020-01-14T07:52:10Z
0 likes, 0 repeats
@kakol oshit, I'll go have a look later
(DIR) Post #9r1b0dI8GTYzL4mD5s by GatoOscuro@quey.org
2020-01-15T12:03:10Z
0 likes, 0 repeats
@puffinus_puffinus Well, it's well deserved. Is there another problem? :pepeEZ:
(DIR) Post #9r1nTkY5YmHDh9tHEW by puffinus_puffinus@sunbeam.city
2020-01-15T14:22:52Z
0 likes, 1 repeats
@GatoOscuro fuck off
(DIR) Post #9r3xOU6B70qnUSFSaG by feld@bikeshed.party
2020-01-16T15:23:30.342462Z
1 likes, 0 repeats
@puffinus_puffinus @tastytea yes we are doing this all day every day, it's called https://search.social -- why do you care?
(DIR) Post #9r409b5d8X5ugwaK80 by shellkr@mstdn.io
2020-01-16T15:53:33Z
0 likes, 0 repeats
@puffinus_puffinus @tastytea This is not the first time it is discussed and previously the message have been to not post publicly what should not be public. This is a hard problem to solve so it is better to be transparent about it. Personally I think purging posts older than two weeks could help discourage a little. It is not a solution but may help.There is a different standard in the works that might solve this but it is not ready yet. https://litepub.social/litepub/spec/intro.html and https://blog.dereferenced.org/what-is-ocap-and-why-should-i-care
(DIR) Post #9tjAyWV5H4XFYZZe1A by strypey@mastodon.nzoss.nz
2020-04-05T08:22:46Z
0 likes, 0 repeats
@puffinus_puffinus > What is public is not necessarily for the public eyeThat's exactly what "public" means, by definition. This comment is clearly #doublethink.
(DIR) Post #9tjBj1DjOEgO9cczjM by vfrmedia@social.tchncs.de
2020-01-13T01:03:41Z
0 likes, 0 repeats
@puffinus_puffinus I'm also concerned these students have created and published a framework of tools (as well as the dataset) that cops/feds (or anyone else) could make use of to monitor the fediverse. Maybe not in Europe as there isn't even anything /that/ controversial here but perhaps in USA and some other more authoritarian countries. Especially with a large amount of traffic from sex workers indexed where they may be operating in a legal grey area..
(DIR) Post #9tjBj2rTHy47FAEFJQ by strypey@mastodon.nzoss.nz
2020-04-05T08:31:10Z
0 likes, 0 repeats
@vfrmedia > sex workers indexed where they may be operating in a legal grey area..If this is the case, then making public posts on the internet that reveal this is self-sabotaging. It's about as sensible as drug dealers offering Illegal substances for sale in public posts. People need to be educated about #SecurityCulture, so they don't compromise themselves like this. "Don't read" policies are a head-in-the-sand solution, because cops will not respect them.
(DIR) Post #9tjCDPngZCsn4hwJKy by frickhaditcoming@anticapitalist.party
2020-01-13T01:47:34Z
0 likes, 0 repeats
@puffinus_puffinus @tao so could you stop them legally by adding something where the instance license every toot under something that this would be a violation? Technology wise if something is publicly broadcasted this will happen
(DIR) Post #9tjCDQPcIBDwyLgbSa by strypey@mastodon.nzoss.nz
2020-04-05T08:36:30Z
0 likes, 0 repeats
@frickhaditcoming > could you stop them legally by adding something where the instance license every toot under something that this would be a violation? You mean like ARR copyright? Probably not without making it Illegal to read the posts using anything other than the web UI of the instance it's hosted on. So other instances and third-party apps would be violating that license every time they show users the posts. IANAL though.@puffinus_puffinus @tao
(DIR) Post #9tjCZdmxEv56pUCBVo by eldaking@mastodon.social
2020-01-13T01:30:27Z
0 likes, 0 repeats
@puffinus_puffinus @tastytea This is, simply put, anti-ethical.Since it is supposedly a scientific study, I would suggest contacting the review board of the university (or something like that).Participation in scientific studies is not something trivial. Usually it is necessary to get a signed form with free and informed consent; implied consent should not be acceptable. And both the allowed uses and the handling of the data are very restricted.
(DIR) Post #9tjCZebIDlLbLbkNyS by strypey@mastodon.nzoss.nz
2020-04-05T08:40:39Z
0 likes, 0 repeats
@eldaking > Usually it is necessary to get a signed form with free and informed consentOnce something is published it is, by definition, no longer private. You don't need the informed consent of an author to use their books in a study. The same is true of public-facing web publications, including blogs and microblogs.@puffinus_puffinus @tastytea
(DIR) Post #9tjCiQDh3bhnLwtDiS by bauglir@mastodon.social
2020-01-13T10:52:22Z
0 likes, 2 repeats
@puffinus_puffinus @tastytea People must know that, when they post something publicly in Internet, they are not like walking along the street, but like deliberately stapling a message to a noticeboard or sending a message to a newspaper. Web tracking is like following you along the street; scraping intentionally published messages is like going to every noticeboard in the street and taking note of the publications.Don't publish anything you don't want to be public...
(DIR) Post #9tjCy2JVhvWjqJ3XAu by puffinus_puffinus@sunbeam.city
2020-01-13T11:33:58Z
0 likes, 0 repeats
@bauglir If you'd bothered to take a second and check the replies to my post, you'd have found that you'd just be joining the reply guys stating the obvious by tooting what you did.I already know, it's obvious, and that's not the point.
(DIR) Post #9tjCy3LJsgZJ2CuUdM by bauglir@mastodon.social
2020-01-13T12:26:30Z
0 likes, 0 repeats
@puffinus_puffinus A bit too agressive reply by your side, IMHO.I've already read every single replay to the post and don't think my message is recurrent, but maybe I didn't explain myself accurately: I'm not saying it's legal, but legal and fully legit. When you post at Mastodon, you're not walking along the street or opening you house's windows: you're deliverately publising info. It's not info under public eye: it's info you wanted to be public. There's a tacit permission...
(DIR) Post #9tjCy3rDy45ad9pyMa by bauglir@mastodon.social
2020-01-13T12:36:56Z
0 likes, 0 repeats
@puffinus_puffinus And the intention when publishing info is commonly used by regulators and judges when reviewing this type of cases. So, unless there's indexed and automatically searchable personal data within the registries, it's not formally a data file, so no data protection regulation is appliable. And unless you have copyrighted you messages, there's nothing to claim for.I'm not trying to start an argument with anyone: just trying to give info you can find in the Law
(DIR) Post #9tjCy4PxstsWMu5iVs by strypey@mastodon.nzoss.nz
2020-04-05T08:45:06Z
0 likes, 0 repeats
@bauglir> And unless you have copyrighted you messages, there's nothing to claim for.Even if you have, they have to be published for #copyright to apply, and the research these folks are doing would most likely be protected by #FairUse/ #FairDealing. @puffinus_puffinus
(DIR) Post #9tjDim2i1K1ozPU2KW by puffinus_puffinus@sunbeam.city
2020-01-13T12:38:14Z
0 likes, 0 repeats
@bauglir After reading a few posts like your own there is a natural tendency towards an aggressive response. You're missing the point.If you read every reply then I'm surprised you thought you'd added anything much. There's difference between something being legal and being a good idea. The people on the Fediverse --- many left-leaning people, many people from minorities, many people who are already oppressed in various ways, many people who take issue with the current governance in many countries --- all of these people are potentially under threat by the creation of a database such as this. There is no anonymity in the data and collecting such a large quantity of data in such a convenient way makes it even easier for authorities to track and profile users. I'm not saying this isn't already happening, I'm saying that this happening so blatantly and publicly should be stopped. If you disagree, you're not a comrade
(DIR) Post #9tjDimglcO4SzeE1lg by strypey@mastodon.nzoss.nz
2020-04-05T08:53:32Z
0 likes, 0 repeats
@puffinus_puffinus > all of these people are potentially under threat by the creation of a database such as this. Even if this was literally true (and I think it's an exaggeration at best), the #5Eyes and all other state and corporate spy agencies are most likely building such databases. The only way to prevent that is a) don't have these discussions in publicb) build tools for private social networking that use #E2EE etc so such databases are literally impossible to aggregate.@bauglir
(DIR) Post #9tjKFCzrwLsnZu0Kga by wolf480pl@mstdn.io
2020-04-05T10:06:52Z
0 likes, 0 repeats
@bauglir @puffinus_puffinus @tastytea IOW: Fedi is soapbox.
(DIR) Post #9tjeel0Ce05NMncWjw by vfrmedia@social.tchncs.de
2020-04-05T13:55:24Z
0 likes, 0 repeats
@strypey Indeed - I help moderate a forum that was once very popular with young people into raves/doofs and the associated lifestyles, and we had to constantly warn/discourage people against incriminating themselves. Those who didn't, sooner or later got caught.There are lots of people who (erroneously) believe that if the content is on a very busy network or in a foreign country with different law that their domestic cops are easily overwhelmed, but thats not the case..