Post ASL1OS1UpYCyjU47No by stop@me.dm
 (DIR) More posts by stop@me.dm
 (DIR) Post #ASL1OQUqVSUrzvmWqu by stop@me.dm
       2023-02-04T13:49:48Z
       
       0 likes, 2 repeats
       
       In case you wondered (like me) or maybe you haven’t considered it yet… I confirmed that Google is indeed indexing public Mastodon profiles and posts. And it’s doing so rather quickly.I posted about Beau here on Thursday morning, and this is the only place I shared about him. And that post and photo already show up in a search for “Doug Bowman Beau” two days later. I didn’t check until now. So I don’t yet know how long it took to be crawled and indexed. But definitely < 48 hours.
       
 (DIR) Post #ASL1ORJtRfKWYFfIQ4 by simon@fedi.simonwillison.net
       2023-02-04T14:04:03Z
       
       0 likes, 0 repeats
       
       @stop https://me.dm/robots.txt suggests that crawlers are welcome to do that - Google at least would obey a "Disallow: /" in that file
       
 (DIR) Post #ASL1OS1UpYCyjU47No by stop@me.dm
       2023-02-04T13:57:05Z
       
       0 likes, 0 repeats
       
       This is significant, because I remember the days when we (at Twitter) were negotiating with Google about access to the firehose of tweet data.Rapidly getting tweets into the Google index played a large part in boosting the awareness and findability of real-time content that had only been shared on Twitter.Mastodon hosts could confirm whether crawlers are directly hitting their servers. Or Google may have already created multiple instances of their own, solely for the purposes of indexing.
       
 (DIR) Post #ASL2ANR5BmzWPBt0F6 by stop@me.dm
       2023-02-04T14:10:55Z
       
       0 likes, 0 repeats
       
       @simon I hadn’t checked that yet, but assumed it might exist, and that it would be written based on host-configured settings.I wonder whether Google figured out it was faster to set up their own instances, sucking in everything they can, but giving priority to certain content (by instance or account). They certainly had multiple flags set on Twitter accounts based on how high profile they were. New/low follower accounts rarely made it into the index in the early days.
       
 (DIR) Post #ASL2NhNmL58w28h1lo by simon@fedi.simonwillison.net
       2023-02-04T14:15:11Z
       
       0 likes, 0 repeats
       
       @stop oh interesting - your suspicion is that Google have custom crawling logic for Mastodon content that takes into account things like follower counts?I had assumed it was just their regular crawler, following links and starting on some of the larger instances like mastodon.social
       
 (DIR) Post #ASL2s2TUU2My8YpeK0 by simon@fedi.simonwillison.net
       2023-02-04T14:18:54Z
       
       0 likes, 0 repeats
       
       @stop this is the most detailed conversation I could find right now about robots.txt in Mastodon - "git blame" shows no policy changes to that file since 2019 https://github.com/mastodon/mastodon/pull/10038
       
 (DIR) Post #ASL39xAmvwXuuHctuq by stop@me.dm
       2023-02-04T14:24:10Z
       
       0 likes, 0 repeats
       
       @simon And it could just be standard link crawling. But if anyone inside is paying attention, given the different nature of public real-time content, they may have gotten out front of this, and proactively set out a strategy for Fediverse/ActivityPub.