Post ApbSg30BFO04UlDD04 by graveolensa@mathstodon.xyz
 (DIR) More posts by graveolensa@mathstodon.xyz
 (DIR) Post #Apaj0OHbEsTIOg7dOi by futurebird@sauropods.win
       2024-12-31T02:25:58Z
       
       0 likes, 0 repeats
       
       I haven't thought "I should try to build my *own* web spider, then maybe I could find things." since... Well, since 1998.:/
       
 (DIR) Post #Apaj1EqToRGnPuqDz6 by tsturm@famichiki.jp
       2024-12-31T02:46:39Z
       
       0 likes, 0 repeats
       
       @futurebird I recently have been thinking of what it would take to run my own spider... for the first time in about 25 years. The search results I'm getting lately are so bad, that a DIY spider might actually improve the situation for me.
       
 (DIR) Post #Apaj1JXsL2MJzfAwBE by JessTheUnstill@infosec.exchange
       2024-12-31T02:50:14Z
       
       0 likes, 0 repeats
       
       The problem is only partly that Google has gotten so much worse. It's also that SEO, botspam, LLM spam, and affiliate link spam has gotten so good that it's functionally impossible to algorithmically filter them out of the results. So just running your own spider is unlikely to matter much.@tsturm @futurebird
       
 (DIR) Post #Apaj1YVEj8A2Np3XeK by futurebird@sauropods.win
       2024-12-31T02:51:54Z
       
       0 likes, 0 repeats
       
       @JessTheUnstill @tsturm Well I'm thinking of doing something a little smaller and more targeted like this:https://sauropods.win/@futurebird/113744151630008623Because making a proper full web spider is a massive project. And even my small idea could be too big.
       
 (DIR) Post #ApajsGFdJhT57mtE2a by djsumdog@djsumdog.com
       2024-12-31T03:03:08.782703Z
       
       0 likes, 0 repeats
       
       I use Linkding to host my own personal bookmarks. The only Crawler I've looked into was Apache Nutch, but I haven't tried running it yet. I did run Yacy for a while, but it just kinda crashed a lot.Recently someone directed me to this which seems alright: https://wiby.meand I kinda want to try this one out: https://presearch.io/
       
 (DIR) Post #Apal3J11Tlpkpzt0YC by JessTheUnstill@infosec.exchange
       2024-12-31T03:09:59Z
       
       0 likes, 0 repeats
       
       Not a bad idea! My (vaguely) related is to fork a Fediverse app / make a browser plugin that caches and indexes only the Fediverse posts that I've browsed - whether on my timeline or on the explore page or whatever. Then I could search the content I've had access to, and I don't feel like I'd be violating anyone's privacy for caching and indexing the content I've already been allowed to view exclusively for my own personal use.Obviously, it'd be other problems if I started crawling and indexing content for public usage, but I think using a computer to augment my own fallible memory would be acceptable so I can find the posts I wanted to remember 2 weeks later.@futurebird @tsturm
       
 (DIR) Post #Apal3Nz50QFaFQ11iS by JessTheUnstill@infosec.exchange
       2024-12-31T03:11:06Z
       
       0 likes, 0 repeats
       
       Which of the content was already caches and indexed, it'd also be possible to strip out just the links to see what interesting had popped up.@futurebird @tsturm
       
 (DIR) Post #ApamZqhBtP4GW1gwUK by JessTheUnstill@infosec.exchange
       2024-12-31T03:32:24Z
       
       0 likes, 0 repeats
       
       Eh, there's way too many Mastodon/Fediverse server forks. I'm never going to have the time and focus to try and mess with all that. I at least have a chance at making a client side goodie that lets me search stuff out of my browser cache.@dalias @futurebird @tsturm
       
 (DIR) Post #Apb7WpbrZVTxJfbO1Q by BrettCoulstock@adforward.org
       2024-12-31T07:28:29Z
       
       0 likes, 0 repeats
       
       @futurebird This guy wrote his own search engine, and it's fun and interesting to play with. It finds a lot of different and idiosyncratic content ...https://www.marginalia.nu/
       
 (DIR) Post #ApbLcmJc8DfCC8nPqC by dahukanna@mastodon.social
       2024-12-31T10:06:26Z
       
       0 likes, 1 repeats
       
       @futurebird To remove & externalise bookmark dependency from browsers, I’ve resorted to manually collecting & curating links as I find them, with personal notes+tags reminding me why they are of interest. They’re always 100% searchable & findable.Given the inconsiderate, effective DDOS behavior of AI scraper bots, adding to that melee with more robo-indexing may not produced a usable search index - https://mastodon.social/@dahukanna/113741237599333856
       
 (DIR) Post #ApbNBKtk9zL9FdjnDk by futurebird@sauropods.win
       2024-12-31T10:23:56Z
       
       0 likes, 0 repeats
       
       @dahukanna I'm thinking of something much more modest:https://sauropods.win/@futurebird/113744151630008623
       
 (DIR) Post #ApbRINYv3LUTMEbwem by dahukanna@mastodon.social
       2024-12-31T11:09:57Z
       
       0 likes, 0 repeats
       
       @futurebird … extract links from within the post and links to the source post?
       
 (DIR) Post #ApbS8SkF1sA8UPk56O by futurebird@sauropods.win
       2024-12-31T11:19:25Z
       
       0 likes, 0 repeats
       
       @dahukanna I think so, yes. Basically I want a database of every single link that's been posted to *my* feed. It would also contain any hash tags used with the link, the post ID so I can go back and see the context. Next I'd strip out all of the "big sites" and focus more on the obscure. Then if I'm curious about, say # fossils I would get links mentioned in that context.And if # fossils is used with the tag # crinoids often I could move laterally and find more links.
       
 (DIR) Post #ApbSJMIzJO1theBrTU by futurebird@sauropods.win
       2024-12-31T11:21:26Z
       
       0 likes, 1 repeats
       
       @dahukanna Importantly this database would grow over time, it wouldn't be focused on "what's new" ... basically I have a high level of trust in the way people #onhere associate hash tags with links and I think that'd be a great way to find things.In fact I do it manually often enough, but it's time consuming. I just want all of the links sometimes.
       
 (DIR) Post #ApbSg30BFO04UlDD04 by graveolensa@mathstodon.xyz
       2024-12-31T11:25:26Z
       
       0 likes, 0 repeats
       
       @futurebird I have been trying to collect information on a local web server, so I don't need to have it come over the network every time I want to see it (keep local copies of things which matter!)