fsebugoutzone.org:9999

       Post ApbSg30BFO04UlDD04 by graveolensa@mathstodon.xyz
 (DIR) More posts by graveolensa@mathstodon.xyz
 (DIR) Post #Apaj0OHbEsTIOg7dOi by futurebird@sauropods.win
       2024-12-31T02:25:58Z
       
       0 likes, 0 repeats
       
       I haven&#39;t thought &quot;I should try to build my *own* web spider, then maybe I could find things.&quot; since... Well, since 1998.:/
       
 (DIR) Post #Apaj1EqToRGnPuqDz6 by tsturm@famichiki.jp
       2024-12-31T02:46:39Z
       
       0 likes, 0 repeats
       
       @futurebird I recently have been thinking of what it would take to run my own spider... for the first time in about 25 years. The search results I&#39;m getting lately are so bad, that a DIY spider might actually improve the situation for me.
       
 (DIR) Post #Apaj1JXsL2MJzfAwBE by JessTheUnstill@infosec.exchange
       2024-12-31T02:50:14Z
       
       0 likes, 0 repeats
       
       The problem is only partly that Google has gotten so much worse. It&#39;s also that SEO, botspam, LLM spam, and affiliate link spam has gotten so good that it&#39;s functionally impossible to algorithmically filter them out of the results. So just running your own spider is unlikely to matter much.@tsturm @futurebird
       
 (DIR) Post #Apaj1YVEj8A2Np3XeK by futurebird@sauropods.win
       2024-12-31T02:51:54Z
       
       0 likes, 0 repeats
       
       @JessTheUnstill @tsturm Well I&#39;m thinking of doing something a little smaller and more targeted like this:https://sauropods.win/@futurebird/113744151630008623Because making a proper full web spider is a massive project. And even my small idea could be too big.
       
 (DIR) Post #ApajsGFdJhT57mtE2a by djsumdog@djsumdog.com
       2024-12-31T03:03:08.782703Z
       
       0 likes, 0 repeats
       
       I use Linkding to host my own personal bookmarks. The only Crawler I&#39;ve looked into was Apache Nutch, but I haven&#39;t tried running it yet. I did run Yacy for a while, but it just kinda crashed a lot.Recently someone directed me to this which seems alright: https://wiby.meand I kinda want to try this one out: https://presearch.io/
       
 (DIR) Post #Apal3J11Tlpkpzt0YC by JessTheUnstill@infosec.exchange
       2024-12-31T03:09:59Z
       
       0 likes, 0 repeats
       
       Not a bad idea! My (vaguely) related is to fork a Fediverse app / make a browser plugin that caches and indexes only the Fediverse posts that I&#39;ve browsed - whether on my timeline or on the explore page or whatever. Then I could search the content I&#39;ve had access to, and I don&#39;t feel like I&#39;d be violating anyone&#39;s privacy for caching and indexing the content I&#39;ve already been allowed to view exclusively for my own personal use.Obviously, it&#39;d be other problems if I started crawling and indexing content for public usage, but I think using a computer to augment my own fallible memory would be acceptable so I can find the posts I wanted to remember 2 weeks later.@futurebird @tsturm
       
 (DIR) Post #Apal3Nz50QFaFQ11iS by JessTheUnstill@infosec.exchange
       2024-12-31T03:11:06Z
       
       0 likes, 0 repeats
       
       Which of the content was already caches and indexed, it&#39;d also be possible to strip out just the links to see what interesting had popped up.@futurebird @tsturm
       
 (DIR) Post #ApamZqhBtP4GW1gwUK by JessTheUnstill@infosec.exchange
       2024-12-31T03:32:24Z
       
       0 likes, 0 repeats
       
       Eh, there&#39;s way too many Mastodon/Fediverse server forks. I&#39;m never going to have the time and focus to try and mess with all that. I at least have a chance at making a client side goodie that lets me search stuff out of my browser cache.@dalias @futurebird @tsturm
       
 (DIR) Post #Apb7WpbrZVTxJfbO1Q by BrettCoulstock@adforward.org
       2024-12-31T07:28:29Z
       
       0 likes, 0 repeats
       
       @futurebird This guy wrote his own search engine, and it&#39;s fun and interesting to play with. It finds a lot of different and idiosyncratic content ...https://www.marginalia.nu/
       
 (DIR) Post #ApbLcmJc8DfCC8nPqC by dahukanna@mastodon.social
       2024-12-31T10:06:26Z
       
       0 likes, 1 repeats
       
       @futurebird To remove &amp; externalise bookmark dependency from browsers, I’ve resorted to manually collecting &amp; curating links as I find them, with personal notes+tags reminding me why they are of interest. They’re always 100% searchable &amp; findable.Given the inconsiderate, effective DDOS behavior of AI scraper bots, adding to that melee with more robo-indexing may not produced a usable search index - https://mastodon.social/@dahukanna/113741237599333856
       
 (DIR) Post #ApbNBKtk9zL9FdjnDk by futurebird@sauropods.win
       2024-12-31T10:23:56Z
       
       0 likes, 0 repeats
       
       @dahukanna I&#39;m thinking of something much more modest:https://sauropods.win/@futurebird/113744151630008623
       
 (DIR) Post #ApbRINYv3LUTMEbwem by dahukanna@mastodon.social
       2024-12-31T11:09:57Z
       
       0 likes, 0 repeats
       
       @futurebird … extract links from within the post and links to the source post?
       
 (DIR) Post #ApbS8SkF1sA8UPk56O by futurebird@sauropods.win
       2024-12-31T11:19:25Z
       
       0 likes, 0 repeats
       
       @dahukanna I think so, yes. Basically I want a database of every single link that&#39;s been posted to *my* feed. It would also contain any hash tags used with the link, the post ID so I can go back and see the context. Next I&#39;d strip out all of the &quot;big sites&quot; and focus more on the obscure. Then if I&#39;m curious about, say # fossils I would get links mentioned in that context.And if # fossils is used with the tag # crinoids often I could move laterally and find more links.
       
 (DIR) Post #ApbSJMIzJO1theBrTU by futurebird@sauropods.win
       2024-12-31T11:21:26Z
       
       0 likes, 1 repeats
       
       @dahukanna Importantly this database would grow over time, it wouldn&#39;t be focused on &quot;what&#39;s new&quot; ... basically I have a high level of trust in the way people #onhere associate hash tags with links and I think that&#39;d be a great way to find things.In fact I do it manually often enough, but it&#39;s time consuming. I just want all of the links sometimes.
       
 (DIR) Post #ApbSg30BFO04UlDD04 by graveolensa@mathstodon.xyz
       2024-12-31T11:25:26Z
       
       0 likes, 0 repeats
       
       @futurebird I have been trying to collect information on a local web server, so I don&#39;t need to have it come over the network every time I want to see it (keep local copies of things which matter!)