Post 9wgWIeFYBoZYsuGLtQ by christianbundy@social.coop
 (DIR) More posts by christianbundy@social.coop
 (DIR) Post #9wg3ZtKOU9XVtuOkbo by sir@cmpwn.com
       2020-07-02T13:47:20Z
       
       3 likes, 4 repeats
       
       We really need a FOSS search engine with, and this is important: its own in-house, FOSS crawler
       
 (DIR) Post #9wg3cJ4ziELgLaLZQm by penguin42@mastodon.org.uk
       2020-07-02T13:52:43Z
       
       0 likes, 1 repeats
       
       @sir Where do you store the crawlers data?
       
 (DIR) Post #9wg3h0Gl4lfLSL5ZfU by fakefred@mastodon.technology
       2020-07-02T13:53:35Z
       
       0 likes, 0 repeats
       
       @penguin42 @sir ... On federated servers?
       
 (DIR) Post #9wg4CZpUuhGkinOS12 by OTheB@mastodon.technology
       2020-07-02T13:52:14Z
       
       1 likes, 0 repeats
       
       @sir "SearchHut"?
       
 (DIR) Post #9wg4Vdq4EQ34eqkb0S by amael@social.linux.pizza
       2020-07-02T14:02:36Z
       
       0 likes, 1 repeats
       
       I 100% agree with you, I would be glad to work on it !
       
 (DIR) Post #9wg4X1ZjSu58Z5llU8 by simon@fosstodon.org
       2020-07-02T13:54:18Z
       
       0 likes, 0 repeats
       
       @sir a search engine that searches only (independent) blogs would be great too
       
 (DIR) Post #9wg4f8uM5QS5ciHpxo by sir@cmpwn.com
       2020-07-02T13:54:34Z
       
       0 likes, 0 repeats
       
       @fakefred @penguin42 federating a search engine would be pretty difficult, but I would be interested in seeing some research around community ownership of the crawled data.
       
 (DIR) Post #9wg4yyrj8LC6GkW6dM by aktivismoEstasMiaLuo@activism.openworlds.info
       2020-07-02T13:54:50Z
       
       0 likes, 0 repeats
       
       @sir #searx is a FOSS search engine & #yacy is a FOSS crawler, and they work together.
       
 (DIR) Post #9wg5Cyy7Xy3idwBT6m by amk@mastodon.amk.ie
       2020-07-02T13:55:25Z
       
       0 likes, 0 repeats
       
       @sir lets hope spider/ask.moe will free us from this search engine prison. I've been using qwant which seem to make similar promises to ddg, but its also not FOSS which is a shame.
       
 (DIR) Post #9wg5rOaLiy47fuKrya by selea@social.linux.pizza
       2020-07-02T14:17:48Z
       
       0 likes, 1 repeats
       
       @aktivismoEstasMiaLuo How do they work toghether?@sir
       
 (DIR) Post #9wg61TBTTsX2pkVKTY by aktivismoEstasMiaLuo@activism.openworlds.info
       2020-07-02T14:19:36Z
       
       0 likes, 0 repeats
       
       @selea @sir i just know that I've encountered some searx instances that source indexes from a yacy instance running on the same host.  I've not installed it myself.
       
 (DIR) Post #9wg6DzstjALtj8mxtY by selea@social.linux.pizza
       2020-07-02T14:21:53Z
       
       0 likes, 1 repeats
       
       @aktivismoEstasMiaLuo Oh, I did not know that! Thank you very much!@sir
       
 (DIR) Post #9wg7K3Yx43qapJucgi by penguin42@mastodon.org.uk
       2020-07-02T14:34:15Z
       
       0 likes, 1 repeats
       
       @sir @fakefred I think that storage and access is the challenge - and trust if it's federated, you don't want all searches to get redirected to porn sites or other vendors sites.
       
 (DIR) Post #9wgSX1BpIRrOGk7LZA by flewkey@layer8.space
       2020-07-02T18:31:54Z
       
       0 likes, 1 repeats
       
       @sir The Gigablast search engine published their source code to a git repository a while back, but it definitely needs an overhaul.
       
 (DIR) Post #9wgWIeFYBoZYsuGLtQ by christianbundy@social.coop
       2020-07-02T19:12:30Z
       
       0 likes, 0 repeats
       
       @sir I was literally just working on this! My use-case is that I've contributed lots on GitHub and I want to download all of the repos I've worked on... but I can't get a list of them.Currently fighting with their GraphQL API, but I'd kill for a "give me a list of all repos where a commit is authored by me" search query.
       
 (DIR) Post #9wgWS1j4H8P6IhXd44 by sir@cmpwn.com
       2020-07-02T19:14:28Z
       
       0 likes, 0 repeats
       
       @christianbundy that's not what I meant. I meant a FOSS search engine for searching the web at large
       
 (DIR) Post #9wgX8c29fqJV4Om9L6 by christianbundy@social.coop
       2020-07-02T19:22:07Z
       
       0 likes, 0 repeats
       
       @sir oh! I haven't looked into those in a while, last I saw I think YaCy was state-of-the-art. If you find anything (or build anything) I'd be happy to test.
       
 (DIR) Post #9wgazJWJXMkErThhSq by _1751015@mastodon.host
       2020-07-02T20:06:34Z
       
       1 likes, 1 repeats
       
       @sir 1) https://yacy.net/ - implementation of P2P (peer-to-peer) search engine2) https://commoncrawl.org/2020/06/may-june-2020-crawl-archive-now-available/ - they provide public index and code: https://github.com/commoncrawl
       
 (DIR) Post #9wgbHO9N3tTGQwahpw by sir@cmpwn.com
       2020-07-02T20:07:35Z
       
       0 likes, 0 repeats
       
       @_1751015 where can I play with a search engine powered by this data?
       
 (DIR) Post #9wgnW4RVoZ8f6cqNcm by cuniculus@cmpwn.com
       2020-07-02T22:25:10Z
       
       0 likes, 0 repeats
       
       @sir @_1751015 https://yacy.eric.ovh/
       
 (DIR) Post #9wgng1b5FWPDAz4Ehl by sir@cmpwn.com
       2020-07-02T22:26:36Z
       
       0 likes, 0 repeats
       
       @cuniculus @_1751015 ooof the animations and javascript yikes
       
 (DIR) Post #9wgnmqoVjxU9l7bQrQ by sir@cmpwn.com
       2020-07-02T22:27:10Z
       
       0 likes, 0 repeats
       
       @cuniculus @_1751015 tbh I don't think a distributed search engine is the right approach
       
 (DIR) Post #9wgrW5ljHf3T2FZ9V2 by cuniculus@cmpwn.com
       2020-07-02T23:10:47Z
       
       0 likes, 0 repeats
       
       @sir @_1751015 Yeah, since it requires loads of storage and fat bandwidth
       
 (DIR) Post #9wh38Ibj665JLzJAnY by thatkiwiguy@coffeehouse.institute
       2020-07-03T01:18:18Z
       
       0 likes, 0 repeats
       
       @sir would https://yacy.net be appropriate? Self hosted, DHT, P2P...
       
 (DIR) Post #9wiGSqCDQ1TvseSdyy by katie@mstdn.io
       2020-07-03T15:26:08Z
       
       0 likes, 0 repeats
       
       @aktivismoEstasMiaLuo @selea @sir https://yacy.everdot.org/ defaults to only sourcing the global + a private yacy network and https://searx.everdot.org/ includes a private yacy network by default.One major problem with using the global yacy network is that you have to decide a cut-off for how long you want to wait for global results and drop slower servers because some use minutes before they respond. That's just too slow. Also, patch is needed to sort results, default is first come first shown.
       
 (DIR) Post #9wiJKVtJKNsGS4L5v6 by katie@mstdn.io
       2020-07-03T15:58:14Z
       
       0 likes, 0 repeats
       
       @sir Even if you have a FOSS search engine with a FOSS crawler like what's running on https://yacy.everdot.org/ you'll quickly run into performance issues and economic issues. Going FOSS won't automatically bring in advertisement revenue and that's what Google/Bing/etc actually do, they are advertisement agencies not search engines. That's how they afford thousands of servers. There's free software but there's no such thing as free hardware.
       
 (DIR) Post #9wmMHtJoqSeCjl1G1w by _1751015@mastodon.host
       2020-07-05T14:50:11Z
       
       0 likes, 1 repeats
       
       @sir I don't have information about a search engine using the Common Crawl data. They have a compiled list with references to various small projects that use the data:https://commoncrawl.org/the-data/examples/
       
 (DIR) Post #9wmMVO4rnP19tlSdQe by _1751015@mastodon.host
       2020-07-05T14:52:38Z
       
       0 likes, 1 repeats
       
       @cuniculus @sir YaCy has some niche applications that are interesting. Check the writing here and the comments:https://www.susa.net/wordpress/2020/05/personal-search-engine/Personal index of curated URLs + eventually sharing the index - IMO it has advantages over a general purpose search engine.
       
 (DIR) Post #9wopbXXOp0y1xm0qUC by z428@social.tchncs.de
       2020-07-06T19:26:16Z
       
       0 likes, 0 repeats
       
       @sir Agree. But, once and again: Maybe this is not so much a F(L)OSS issue but more an issue of handling a large, potentially decentralized / distributed search index at runtime, keeping things available, stable, performant 24x7. Maybe, finally, a situation to understand our current focus on code and code licensing is important but not *all* it takes to have working technology available.....? 🙂