[HN Gopher] YaCy, a distributed Web Search Engine, based on a pe...
       ___________________________________________________________________
        
       YaCy, a distributed Web Search Engine, based on a peer-to-peer
       network
        
       Author : Timothee
       Score  : 248 points
       Date   : 2024-03-06 06:33 UTC (16 hours ago)
        
 (HTM) web link (yacy.net)
 (TXT) w3m dump (yacy.net)
        
       | DrDroop wrote:
       | I once went to a workshop on a Sunday morning at the local
       | makerspace to listen to someone talk about some kind of
       | distributed search engine or something like that. One of the
       | developers came from (I think) Germany to explain this to us the
       | centralized sheeple. He just gave a demonstration of the thing,
       | like here is the box you type stuff and here are the results.
       | When I started to ask questions about how it worked an all he
       | sort of acted annoyed saying it was all too difficult to explain.
       | This was more than ten years ago, and yes I am still angry about
       | it.
        
         | ssijak wrote:
         | At the core it was probably based on peer to peer distributed
         | hash tables, so here you go, read the source
         | https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia...
        
           | belter wrote:
           | 160 bits ought to be enough for anybody :-)
        
         | albert180 wrote:
         | It's probably him YaCy is made by a German Dude
        
           | synctext wrote:
           | Impressive 20 year project by one key developer.
           | 
           | See 20 year post in German by YaCy founder:
           | https://community.searchlab.eu/t/yacy-vor-20-jahren/1543
        
       | ssijak wrote:
       | Long time ago I worked for a startup called Wowd which built
       | distributed search engine. It was acquihired by Facebook.
       | 
       | On of the biggest issues was how to entice people to download and
       | run the client/node.
       | 
       | I half wondered afterwards if slapping some crypto on top of it
       | which would be mined by running the node and providing resources
       | would help. My gut says easy yes, but my mind grimace at the
       | abomination.
        
         | zoklet-enjoyer wrote:
         | We have proof of stake now. The nodes could be run by the chain
         | validators and they get a cut of the staking rewards. Look up
         | how proof of stake works on the Cosmos chain. You could totally
         | do this and I bet it would take off, at least in that section
         | of the Internet that's into Cosmos/Tendermint chains. I'd use
         | it
        
           | ssijak wrote:
           | I was definitely thinking of some kind of proof of stake, not
           | proof of work.
        
           | zoklet-enjoyer wrote:
           | Hahaha one downvote. I love to see it
        
         | worksonmine wrote:
         | > but my mind grimace at the abomination
         | 
         | Why would that be an abomination? It's a perfect use-case. Like
         | you noticed people need incentives to volunteer their hardware.
         | If you hate crypto because it's crypto you can just use fiat
         | instead.
        
           | komali2 wrote:
           | > Like you noticed people need incentives to volunteer their
           | hardware.
           | 
           | I wonder if this is because "volunteer your hardware"
           | projects sometimes involve someone making else money, and if
           | someone else is making money but not you, why should you
           | donate your hardware?
           | 
           | For the truly libre "hardware donation" projects, they seem
           | to be doing ok without financial incentivization. What
           | immediately comes to mind is the petabytes of data flying
           | around on peer to peer systems through torrenting. I know
           | people that spend thousands of dollars a year on upkeep and
           | upgrades for what are essentially super seedbox homelabs (I'm
           | one of them too :P )
           | 
           | There's also communities like soulseek where people keep TBs
           | of music up, often seeking out rare tracks to make available
           | to the community for free.
           | 
           | There's folding@home and seti@home, and I'm sure other
           | similar projects I haven't heard of, where people donate
           | cycles just for the common good.
           | 
           | folding@home is a great example because we can directly
           | compare the people that are "incentivized" to participate
           | with bananocoin, a cryptocurrency rewarded based on work
           | cycles in folding@home. You can see all bananocoin miners
           | here under the banano.cc team:
           | https://stats.foldingathome.org/ That team is in first place
           | for work completed, however are only just surpassing the
           | linus tech tips team, and not to mention compared to a bunch
           | of other teams (and private "donors") they're a very small %
           | of work completed for folding@home
           | 
           | So therefore I disagree that people "need" incentives, there
           | just needs to be no, erm, disincentives, if that's a word.
        
             | shinryuu wrote:
             | > I know people that spend thousands of dollars a year on
             | upkeep and upgrades for what are essentially super seedbox
             | homelabs
             | 
             | And then you end with "there just needs to be no
             | disincentives". If anything spending thousands of dollars a
             | year on upkeep should be a disincentive for most people.
             | You are not most people though, since you do it
             | voluntarily.
        
               | komali2 wrote:
               | I'm a maniac though. I used to run my stack just fine off
               | a raspberri pi with a USB harddrive plugged in.
               | 
               | Actually, before that, I used to run it off an old
               | macbook.
               | 
               | Do we need it to be where everyone hosts a node? I just
               | had this conversation with a friend yesterday actually.
               | We were in disagreement about the accessibility of self
               | hosting and federation. He was of the opinion that we
               | should push LLMs to where anyone can type "I want to host
               | a video hosting platform" and chatgpt.exe will find and
               | install jellyfin on their computer and set up a
               | cloudflare tunnel, or whatever.
               | 
               | I'm more of the opinion that we should increase the
               | quality of documentation until the one person just weird
               | and nerdy enough out of a group of 20 will be able to
               | deploy things on leftover hardware, and share with their
               | friends.
               | 
               | What do you think?
        
               | shinryuu wrote:
               | In terms of accessibility I don't think it would be bad
               | per se if chatgpt.exe would be able to help you with
               | that. Though both of us know that there is maintenance
               | involved and once something catch fire (which will happen
               | at some point), you are kind of helpless.
               | 
               | Something like pikapods.com certainly helps with
               | accessibility, even if it isn't self-hosting per se.
               | 
               | But all of that doesn't have little to do with incentives
               | or disincentives. Even with very high accessibility there
               | are disincentives to self-host. It will cost time and
               | money in some way. For some people the intrinsic
               | motivation will override those disincentives. But I think
               | for the majority of people there will still not be enough
               | motivation to do it.
               | 
               | There are more important things to do for them.
        
           | bawolff wrote:
           | I mean, how do you verify nodes are being honest and not just
           | sending fake data for the free crypto (like what happened
           | with seti@home and there wasn't even money involved)
           | 
           | Not to mention, where is the value of this coin going to come
           | from? Will people pay to use this search engine? That seems
           | unlikely.
           | 
           | It doesn't sound like the perfect use case to me.
        
             | px43 wrote:
             | By ignoring cryptocurrencies, you've missed out on over 10
             | years of progress in this space. We have things like zero
             | knowledge notaries and data availability sampling proofs.
             | Actively Validated Services are also a thing. Service
             | providers stake some asset, and interested parties can
             | challenge them at certain intervals to ensure that they are
             | properly performing their duties. Through the magic of
             | Merkel Trees, and soon Verkel Trees (basically Merkel
             | tress, but using vector commitments for super fast proofs)
             | challengers can demand that that service providers generate
             | a proof that some data they hold matches some criteria. The
             | nice thing about it is that because it's a zero knowledge
             | proof, the challenger doesn't even need to know what that
             | data is, and what they get back is a succinct proof that
             | they can check very quickly, basically like checking an md5
             | some for execution correctness.
             | 
             | It's cool shit, you should really look into it.
        
             | worksonmine wrote:
             | That's exactly why blockchain is a good choice. You verify
             | that whatever X sends matches what Y and Z would send
             | before any reward is received. Based on the shared index
             | every query should return the same results, kids stuff
             | really.
             | 
             | The monetization is a nut to crack yes, but Kagi works as a
             | paywalled search engine. Otherwise just serve ads like all
             | the rest already do? Tried and proven model, and in this
             | solution they could be very transparent as there's no
             | corporation behind trying to dupe the users for clicks to
             | maximize profits. I even see the possibility for a hybrid
             | model, don't like ads? Pay for the compute with your own
             | coins.
             | 
             | The value comes from the network, trust and use-case. It
             | doesn't have to be a new coin.
        
             | numpad0 wrote:
             | Agreed; feels to me that people here is underestimating
             | malice on the Internet. Simple crypto-based search credit
             | system will be overtaken with fake queries and fake data.
             | 
             | I'm not entirely confident that crypto-like reward
             | mechanisms for distributed search is fundamentally flawed
             | and unusable, but both the problem and solution needs to be
             | refined a bit more.
        
               | worksonmine wrote:
               | > Agreed; feels to me that people here is underestimating
               | malice on the Internet.
               | 
               | I don't think we do. We just prefer to put our trust in
               | algorithms and verifiable data sources. It's not like
               | Google et al are the pinnacle of altruism, there have
               | been cases where the promoted results are faked copies of
               | the actual site you want to visit, fooling less computer
               | savvy users to install malware.
               | 
               | The trust is put into the code, same principle as
               | reproducible builds. It doesn't matter where you get the
               | source, as long as the checksum matches. This way the
               | censor side of the problem is solved.
               | 
               | That leaves the spam, which isn't really solved by the
               | big corporations either. Last time I used google I got
               | 2-3 pages of the same auto-generated bullshit on every
               | technical search term I tried. This could be fixed by
               | having the main index limited to trusted sites at the
               | expense of discovering new content. The latter can be
               | handled by opt-in indexes. If the goal is to index
               | everything users could have their own filters for sites
               | they don't want.
               | 
               | If you really want to spice it up allow me to maintain my
               | own query function (dangerous and potential exploit yes)
               | that I send to the nodes and I can handle my own ranking.
               | 
               | There's nothing that makes a distributed index more
               | unsafe than one run by Google. If every query picks 2
               | random nodes and compares the results I would trust that
               | query more than current Google execs opinions of what I'm
               | allowed to see.
        
         | lifty wrote:
         | Not sure why it would be an abomination. This is the exact use
         | case which is a fit for cryptocurrency networks.
        
           | rakoo wrote:
           | You have to look beyond the surface. Cryptocurrencies work
           | specifically to address a system where no node can trust any
           | other node. If I cannot trust any other node, why would I
           | fetch anyone else's index, or ask them for the results of a
           | query, or even talk to them ?
           | 
           | Unless there can be a way to trivially verify what others
           | tell you, crypto currencies are a dead end
        
             | mhluongo wrote:
             | You have that issue without cryptocurrencies as well, you'd
             | just be relying on the kindness of users rather than crypto
             | incentives.
             | 
             | You always need a way to hold nodes accountable in a system
             | like this, or it'll be rife with manipulation -- because
             | there's already a strong, innate incentive to manipulate
             | results. Today, we call that industry "SEO".
        
               | rakoo wrote:
               | What you don't understand is that "I don't trust others"
               | is not a terminal statee I'd rather build trust again,
               | create human connections, or rather, put them in front
               | because there are always connections; nothing works if
               | you trust noone.
               | 
               | Building a societal system where you know you can rely on
               | your peers, you build together, is a more joyful, more
               | resilient, more ecological and also more realistic way of
               | building a thriving society than distrust-by-default that
               | cryptocurrencies live for.
        
               | idiotsecant wrote:
               | Your current fiat currency is not based on love and
               | trust. Its Proof-of-World-Hegemony which puts crypto
               | based consensus mechanisms to shame in terms of how not
               | based on love and trust it is.
        
               | rakoo wrote:
               | My current fiat currency is absolutely based on trust
               | that the State will uphold any disagreement, even though
               | I know it is not benevolent.
               | 
               | I also don't understand your point. The current world is
               | not what I want, so let's make it worse according to my
               | values ?
        
               | idiotsecant wrote:
               | The value of, for example, the dollar is not based on
               | your trust, at least not at the first order. It's based
               | on the economic and military power backing it up.
        
               | rakoo wrote:
               | Absolutely it is: it is based on the trust we all have
               | that the government will do whatever it takes to
               | guarantee the value of a dollar. Me being able to
               | commerce with you in dollars and not, say, in old
               | zimbabwean dollars rests on the shared assumptions that
               | the US State can and will be there.
        
               | Brian_K_White wrote:
               | Sure it is. When someone gives me a dollar, I have no
               | idea if it's fake or stolen.
               | 
               | That sort of thing only gets handled very indirectly and
               | much later and after a bad actor does their bad thing
               | enough times for the surrounding greater population of
               | good actors to notice a pattern.
        
               | Brian_K_White wrote:
               | And I think this is not even stupid either.
               | 
               | Bad actors exist and there must be some process for
               | identifying and dealing with them, but they are not the
               | majority of people and so probably don't have to be the
               | first, last, primary, and only consideration at all
               | times.
               | 
               | IE living in a bomb shelter is not a life worth living,
               | even though yes you will be safe from bombs and theives.
        
               | rakoo wrote:
               | Exactly. If I'll have to depend on someone else anyway
               | (and I will), might as well build trust because a life
               | being cautious about everything and everyone is not worth
               | living. Only those with already vast amounts of money can
               | afford it because they trust (heh) other people working
               | for them to taue care of that, but to non-jokingly
               | propose it as a standard for everyone is a dystopia.
        
             | lifty wrote:
             | You should be able add incentives in the system so that
             | people store the correct index. You can check the incentive
             | design of Filecoin for an example of how you can do that.
             | Obviously it depends on the application how the incentive
             | mechanism should be built.
        
               | rakoo wrote:
               | Filecoin is "easy": it is trivial to verify that the blob
               | you stored is the one I wanted you to store. There is no
               | trivial way to verify that you indexed what I wanted you
               | to index, or that you reply what I wanted you to reply.
               | 
               | I highly dislike monetary incentives because they
               | perpetuate inequalities by design, so here's another
               | incentive: if you store a correct index, I will keep
               | working with you and we can build an awesome system
               | together. We can coordinate by talking to each other
               | rather than trying to get money from each other.
        
             | zubairq wrote:
             | Interesting comment about how cryptocurrencies can enable a
             | system where no node can trust any other node. Something
             | for me to think about as I am building a peer to peer
             | system (not a search engine though)
        
               | rakoo wrote:
               | Cryptocurrencies only help where no one can trust anyone.
               | But if that's the case, I claim that such a system is not
               | viable in the long term.
        
               | zubairq wrote:
               | Good point. Does this mean that Bitcoin is not viable in
               | the long term?
        
               | rakoo wrote:
               | Bitcoin as a speculating tool lives as long as
               | speculation can live. Bitcoin, or any cryptocurrency as
               | an actual currency exchanged at large scale will not
               | work, or at least not in a democracy.
        
             | miohtama wrote:
             | Cryptocurrencies solve spam problem, not trust problem. No
             | one can spam the network with new write data (transactions)
             | because spam would become expensive. Although people still
             | do, and Ethereum is full of spam tokens, meaning the
             | transaction cost is still too low. This was also the use
             | case of Hashcash, predecessor in proof-of-work, and was
             | designed to solve email spam.
             | 
             | You are paying either
             | 
             | - Block space: your transaction to be included in a block
             | 
             | - State: modifying the world state (EVM in Ethereum)
             | 
             | Trust problem is solved by various other means, usually on
             | libp2p level, by banning node (IP addresses) that send you
             | bad data, which you can verify by comparing it to data from
             | other peers.
        
               | dumbfounder wrote:
               | They also solve the trust problem through consensus using
               | proof of stake. If there is enough financial skin in the
               | game to behave correctly, then that should be enough to
               | make sure that results are not tainted.
        
               | rakoo wrote:
               | Cryptocurrencies slow down the rate of data not because
               | of spam but because a slower rate means a higher
               | consistency across the network: cryptocurrencies' goal is
               | to agree on a consistent state with peers who do not want
               | to negotiate. If the consistent state is pure garbage
               | then that is not a problem for blockchains, because from
               | blockchains' point of view, everything is fine.
               | 
               | Spam is not a function of rate but of content. Spam can
               | absolutely be sent in a blockchain, as you say, and
               | making the price higher only makes both spam and non-spam
               | more difficult. Spam for me might be actual legit
               | information for you.
               | 
               | Hashcash is another beast, it only has the proof-of-work
               | part, not the money part (contrary to its name) so it's
               | not comparable.
        
             | mattdesl wrote:
             | This seems like something that could be verified through ZK
             | proofs. The data to search could be represented by a public
             | merkle root, and the searching/indexing given the user
             | query could be programmed in a ZKVM like RISC0[1].
             | 
             | [1] https://www.risczero.com/zkvm
        
               | notfed wrote:
               | Most information is not a math equation.
        
         | 6510 wrote:
         | After that the issue becomes ranking. Should say became since
         | LLM's could both rank pages and generate them on "demand" to
         | fit the query.
         | 
         | YaCy has so many buttons I'm not even sure if it lacks it but
         | playing around with it it is very cool to crawl large amounts
         | of pages and serve requests until you want to do other things
         | with the computer and the background process is to bloated.
         | Something like a turtle mode like torrent clients have would be
         | useful.
         | 
         | Long ago there was a Chinese p2p client with a rootkit that
         | would seed at 1 kb. I haven't used it but was told it worked
         | remarkably well.
        
         | mdaniel wrote:
         | Nothing new under the sun, as they say:
         | https://www.presearch.io/engine and just as you said I was
         | unwilling to run a closed-source node binary
        
         | colinsane wrote:
         | if the situation is really "nobody will run this software
         | unless i pay them to", then you're doomed regardless. there's
         | nothing wrong with the classic route: package your software for
         | the stores/distros you're familiar with, make your software _as
         | easy to package as humanly possible_ for anyone else who 'll
         | come around, document the hell out of it, submit it to the
         | handful of top-level news feeds from which it'll percolate, and
         | then wait. maybe you don't like waiting?
        
       | rasulkireev wrote:
       | Love it. Super easy to self host and use. Now I have a personal
       | Google!
        
       | maxloh wrote:
       | See also: Presearch, another decentralized search engine, claimed
       | that it will be open source. No source code available at the
       | moment though.
       | 
       | https://presearch.com/
        
       | b2bsaas00 wrote:
       | Could this be used for a Torrent search engine?
        
         | fddrdplktrew wrote:
         | if it is not censored, probably?
        
         | worksonmine wrote:
         | Recently there was a distributed tracker on the front page.
         | Probably more what you're looking for.
        
           | BLKNSLVR wrote:
           | Bit Magnet: https://bitmagnet.io
        
           | rakoo wrote:
           | Note that it's not a distributed tracker, it's an
           | indexer/tracker/search engine that _uses_ distributed
           | resources (the nodes in the dht)
        
         | feverzsj wrote:
         | btdig is still alive.
        
           | qingcharles wrote:
           | btdig has the data, but its search is subpar :(
        
       | vGPU wrote:
       | Has it gotten any better recently?
       | 
       | I run a node but I haven't actually used it as a search engine in
       | a while, as I found the result quality to be exceedingly poor.
        
         | rahen wrote:
         | I remember trying it for a while in 2012, but the results were
         | essentially worthless, probably because there were so few
         | nodes/crawlers back then. I guess the more users there are, the
         | better the results.
        
           | viraptor wrote:
           | Alternatively, ignore the public network (it's still useless)
           | and run it as your own crawler. Seed it with your browsing
           | history, some aggregators like HN, your favourite RSS feeds,
           | etc. and you'll be good.
        
           | WarOnPrivacy wrote:
           | > I remember trying it for a while in 2012, but the results
           | were essentially worthless,
           | 
           | I had mine crawling gov, mil, etc sties for pages that Google
           | was starting to delist back then. Inbound requests were heavy
           | with porn until I tweaked - IDK, something.
        
         | Avamander wrote:
         | No.
         | 
         | Either it picks up too much garbage if you allow any P2P data
         | exchange (can't allow only outgoing AFAIK) or it kinda only
         | knows about the sites you know about. Which kinda defeats the
         | purpose.
         | 
         | Even assuming you just want a specific index for yourself of
         | your own content then it struggles to display useful snippets
         | about the results, which makes it really tedious to shift
         | through the already poor results.
         | 
         | If you try to proactively blacklist garbage, which is
         | incredibly tedious because there's no quick "delete from index
         | and blocklist" button under index explorer, then you'll soon
         | run into an unmanageable blocklist, the admin interface doesn't
         | handle long lists well. At some point (around 160k blocked
         | domains) Yacy just runs out of heap during startup trying to
         | load it which makes the instance unusable.
         | 
         | It also can't really handle being reverse proxied (accessed
         | securely by both the users and peers).
         | 
         | It also likes to completely deplete disk space or memory, so
         | both have to be forcefully constrained. But that ends up with a
         | nonfunctional instance you can't really manage. It also doesn't
         | separate functionality enough that you could manually delete a
         | corrupt index for example.
         | 
         | Running (z)grep on locally stored web archives works
         | significantly better.
        
           | bobajeff wrote:
           | Those are pretty bad issues. I remember using it along time
           | ago and only remember the results being bad. I've heard that
           | Yacy could be good for searching sites you've already visited
           | but it sounds like even that might not be a good use case for
           | it.
           | 
           | I do understand the taking up of disk space thing. It's hard
           | to store text of all your sites without it talking up a lot
           | of space unless you can intelligently determine which text is
           | unique and desired. Unless you are just crawling static pages
           | it becomes hard to know what needs to be saved or updated.
        
       | RGBCube wrote:
       | curl failed to verify the legitimacy of the server and therefore
       | could not         establish a secure connection to it. To learn
       | more about this situation and         how to fix it, please visit
       | the web page mentioned above.
       | 
       | Can't seem to access the page.
        
       | gonesilent wrote:
       | Infrasearch / Gonesilent sold to Sun turned into project JXTA and
       | died.
        
         | mdaniel wrote:
         | While trying to read more about it, turns out there's an
         | O'Reilly book, too: https://www.oreilly.com/library/view/jxta-
         | in-a/059600236X/ch... and there's also this
         | https://wiki.wireshark.org/JXTA _(I 'm guessing those
         | specification links are in wayback but I didn't chase them)_
        
       | charcircuit wrote:
       | Are the results still being gamed by sites using content keyword
       | stuffing? The last time I used it the searching and ranking
       | technology felt like they were 40 years behind state of the art.
        
         | liotier wrote:
         | In distributed indexing, spam management seems a much bigger
         | problem than the indexing itself.
        
       | boyter wrote:
       | I actually half wrote a RFC of a spec and 2 implementations of a
       | federated search last year. Rather than do the disturbed hash
       | table that yacy does.
       | 
       | I wanted results to be re-rankable by the peers by sharing the
       | scores that went into them. The idea being with a common protocol
       | based on the ideas of ActivityPub you could get peers of searches
       | working together to hopefully surface interesting things.
       | 
       | Something I should probably finish and publish at some point. It
       | worked to the hundreds of peers I tested.
       | 
       | The reason I mention this is because I wanted to also add a front
       | into yacy which tuned out to be harder than I expected. It's a
       | wonderful project and you can find great stuff through it but the
       | way the peers return results sometimes it's hard to find it
       | again. It's also not quite as hackable as I would have hoped at
       | the time probably due to he project age.
       | 
       | I still think there is value in it though and I'd love to see
       | yacy have its protocol explained as an apex so people could,build
       | implementations in other languages more easily.
        
         | detourdog wrote:
         | I remember the first days of gopher browsing were like that.
         | Gopher browsing to me was like swinging on vine to vine. The
         | trick was remembering/documenting where each vine went.
        
       | arboles wrote:
       | Sort of hijacking the thread to ask, can YaCy or similar, be an
       | alternative to Google's Programmable Search Engine? All I use it
       | for is limit a search to a medium-sized list of domains. The
       | aspect that makes running a search engine difficult on your own
       | is lack of resources for crawling, I expect. But since I only
       | care about a small list of domains, could I ditch Google's and
       | run my own crawler like YaCy?
        
         | gtirloni wrote:
         | Is that the deceased code search tool?
         | 
         | You could run Sourcegraph and import/sync those repositories.
         | 
         | Or you could run your own ElasticSearch/Melisearch and crawl
         | the websites yourself (if you're interested in things other
         | than git repositories).
        
           | arboles wrote:
           | > Is that the deceased code search tool?
           | 
           | No, it's _Programmable_. Though it 's not actually
           | programmable. I should've written Custom Search Engine
           | instead, that's also a name for it.
           | 
           | cse.google.com - It's quaint that past the modern landing
           | page, when using the search portal today, you still get some
           | outdated iteration of Google UI design.
           | 
           | It's used, for example, for making OSINT searches.[0] Or at
           | some point by at least one Wikipedia editor for a custom list
           | of Reliable Sources for Anime & Manga.[1]
           | 
           | [0] https://www.osintme.com/index.php/2020/09/28/
           | 
           | [1] https://gwern.net/me#wikis
        
       | anthk wrote:
       | Ugh, Java. I'll wait for something like i2pd does for I2P,
       | something called yacyd either in c, c++ or golang.
        
         | ravenstine wrote:
         | What's your objection to Java?
        
           | anthk wrote:
           | High CPU and RAM usage.
        
       | WarOnPrivacy wrote:
       | Yacy's still around. Nice.
       | 
       | After a year or two of hosting a Yacy instance (2014?) I started
       | winding up on some general (probes, etc) blacklists.
       | 
       | I also host a small mail server and I was getting mail returned.
       | I'd force an IP swap and a few weeks later it'd be the same. I
       | had to let Yacy go.
        
         | 1oooqooq wrote:
         | So that is how they block a people's search/crawler. Didn't
         | thought they would use the most complicated method.
         | 
         | They also use block lists to add every single TOR node (even if
         | not an exit) and every VPN under the sun (except for streaming,
         | because, why would them, that's why they exist)
        
       | renegat0x0 wrote:
       | There are already many project about search:
       | 
       | - https://www.marginalia.nu/
       | 
       | - https://searchmysite.net/
       | 
       | - https://lucene.apache.org/
       | 
       | - elastic search
       | 
       | - https://presearch.com/
       | 
       | - https://stract.com/
       | 
       | - https://wiby.me/
       | 
       | I think that all project are fun. I would like to see one
       | succeeding at reaching mainstream level of attention.
       | 
       | I have also been gathering links meta data for some time. Maybe I
       | will use them to feed any eventual self hosted search engine, or
       | language model, if I decide to experiment with that.
       | 
       | - domains for seed https://github.com/rumca-js/Internet-Places-
       | Database
       | 
       | - bookmarks seed https://github.com/rumca-js/RSS-Link-Database
       | 
       | - links for year https://github.com/rumca-js/RSS-Link-
       | Database-2024
        
         | fsflover wrote:
         | But which of those projects are distributed and FLOSS?
        
         | legrande wrote:
         | Also these:
         | 
         | https://swisscows.com/en
         | 
         | https://search.disconnect.me/
         | 
         | https://www.ecosia.org/
         | 
         | https://metager.org/
         | 
         | https://searx.space/
        
         | ColinHayhurst wrote:
         | https://www.mojeek.com/ self-disclosure, mojeek team member
        
         | wongarsu wrote:
         | To be fair, of those only Apache Lucene predates YaCy. YaCy is
         | very mature, but in terms of relative popularity for general
         | web search probably peaked around 15 years ago.
        
       | buffalobuffalo wrote:
       | I ran YaCy for a while, but not as a node on their distributed
       | search index. I just ran it as a search engine for all my own
       | bookmarks. Unfortunately I never found a particularly good way of
       | getting bookmarks into the system. So eventually I shut it down.
       | Cool idea in theory though.
        
       | fortran77 wrote:
       | Related to this -- I'd love to see individuals making web pages
       | again, and federated search engines indexing them. People don't
       | make their own hobby or fan or art websites anymore, and I think
       | that's partly because nobody will every find them with the big
       | search engines.
        
         | emrah wrote:
         | I think it would be nice if the search results were
         | "distributed" rather than deterministic.
         | 
         | So when i enter the same keywords, let's say there are 50 pages
         | each of which would be equivalently good result for the search,
         | rather than one page "winning", the search engine would
         | alternate the winner among the many possibilities
        
       | jrussbowman wrote:
       | Nice to see search projects are still popping up. After a move,
       | family life taking over and me getting more interested in Unreal
       | Engine, my poor search engine is now more of an experiment in
       | seeing how well it runs while basically on life-support
       | maintenance updates I do. Starting to think I honestly should
       | just take it down and save my $50 a month I spend maintaining it.
       | 
       | But I'll post it in a hacker news comment and maybe you all will
       | give it enough traffic I can get excited about it again, lol
       | 
       | https://www.unscatter.com
        
         | jrussbowman wrote:
         | And for my immature moment of the day, the above comment was
         | comment #69
        
       | fho wrote:
       | I've been using several times over the last decades and never got
       | good results. I think one instance is still running on my old
       | computer at uni :-)
        
       | dredmorbius wrote:
       | Previously:
       | 
       | YaCy - your own search engine |
       | https://news.ycombinator.com/item?id=32597309 | 2 years ago | 93
       | comments
       | 
       | YaCy: Decentralized Web Search |
       | https://news.ycombinator.com/item?id=22246732 | 4 years ago | 41
       | comments
       | 
       | YaCy - The Peer to Peer Search Engine |
       | https://news.ycombinator.com/item?id=17089240 | 6 years ago | 3
       | comments
       | 
       | YaCy: a free distributed search engine |
       | https://news.ycombinator.com/item?id=12433010 | 8 years ago | 24
       | comments
       | 
       | YaCy: Decentralized Web Search |
       | https://news.ycombinator.com/item?id=8746883 | 9 years ago | 29
       | comments
       | 
       | YaCy takes on Google with open source search engine |
       | https://news.ycombinator.com/item?id=3288586 | 12 years ago | 17
       | comments
        
       | treprinum wrote:
       | Is it worth dedicating 1-2 low power NUCs (4-8 core) to this on a
       | 250MBit/s connection? Or does it need beefier CPUs/network?
        
       | nairboon wrote:
       | If you run YaCy with docker and it is still a junior peer, does
       | the search return results from the global index or just the one
       | that appears to be 'preinstalled'?
        
       ___________________________________________________________________
       (page generated 2024-03-06 23:01 UTC)