[HN Gopher] Show HN: Nebula - A network agnostic DHT crawler
___________________________________________________________________
Show HN: Nebula - A network agnostic DHT crawler
Author : dennis-tra
Score : 50 points
Date : 2024-03-20 10:54 UTC (12 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mikae1 wrote:
| Unlucky naming collision with Slack's networking tool Nebula:
| https://github.com/slackhq/nebula
| rad_gruchalski wrote:
| Oh no, what of an unfortunate event. The slack tool uses the
| name already used by OpenNebula. /s
| Nux wrote:
| Not to mention they jinxed Slack(ware) for many.
| sph wrote:
| And the Nebula streaming platform. Which is unfortunate because
| I'm using both.
|
| I get it, nebulae are cool.
| doublerabbit wrote:
| Not forgetting the awesome Nebula game engine
| pdabbadabba wrote:
| I'm sure that this just because I'm not the target audience, so
| intend only the very gentlest criticism. But I literally LOLed at
| how completely incomprehensible this README was for me. It has
| really been a while since I've read a paragraph and had literally
| no idea what it was talking about. But here's the winner:
|
| > A network agnostic DHT crawler and monitor. The crawler
| connects to DHT bootstrappers and then recursively follows all
| entries in their k-buckets until all peers have been visited.
|
| Following the Wikipedia link for "DHT" yielded some clues. (Ah.
| Distributed hash table.) But I've still been looking at this for
| several minutes now and am basically just puzzled. But the graphs
| are pretty! Reading the word "amino" a little further down threw
| me off the scent for a bit. But I gather that is actually a
| proper noun, and we aren't really talking about proteins here.
|
| Maybe an initial sentence that makes fewer assumptions about the
| reader's familiarity with the jargon would be helpful.
| Chabsff wrote:
| This is not a particularly egregious example, but it's kind of
| spectacular how everything crypto adjacent revels in
| technobabble.
|
| The detractors of the ecosystem (myself included, to be honest)
| will be quick to point out that obfuscating the tech as magic
| as much as possible, as well as creating an inside group lingo,
| is key to onboarding and retaining people into it. But it's
| fascinating how that percolates throughout the dev community
| behind it as well.
| DanAtC wrote:
| I feel the same way about AI/LLM lingo
| PedroBatista wrote:
| While I agree, there is a whole generation of people who know
| what a DHT is, it's not really that obscure.
|
| I'm talking about of course very late 90s, 00's P2P file
| sharing, Kademlia, torrents but also later "eventually
| consistent" databases ( remember those? )
|
| The crypto 20 somethings hype beasts came way later.
| nodja wrote:
| A DHT is a decentralized key-value database, it's most famous
| use being in the bittorrent protocols, it uses a routing
| algorithm to guarantee that you can find the peers that can
| retrieve the value of a known key, granted that you at least
| know one peer in the network (even if that peer doesn't know
| the value). Essentially the network is split into buckets and
| it guarantees that you'll either will be already connected to a
| peer that knows the value for the key, or that that peer knows
| a peer whose bucket is closer to the key, you can then
| recursively ask for peers that are closer and closer until you
| find one that knows the key, as you do this search you keep
| track of the peers so the next time you ask for another key
| you're more likely to know a peer that is closer to it. A
| typical DHT implementation has you keep track of hundreds of
| peers to guarantee the robustness of the network.
|
| One issue is that peers go offline and online all the time, so
| the network is ever changing, if you turn off your client for a
| week and then come back, your only hope is that at least one of
| the peers you know is still online, if that's the case then
| that's fine, if that's not the case, or you're starting the
| client for the first time, then there's no way for you to
| connect to the network and query for keys. In bittorrent this
| is not an issue as most torrents include trackers, the original
| centralized way of finding peers on the network, but it seems
| that each project listed on this page has it's own separate DHT
| network that doesn't connect to the main network (the one used
| by bittorrent), so for you to connect to these networks for the
| first time you need to use a bootstrap peer, this is just a
| normal peer on the network that is known to be always online,
| usually hosted by owner of the project, and it'll give you a
| starting point to find other peers in the network.
|
| What this project does in essence, is connect to a bootstrap
| peer, then use the properties of the routing algorithm to
| efficiently find out all the peers that are currently online.
| dTal wrote:
| Why is BitTorrent not supported? Perhaps I'm misunderstanding
| this thing but it seems like application #1.
| DanielVZ wrote:
| Because this seems to cater to the cryptocurrency/blockchain
| culture.
| ogurechny wrote:
| My guesses:
|
| a) Many other tools exist for that.
|
| b) Bittorrent DHT modes are simple and interchangeable. They
| can give you a list of peer addresses associated with certain
| (torrent) hash -- and only if you know the exact hash. Even
| client versions can't generally be collected (apart from some
| protocol extensions). The only thing you learn about DHT member
| is that it exists. On the contrary, this project is for
| heterogeneous networks in which peers announce various
| services.
|
| c) Number of Bittorrent DHT nodes is... bigger.
|
| d) To collect interesting data from Bittorrent DHT, one needs
| to observe as much third party torrent hash requests as
| possible. To do that, multiple nodes are needed. Moreover, they
| need to run for a long time, not just because it takes time to
| make a lot of requests to a lot of nodes, but also because of
| external preference for long-running nodes. Not sure how
| important it is, but, anecdotally, a fresh DHT node sees twice
| as much requests after a week than after a day.
| jzm2k wrote:
| Looks like Nebula uses go-libp2p and all of the supported
| networks listed in the README use libp2p for their p2p
| networking. Mainline DHT doesn't support the same transport
| protocols that libp2p supports (such ash TCP+Yamux+Noise) which
| is probably why Nebula doesn't support Bittorrent
| crotchfire wrote:
| Because it isn't really network-agnostic.
|
| It only supports IPFS and derivitaves thereof.
| ogurechny wrote:
| /me remembers various DHT views, traffic flows, client stats,
| graphs and other data decorations in Azureus. Now that's what I
| call a dashboard.
| crotchfire wrote:
| It isn't really network-agnostic... in fact it doesn't support
| the (by far) largest DHT out there, the Mainline DHT that
| bittorrent uses.
|
| This is just a crawler for DHTs that use IPFS's implementation,
| or at least smell very similar to it.
| pedalpete wrote:
| Can someone explain why we want to crawl and/or monitor? What is
| this used for?
|
| When I think of a crawler, I think of a non-homogonous network
| (if that is the right term).
|
| But with the blockchain, isn't it the case that each node has an
| entire copy of the blockchain, so you don't need to "crawl" it,
| it works more like a database.
|
| What am I not understanding about this?
___________________________________________________________________
(page generated 2024-03-20 23:01 UTC)