[HN Gopher] Bitmagnet: A self-hosted BitTorrent indexer, DHT cra...
       ___________________________________________________________________
        
       Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, and
       torrent search
        
       Author : KoftaBob
       Score  : 300 points
       Date   : 2023-10-05 11:44 UTC (11 hours ago)
        
 (HTM) web link (bitmagnet.io)
 (TXT) w3m dump (bitmagnet.io)
        
       | coolspot wrote:
       | Please remove copyrighted movies from the screenshot on your
       | website. It provides evidence that this program is designed for
       | violating copyright, which makes DCMA takedown trivial.
        
         | gymbeaux wrote:
         | I don't see any screenshots on the website
        
           | davidcollantes wrote:
           | There was, and were removed.
        
         | pipes wrote:
         | And they probably want to remove text like
         | 
         | "It then further enriches this metadata by attempting to
         | classify it and associate it with known pieces of content, such
         | as movies and TV shows. It then allows you to search everything
         | it has indexed."
        
           | dewey wrote:
           | I don't see anything wrong with that if the example titles
           | are under the correct licenses like the often used:
           | https://en.wikipedia.org/wiki/Big_Buck_Bunny which can then
           | be mapped to open databases like
           | https://www.themoviedb.org/movie/10378-big-buck-bunny.
        
         | rsync wrote:
         | Please determine if these images fall under fair-use provisions
         | and, if so, leave them in place.
         | 
         | Bad actors - whoever they may be - need to see your rights
         | constantly reasserted.
        
           | KennyBlanken wrote:
           | The MPAA and other organizations use screenshots that show
           | copyrighted material as "proof" that the tools are used for
           | copyright violation, and then DMCA them.
           | 
           | If you want to help pay for lawyers to fight those DMCA
           | notices with counterclaims and lawsuits, put up or shut up;
           | the FSF, EFF and ACLU have been noticeably disinterested in
           | doing so.
        
           | crazygringo wrote:
           | This has nothing whatsoever to do with fair use.
           | 
           | It's about arguments in court about the intention of the
           | software if sued. Images of copyrighted content indicate
           | intent to infringe copyright. Without those, you can argue
           | it's only meant to find and index Linux image torrents or
           | whatever.
           | 
           | Fair use doesn't enter the picture at all.
        
         | mgdigital wrote:
         | Thanks - I only condone accessing the legal content available
         | on BitTorrent, and my screenshots now embody this moral stance.
        
       | chmod775 wrote:
       | > The DHT crawler is not quite unique to bitmagnet; another open-
       | source project, magnetico was first (as far as I know) to
       | implement a usable DHT crawler, and was a crucial reference point
       | for implementing this feature.
       | 
       | Heh. That was one of my first projects when I was still learning
       | to code back in 2012: https://github.com/laino/shiny-adventure
       | 
       | The DHT crawler/worker lived seperately, and I eventually put it
       | here to rescue it from a dying HDD: https://github.com/laino/DHT-
       | Torrent-database-Worker
       | 
       | The code is abhorrent and you absolutely shouldn't use it, but it
       | worked. At least the crawler did - the frontend was never
       | completed.
       | 
       | Since the first implementation of mainline DHT appeared in 2005
       | and crawling that network is really quite an obvious idea, I
       | doubt we (a friend was working on it as well) were first either.
        
         | JP44 wrote:
         | Nothing substantial, I chuckled when I saw the commit history
         | on your linked projects. I do not mean to belittle you (or the
         | purpose/goal of the projects), genuinely enjoyed the
         | distraction and 'results' from it:
         | 
         | Today was the first commit after 11 (9 oct 2012) and 5 years
         | (24 nov 2018), respectively, on the projects. I think your repo
         | might be part of some sort of oldest 'active'- or 'not ported
         | to another repo'-repo
         | 
         | For what I've found in ~10 min (google/gpt), excluding git
         | projects existing before spring 2008 (couldn't get a quick
         | consensus on feb vs april of that year), there's not a lot
         | 
         | (I'll edit this part if sources are requested)
        
       | thelastparadise wrote:
       | Very interesting. Does this approach actually work in practice?
       | 
       | Also what happens if illegal content gets scooped up into the
       | index?
        
         | KoftaBob wrote:
         | Based on my understanding of how the torrent DHT works, all
         | that's happening is that youre requesting metadata on various
         | torrent info hashes, but that's not the same thing as actually
         | downloading/seeding the content in the torrent itself.
        
           | no_time wrote:
           | >but that's not the same thing as actually
           | downloading/seeding the content in the torrent itself.
           | 
           | The question is whether Law Enforcement and "Intellectual
           | Property" watchdogs make a meaningful distinction between the
           | two in their monitoring tools.
        
             | justinclift wrote:
             | Seeing as they get so much other stuff wrong, it's highly
             | likely they don't care.
        
         | Akashic101 wrote:
         | Considering the screenshot on the linked page shows The Flash
         | among other movies I dont think the author is too concerned
         | about that. I believe its similar to the area that Plex and
         | Jellyfin operate in, mainly that they just provide the
         | framework and tools, what the user does with them is not in
         | their control
        
           | EGreg wrote:
           | How will governments police it when MaidSAFE and other
           | systems distribute totally encrypted content?
        
             | colinsane wrote:
             | BitTorrent already supports encrypted peer connections.
        
               | mixmastamyk wrote:
               | Doesn't seem to work, our ISP has sent nastygrams despite
               | this and site on https.
        
               | colinsane wrote:
               | > How will governments police it when MaidSAFE and other
               | systems distribute totally encrypted content?
               | 
               | so isn't the answer then "they'll continue to police it
               | the way they already do"? i don't know what a MaidSAFE
               | is, but the context of this discussion is the DHT, and so
               | public (indexable) torrents, and so however you encrypt
               | the content doesn't matter because you have to provide
               | the decryption method to anyone who asks for any of the
               | previous context (public torrents/indexes) to make any
               | sense.
        
               | mixmastamyk wrote:
               | Encrypted connections shouldn't decrypt for anyone who
               | asks, otherwise they have no reason to exist.
               | 
               | The weak point seems to be the tracker or filename, but
               | been told https hides that so not sure.
        
               | no_time wrote:
               | Encryption without authentication in this case is as good
               | as XORing all outgoing data with a fixed key. So
               | basically useless...
        
               | colinsane wrote:
               | yes, exactly. BitTorrent supports encryption. swapping it
               | out for some other encryption mechanism won't change
               | anything when it comes to government policing because
               | that's already not where the weaknesses lie for those
               | sharing p2p content today. so what was GP's point?
        
         | lobsterslive wrote:
         | It works well in practice. The DHT protocol includes announce
         | messages that broadcast when new files are shared on
         | BitTorrent. It then includes a "geometric" way to find people
         | who are sharing those files. It doesn't include the files
         | themselves, just the torrents which include a file list and
         | location hashes.
         | 
         | If you listen to BitTorrent's DHT network, you'll build an
         | index of everything shared on BitTorrent (over time), this will
         | include commercial movies and such.
        
           | grepfru_it wrote:
           | >It works well in practice.
           | 
           | Hi, I worked on gnutella and lots of P2P systems in the early
           | 00s. This will devolve into noise and spam as the number of
           | users who adopt this feature pass a critical mass. With a
           | fully decentralized system, there are no gatekeepers, and as
           | such, there is no way to filter counterfiet items. While your
           | client will present with you the data you are searching for,
           | you will find out (usually hours later) that your supposed
           | pirated download is actually just a 2hour loop of Rick Astley
           | (still piracy though, so you are still winning.. i think?).
        
             | SparkyMcUnicorn wrote:
             | But you can still pick the option with the most seeders,
             | which should get you what you're looking for most of the
             | time.
             | 
             | The spam problem isn't nonexistent within the centralized
             | services either.
        
               | grepfru_it wrote:
               | Hehe in a popular P2P client from the '03-'05 period, we
               | said the same thing. Turns out there are groups with
               | large amounts of funding which will provide a fake seed
               | count. Either just faking metadata making it seem there
               | was a high seed count but bogus nodes which would refuse
               | connections (which was actual behavior from clients with
               | bad ISPs - which we saw valid cases in asia or east
               | europe) or would actually stream data (and some of them
               | were on good hosts seeding multi mbps of bad data)
               | 
               | What i'm saying is it becomes a numbers game and those
               | fake seeders usually have deep pockets financed by the
               | content creators themselves
        
             | derefr wrote:
             | Once you've discovered a torrent being seeded, is there no
             | way to interrogate the seeders and/or the DHT itself, to
             | find out the oldest active seeder registration on that
             | torrent hash; and then use the time-of-oldest-observed-
             | registration to rank torrents that claim to be "the same
             | thing" in their metadata, but which have different piece-
             | trie-hash-root?
             | 
             | I ask, because a similar heuristic is used in crypto wallet
             | software, visibility-weighting the various "versions" of a
             | crypto token with the same metadata, by (in part) which
             | were oldest-created. (The logic being: scam clones of a
             | thing need to first observe the real thing, before they can
             | clone it. So the real thing will always come first.)
             | 
             | Of course, I'm assuming here that you're searching for an
             | "expected to exist" release of a thing by a specific
             | distributor, where the distributor has a known-to-you
             | structured naming scheme to the files in their releases,
             | and so you'll only be trying to rank "versions" of the
             | torrent that all have identical names under this naming
             | scheme, save for e.g. the [hash] part of the file name
             | being different to match the content. This won't help if
             | you're trying to find e.g. "X song by Y artist, by any
             | distributor."
        
             | thomastjeffery wrote:
             | Gatekeeping is just a bad moderation method in the first
             | place.
             | 
             | What you need is _sorting_ and _categorization_. If you
             | really want to involve authoritative opinions on metadata,
             | then use a web of trust.
        
               | NegativeK wrote:
               | I've yet to see a moderation method that works better
               | than gatekeeping.
        
             | fluoridation wrote:
             | The way to filter out garbage is to download things with
             | lots of seeds, and if you still happen to download garbage,
             | to immediately stop sharing it.
        
               | latchkey wrote:
               | Chicken/egg problem... as mentioned by someone else
               | above...
               | 
               | https://news.ycombinator.com/item?id=37779341
               | 
               | > _New releases of something that just came out tend to
               | suffer from this, though. Sometimes the counterfeits
               | reach escape velocity - the rate of people joining in
               | downloading the counterfeit exceed the rate of people
               | realizing and stopping, thus giving the illusion of a
               | legit torrent._
        
               | fluoridation wrote:
               | It's possible. I never follow new releases. But back in
               | the ed2k days, I'd say about half of just about any file
               | you cared you search for was fake, regardless of age.
        
               | [deleted]
        
             | danpalmer wrote:
             | I don't think this project changes any of this? Torrents
             | have been around for decades and this hasn't been a problem
             | yet. We can't rule it out entirely but it does seem
             | unlikely at this point to be worthwhile doing otherwise
             | we'd see more exploitation.
             | 
             | If the criticism is that a DHT crawler is going to be more
             | subject to this than a website where people submit upload
             | torrents, that may be the case, but I think the author of
             | this project underestimates the DHT crawling going on. I
             | believe the torrent ecosystem is largely automated and
             | there's little in the way of manual submission or human
             | review going on.
        
               | Retr0id wrote:
               | The "problem" is that most users aren't crawling the DHT
               | to find torrents, right now. The more people start using
               | DHT crawlers as their primary way of finding new
               | torrents, the more incentive there is to spam the DHT
               | with junk, malware, etc. (because there will be more
               | eyeballs on it)
               | 
               | That is, the usefulness of DHT crawling is inversely
               | proportional to how many people are doing it.
        
               | danpalmer wrote:
               | But my second point is that I really think they _are_
               | crawling the DHT, albeit indirectly. There are many
               | torrent websites and they tend to have the same content.
               | It seems fairly clear to me that this is what most
               | torrent sites are doing. Maybe not the major names that
               | users might submit to, but the long tail of other torrent
               | search indexes certainly. It also seems to be what
               | Popcorn Time does.
        
             | hypertele-Xii wrote:
             | While you're technically correct, the protocol is resilient
             | to such attack, as the number of people participating in a
             | particular torrent is a good indicator of its validity.
             | After all, everyone who was fooled will delete and stop
             | sharing such items.
             | 
             | New releases of something that just came out tend to suffer
             | from this, though. Sometimes the counterfeits reach escape
             | velocity - the rate of people joining in downloading the
             | counterfeit exceed the rate of people realizing and
             | stopping, thus giving the illusion of a legit torrent.
             | 
             | Currently this problem is being solved by torrent sites'
             | reputation and comment systems. If we imagine a world where
             | only decentralized indexes like Bitmagnet exist, your
             | prediction is 100% accurate. This only works if reputation
             | from a reliable site is bootstrapping the initial
             | popularity of a torrent.
        
               | kevincox wrote:
               | It seems like it would be pretty easy to make it appear
               | that your spam torrent is highly active.
        
               | iinnPP wrote:
               | You are correct.
        
               | grepfru_it wrote:
               | (btw my comment was/is about the DHT crawler)
               | 
               | You are describing a pay-to-play model. The validator is
               | if the seeder/leech count is high. Well does DHT provide
               | aggregate bandwidth of each torrent? If not, you can
               | easily spin up 1000+ nodes and connect to your torrent.
               | Tada fake popularity. If bandwidth is known, then you
               | simply raise your costs a bit by running fake clients.
               | There are anti-piracy groups who's entire mandate is to
               | provide noise in the piracy ecosystem. Food for thought:
               | bandwidth costs for this would be a rounding error for
               | e.g. MGM, Universal, or any major content creator.
               | 
               | DHT does not offer any sort of reputation or comment
               | system. Back to centralized torrenting which is why I
               | suspect DHT crawling has not been a very popular feature
        
               | fluoridation wrote:
               | > If not, you can easily spin up 1000+ nodes and connect
               | to your torrent. Tada fake popularity. If bandwidth is
               | known, then you simply raise your costs a bit by running
               | fake clients.
               | 
               | Sure, but like the other commenter said, this has been
               | possible for years, and yet public trackers aren't
               | swamped with fake torrents. I think in all my years of
               | using BitTorrent I've only ever found a single fake
               | torrent, where the content was inside an encrypted RAR
               | with no key (obviously there was no way to know it was
               | encrypted ahead of time).
        
           | no_time wrote:
           | >If you listen to BitTorrent's DHT network, you'll build an
           | index of everything shared on BitTorrent (over time),
           | 
           | Correct me if I'm wrong but as far as I understand, passively
           | listening on DHT would only mean you build up a list of
           | _infohashes_ of everything shared on BitTorrent. You 'd
           | actually have to reach out to your DHT peers to know what
           | files the infohashes actually represents.
           | 
           | Wrapping back to grandparent's question of
           | 
           | >Also what happens if illegal content gets scooped up into
           | the index?
           | 
           | I think this could get dicey if someone announces something
           | very illegal like CP, and your crawler starts asking every
           | peer that announced the infohash about it's contents with
           | this[0] protocol. This would put your IP into a pretty awful
           | exclusive club of
           | 
           | A, other crawlers
           | 
           | B, actual people wanting downloading said CP
           | 
           | [0]: https://www.bittorrent.org/beps/bep_0009.html
        
             | lobsterslive wrote:
             | > _Correct me if I 'm wrong but as far as I understand,
             | passively listening on DHT would only mean you build up a
             | list of infohashes of everything shared on BitTorrent.
             | You'd actually have to reach out to your DHT peers to know
             | what files the infohashes actually represents._
             | 
             | Yes, you're correct! I should have stated that, you still
             | need to resolve the metadata from the peers that have the
             | infohashed files hosted. That's a separate operation from
             | downloading the file's content.
        
             | hattmall wrote:
             | Would this get hashes of items shared on private trackers
             | too?
        
       | forgetm3 wrote:
       | Isn't this much the same as btdig.com which is based on:
       | https://github.com/btdig/dhtcrawler2
       | 
       | I use this service to do security research a fair bit. It'd be
       | nice if there was a higher quality self-hosted version I could
       | use so I'll be watching this project with interest!
        
       | the8472 wrote:
       | It seems like every single of these things always cut corners and
       | don't implement proper, spec-compliant nodes that provide the
       | same services as they use. You know, the "peer" in p2p. BEP51 was
       | designed to make it easier to not trample on the commons, and
       | yet...
        
         | zolland wrote:
         | Without the peers all we have is to!
        
         | mgdigital wrote:
         | Author here. FWIW I wasn't intending this to make it onto HN,
         | having posted about this on Lemmy looking for beta testers. The
         | current version of the app is very much a preview. There's much
         | further work to be done and this will include as far as
         | possible ensuring Bitmagnet is a "good citizen". The
         | suggestions made on the GH issue look largely feasible and I'll
         | get round to looking at them as soon as I can.
         | 
         | The issue and my response on GH: https://github.com/bitmagnet-
         | io/bitmagnet/issues/11
        
         | e12e wrote:
         | Appreciate that you took the time to file an issue:
         | 
         | https://github.com/bitmagnet-io/bitmagnet/issues/11
        
       | askiiart wrote:
       | And the predecessor to this: https://github.com/mgdigital/rarbg-
       | selfhosted
       | 
       | It's now archived due to effort being redirected to Bitmagnet.
        
       | Fnoord wrote:
       | See also: Magnetissimo [1], Torrentinim [2], and Spotweb [3].
       | 
       | [1] https://github.com/sergiotapia/magnetissimo
       | 
       | [2] https://github.com/sergiotapia/torrentinim
       | 
       | [3] https://github.com/spotweb/spotweb
        
         | e12e wrote:
         | Any comments on how these compare? Especially in relation to
         | sibling comment about BEP51?
        
           | diggan wrote:
           | From https://github.com/bitmagnet-io/bitmagnet/issues/11
           | 
           | > The DHT implementation was largely borrowed from Magnetico
           | (https://github.com/boramalper/magnetico) a popular and
           | widely used app which is likewise unable to service these
           | requests.
        
           | Fnoord wrote:
           | You want Torznab support. That is basically metadata you want
           | to export, to import it into your application which holds the
           | database of what you are after. If it is a match, it should
           | attempt to download it via your download client (BitTorrent
           | client).
           | 
           | Torrentinim is the successor of Magnetissimo but it lacks
           | Radarr/Sonarr integration (there is a pull request for
           | Torznab support for both). Spotweb has Newznab support [1]
           | but at Black Friday (soon) there's usually tons of deals
           | available for Newznab indexes.
           | 
           | I don't care about BEP51 as I don't have huge upload. That is
           | also why I prefer Usenet over torrents. But torrents are a
           | useful and sometimes required backup mode. Just not my
           | preferred one.
           | 
           | [1] https://github.com/Spotweb/Spotweb/wiki/Spotweb-als-
           | Newznab-...
        
           | the8472 wrote:
           | From a brief look at each it seems like they're scraping
           | things like torrent websites, usenet or maybe RSS feeds. Not
           | the DHT.
        
       | fallat wrote:
       | It'd be so nice to have a super simple DHT crawler CLI tool, in
       | both implementation and interface.
        
         | the8472 wrote:
         | These things need uptime of hours and days to do it properly
         | and also to stay up to date. There are millions of nodes and
         | torrents and to be non-abusive you have to issue requests at a
         | somewhat sedate pace. And activity kind of moves with the sun
         | due to people who run torrent clients on their home machines.
         | And there are lots of buggy or malicious implementations out
         | there that you have to deal with. So you'd want to run it as a
         | daemon. The CLI would have to be a frontend to the daemon or
         | its database. The UI could be simple. I'm skeptical whether an
         | implementation could be both good and simple.
        
           | derefr wrote:
           | That's if you're imagining a single node to discover the
           | whole DHT. What if you want to fire off a map-reduce of
           | limited-run DHT explorations starting from different DHT ring
           | positions, where each agent just crawls and emits what it
           | finds on stdout as it finds it?
           | 
           | (In a sense, I suppose this would still be a "daemon", but
           | that daemon would be the map-reduce infrastructure.)
        
             | the8472 wrote:
             | I don't quite understand what you're proposing here.
             | Generally you only control and operate ~1 node per IPv4
             | address or per IPv6 /64.
             | 
             | All other nodes are operated by someone else, so they don't
             | cooperate on anything beyond what the protocol specifies.
             | Which means everyone is their own little silo. If you want
             | a list of all currently active torrents (millions) then you
             | have to do it with 1 or a handful of nodes, depending on
             | how many IPs you have. DHTs are not arbitrary-distributed-
             | compute frameworks, they're a quite restrictive get/put
             | service.
             | 
             | BEP51[0] does let you query other nodes for a sample of
             | their keys (infohashes) but what they can offer is limited
             | by their vantage point of the network so you need to go
             | around and ask all those millions of nodes. And since it's
             | all random you can't really "search" for anything, you can
             | only sample. And that just gives you 20-byte keys.
             | Afterwards you need to do a lot of additional work to turn
             | those into human-readable metadata.
             | 
             | [0] http://bittorrent.org/beps/bep_0051.html
        
               | derefr wrote:
               | I mean, what I'm describing is the same thing that BEP51
               | mentions as a motivation:
               | 
               | > DHT indexing already is possible and done in practice
               | by passively observing get_peers queries. But that
               | approach is inefficient, favoring indexers with lots of
               | unique IP addresses at their disposal. It also
               | incentivizes bad behavior such as spoofing node IDs and
               | attempting to pollute other nodes' routing tables.
               | 
               | If you have a lot of IP addresses (from e.g. AWS Lambda)
               | then you can partition DHT keyspace across a large-N
               | number of nodes and then _very quickly_ discover
               | everything in the keyspace.
               | 
               | The trick is that, since BEP51 exists, you don't need to
               | have all these nodes register themselves into the hash-
               | ring (at arbitrary spoofed positions) to listen. You can
               | just have all these nodes independently probing the hash-
               | ring "from the outside" -- just making short-lived
               | connections to registered nodes (without first
               | registering themselves); handshaking that connection as a
               | spoofed node ID; and then firing off one
               | `sample_infohashes` request, getting a response, and
               | disconnectting. The lack of registration shouldn't make
               | any difference, as long as they don't want anyone to try
               | connecting to _them_.
               | 
               | Which is why I say that these are just "crawler agents",
               | not "nodes" per se. They don't start up P2P at all -- to
               | them, this is a one-shot client/server RPC conversation,
               | like a regular web crawler making HTTP requests!
        
               | the8472 wrote:
               | Oh, I already have implemented something[0] like that. It
               | doesn't need lambdas or anything "cloud scale" like that.
               | You "just" need a few dozen to a hundred IP addresses
               | assigned to one machine and run a multi-homed DHT node on
               | that to passively observe traffic from multiple points in
               | the keyspace.
               | 
               | But neither of these approaches is what I'd call a "super
               | simple DHT crawler CLI tool" that the initial comment was
               | asking about. BEP51 is intended to make crawling simple
               | enough that it can run on a single home internet
               | connection, but a proper implementation still isn't
               | trivial.
               | 
               | [0] https://github.com/the8472/mldht
        
       | r3trohack3r wrote:
       | Have been playing around with DHT crawling for a while now,
       | curious how you're getting around the "tiers" of the DHT?
       | 
       | IIUC peers favor nodes they've had longer relationships with to
       | provide stable routes through the DHT.
       | 
       | This means short-lived nodes receive very little traffic, nobody
       | is routing much traffic through fresh nodes, they choose nodes
       | they've had longer relationships with.
       | 
       | The longer you stay up, the more you start seeing.
       | 
       | At least this is what I've observed in my projects. The only way
       | I've been able to get anything interesting out of the DHT in the
       | last ~5 years has been to put up a node and leave it up for a
       | long time. If I spin up something, the first day I usually only
       | find a handful of resolvable hashes.
       | 
       | Not to mention it seems the BitTorrent DHT is very lax in what it
       | will route compared to other DHTs (like IPFS) meaning many of the
       | hashes you receive aren't for torrents at all.
        
       | complianceowl wrote:
       | Guys, please educate me: I want to use torrents, but the thought
       | of downloading something inappropriate by clicking on a deceptive
       | link terrifies me (e.g., a download with the title of an action
       | movie, but it turns out to be something else).
       | 
       | How do you guys handle that risk?
        
         | shepherdjerred wrote:
         | If you're downloading "normal" stuff, then you aren't likely to
         | run into an issue. Stick to reputable sites; reddit can help
         | you figure out what those are.
        
         | dvdkon wrote:
         | Unless you're also afraid of clicking links on the web, you
         | should be fine with torrents. Maybe you could not seed by
         | default and turn it on only after verifying the data is
         | authentic, that way you're actually only downloading and the
         | analogy is complete.
        
         | askiiart wrote:
         | Just use reputable sites and run a good adblocker, like Unlock
         | Origin.
        
       | LeoPanthera wrote:
       | > Pipe dream features
       | 
       | > In-place seeding: identify files on your computer that are part
       | of an indexed torrent, and allow them to be seeded in place after
       | having moved, renamed or deleted parts of the torrent
       | 
       | Does anything do this already? It would be amazing to point a
       | client at a folder of unstructured junk and have it magically
       | find the right parts.
        
         | Scion9066 wrote:
         | It might be able to do this with the Bittorent V2 format as the
         | hashes would be per file.
        
         | miffe wrote:
         | dc++ does it, but it's not torrent based.
        
         | orbisvicis wrote:
         | I've thought about writing a fuse system that tracks rename /
         | move / delete and updates the torrent client. Never had the
         | time, though.
        
       | RIMR wrote:
       | This is really neat. I'll need to check it out. A couple years
       | ago I ran my own instance of Magnetico
       | (https://github.com/boramalper/magnetico), but this project looks
       | a lot more polished.
        
       | doakes wrote:
       | What kind of bandwidth usage should be expected from a DHT
       | crawler like this?
        
         | KennyBlanken wrote:
         | After ~30-60 minutes of running, still less than 100kB/sec
         | combined in and out. However, as others have noted, nodes don't
         | communicate much with nodes that haven't been up for a while
         | (days.)
         | 
         | It's using roughly 6% CPU time for the crawler and another 1-2%
         | for postgres, on a second-gen i7.
         | 
         | As a datapoint to set expectations: 4000 torrents have been
         | captured so far, and somewhat surprisingly, they're not very
         | current results, necessarily.
         | 
         | For example, a certain wildly popular TV series about samurai
         | in space swinging very hot swords around which just had its
         | season ending episode last night (I think)...that ep isn't in
         | my list so far, but the episode prior to it, and the first two
         | episodes, are.
         | 
         | There's a ton of random, low-seed torrents, so it's actually
         | kind of interesting to search by type, year, etc and see what
         | comes up.
        
       | KennyBlanken wrote:
       | I've been unable to get this running; I gave it a postgres user
       | and database, granted it ownership and all permissions on said
       | DB, and there's nothing in the database.
       | 
       | Edit: found the init schema and things seem to be working now:
       | https://github.com/bitmagnet-io/bitmagnet/blob/main/migratio...
       | 
       | It would be really nice to be able to sort by header (size,
       | seeders) and/or some filters for seed/downloaders (for example,
       | filtering out anything with less than X seeds.)
        
       ___________________________________________________________________
       (page generated 2023-10-05 23:00 UTC)