[HN Gopher] I use zip bombs to protect my server
       ___________________________________________________________________
        
       I use zip bombs to protect my server
        
       Author : foxfired
       Score  : 230 points
       Date   : 2025-04-28 22:28 UTC (1 days ago)
        
 (HTM) web link (idiallo.com)
 (TXT) w3m dump (idiallo.com)
        
       | codingdave wrote:
       | Mildly amusing, but it seems like this is thinking that two
       | wrongs make a right, so let us serve malware instead of using a
       | WAF or some other existing solution to the bot problem.
        
         | cratermoon wrote:
         | Something like https://xeiaso.net/notes/2025/anubis-works/
        
           | xena wrote:
           | I did actually try zip bombs at first. They didn't work due
           | to the architecture of how Amazon's scraper works. It just
           | made the requests get retried.
        
             | cookiengineer wrote:
             | Did you also try Transfer-Encoding: chunked and things like
             | HTTP smuggling to serve different content to web browser
             | instances than to scrapers?
        
             | wiredfool wrote:
             | Amazon's scraper has been sending multiple requests per
             | second to my servers for 6+ weeks, and every request has
             | been returned 429.
             | 
             | Amazon's scraper doesn't back off. Meta, google, most of
             | the others with identifiable user agents back off, Amazon
             | doesn't.
        
               | toast0 wrote:
               | If it's easy, sleep 30 before returning 429. Or tcpdrop
               | the connections and don't even send a response or a tcp
               | reset.
        
             | deathanatos wrote:
             | So first, let me prefix this by saying I generally don't
             | accept cookies from websites I don't explicitly first
             | allow, my reasoning being "why am I granting disk
             | read/write access to [mostly] shady actors to allow them to
             | track me?"
             | 
             | (I don't think your blog qualifies as shady ... but you're
             | not in my allowlist, either.)
             | 
             | So if I visit https://anubis.techaro.lol/ (from the
             | "Anubis" link), I get an infinite anime cat girl refresh
             | loop -- which honestly isn't the worst thing ever?
             | 
             | But if I go to https://xeiaso.net/blog/2025/anubis/ and
             | click "To test Anubis, click here." ... that one loads just
             | fine.
             | 
             | Neither xeserv.us nor techaro.lol are in my allowlist.
             | Curious that one seems to pass. IDK.
             | 
             | The blog post does have that lovely graph ... but I suspect
             | I'll loop around the "no cookie" loop in it, so the
             | infinite cat girls are somewhat expected.
             | 
             | I was working on an extension that would store cookies
             | _very_ ephemerally for the more malicious instances of
             | this, but I think its design would work here too. (In-RAM
             | cookie jar, burns them after, say, 30s. Persisted long
             | enough to load the page.)
        
               | lcnPylGDnU4H9OF wrote:
               | > Neither xeserv.us nor techaro.lol are in my allowlist.
               | Curious that one seems to pass. IDK.
               | 
               | Is your browser passing a referrer?
        
               | cycomanic wrote:
               | Just FYI temporary containers (Firefox extension) seem to
               | be the solution you're looking for. It essentially
               | generates a new container for every tab you open (subtabs
               | can be either new containers or in the same container).
               | Once the tab is closed it destroys the container and
               | deletes all browsing data (including cookies). You can
               | still whitelist some domains to specific persistent
               | containers.
               | 
               | I used cookie blockers for a long time, but always ended
               | up having to whitelist some sites even though I didn't
               | want their cookies because the site would misbehave
               | without them. Now I just stopped worrying.
        
         | theandrewbailey wrote:
         | WAF isn't the right choice for a lot of people:
         | https://news.ycombinator.com/item?id=43793526
        
           | codingdave wrote:
           | At least, not with the default rules. I read that discussion
           | a few days ago and was surprised how few callouts there were
           | that a WAF is just a part of the infrastructure - it is the
           | rules that people are actually complaining about. I think the
           | problem is that so many apps run on AWS and their default WAF
           | rules have some silly content filtering. And their "security
           | baseline" says that you have to use a WAF and include their
           | default rules, so security teams lock down on those rules
           | without any real thought put into whether or not they make
           | sense for any given scenario.
        
         | chmod775 wrote:
         | Truly one my favorite thought-terminating proverbs.
         | 
         | "Hurting people is wrong, so you should not defend yourself
         | when attacked."
         | 
         | "Imprisoning people is wrong, so we should not imprison
         | thieves."
         | 
         | Also the modern telling of Robin Hood seems to be pretty
         | generally celebrated.
         | 
         | Two wrongs may not make a right, but often enough a smaller
         | wrong is the best recourse we have to avert a greater wrong.
         | 
         | The spirit of the proverb is referring to wrongs which are
         | unrelated to one another, especially when using one to excuse
         | another.
        
           | zdragnar wrote:
           | > a smaller wrong is the best recourse we have to avert a
           | greater wrong
           | 
           | The logic of terrorists and war criminals everywhere.
        
             | BlackFingolfin wrote:
             | And sometimes one man's terrorist is another's freedom
             | fighter.... (Not to defend terrorism, but it's just not
             | that simple)
        
             | _Algernon_ wrote:
             | And also how fuctioning governments work:
             | https://en.m.wikipedia.org/wiki/Monopoly_on_violence
             | 
             | Do you really want to live in a society were all use of
             | punishment to discourage bad behaviour in others? That is a
             | game theoretical disaster...
        
             | toss1 wrote:
             | Defense and Offense are not the same.
             | 
             | Crime and Justice are not the same.
             | 
             | If you cannot figure that out, you _ARE_ a major part of
             | the problem.
             | 
             | Keep thinking until you figure it out for good.
        
             | impulsivepuppet wrote:
             | I admire your deontological zealotry. That said, I think
             | there is an implied virtuous aspect of "internet
             | vigilantism" that feels ignored (i.e. disabling a malicious
             | bot means it does not visit other sites) While I do not
             | absolve anyone from taking full responsibility for their
             | actions, I have a suspicion that terrorists do a bit more
             | than just avert a greater wrong--otherwise, please sign me
             | up!
        
         | imiric wrote:
         | The web is overrun by malicious actors without any sense of
         | morality. Since playing by the rules is clearly not working,
         | I'm in favor of doing anything in my power to waste their
         | resources. I would go a step further and try to corrupt their
         | devices so that they're unable to continue their abuse, but
         | since that would require considerably more effort from my part,
         | a zip bomb is a good low-effort solution.
        
         | bsimpson wrote:
         | There's no ethical ambiguity about serving garbage to malicious
         | traffic.
         | 
         | They made the request. Respond accordingly.
        
           | joezydeco wrote:
           | This is William Gibson's "black ICE" becoming real, and I
           | love it.
           | 
           | https://williamgibson.fandom.com/wiki/ICE
        
           | petercooper wrote:
           | Based on the example in the post, that thinking might need to
           | be extended to "someone happening to be using a blocklisted
           | IP." I don't serve up zip bombs, but I've blocklisted many
           | abusive bots using VPN IPs over the years which have then
           | impeded legitimate users of the same VPNs.
        
       | java-man wrote:
       | I think it's a good idea, but it must be coupled with robots.txt.
        
         | cratermoon wrote:
         | AI scraper bots don't respect robots.txt
        
           | jsheard wrote:
           | I think that's the point, you'd use robots.txt to direct
           | Googlebot/Bingbot/etc away from countermeasures that could
           | potentially mess up your SEO. If other bots ignore the
           | signpost clearly saying not to enter the tarpit, that's their
           | own stupid fault.
        
           | reverendsteveii wrote:
           | The ones that survive do
        
         | forinti wrote:
         | I was looking through my logs yesterday.
         | 
         | Bad bots don't even read robots.txt.
        
       | zzo38computer wrote:
       | I also had the idea of zip bomb to confuse badly behaved scrapers
       | (and I have mentioned it before to some other people, although I
       | did not implemented it). However, maybe instead of 0x00, you
       | might use a different byte value.
       | 
       | I had other ideas too, but I don't know how well some of them
       | will work (they might depend on what bots they are).
        
         | ycombinatrix wrote:
         | The different byte values likely won't compress as well as all
         | 0s unless they are a repeating pattern of blocks.
         | 
         | An alternative might be to use Brotli which has a static
         | dictionary. Maybe that can be used to achieve a high
         | compression ratio.
        
           | zzo38computer wrote:
           | I meant that all of the byte values would be the same (so
           | they would still be repeating), but a different value than
           | zero. However, Brotli could be another idea if the client
           | supports it.
        
       | altairprime wrote:
       | See also (2017) HN, https://news.ycombinator.com/item?id=14707674
        
       | wewewedxfgdf wrote:
       | I protected uploads on one of my applications by creating fixed
       | size temporary disk partitions of like 10MB each and unzipping to
       | those contains the fallout if someone uploads something too big.
        
         | sidewndr46 wrote:
         | What? You partitioned a disk rather than just not decompressing
         | some comically large file?
        
           | gchamonlive wrote:
           | https://github.com/uint128-t/ZIPBOMB                 2048
           | yottabyte Zip Bomb            This zip bomb uses overlapping
           | files and recursion to achieve 7 layers with 256 files each,
           | with the last being a 32GB file.            It is only 266 KB
           | on disk.
           | 
           | When you realise it's a zip bomb it's already too late.
           | Looking at the file size doesn't betray its contents. Maybe
           | applying some heuristics with ClamAV? But even then it's not
           | guaranteed. I think a small partition to isolate
           | decompression is actually really smart. Wonder if we can
           | achieve the same with overlays.
        
             | sidewndr46 wrote:
             | What are you talking about? You get a compressed file. You
             | start decompressing it. When the amount of bytes you've
             | written exceeds some threshold (say 5 megabytes) just stop
             | decompressing, discard the output so far & delete the
             | original file. That is it.
        
               | gchamonlive wrote:
               | Those files are designed to exhaust the system resources
               | _before_ you can even do these kinds of checks. I 'm not
               | particularly familiar with the ins and outs of
               | compression algorithms, but it's intuitively not strange
               | for me to have a a zip that is carefully crafted so that
               | memory and CPU goes out the window before any check can
               | be done. Maybe someone with more experience can give mode
               | details.
               | 
               | I'm sure though that if it was as simples as _that_ we
               | wouldn 't even have a name for it.
        
               | crazygringo wrote:
               | Not really. It really is that simple. It's just
               | dictionary decompression, and it's just halting it at
               | some limit.
               | 
               | It's just nobody usually implements a limit during
               | decompression because people aren't usually giving you
               | zip bombs. And sometimes you really do want to decompress
               | ginormous files, so limits aren't built in by default.
               | 
               | Your given language might not make it easy to do, but you
               | should pretty much always be able to hack something
               | together using file streams. It's just an extra step is
               | all.
        
               | tremon wrote:
               | That assumes they're using a stream decompressor library
               | and are feeding that stream manually. Solutions that
               | write the received file to $TMP and just run an external
               | tool (or, say, use sendfile()) don't have the option to
               | abort after N decompressed bytes.
        
               | overfeed wrote:
               | > Solutions that write the received file to $TMP and just
               | run an external tool (or, say, use sendfile()) don't have
               | the option to abort after N decompressed bytes
               | 
               | cgroups with hard-limits will let the external tool's
               | process crash without taking down the script or system
               | along with it.
        
               | gruez wrote:
               | Depending on the language/library that might not always
               | be possible. For instance python's zip library only
               | provides an extract function, without a way to hook into
               | the decompression process, or limit how much can be
               | written out. Sure, you can probably fork the library to
               | add in the checks yourself, but from a maintainability
               | perspective it might be less work to do with the
               | partition solution.
        
               | banana_giraffe wrote:
               | It also provides an open function for the files in a zip
               | file. I see no reason something like this won't bail
               | after a small limit:                   import zipfile
               | with zipfile.ZipFile("zipbomb.zip") as zip:
               | for name in zip.namelist():
               | print("working on " + name)                 left =
               | 1000000                 with open("dest_" + name, "wb")
               | as fdest, zip.open(name) as fsrc:
               | while True:                         block =
               | fsrc.read(1000)                         if len(block) ==
               | 0:                             break
               | fdest.write(block)                         left -=
               | len(block)                         if left <= 0:
               | print("too much data!")                             break
        
               | kulahan wrote:
               | Isn't this basically a question about the halting
               | problem? Whatever arbitrary cutoff you chose might not
               | work for all.
        
               | maxbond wrote:
               | That is exactly what OP is doing, they've just
               | implemented it at the operating system/file system level.
        
           | kccqzy wrote:
           | Seems like a good and simple strategy to me. No real
           | partition needed; tmpfs is cheap on Linux. Maybe OP is using
           | tools that do not easily allow tracking the number of
           | uncompressed bytes.
        
           | wewewedxfgdf wrote:
           | Yes I'd rather deal with a simple out of disk space error
           | than perform some acrobatics to "safely" unzip a potential
           | zip bomb.
           | 
           | Also zip bombs are not comically large until you unzip them.
           | 
           | Also you can just unpack any sort of compressed file format
           | without giving any thought to whether you are handling it
           | safely.
        
         | warkdarrior wrote:
         | `unzip -p | head -c 10MB`
        
       | ChuckMcM wrote:
       | I sort of did this with ssh where I figured out how to crash an
       | ssh client that was trying to guess the root password. What I got
       | for my trouble was a number of script kiddies ddosing my poor
       | little server. I switched to just identifying 'bad actors' who
       | are clearly trying to do bad things and just banning their IP
       | with firewall rules. That's becoming more challenging with IPV6
       | though.
       | 
       | Edit: And for folks who write their own web pages, you can always
       | create zip bombs that are links on a web page that don't show up
       | for humans (white text on white background with no highlight on
       | hover/click anchors). Bots download those things to have a look
       | (so do crawlers and AI scrapers)
        
         | 1970-01-01 wrote:
         | Why is it harder to firewall them with IPv6? I seems this would
         | be the easier of the two to firewall.
        
           | echoangle wrote:
           | Maybe it's easier to circumvent because getting a new IPv6
           | address is easier than with IPv4?
        
           | firesteelrain wrote:
           | I think they are suggesting the range of IPs to block is too
           | high?
        
             | CBLT wrote:
             | Allow -> Tarpit -> Block should be done by ASN
        
               | carlhjerpe wrote:
               | You probably want to check how many ips/blocks a provider
               | announces before blocking the entire thing.
               | 
               | It's also not a common metric you can filter on in open
               | firewalls since you must lookup and maintain a cache of
               | IP to ASN, which has to be evicted and updated as blocks
               | still move around.
        
           | carlhjerpe wrote:
           | Manual banning is about the same since you just book /56 or
           | bigger, entire providers or countries.
           | 
           | Automated banning is harder, you'd probably want a heuristic
           | system and look up info on IPs.
           | 
           | IPv4 with NAT means you can "overban" too.
        
         | j_walter wrote:
         | Check this out if you want to stop this behavior...
         | 
         | https://github.com/skeeto/endlessh
        
         | leephillips wrote:
         | These links do show up for humans who might be using text
         | browsers, (perhaps) screen readers, bookmarklets that list the
         | links on a page, etc.
        
           | ChuckMcM wrote:
           | true, but you can make the link text 'do not click this' or
           | 'not a real link' to let them know. I'm not sure if crawlers
           | have started using LLMs to check pages or not which would be
           | a problem.
        
       | sgc wrote:
       | I am ignorant as to how most bots work. Could you have a second
       | line of defense for bots that avoid this bomb: Dynamically
       | generate a file from /dev/random and trickle stream it to them,
       | or would they just keep spawning parallel requests? They would
       | never finish streaming it, and presumably give up at some point.
       | The idea would be to make it more difficult for them to detect it
       | was never going to be valid content.
        
         | shishcat wrote:
         | This will waste your bandwidth and resources too
        
           | sgc wrote:
           | The idea is to trickle it very slowly, like keeping a cat
           | occupied with a ball of fluff in the corner.
        
             | uniqueuid wrote:
             | Cats also have timeouts set for balls of fluff. They
             | usually get bored at some point and either go away or
             | attack you :)
        
             | CydeWeys wrote:
             | Yeah but in the mean time it's tying up a connection on
             | your webserver.
        
             | jeroenhd wrote:
             | If the bot is connecting over IPv4, you only have a couple
             | thousand connections before your server starts needing to
             | mess with shared sockets and other annoying connectivity
             | tricks.
             | 
             | I don't think it's a terrible problem to solve these days,
             | especially if you use one of the tarpitting implementations
             | that use nftables/iptables/eBPF, but if you have one of
             | those annoying Chinese bot farms with thousands of IP
             | addresses hitting your server in turn (Huawei likes to do
             | this), you may need to think twice before deploying this
             | solution.
        
         | uniqueuid wrote:
         | Practically all standard libraries have timeouts set for such
         | requests, unless you are explicitly offering streams which they
         | would skip.
        
         | jerf wrote:
         | You want to consider the ratio of your resource consumption to
         | their resource consumption. If you trickle bytes from
         | /dev/random, you are holding open a TCP connection with some
         | minimal overhead, and that's about what they are doing too.
         | Let's assume they are bright enough to use any of the many
         | modern languages or frameworks that can easily handle 10K/100K
         | connections or more on a modern system. They aren't all that
         | bright but certainly some are. You're basically consuming your
         | resources to their resources 1:1. That's not a winning scenario
         | for you.
         | 
         | The gzip bomb means you serve 10MB but they try to consume vast
         | quantities of RAM on their end and likely crash. Much better
         | ratio.
        
           | 3np wrote:
           | Also might open up a new DoS vector on entropy consumed by
           | /dev/random so it can be worse than 1:1.
        
           | sgc wrote:
           | That's clear. It all comes down to their behavior. Will they
           | sit there waiting to finish this download, or just start
           | sending other requests in parallel until you dos yourself? My
           | hope is they would flag the site as low-value and go looking
           | elsewhere, on another site.
        
         | tremon wrote:
         | This article on Endlessh also shows how to implement a
         | resource-light http tarpit:
         | https://nullprogram.com/blog/2019/03/22/
        
         | charonn0 wrote:
         | For HTTP/1.1 you could send a "chunked" response. Chunked
         | responses are intended to allow the server to start sending
         | dynamically generated content immediately instead of waiting
         | for the generation process to finish before sending. You could
         | just continue to send chunks until the client gives up or
         | crashes.
         | 
         | [0]: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
        
       | _QrE wrote:
       | There's a lot of creative ideas out there for banning and/or
       | harassing bots. There's tarpits, infinite labyrinths, proof of
       | work || regular challenges, honeypots etc.
       | 
       | Most of the bots I've come across are fairly dumb however, and
       | those are pretty easy to detect & block. I usually use CrowdSec
       | (https://www.crowdsec.net/), and with it you also get to ban the
       | IPs that misbehave on all the other servers that use it before
       | they come to yours. I've also tried turnstile for web pages
       | (https://www.cloudflare.com/application-services/products/tur...)
       | and it seems to work, though I imagine most such products would,
       | as again most bots tend to be fairly dumb.
       | 
       | I'd personally hesitate to do something like serving a zip bomb
       | since it would probably cost the bot farm(s) less than it would
       | cost me, and just banning the IP I feel would serve me better
       | than trying to play with it, especially if I know it's
       | misbehaving.
       | 
       | Edit: Of course, the author could state that the satisfaction of
       | seeing an IP 'go quiet' for a bit is priceless - no arguing
       | against that
        
       | KTibow wrote:
       | It's worth noting that this is a gzip bomb (acts just like a
       | normal compressed webpage), not a classical zip file that uses
       | nested zips to knock out antiviruses.
        
       | d--b wrote:
       | Zip libraries aren't bomb proof yet? Seems fairly easy to detect
       | and ignore, no?
        
       | harrison_clarke wrote:
       | it'd be cool to have a proof of work protocol baked into http.
       | like, a header that browsers understood
        
       | layer8 wrote:
       | Back when I was a stupid kid, I once did                   ln -s
       | /dev/zero index.html
       | 
       | on my home page as a joke. Browsers at the time didn't like that,
       | they basically froze, sometimes taking the client system down
       | with them.
       | 
       | Later on, browsers started to check for actual content I think,
       | and would abort such requests.
        
         | koolba wrote:
         | I hope you weren't paying for bandwidth by the KiB.
        
           | santoshalper wrote:
           | Nah, back then we paid for bandwidth by the kb.
        
         | sandworm101 wrote:
         | Devide by zero happens to everyone eventually.
         | 
         | https://medium.com/@bishr_tabbaa/when-smart-ships-divide-by-...
         | 
         | "On 21 September 1997, the USS Yorktown halted for almost three
         | hours during training maneuvers off the coast of Cape Charles,
         | Virginia due to a divide-by-zero error in a database
         | application that propagated throughout the ship's control
         | systems."
         | 
         | " technician tried to digitally calibrate and reset the fuel
         | valve by entering a 0 value for one of the valve's component
         | properties into the SMCS Remote Database Manager (RDM)"
        
         | m463 wrote:
         | Sounds like the favicon.ico that would crash the browser.
        
       | jawns wrote:
       | Is there any legal exposure possible?
       | 
       | Like, a legitimate crawler suing you and alleging that you broke
       | something of theirs?
        
         | bilekas wrote:
         | Please, just as a conversational piece, walk me through the
         | potentials you might think there are ?
         | 
         | I'll play the side of the defender and you can play the
         | "bot"/bot deployer.
        
           | echoangle wrote:
           | Well creating a bot is not per se illegal, so assuming the
           | maliciousness-detector on the server isn't perfect, it could
           | serve the zip bomb to a legitimate bot. And I don't think
           | it's crazy that serving zip bombs with the stated intent to
           | sabotage the client would be illegal. But I'm not a lawyer,
           | of course.
        
             | bilekas wrote:
             | Disclosure, I'm not a lawyer either. This is all
             | hypothetical high level discussion here.
             | 
             | > it could serve the zip bomb to a legitimate bot.
             | 
             | Can you define the difference between a legitimate bot, and
             | a non legitimate bot for me ?
             | 
             | The OP didn't mention it, but if we can assume they have
             | SOME form of robots.txt (safe assumtion given their
             | history), would those bots who ignored the robots be
             | considered legitimate/non-legitimate ?
             | 
             | Almost final question, and I know we're not lawyers here,
             | but is there any precedent in case law or anywhere, which
             | defines a 'bad bot' in the eyes of the law ?
             | 
             | Final final question, as a bot, do you believe you have a
             | right or a privilege to scrape a website ?
        
           | brudgers wrote:
           | Anyone can sue anyone for anything and the side with the most
           | money is most likely to prevail.
        
         | bauruine wrote:
         | >User-agent: *
         | 
         | >Disallow: /zipbomb.html
         | 
         | Legitimate crawlers would skip it this way only scum ignores
         | robots.txt
        
           | echoangle wrote:
           | I'm not sure that's enough, robots.txt isn't really legally
           | binding so if the zip bomb somehow would be illegal, guarding
           | it behind a robots.txt rule probably wouldn't make it fine.
        
             | lcnPylGDnU4H9OF wrote:
             | Has any similar case been tried? I'd think that a judge
             | learning the intent of robots.txt and disallow rules is
             | fairly likely to be sympathetic. Seems like it could go
             | either way, I mean. (Jury is probably more a crap-shoot.)
        
             | thephyber wrote:
             | Who, running a crawler which violates robots.txt, is going
             | to prosecute/sue the server owner?
             | 
             | The server owner can make an easy case to the jury that it
             | is a booby trap to defend against trespassers.
        
             | boricj wrote:
             | > robots.txt isn't really legally binding
             | 
             | Neither is the HTTP specification. Nothing is stopping you
             | from running a Gopher server on TCP port 80, should you get
             | into trouble if it happens to crash a particular crawler?
             | 
             | Making a HTTP request on a random server is like uttering a
             | sentence to a random person in a city: some can be helpful,
             | some may tell you to piss off and some might shank you. If
             | you don't like the latter, then maybe don't go around
             | screaming nonsense loudly to strangers in an unmarked area.
        
         | thayne wrote:
         | Disclosure: IANAL
         | 
         | The CFAA[1] prohibits:
         | 
         | > knowingly causes the transmission of a program, information,
         | code, or command, and as a result of such conduct,
         | intentionally causes damage without authorization, to a
         | protected computer;
         | 
         | As far as I can tell (again, IANAL) there isn't an exception if
         | you believe said computer is actively attempting to abuse your
         | system[2]. I'm not sure if a zip bomb would constitute
         | intentional damage, but it is at least close enough to the line
         | that I wouldn't feel comfortable risking it.
         | 
         | [1]: https://www.law.cornell.edu/uscode/text/18/1030
         | 
         | [2]: And of course, you might make a mistake and incorrectly
         | serve this to legitimate traffic.
        
           | jedberg wrote:
           | I don't believe the client counts as a protected computer
           | because they initiated the connection. Also a protected
           | computer is a very specific definition that involves banking
           | and/or commerce and/or the government.
        
         | brudgers wrote:
         | Though anyone can sue anyone, not doing X is the simplest thing
         | that might avoid being sued for doing X.
         | 
         | But if it matters pay your lawyer and if it doesn't matter, it
         | doesn't matter.
        
       | mahi_novice wrote:
       | Do you mind sharing your specs of your digital ocean droplet? I'm
       | trying to setup one with less cost.
        
         | foxfired wrote:
         | The blog runs on a $6 digital ocean droplet. It's 1GB RAM and
         | 25GB storage. There is a link at the end of the article on how
         | it handles typical HN traffic. Currently at 5% CPU.
        
       | bilekas wrote:
       | > At my old employer, a bot discovered a wordpress vulnerability
       | and inserted a malicious script into our server
       | 
       | I know it's slightly off topic, but it's just so amusing (edit:
       | reassuring) to know I'm not the only one who, after 1 hour of
       | setting up Wordpress there's a PHP shell magically deployed on my
       | server.
        
         | ianlevesque wrote:
         | Yes, never self host Wordpress if you value your sanity. Even
         | if it's not the first hour it will eventually happen when you
         | forget a patch.
        
           | sunaookami wrote:
           | Hosting WordPress myself for 13 years now and have no problem
           | :) Just follow standard security practices and don't install
           | gazillion plugins.
        
             | carlosjobim wrote:
             | There's a lot of essential functionality missing from
             | WordPress, meaning you have to install plugins. Depending
             | on what you need to do.
             | 
             | But it's such a bad platform that there really isn't any
             | reason for anybody to use WordPress for anything. No matter
             | your use case, there will be a better alternative to
             | WordPress.
        
               | aaronbaugher wrote:
               | Can you recommend an alternative for a non-technical
               | organization, where there's someone who needs to be able
               | to edit pages and upload documents on a regular basis, so
               | they need as user-friendly an interface as possible for
               | that? Especially when they don't have a budget for it,
               | and you're helping them out as a favor? It's so easy to
               | spin up Wordpress for them, but I'm not a fan either.
               | 
               | I've tried Drupal in the past for such situations, but it
               | was too complicated for them. That was years ago, so
               | maybe it's better now.
        
               | shakna wrote:
               | I've had some luck using Decap for that. An initial dev
               | setup, followed by almost never needing support from the
               | PR team running it.
               | 
               | [0] https://decapcms.org/
        
               | donnachangstein wrote:
               | > Can you recommend an alternative for a non-technical
               | organization, where there's someone who needs to be able
               | to edit pages and upload documents on a regular basis, so
               | they need as user-friendly an interface as possible for
               | that
               | 
               | 25 years ago we used Microsoft Frontpage for that, with
               | the web root mapped to a file share that the non-
               | technical secretary could write to and edit it as if it
               | were a word processor.
               | 
               | Somehow I feel we have regressed from that simplicity,
               | with nothing but hand waving to make up for it. This
               | method was declared "obsolete" and ... Wordpress kludges
               | took its place as somehow "better". Someone prove me
               | wrong.
        
               | willyt wrote:
               | Static site with Jekyll?
        
               | socalgal2 wrote:
               | Jekyll and other static site generators do not repo
               | Wordpress any more than notepad repos MSWord
               | 
               | In one, multiple users can login, edit WYSIWYG, preview,
               | add images, etc, all from one UI. You can access it from
               | any browser including smart phones and tablets.
               | 
               | In the other, you get to instruct users on git, how to
               | deal with merge conflicts, code review (two people can't
               | easily work on a post like they can in wordpress),
               | previews require a manual build, you need a local
               | checkout and local build installation to do the build.
               | There no WYSIWYG, adding images is a manual process of
               | copying a file, figuring out the URL, etc... No
               | smartphone/tablet support. etc....
               | 
               | I switched by blog from wordpress install to a static
               | site geneator because I got tired of having to keep it up
               | to date but my posting dropped because of friction of
               | posting went way up. I could no longer post from a phone.
               | I couldn't easily add images. I had to build to preview.
               | And had to submit via git commits and pushes. All of that
               | meant what was easy became tedious.
        
               | carlosjobim wrote:
               | Yes I can. There's an excellent and stable solution
               | called SurrealCMS, made by an indie developer. You
               | connect it by FTP to any traditional web design
               | (HTML+CSS+JS), and the users get a WYSIWYG editor where
               | the published output looks exactly as it looked when
               | editing. It's dirt cheap at $9 per month.
               | 
               | Edit: I actually feel a bit sorry for the SurrealCMS
               | developer. He has a fantastic product that should be an
               | industry standard, but it's fairly unknown.
        
               | realityloop wrote:
               | DrupalCMS is a new project that aims to radically
               | simplify for end users https://new.drupal.org/drupal-cms
        
               | wincy wrote:
               | I do custom web dev so am way out of the website hosting
               | game. What are good frameworks now if I want to say,
               | light touch help someone who is slightly technical set up
               | a website? Not full react SPA with an API.
        
               | carlosjobim wrote:
               | By the sound of your question I will guess you want to
               | make a website for a small or medium sized organization?
               | jQuery is probably the only "framework" you should need.
               | 
               | If they are selling anything on their website, it's
               | probably going to be through a cloud hosted third party
               | service and then it's just an embedded iframe on their
               | website.
               | 
               | If you're making an entire web shop for a very large
               | enterprise or something of similar magnitude, then you
               | have to ask somebody else than me.
        
               | felbane wrote:
               | Does anyone actually still use jQuery?
               | 
               | Everything I've built in the past like 5 years has been
               | almost entirely pure ES6 with some helpers like jsviews.
        
               | karaterobot wrote:
               | jQuery's still the third most used web framework, behind
               | React and before NextJS. If you use jQuery to build
               | Wordpress websites, you'd be specializing in popular web
               | technologies in the year 2025.
               | 
               | https://survey.stackoverflow.co/2024/technology#1-web-
               | framew...
        
           | arcfour wrote:
           | Never use that junk if you value your sanity, I think you
           | mean.
        
       | Scoundreller wrote:
       | Attacked Over Tor [2017]
       | 
       | https://www.hackerfactor.com/blog/index.php?/archives/762-At...
        
       | kazinator wrote:
       | I deployed this, instead of my usual honeypot script.
       | 
       | It's not working very well.
       | 
       | In the web server log, I can see that the bots are not
       | downloading the whole ten megabyte poison pill.
       | 
       | They are cutting off at various lengths. I haven't seen anything
       | fetch more than around 1.5 Mb of it so far.
       | 
       | Or is it working? Are they decoding it on the fly as a stream,
       | and then crashing? E.g. if something is recorded as having read
       | 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly,
       | and crashed?
       | 
       | There is no way to tell.
        
         | MoonGhost wrote:
         | Try content labyrinth. I.e. infinitely generated content with a
         | bunch of references to other generated pages. It may help
         | against simple wget and till bots adapt.
         | 
         | PS: I'm on the bots side, but don't mind helping.
        
         | unnouinceput wrote:
         | Do they comeback? If so then they detect it and avoid it. If
         | not then they crashed and mission accomplished.
        
           | kazinator wrote:
           | I currently cannot tell without making a little configuration
           | change, because as soon as an IP address is logged as having
           | visited the trap URL (honeypot, or zipbomb or whatever), a
           | log monitoring script bans that client.
           | 
           | Secondly, I know that most of these bots do not come back.
           | The attacks do not reuse addresses against the same server in
           | order to evade almost any conceivable filter rule that is
           | predicated on a prior visit.
        
       | cynicalsecurity wrote:
       | This topic comes up from time to time and I'm surprised no one
       | yet mentioned the usual fearmongering rhetoric of zip bombs being
       | potentially illegal.
       | 
       | I'm not a lawyer, but I'm yet to see a real life court case of a
       | bot owner suing a company or an individual for responding to his
       | malicious request with a zip bomb. The usual spiel goes like
       | this: responding to his malicious request with a malicious
       | response makes you a cybercriminal and allows him (the real
       | cybercriminal) to sue you. Again, except of cheap talk I've never
       | heard of a single court case like this. But I can easily imagine
       | them trying to blackmail someone with such cheap threats.
       | 
       | I cannot imagine a big company like Microsoft or Apple using zip
       | bombs, but I fail to see why zip bombs would be considered bad in
       | any way. Anyone with an experience of dealing with malicious bots
       | knows the frustration and the amount of time and money they steal
       | from businesses or individuals.
        
         | os2warpman wrote:
         | Anyone can sue anyone else for any reason.
         | 
         | This is what trips me up:
         | 
         | >On my server, I've added a middleware that checks if the
         | current request is malicious or not.
         | 
         | There's a lot of trust placed in:
         | 
         | >if (ipIsBlackListed() || isMalicious()) {
         | 
         | Can someone assigned a previously blacklisted IP or someone who
         | uses a tool to archive the website that mimics a bot be served
         | malware? Is the middleware good enough or "good enough so far"?
         | 
         | Close enough to 100% of my internet traffic flows through a
         | VPN. I have been blacklisted by various services upon
         | connecting to a VPN or switching servers on multiple occasions.
        
       | marcusb wrote:
       | Zip bombs are fun. I discovered a vulnerability in a security
       | product once where it wouldn't properly scan a file for malware
       | if the file was or contained a zip archive greater than a certain
       | size.
       | 
       | The practical effect of this was you could place a zip bomb in an
       | office xml document and this product would pass the ooxml file
       | through even if it contained easily identifiable malware.
        
         | secfirstmd wrote:
         | Eh I got news for ya.
         | 
         | The file size problem is still an issue for many big name EDRs.
        
           | marcusb wrote:
           | Undoubtedly. If you go poking around most any security
           | product (the product I was referring to was _not_ in the EDR
           | space,) you 'll see these sorts of issues all over the place.
        
       | crazygringo wrote:
       | > _For the most part, when they do, I never hear from them again.
       | Why? Well, that 's because they crash right after ingesting the
       | file._
       | 
       | I would have figured the process/server would restart, and
       | restart with your specific URL since that was the last one not
       | completed.
       | 
       | What makes the bots avoid this site in the future? Are they
       | really smart enough to hard-code a rule to check for crashes and
       | avoid those sites in the future?
        
         | fdr wrote:
         | Seems like an exponential backoff rule would do the job: I'm
         | sure crashes happen for all sorts of reasons, some of which are
         | bugs in the bot, even on non-adversarial input.
        
       | monster_truck wrote:
       | I do something similar using a script I've cobbled together over
       | the years. Once a year I'll check the 404 logs and add the most
       | popular paths trying to exploit something (ie ancient phpmyadmin
       | vulns) to the shitlist. Requesting 3 of those URLs adds that host
       | to a greylist that only accepts requests to a very limited set of
       | legitimate paths.
        
       | jeroenhd wrote:
       | These days, almost all browsers accept zstd and brotli, so these
       | bombs can be even more effective today!
       | [This](https://news.ycombinator.com/item?id=23496794) old comment
       | showed an impressive 1.2M:1 compression ratio and [zstd seems to
       | be doing even
       | better](https://github.com/netty/netty/issues/14004).
       | 
       | Though, bots may not support modern compression standards. Then
       | again, that may be a good way to block bots: every modern browser
       | supports zstd, so just force that on non-whitelisted browser
       | agents and you automatically confuse scrapers.
        
       | fracus wrote:
       | I'm curious why a 10GB file of all zeroes would compress only to
       | 10MB. I mean theoretically you could compress it to one byte. I
       | suppose the compression happens on a stream of data instead of
       | analyzing the whole, but I'd assume it would still do better than
       | 10MB.
        
         | dagi3d wrote:
         | I get your point(and have no idea why it isn't compressed
         | more), but is the theoretical value of 1 byte correct? With
         | just one single byte, how does it know how big should the file
         | be after being decompressed?
        
           | kulahan wrote:
           | It's a zip bomb, so does the creator care? I just mean from a
           | practical standpoint - overflows and crashes would be a fine
           | result.
        
         | kulahan wrote:
         | There probably aren't any perfectly lossless compression
         | algorithms, I guess? Nothing would ever be all zeroes, so it
         | might not be an edge case accounted for or something? I have no
         | idea, just pulling at strings. Maybe someone smarter can jump
         | in here.
        
         | rtkwe wrote:
         | It'd have to be more than one byte. There's the central
         | directory, zip header, local header then the file itself you
         | need to also tell it how many zeros to make when decompressing
         | the actual file but most compression algorithms don't work like
         | that because they're designed for actual files not essentially
         | blank files so you get larger than the absolute minimum
         | compression.
        
         | philsnow wrote:
         | A compressed file that is only one byte long can only represent
         | maximally 256 different uncompressed files.
         | 
         | Signed, a kid in the 90s who downloaded some "wavelet
         | compression" program from a BBS because it promised to compress
         | all his WaReZ even more so he could then fit moar on his disk.
         | He ran the compressor and hey golly that 500MB ISO fit into
         | only 10MB of disk now! He found out later (after a defrag) that
         | the "compressor" was just hiding data in unused disk sectors
         | and storing references to them. He then learned about Shannon
         | entropy from comp.compression.research and was enlightened.
        
         | ugurs wrote:
         | It requires at leadt few bytes, there is no way to represent
         | 10GB of data in 8 bits.
        
       | manmal wrote:
       | > Before I tell you how to create a zip bomb, I do have to warn
       | you that you can potentially crash and destroy your own device
       | 
       | Surely, the device does crash but it isn't destroyed?
        
       ___________________________________________________________________
       (page generated 2025-04-29 23:00 UTC)