[HN Gopher] A 1.5GB String
       ___________________________________________________________________
        
       A 1.5GB String
        
       Author : Backslasher
       Score  : 93 points
       Date   : 2023-04-08 19:26 UTC (3 hours ago)
        
 (HTM) web link (blog.backslasher.net)
 (TXT) w3m dump (blog.backslasher.net)
        
       | usr1106 wrote:
       | Did they buy this domain just for this blog posting about
       | backslashes? No, the first HN submission is from 2017.
       | Coincidence or an agenda?
        
         | rsstack wrote:
         | Coincidence :) I know him in real life and he has used that
         | nickname for a decade+.
        
         | ehPReth wrote:
         | what?
        
           | stingraycharles wrote:
           | The story is about backslashes. The domain is
           | backslasher.net. Seems like a fun coincidence to mention.
        
             | sofaygo wrote:
             | OP's username is also Backslasher.
        
               | mbreese wrote:
               | That's quite the origin story for a username!
        
         | augusto-moura wrote:
         | Maybe the story is not that recent? I will bet on a coincidence
         | though, pretty interesting nonetheless
        
       | Vecr wrote:
       | The formula to calculate the numbers of needed servers is wrong,
       | as you can't have fractional servers. You should use the
       | pigeonhole principle[0]
       | 
       | [0]: https://en.wikipedia.org/wiki/Pigeonhole_principle
        
         | Dylan16807 wrote:
         | Use it how? The actual pigeonhole principle would conclude that
         | you have multiple users on at least one server, and nothing
         | else.
         | 
         | And anything that calculates individual users is the wrong way
         | to do this math.
        
         | tiler2915072 wrote:
         | As others have pointed out for the authors discussion this fact
         | doesn't really matter.
         | 
         | However what does matter is that the authors probably is not
         | estimating the average correctly partially due to this bug. If
         | they scaled without fixing their excessive string storage, they
         | would probably find their estimate to be off.
         | 
         | Even with the fix I'd be surprised if the underlying
         | distribution of data per user is normally distributed.
        
         | sd9 wrote:
         | The only difference in this case is to take the ceiling. The
         | formula seems fine as an approximation.
        
         | nmilo wrote:
         | So round it, this is engineering, not math. The safety buffer
         | factor accounts for any rounding errors anyways.
        
         | [deleted]
        
         | sweettea wrote:
         | The formula for number of needed servers seems like a fine
         | approximation; I'm very confused how you could use the
         | pigeonhole principle. Perhaps to say that if you have N
         | servers, one of them has at least total_users/N users? But
         | assuming you have a decent methodology for balancing users
         | across servers, and can rebalance to use a new server when
         | added, this has basically the same effects as the formula given
         | so I'm clearly missing why you pointed this out.
        
         | IncRnd wrote:
         | That can obviously be rounded. Nobody will really try to
         | purchase 27% of a server. A different issue is that the formula
         | should include "1+Safety Buffer". Otherwise it needs to be run
         | twice, once with the safety buffer and once without. Even so, I
         | didn't take the formulae on this page to be accurate so much as
         | representational.
        
           | gvx wrote:
           | I assumed the "Safety Buffer" is really more of a "Safety
           | Factor", and is equivalent to 1 + what you or I would call a
           | Safety Buffer.
        
         | cowthulhu wrote:
         | The formula/metric is clearly an approximation?
        
       | missblit wrote:
       | > Understanding we are memory-bound is crucial, as it directs our
       | efforts towards reducing memory consumption in order to
       | accommodate more users on the server.
       | 
       | While this is true, it bears mentioning that in a service
       | handling many requests they all smear together memory-wise, so
       | _any_ latency savings can also ease memory pressure (a memory
       | allocation has an area: bytes x time). So for instance it can
       | make sense to minimize memory held across an RPC boundary, or
       | even to perform CPU optimizations if they would improve the
       | latency,
       | 
       | > e.g. examining a memdump rather than just measuring overall
       | memory utilization
       | 
       | A flame graph also would have highlighted this (without the need
       | to look at what might be user data), and is usually where I start
       | looking for memory optimization. I mean if you don't know what
       | code is allocating lots of memory where would you even start
       | anyway?
       | 
       | Additionally I've found that looking at outliers, while a good
       | idea, can sometimes lead to wild goose chases if you're not
       | careful. There can be pathological edges cases which are rare
       | enough to not actually matter in practice.
        
       | wadefletch wrote:
       | What is the CSS effect as you enter the page? Are they streaming
       | the CSS itself, or just an animation in the browser? Very cool
       | effect.
        
         | pkage wrote:
         | It's a CSS animation from the Jekyll theme. Pulled from the
         | minified CSS:                   @keyframes intro {
         | 0%   { opacity: 0 }             100% { opacity: 1 }         }
         | 
         | Applied via:                   animation: intro 0.3s both;
         | animation-delay: 0.15s;
        
       | Dwedit wrote:
       | Meanwhile, Firefox has big problems when you give it a 6MB Data
       | URL, you get crashes or graphical glitches throughout the browser
       | while it tries to draw the URL bar.
        
         | TechBro8615 wrote:
         | That seems like a weird oversight, since Firefox is normally
         | quite good at rendering large assets. For example, I prefer
         | viewing large SVG images (like those generated by dependency
         | graphing tools) in Firefox, because unlike Chrome, it allows me
         | to pan and zoom within the image.
        
       | ww520 wrote:
       | Good analysis.
       | 
       | The previous session data probably needs to be deserialized from
       | json to objects first before adding in the new session object.
       | The whole container can then be serialized to json at one shot.
       | This avoids the recursive escape encoding.
        
       | IncRnd wrote:
       | That's a case of not solving the problem.
       | 
       | Leaving aside the need to send a massive set of screen images for
       | the moment, a severe issue with that protocol is the lack of
       | compression. It doesn't need to use a quadtree[1], although that
       | will significantly speed up the interface when sliding the tree
       | depth to the screen action being taken. Even RLE[2], Huffman[3],
       | or any other standard string compression[4][5] would solve this
       | issue shown in the blog.
       | 
       | [1] https://en.wikipedia.org/wiki/Quadtree#Compressed_quadtrees
       | 
       | [2] https://en.wikipedia.org/wiki/Run-length_encoding
       | 
       | [3] https://en.wikipedia.org/wiki/Huffman_coding
       | 
       | [4] https://en.wikipedia.org/wiki/Data_compression#Image
       | 
       | [5]
       | https://en.wikipedia.org/wiki/Lossless_compression#General_p...
        
         | Genbox wrote:
         | Efficient data handling is better than spending cycles on
         | generating a lot of backslashes and then burn cycles on
         | compressing it.
        
       | IngvarLynn wrote:
       | Assuming 3 backslashes per iteration, one iteration per second -
       | it would take 16 years to get to 1.5GB. This code problem looks
       | deeper to me.
        
         | cyanydeez wrote:
         | Every quote in a jsonstring requires backslashes. This is
         | geometric explosion. You're assuming very simple state objects.
        
         | NieDzejkob wrote:
         | The growth is exponential - each backslash becomes two in the
         | next iteration. Thus after n iterations we have 2^n - 1
         | backslashes, and we only need 30 iterations to hit a gigabyte
         | (and that's assuming only one quotation mark in the original
         | JSON).
        
       | tgsovlerkhgsel wrote:
       | A better fix could have been switching to some other (binary)
       | serialization protocol (BSON or similar) - the problem here
       | doesn't seem to have been the size of the history itself, but the
       | exponential growth of the escaping.
        
         | crdrost wrote:
         | Or, y'know, just                   jo.put("previous",
         | previous);
         | 
         | Rather than                   jo.put("previous",
         | previous.toJson());
         | 
         | Presumably the reason they didn't do this is because they
         | couldn't handle a recursive schema?
        
           | btown wrote:
           | This is reminiscent of some APIs I've had the great pleasure
           | of needing to integrate with, where a SOAP API would take XML
           | containing Username, Password, Method, Data... where Data was
           | a string, itself containing barely-documented XML, but
           | serialized as a string before being placed in the XML. You'd
           | think that if they knew how to stand up a SOAP API, they'd
           | know that XML is literally designed to be nested. Alas, such
           | was not the case. I shudder to think about their security
           | under the hood...
        
           | alephaleph wrote:
           | In the post they say they didn't do this bc they'd just be
           | trading long strings for deeply nested objects.
        
             | lalaithion wrote:
             | Deeply nested objects grow linearly with the size of
             | history, not exponentially.
        
         | benatkin wrote:
         | _sighs in bencode_
        
       | Phelinofist wrote:
       | Why not just restrict the history to a configurable number of
       | entries? Having like 20-50 is probably enough. It might no be
       | 100% correct but the effort:use ratio seems good.
        
       | NegativeLatency wrote:
       | > Since JSON fields have quotes, and those quotes need to be
       | escaped when stored in a string, they are stored as \"
       | 
       | I don't think they do though?
        
         | andrewmackrodt wrote:
         | They were serialising the previous object as a string (which
         | may also contain a key called previous which is a string) so
         | quoted values, backslashes and any other escape characters
         | would be escaped. Over multiple screens this grows quickly,
         | e.g. this simple struct which has only an id column grows to
         | 2KB in only 9 steps:                   let obj = { id: 1 }
         | for (let id = 2; id <= 10; id++) {           obj = { id,
         | previous: JSON.stringify(obj) }         }
        
       ___________________________________________________________________
       (page generated 2023-04-08 23:00 UTC)