[HN Gopher] A 1.5GB String
___________________________________________________________________
A 1.5GB String
Author : Backslasher
Score : 93 points
Date : 2023-04-08 19:26 UTC (3 hours ago)
(HTM) web link (blog.backslasher.net)
(TXT) w3m dump (blog.backslasher.net)
| usr1106 wrote:
| Did they buy this domain just for this blog posting about
| backslashes? No, the first HN submission is from 2017.
| Coincidence or an agenda?
| rsstack wrote:
| Coincidence :) I know him in real life and he has used that
| nickname for a decade+.
| ehPReth wrote:
| what?
| stingraycharles wrote:
| The story is about backslashes. The domain is
| backslasher.net. Seems like a fun coincidence to mention.
| sofaygo wrote:
| OP's username is also Backslasher.
| mbreese wrote:
| That's quite the origin story for a username!
| augusto-moura wrote:
| Maybe the story is not that recent? I will bet on a coincidence
| though, pretty interesting nonetheless
| Vecr wrote:
| The formula to calculate the numbers of needed servers is wrong,
| as you can't have fractional servers. You should use the
| pigeonhole principle[0]
|
| [0]: https://en.wikipedia.org/wiki/Pigeonhole_principle
| Dylan16807 wrote:
| Use it how? The actual pigeonhole principle would conclude that
| you have multiple users on at least one server, and nothing
| else.
|
| And anything that calculates individual users is the wrong way
| to do this math.
| tiler2915072 wrote:
| As others have pointed out for the authors discussion this fact
| doesn't really matter.
|
| However what does matter is that the authors probably is not
| estimating the average correctly partially due to this bug. If
| they scaled without fixing their excessive string storage, they
| would probably find their estimate to be off.
|
| Even with the fix I'd be surprised if the underlying
| distribution of data per user is normally distributed.
| sd9 wrote:
| The only difference in this case is to take the ceiling. The
| formula seems fine as an approximation.
| nmilo wrote:
| So round it, this is engineering, not math. The safety buffer
| factor accounts for any rounding errors anyways.
| [deleted]
| sweettea wrote:
| The formula for number of needed servers seems like a fine
| approximation; I'm very confused how you could use the
| pigeonhole principle. Perhaps to say that if you have N
| servers, one of them has at least total_users/N users? But
| assuming you have a decent methodology for balancing users
| across servers, and can rebalance to use a new server when
| added, this has basically the same effects as the formula given
| so I'm clearly missing why you pointed this out.
| IncRnd wrote:
| That can obviously be rounded. Nobody will really try to
| purchase 27% of a server. A different issue is that the formula
| should include "1+Safety Buffer". Otherwise it needs to be run
| twice, once with the safety buffer and once without. Even so, I
| didn't take the formulae on this page to be accurate so much as
| representational.
| gvx wrote:
| I assumed the "Safety Buffer" is really more of a "Safety
| Factor", and is equivalent to 1 + what you or I would call a
| Safety Buffer.
| cowthulhu wrote:
| The formula/metric is clearly an approximation?
| missblit wrote:
| > Understanding we are memory-bound is crucial, as it directs our
| efforts towards reducing memory consumption in order to
| accommodate more users on the server.
|
| While this is true, it bears mentioning that in a service
| handling many requests they all smear together memory-wise, so
| _any_ latency savings can also ease memory pressure (a memory
| allocation has an area: bytes x time). So for instance it can
| make sense to minimize memory held across an RPC boundary, or
| even to perform CPU optimizations if they would improve the
| latency,
|
| > e.g. examining a memdump rather than just measuring overall
| memory utilization
|
| A flame graph also would have highlighted this (without the need
| to look at what might be user data), and is usually where I start
| looking for memory optimization. I mean if you don't know what
| code is allocating lots of memory where would you even start
| anyway?
|
| Additionally I've found that looking at outliers, while a good
| idea, can sometimes lead to wild goose chases if you're not
| careful. There can be pathological edges cases which are rare
| enough to not actually matter in practice.
| wadefletch wrote:
| What is the CSS effect as you enter the page? Are they streaming
| the CSS itself, or just an animation in the browser? Very cool
| effect.
| pkage wrote:
| It's a CSS animation from the Jekyll theme. Pulled from the
| minified CSS: @keyframes intro {
| 0% { opacity: 0 } 100% { opacity: 1 } }
|
| Applied via: animation: intro 0.3s both;
| animation-delay: 0.15s;
| Dwedit wrote:
| Meanwhile, Firefox has big problems when you give it a 6MB Data
| URL, you get crashes or graphical glitches throughout the browser
| while it tries to draw the URL bar.
| TechBro8615 wrote:
| That seems like a weird oversight, since Firefox is normally
| quite good at rendering large assets. For example, I prefer
| viewing large SVG images (like those generated by dependency
| graphing tools) in Firefox, because unlike Chrome, it allows me
| to pan and zoom within the image.
| ww520 wrote:
| Good analysis.
|
| The previous session data probably needs to be deserialized from
| json to objects first before adding in the new session object.
| The whole container can then be serialized to json at one shot.
| This avoids the recursive escape encoding.
| IncRnd wrote:
| That's a case of not solving the problem.
|
| Leaving aside the need to send a massive set of screen images for
| the moment, a severe issue with that protocol is the lack of
| compression. It doesn't need to use a quadtree[1], although that
| will significantly speed up the interface when sliding the tree
| depth to the screen action being taken. Even RLE[2], Huffman[3],
| or any other standard string compression[4][5] would solve this
| issue shown in the blog.
|
| [1] https://en.wikipedia.org/wiki/Quadtree#Compressed_quadtrees
|
| [2] https://en.wikipedia.org/wiki/Run-length_encoding
|
| [3] https://en.wikipedia.org/wiki/Huffman_coding
|
| [4] https://en.wikipedia.org/wiki/Data_compression#Image
|
| [5]
| https://en.wikipedia.org/wiki/Lossless_compression#General_p...
| Genbox wrote:
| Efficient data handling is better than spending cycles on
| generating a lot of backslashes and then burn cycles on
| compressing it.
| IngvarLynn wrote:
| Assuming 3 backslashes per iteration, one iteration per second -
| it would take 16 years to get to 1.5GB. This code problem looks
| deeper to me.
| cyanydeez wrote:
| Every quote in a jsonstring requires backslashes. This is
| geometric explosion. You're assuming very simple state objects.
| NieDzejkob wrote:
| The growth is exponential - each backslash becomes two in the
| next iteration. Thus after n iterations we have 2^n - 1
| backslashes, and we only need 30 iterations to hit a gigabyte
| (and that's assuming only one quotation mark in the original
| JSON).
| tgsovlerkhgsel wrote:
| A better fix could have been switching to some other (binary)
| serialization protocol (BSON or similar) - the problem here
| doesn't seem to have been the size of the history itself, but the
| exponential growth of the escaping.
| crdrost wrote:
| Or, y'know, just jo.put("previous",
| previous);
|
| Rather than jo.put("previous",
| previous.toJson());
|
| Presumably the reason they didn't do this is because they
| couldn't handle a recursive schema?
| btown wrote:
| This is reminiscent of some APIs I've had the great pleasure
| of needing to integrate with, where a SOAP API would take XML
| containing Username, Password, Method, Data... where Data was
| a string, itself containing barely-documented XML, but
| serialized as a string before being placed in the XML. You'd
| think that if they knew how to stand up a SOAP API, they'd
| know that XML is literally designed to be nested. Alas, such
| was not the case. I shudder to think about their security
| under the hood...
| alephaleph wrote:
| In the post they say they didn't do this bc they'd just be
| trading long strings for deeply nested objects.
| lalaithion wrote:
| Deeply nested objects grow linearly with the size of
| history, not exponentially.
| benatkin wrote:
| _sighs in bencode_
| Phelinofist wrote:
| Why not just restrict the history to a configurable number of
| entries? Having like 20-50 is probably enough. It might no be
| 100% correct but the effort:use ratio seems good.
| NegativeLatency wrote:
| > Since JSON fields have quotes, and those quotes need to be
| escaped when stored in a string, they are stored as \"
|
| I don't think they do though?
| andrewmackrodt wrote:
| They were serialising the previous object as a string (which
| may also contain a key called previous which is a string) so
| quoted values, backslashes and any other escape characters
| would be escaped. Over multiple screens this grows quickly,
| e.g. this simple struct which has only an id column grows to
| 2KB in only 9 steps: let obj = { id: 1 }
| for (let id = 2; id <= 10; id++) { obj = { id,
| previous: JSON.stringify(obj) } }
___________________________________________________________________
(page generated 2023-04-08 23:00 UTC)