[HN Gopher] When LIMIT 9 works but LIMIT 10 hangs
       ___________________________________________________________________
        
       When LIMIT 9 works but LIMIT 10 hangs
        
       Author : gmac
       Score  : 173 points
       Date   : 2023-05-31 14:11 UTC (8 hours ago)
        
 (HTM) web link (neon.tech)
 (TXT) w3m dump (neon.tech)
        
       | lanstin wrote:
       | First thing I do when I join a group that has code that writes
       | and reads bytes from the wire is write really big stress tests
       | and keep fixing things till it has 100% reliability, short
       | messages, long messages, slow messages, pipe lined messageds,
       | etc. etc. There's so many bugs where the code fails if len > N,
       | len % N == 0, len < small N, len < what is read, etc. Like does
       | this mean the driver had never worked for len > 126 or whatever?
        
         | hinkley wrote:
         | Don't forget the perennial classic: len > N && len % N == 1
         | 
         | Due to off by one error, the last byte of a message is never
         | sent if the message is larger than the buffer or the packet
         | size (or not sent until the next message arrives, for stream
         | protocols)
        
           | lanstin wrote:
           | Yeah, I had a djb net parser my (work) friends wrote that
           | would end up dropping the connection if the last byte read
           | was the byte before the separating comma.
           | https://en.wikipedia.org/wiki/Netstring
           | 
           | Only happened with very large database reads so that the 1440
           | byte segments hit that spot. New manager and he started
           | mocking me during team meetings on why I was so obsessed with
           | these load tests succeeding and then was amazed at the 2 char
           | fix (adding +1 to the read till length call).
        
         | gmac wrote:
         | Yeah: interestingly, they had a test for the biggest category
         | of frame, but not for the two other categories:
         | https://github.com/nodejs/undici/blob/main/test/websocket/se...
         | 
         | The test I contributed is very specific to the frame fix I
         | made, but I should probably go back and contribute more tests
         | in send.js that test other lengths too.
        
         | pacaro wrote:
         | This. And profile guided fuzz testing. Ideally every basis path
         | that serializes/deserializes should be tested
        
       | charcircuit wrote:
       | I'm surprised there was no mention of "query planner" in the
       | article. That would be the first thing I would check instead of
       | assuming the driver had a bug.
        
       | T3RMINATED wrote:
       | [dead]
        
       | rmbyrro wrote:
       | Why do we seem to like torturing ourselves?..
        
       | barrkel wrote:
       | Rediscovering memory unsafety through performance chasing. Ah
       | well.
        
         | munchler wrote:
         | Seriously. Any design that encourages aliasing of mutable
         | memory like this is just asking for trouble. This is a lesson
         | that the functional programming community has already
         | internalized, but the rest of the world sadly has not.
        
         | jjnoakes wrote:
         | > That 16-bit payload length is actually written somewhere
         | else, as bytes 3 and 4 of Buffer.allocUnsafe()'s global pool.
         | These bytes may well be part of another active buffer, and thus
         | we also stand a good chance of corrupting data elsewhere.
         | 
         | No kidding.
        
         | JBiserkov wrote:
         | Indeed!
         | 
         | > the use of DataView here was an optimization that, though
         | intended to speed things up, actually slowed things down ...
         | and broke them utterly in the process.
        
       | londons_explore wrote:
       | For the performance footnote...
       | 
       | I think the misunderstanding comes from the fact calling .write()
       | on sockets is pretty expensive. Specifically, if you write a byte
       | or two, then the OS might send a whole packet out onto the
       | network with just those two bytes.
       | 
       | If you did that for every tiny websocket message, then you would
       | be sending double the number of packets necessary, with matching
       | CPU overhead. Double the chance of network losses causing head of
       | line blocking too.
       | 
       | So... Calling .write() should be done for large buffers and as
       | rarely as possible.
       | 
       | However, in this case, .writeUInt16BE() doesn't send anything
       | onto the network, but presumably the original author has been
       | burned by .write() being expensive before...
        
       | jwildeboer wrote:
       | What I _really_ like about this story is that free and unfettered
       | access to the standard /specs, the implementation, the ability to
       | spin up a reproducer, to see _all_ moving parts with X-ray
       | vision, come to a solution and having said solution accepted
       | upstream, all in one go, at your own pace AND being able to share
       | the whole story without needing permission - the power of Open.
       | Not working around bugs in whatever proprietary thing.
       | Mervellieux! Excuse my French. #OpenFTW
        
         | nine_k wrote:
         | Absolutely.
         | 
         | Once upon a time I encountered some small issue (not a bug)
         | with Postgres. I was able to get to the very source of it in
         | the C code, understand what's happening, and adjust my code
         | accordingly. It took maybe an hour.
         | 
         | Upon doing that, I shuddered, as I remembered troubleshooting
         | an issue (a real confirmed bug) inside Oracle 9. It was by
         | poking in the dark, trying to trigger a condition that made a
         | server process crash, then trying to help Oracle engineers
         | reproduce that. It took a week.
        
           | JohnFen wrote:
           | > I was able to get to the very source of it in the C code,
           | understand what's happening, and adjust my code accordingly.
           | 
           | The ability to do this has saved my bacon on more than one
           | occasion. It's why (outside of work I do for an employer) I
           | won't develop using libraries/frameworks that I don't have
           | the source code to.
        
       | JdeBP wrote:
       | What interests me is _why_ the developers originally thought that
       | interposing a DataView() was a speed improvement, and why the
       | benchmark results at
       | https://github.com/nodejs/performance/issues/2 are at odds with
       | the author's benchmarks in the headlined article.
        
         | Thorrez wrote:
         | That link seems to be doing a bunch of writes on the same
         | buffer, but in the blog it's just doing a single write on each
         | buffer.
        
       | Izkata wrote:
       | Oh hell I think I encountered almost this same bug a few weeks
       | ago, but in python code. A postgres query suddenly stopped
       | working and while fiddling with it I figured out it was only when
       | over the network (local psql worked), and only when the query was
       | long enough. The "fix" I settled on without figuring out the root
       | cause was to change an "AS foobar" to "AS foo" in the SELECT,
       | reducing the total query size.
        
       | gmac wrote:
       | This is ultimately a story about a WebSockets bug and a classic
       | JS footgun in `new DataView()`. I'm the post author: AMA.
        
         | projektfu wrote:
         | It seems like NodeJS' Buffer.buffer (->ArrayBuffer) contains a
         | footgun as well, in that you can't rely on the ArrayBuffer
         | layout being identical to what the user of Buffer sees.
        
       | fnordpiglet wrote:
       | I originally thought this would be a story about query optimizers
       | and their stupidity in the face of the banal. But it was
       | infinitely better! I've never had the chance to decrypt tls with
       | wireshark so bookmarking for future reference.
       | 
       | I'm a little surprised at the statement that ~10%+ performance on
       | a transport protocol isn't worth it. 1 million frames doesn't
       | seem like that much, and some workloads see these things scaled
       | out as front end servers mostly managing stuff like websockets /
       | JSON / utf nonsense. I'd argue if it's deep in the kernel of
       | execution and well abstracted in the higher level logic a cryptic
       | bit fiddle with a comment, even perhaps with it's more common
       | equivalent in the comment, might be worth while.
       | 
       | That leads me to the ultimate question: what was the Pokemon
       | table used for?
        
         | gmac wrote:
         | > I'm a little surprised at the statement that ~10%+
         | performance on a transport protocol isn't worth it.
         | 
         | Well, depending on which contrast we're making, the performance
         | difference on that one line is either 10ms (by which bit-
         | twiddling is faster than Uint8Array.writeUint16BE) or 100ms (by
         | which Uint8Array.writeUInt16BE is faster than
         | DataView.setUint16) per million frames. I don't think that's
         | going to be anything like 10% of the total CPU time per frame.
         | Unless I've misunderstood your point here?
         | 
         | > That leads me to the ultimate question: what was the Pokemon
         | table used for?
         | 
         | You know what, I just assumed it was a test data table, but I
         | never asked!
        
           | fnordpiglet wrote:
           | I guess my time doing low latency work and rust programming
           | makes me see those numbers and want to shave that time off
           | since it's inside a protocol which you expect to be called
           | millions of times in tight succession potentially dominating
           | the machines performance on a front end node doing nothing
           | but shuttling data. Definitely there will be a lot of other
           | stuff contributing surrounding - but a march of a million
           | miles starts with a single step ;-) certainly in a higher
           | level business logic world I could see not squeezing every
           | cpu cycle and optimizing for clarity of reading, but at very
           | low levels of tight loops (at least in my world) time spent
           | here is time not spent on stuff the developer cares about :-)
        
             | norir wrote:
             | Yeah, I don't really understand why they didn't just
             | comment out the high level function call (option 3) and
             | inline option 4 below. Then they could have clarity and
             | performance.
        
               | gmac wrote:
               | Sure, that was also an option. Feel free to submit
               | another PR to the maintainers!
        
       | irrational wrote:
       | This is the first time I've ever seen anyone describe themselves
       | as a "typescript developer". Though, I do recall hearing people
       | call themselves Java developers in the past. I guess I'm more
       | used to terms like senior developer, backend developer, web
       | developer, etc. rather than a particular language developer.
        
         | phpnode wrote:
         | Even worse, I'm a typescript _consultant_
        
         | jjeaff wrote:
         | A web/frontend developer is always a typescript developer if
         | the job you are applying for is looking for someone with
         | typescript experience.
        
       | nikita wrote:
       | Neon CEO here. Happy to answer questions about the serverless
       | driver and beyond.
        
         | ddorian43 wrote:
         | I see you support timescaledb extension.
         | 
         | How does that work? Is a "chunk" in timescaledb split into
         | "pages" and just streamed from S3?
        
           | nikita wrote:
           | Neon works at the storage level:
           | https://neon.tech/docs/introduction/architecture-overview
           | 
           | So all extensions including Timescale work. Postgres does the
           | heavy lifting
        
             | btown wrote:
             | Does this affect ACID guarantees at all? Could Postgres
             | ever expect to read back a page with currently-changing
             | data from the Pageserver before those changes get
             | propagated there? Will the Pageserver cluster block I/O if
             | it's waiting on a page to reach a consistent state?
        
               | nikita wrote:
               | Neon is as consistent as Postgres. B/c it's Postgres. All
               | we did was swapping storage.
        
         | re-thc wrote:
         | AWS released Aurora I/O-Optimized that for a higher storage
         | cost no longer charges for I/O (quoted 40% savings).
         | 
         | Is such a mode being considered for Neon? If not, will there be
         | future adjustments in pricing?
        
         | re-thc wrote:
         | How does the serverless driver and connection pooling impact
         | pricing? Will it keep instances active and hence run for all of
         | the month?
        
           | nikita wrote:
           | No, the compute shuts down on inactivity in 5 minutes
           | regardless of active connections.
        
       | williamdclt wrote:
       | You can takeaway learnings about web sockets, about DataFrame,
       | about buffers and allocation but...
       | 
       | To me, the root cause is "optional arguments are evil". I don't
       | blame the undici people, I blame the API design of DataFrame that
       | makes it easier for the dev to not think about these critical
       | parameters that to do it.
       | 
       | Optional arguments are much more often than not the wrong API.
        
       | Jupe wrote:
       | Wait, what?                 order by random()
       | 
       | Does that even make sense? Why would anyone ever need this?
       | Better to order by a stored value and randomize the presentation
       | order, IMO.
       | 
       | Seems completely dependent on when random() gets evaluated. If it
       | gets evaluated at every inspection of the row during the sort,
       | I'm little amazed that it hangs some times.
        
         | kevincox wrote:
         | It does make sense. It gets you a non-repeating sample of the
         | table.
         | 
         | It is probably not very efficient, but semantically it is
         | fairly straightforward. In fact I thought that "hang" was going
         | to mean "took so long I thought it was hung" but that turned
         | out not be the issue here.
        
         | dymk wrote:
         | You'd use order by random() when you want to get a random
         | sample of rows in a table. Without that, you'd end up with rows
         | clustered according to some internal value, probably ordered by
         | the primary key.
         | 
         | Of course, this is a bad idea if you have a table with lots of
         | rows, and you'd be better off rolling an N-sided dice M times,
         | where N is the number of rows in the table.
        
           | acatton wrote:
           | But ORDER BY random() will scan the whole table and assign
           | values.
           | 
           | This is the plan I get:                   Limit
           | (cost=56.39..56.41 rows=10 width=40)         -> Sort
           | (cost=56.39..59.79 rows=1360 width=40)         Sort Key:
           | (random())         -> Seq Scan on test (cost=0.00..27.00
           | rows=1360 width=40)
           | 
           | One should use "FROM table TABLESAMPLE SYSTEM (size)", this
           | is the plan:                   Sample Scan on test
           | (cost=0.00..5.36 rows=136 width=32)         Sampling: system
           | ('10'::real)
           | 
           | You can also use different distribution method than "SYSTEM",
           | and you can make it reproducible with a seed.
           | 
           | https://www.postgresql.org/docs/current/sql-
           | select.html#SQL-...
           | 
           | ORDER BY random() is a really un-optimized bad practice to
           | get a random sample on PostgreSQL. If you use another SQL
           | database, the best thing to do is to use UUIDv4 as primary
           | key, and use "ORDER BY pk LIMIT size".
        
             | tedunangst wrote:
             | > If you use another SQL database, the best thing to do is
             | to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT
             | size".
             | 
             | What if I want a different random order tomorrow?
        
               | acatton wrote:
               | > > If you use another SQL database, the best thing to do
               | is to use UUIDv4 as primary key, and use "ORDER BY pk
               | LIMIT size".
               | 
               | > What if I want a different random order tomorrow?
               | 
               | If your data is large, and you want different samples.
               | You can always always "select * from pk >
               | gen_random_uuid() order by pk limit 10"
               | 
               | You have a very small chance to get ffffffff-... and get
               | zero elements.
               | 
               | This is not perfect, I admit, but ORDER BY random() is
               | one of the most wasteful thing.
               | 
               | If getting random samples is very important, the best you
               | can you is migrate to postgres and use TABLESAMPLE.
        
               | __alexs wrote:
               | Then you sort descending of course /s
        
             | tmoertel wrote:
             | Note that `FROM table TABLESAMPLE SYSTEM (n/N)` is a safe
             | substituion for `ORDER BY random() LIMIT n` only when you
             | don't need a sample that's guaranteed to be from a uniform
             | distribution of rows. That's because TABLESAMPLE SYSTEM
             | samples blocks, not individual rows:
             | 
             | > The SYSTEM method does block-level sampling with each
             | block having the specified chance of being selected; all
             | rows in each selected block are returned.
        
             | wolf550e wrote:
             | order by stored uuidv4 will get you the same rows the next
             | time you run it. If you sample using where random() < 0.001
             | (or something like that) you'll get different rows each
             | time.
        
             | cryptonector wrote:
             | > But ORDER BY random() will scan the whole table and
             | assign values.
             | 
             | Er, no, it will assign values to the result set, which may
             | not involve a full table scan.
        
               | acatton wrote:
               | Good point, but in the case of OP, without condition, it
               | will do a full table scan.
               | 
               | Also, any case of "WHERE filter ORDER BY random() LIMIT
               | n" means "I want a tiny sample from a broad filter". So
               | it might not be a full table scan, but it will still
               | assign value to a lot of rows.
        
           | Jupe wrote:
           | Agreed. (And thanks; never came across this before... ever.
           | in 20 years)
           | 
           | I guess some projects need to randomly select a few rows from
           | a table? I was thinking of the wikipedia random topic, but
           | that's only one-at-a-time, not a set of records.
        
         | awestroke wrote:
         | The hanging query has nothing at all to do with the random
         | ordering. RTFA
        
         | acatton wrote:
         | Yeah. It's odd that you got down-voted for this comment. "ORDER
         | BY random() LIMIT 10" is known to be slow on large tables. The
         | correct way to do get a random sample is to use "SELECT ...
         | FROM table TABLESAMPLE SYSTEM (10)"
         | https://www.postgresql.org/docs/current/sql-select.html#SQL-...
         | 
         | You can even use a repeatable seed with this method.
        
           | tmoertel wrote:
           | Note that TABLESAMPLE SYSTEM is not guaranteed to give you a
           | uniform sample, as it samples blocks, whereas ORDER BY
           | RANDOM() LIMIT samples uniformly distributed rows. Some
           | database systems support TABLESAMPLE RESERVOIR to guarantee
           | uniform sampling behavior.
        
         | tmoertel wrote:
         | The `ORDER BY RANDOM() LIMIT n` idiom is fairly well
         | established for taking a random sample of uniformly distributed
         | rows. It is also easily augmented to take weighted samples:
         | `WHERE weight > 0 ORDER BY -LN(RANDOM())/weight LIMIT n`.
         | 
         | Queries of this form are easilly parallelized. Most systems
         | that support distributed query processing, for example,
         | optimize ORDER/LIMIT n queries such that each worker returns
         | just the top n rows from its partition. These top rows are then
         | merged in a central worker and limited one last time to produce
         | the final result.
         | 
         | Even so, if your system supports the TABLESAMPLE clause and
         | offers a formulation that lets you specify your actual
         | distribution of interest (TABLESAMPLE SYSTEM often does not),
         | you're probably better off using it.
        
           | tmoertel wrote:
           | For anyone who is curious, the weighted sampling logic is the
           | SQL translation of Algorithm A from Efraimidis and Spirakis's
           | 2006 paper [1]:                   Algorithm A         Input:
           | A population V of weighted items         Output: A WRS of
           | size n         1: For each v[i] in V, u[i] = random(0, 1) and
           | k[i] = u^(1/w[i])         2: Select the n items with the
           | largest keys k[i] as a WRS
           | 
           | Instead of the more direct tanslation of `ORDER BY
           | POW(RANDOM(), (1/weight)) DESC`, we use the convenient fact
           | that logarithm is a monotone transform over the positive
           | reals to take the log of the ORDER BY argument without
           | changing the sort order:                      ORDER BY
           | POW(RANDOM(), (1/weight)) DESC         => ORDER BY
           | LN(POW(RANDOM(), (1/weight))) DESC  -- LN preserves order.
           | => ORDER BY LN(RANDOM()) / weight DESC          -- LN(x^a) =
           | LN(x) * a.         => ORDER BY -LN(RANDOM()) / weight
           | -- Order by `x DESC` is same as by `-x`.
           | 
           | [1] Pavlos S Efraimidis and Paul G Spirakis. Weighted random
           | sampling with a reservoir. Information Processing Letters,
           | 97(5):181-185, 2006
        
       | revskill wrote:
       | Neon open source version is too "unstable" to me though.
        
         | nikita wrote:
         | How do you run it?
        
           | revskill wrote:
           | I use the docker-compose provided via GIthub.
        
       ___________________________________________________________________
       (page generated 2023-05-31 23:00 UTC)