[HN Gopher] When LIMIT 9 works but LIMIT 10 hangs
___________________________________________________________________
When LIMIT 9 works but LIMIT 10 hangs
Author : gmac
Score : 173 points
Date : 2023-05-31 14:11 UTC (8 hours ago)
(HTM) web link (neon.tech)
(TXT) w3m dump (neon.tech)
| lanstin wrote:
| First thing I do when I join a group that has code that writes
| and reads bytes from the wire is write really big stress tests
| and keep fixing things till it has 100% reliability, short
| messages, long messages, slow messages, pipe lined messageds,
| etc. etc. There's so many bugs where the code fails if len > N,
| len % N == 0, len < small N, len < what is read, etc. Like does
| this mean the driver had never worked for len > 126 or whatever?
| hinkley wrote:
| Don't forget the perennial classic: len > N && len % N == 1
|
| Due to off by one error, the last byte of a message is never
| sent if the message is larger than the buffer or the packet
| size (or not sent until the next message arrives, for stream
| protocols)
| lanstin wrote:
| Yeah, I had a djb net parser my (work) friends wrote that
| would end up dropping the connection if the last byte read
| was the byte before the separating comma.
| https://en.wikipedia.org/wiki/Netstring
|
| Only happened with very large database reads so that the 1440
| byte segments hit that spot. New manager and he started
| mocking me during team meetings on why I was so obsessed with
| these load tests succeeding and then was amazed at the 2 char
| fix (adding +1 to the read till length call).
| gmac wrote:
| Yeah: interestingly, they had a test for the biggest category
| of frame, but not for the two other categories:
| https://github.com/nodejs/undici/blob/main/test/websocket/se...
|
| The test I contributed is very specific to the frame fix I
| made, but I should probably go back and contribute more tests
| in send.js that test other lengths too.
| pacaro wrote:
| This. And profile guided fuzz testing. Ideally every basis path
| that serializes/deserializes should be tested
| charcircuit wrote:
| I'm surprised there was no mention of "query planner" in the
| article. That would be the first thing I would check instead of
| assuming the driver had a bug.
| T3RMINATED wrote:
| [dead]
| rmbyrro wrote:
| Why do we seem to like torturing ourselves?..
| barrkel wrote:
| Rediscovering memory unsafety through performance chasing. Ah
| well.
| munchler wrote:
| Seriously. Any design that encourages aliasing of mutable
| memory like this is just asking for trouble. This is a lesson
| that the functional programming community has already
| internalized, but the rest of the world sadly has not.
| jjnoakes wrote:
| > That 16-bit payload length is actually written somewhere
| else, as bytes 3 and 4 of Buffer.allocUnsafe()'s global pool.
| These bytes may well be part of another active buffer, and thus
| we also stand a good chance of corrupting data elsewhere.
|
| No kidding.
| JBiserkov wrote:
| Indeed!
|
| > the use of DataView here was an optimization that, though
| intended to speed things up, actually slowed things down ...
| and broke them utterly in the process.
| londons_explore wrote:
| For the performance footnote...
|
| I think the misunderstanding comes from the fact calling .write()
| on sockets is pretty expensive. Specifically, if you write a byte
| or two, then the OS might send a whole packet out onto the
| network with just those two bytes.
|
| If you did that for every tiny websocket message, then you would
| be sending double the number of packets necessary, with matching
| CPU overhead. Double the chance of network losses causing head of
| line blocking too.
|
| So... Calling .write() should be done for large buffers and as
| rarely as possible.
|
| However, in this case, .writeUInt16BE() doesn't send anything
| onto the network, but presumably the original author has been
| burned by .write() being expensive before...
| jwildeboer wrote:
| What I _really_ like about this story is that free and unfettered
| access to the standard /specs, the implementation, the ability to
| spin up a reproducer, to see _all_ moving parts with X-ray
| vision, come to a solution and having said solution accepted
| upstream, all in one go, at your own pace AND being able to share
| the whole story without needing permission - the power of Open.
| Not working around bugs in whatever proprietary thing.
| Mervellieux! Excuse my French. #OpenFTW
| nine_k wrote:
| Absolutely.
|
| Once upon a time I encountered some small issue (not a bug)
| with Postgres. I was able to get to the very source of it in
| the C code, understand what's happening, and adjust my code
| accordingly. It took maybe an hour.
|
| Upon doing that, I shuddered, as I remembered troubleshooting
| an issue (a real confirmed bug) inside Oracle 9. It was by
| poking in the dark, trying to trigger a condition that made a
| server process crash, then trying to help Oracle engineers
| reproduce that. It took a week.
| JohnFen wrote:
| > I was able to get to the very source of it in the C code,
| understand what's happening, and adjust my code accordingly.
|
| The ability to do this has saved my bacon on more than one
| occasion. It's why (outside of work I do for an employer) I
| won't develop using libraries/frameworks that I don't have
| the source code to.
| JdeBP wrote:
| What interests me is _why_ the developers originally thought that
| interposing a DataView() was a speed improvement, and why the
| benchmark results at
| https://github.com/nodejs/performance/issues/2 are at odds with
| the author's benchmarks in the headlined article.
| Thorrez wrote:
| That link seems to be doing a bunch of writes on the same
| buffer, but in the blog it's just doing a single write on each
| buffer.
| Izkata wrote:
| Oh hell I think I encountered almost this same bug a few weeks
| ago, but in python code. A postgres query suddenly stopped
| working and while fiddling with it I figured out it was only when
| over the network (local psql worked), and only when the query was
| long enough. The "fix" I settled on without figuring out the root
| cause was to change an "AS foobar" to "AS foo" in the SELECT,
| reducing the total query size.
| gmac wrote:
| This is ultimately a story about a WebSockets bug and a classic
| JS footgun in `new DataView()`. I'm the post author: AMA.
| projektfu wrote:
| It seems like NodeJS' Buffer.buffer (->ArrayBuffer) contains a
| footgun as well, in that you can't rely on the ArrayBuffer
| layout being identical to what the user of Buffer sees.
| fnordpiglet wrote:
| I originally thought this would be a story about query optimizers
| and their stupidity in the face of the banal. But it was
| infinitely better! I've never had the chance to decrypt tls with
| wireshark so bookmarking for future reference.
|
| I'm a little surprised at the statement that ~10%+ performance on
| a transport protocol isn't worth it. 1 million frames doesn't
| seem like that much, and some workloads see these things scaled
| out as front end servers mostly managing stuff like websockets /
| JSON / utf nonsense. I'd argue if it's deep in the kernel of
| execution and well abstracted in the higher level logic a cryptic
| bit fiddle with a comment, even perhaps with it's more common
| equivalent in the comment, might be worth while.
|
| That leads me to the ultimate question: what was the Pokemon
| table used for?
| gmac wrote:
| > I'm a little surprised at the statement that ~10%+
| performance on a transport protocol isn't worth it.
|
| Well, depending on which contrast we're making, the performance
| difference on that one line is either 10ms (by which bit-
| twiddling is faster than Uint8Array.writeUint16BE) or 100ms (by
| which Uint8Array.writeUInt16BE is faster than
| DataView.setUint16) per million frames. I don't think that's
| going to be anything like 10% of the total CPU time per frame.
| Unless I've misunderstood your point here?
|
| > That leads me to the ultimate question: what was the Pokemon
| table used for?
|
| You know what, I just assumed it was a test data table, but I
| never asked!
| fnordpiglet wrote:
| I guess my time doing low latency work and rust programming
| makes me see those numbers and want to shave that time off
| since it's inside a protocol which you expect to be called
| millions of times in tight succession potentially dominating
| the machines performance on a front end node doing nothing
| but shuttling data. Definitely there will be a lot of other
| stuff contributing surrounding - but a march of a million
| miles starts with a single step ;-) certainly in a higher
| level business logic world I could see not squeezing every
| cpu cycle and optimizing for clarity of reading, but at very
| low levels of tight loops (at least in my world) time spent
| here is time not spent on stuff the developer cares about :-)
| norir wrote:
| Yeah, I don't really understand why they didn't just
| comment out the high level function call (option 3) and
| inline option 4 below. Then they could have clarity and
| performance.
| gmac wrote:
| Sure, that was also an option. Feel free to submit
| another PR to the maintainers!
| irrational wrote:
| This is the first time I've ever seen anyone describe themselves
| as a "typescript developer". Though, I do recall hearing people
| call themselves Java developers in the past. I guess I'm more
| used to terms like senior developer, backend developer, web
| developer, etc. rather than a particular language developer.
| phpnode wrote:
| Even worse, I'm a typescript _consultant_
| jjeaff wrote:
| A web/frontend developer is always a typescript developer if
| the job you are applying for is looking for someone with
| typescript experience.
| nikita wrote:
| Neon CEO here. Happy to answer questions about the serverless
| driver and beyond.
| ddorian43 wrote:
| I see you support timescaledb extension.
|
| How does that work? Is a "chunk" in timescaledb split into
| "pages" and just streamed from S3?
| nikita wrote:
| Neon works at the storage level:
| https://neon.tech/docs/introduction/architecture-overview
|
| So all extensions including Timescale work. Postgres does the
| heavy lifting
| btown wrote:
| Does this affect ACID guarantees at all? Could Postgres
| ever expect to read back a page with currently-changing
| data from the Pageserver before those changes get
| propagated there? Will the Pageserver cluster block I/O if
| it's waiting on a page to reach a consistent state?
| nikita wrote:
| Neon is as consistent as Postgres. B/c it's Postgres. All
| we did was swapping storage.
| re-thc wrote:
| AWS released Aurora I/O-Optimized that for a higher storage
| cost no longer charges for I/O (quoted 40% savings).
|
| Is such a mode being considered for Neon? If not, will there be
| future adjustments in pricing?
| re-thc wrote:
| How does the serverless driver and connection pooling impact
| pricing? Will it keep instances active and hence run for all of
| the month?
| nikita wrote:
| No, the compute shuts down on inactivity in 5 minutes
| regardless of active connections.
| williamdclt wrote:
| You can takeaway learnings about web sockets, about DataFrame,
| about buffers and allocation but...
|
| To me, the root cause is "optional arguments are evil". I don't
| blame the undici people, I blame the API design of DataFrame that
| makes it easier for the dev to not think about these critical
| parameters that to do it.
|
| Optional arguments are much more often than not the wrong API.
| Jupe wrote:
| Wait, what? order by random()
|
| Does that even make sense? Why would anyone ever need this?
| Better to order by a stored value and randomize the presentation
| order, IMO.
|
| Seems completely dependent on when random() gets evaluated. If it
| gets evaluated at every inspection of the row during the sort,
| I'm little amazed that it hangs some times.
| kevincox wrote:
| It does make sense. It gets you a non-repeating sample of the
| table.
|
| It is probably not very efficient, but semantically it is
| fairly straightforward. In fact I thought that "hang" was going
| to mean "took so long I thought it was hung" but that turned
| out not be the issue here.
| dymk wrote:
| You'd use order by random() when you want to get a random
| sample of rows in a table. Without that, you'd end up with rows
| clustered according to some internal value, probably ordered by
| the primary key.
|
| Of course, this is a bad idea if you have a table with lots of
| rows, and you'd be better off rolling an N-sided dice M times,
| where N is the number of rows in the table.
| acatton wrote:
| But ORDER BY random() will scan the whole table and assign
| values.
|
| This is the plan I get: Limit
| (cost=56.39..56.41 rows=10 width=40) -> Sort
| (cost=56.39..59.79 rows=1360 width=40) Sort Key:
| (random()) -> Seq Scan on test (cost=0.00..27.00
| rows=1360 width=40)
|
| One should use "FROM table TABLESAMPLE SYSTEM (size)", this
| is the plan: Sample Scan on test
| (cost=0.00..5.36 rows=136 width=32) Sampling: system
| ('10'::real)
|
| You can also use different distribution method than "SYSTEM",
| and you can make it reproducible with a seed.
|
| https://www.postgresql.org/docs/current/sql-
| select.html#SQL-...
|
| ORDER BY random() is a really un-optimized bad practice to
| get a random sample on PostgreSQL. If you use another SQL
| database, the best thing to do is to use UUIDv4 as primary
| key, and use "ORDER BY pk LIMIT size".
| tedunangst wrote:
| > If you use another SQL database, the best thing to do is
| to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT
| size".
|
| What if I want a different random order tomorrow?
| acatton wrote:
| > > If you use another SQL database, the best thing to do
| is to use UUIDv4 as primary key, and use "ORDER BY pk
| LIMIT size".
|
| > What if I want a different random order tomorrow?
|
| If your data is large, and you want different samples.
| You can always always "select * from pk >
| gen_random_uuid() order by pk limit 10"
|
| You have a very small chance to get ffffffff-... and get
| zero elements.
|
| This is not perfect, I admit, but ORDER BY random() is
| one of the most wasteful thing.
|
| If getting random samples is very important, the best you
| can you is migrate to postgres and use TABLESAMPLE.
| __alexs wrote:
| Then you sort descending of course /s
| tmoertel wrote:
| Note that `FROM table TABLESAMPLE SYSTEM (n/N)` is a safe
| substituion for `ORDER BY random() LIMIT n` only when you
| don't need a sample that's guaranteed to be from a uniform
| distribution of rows. That's because TABLESAMPLE SYSTEM
| samples blocks, not individual rows:
|
| > The SYSTEM method does block-level sampling with each
| block having the specified chance of being selected; all
| rows in each selected block are returned.
| wolf550e wrote:
| order by stored uuidv4 will get you the same rows the next
| time you run it. If you sample using where random() < 0.001
| (or something like that) you'll get different rows each
| time.
| cryptonector wrote:
| > But ORDER BY random() will scan the whole table and
| assign values.
|
| Er, no, it will assign values to the result set, which may
| not involve a full table scan.
| acatton wrote:
| Good point, but in the case of OP, without condition, it
| will do a full table scan.
|
| Also, any case of "WHERE filter ORDER BY random() LIMIT
| n" means "I want a tiny sample from a broad filter". So
| it might not be a full table scan, but it will still
| assign value to a lot of rows.
| Jupe wrote:
| Agreed. (And thanks; never came across this before... ever.
| in 20 years)
|
| I guess some projects need to randomly select a few rows from
| a table? I was thinking of the wikipedia random topic, but
| that's only one-at-a-time, not a set of records.
| awestroke wrote:
| The hanging query has nothing at all to do with the random
| ordering. RTFA
| acatton wrote:
| Yeah. It's odd that you got down-voted for this comment. "ORDER
| BY random() LIMIT 10" is known to be slow on large tables. The
| correct way to do get a random sample is to use "SELECT ...
| FROM table TABLESAMPLE SYSTEM (10)"
| https://www.postgresql.org/docs/current/sql-select.html#SQL-...
|
| You can even use a repeatable seed with this method.
| tmoertel wrote:
| Note that TABLESAMPLE SYSTEM is not guaranteed to give you a
| uniform sample, as it samples blocks, whereas ORDER BY
| RANDOM() LIMIT samples uniformly distributed rows. Some
| database systems support TABLESAMPLE RESERVOIR to guarantee
| uniform sampling behavior.
| tmoertel wrote:
| The `ORDER BY RANDOM() LIMIT n` idiom is fairly well
| established for taking a random sample of uniformly distributed
| rows. It is also easily augmented to take weighted samples:
| `WHERE weight > 0 ORDER BY -LN(RANDOM())/weight LIMIT n`.
|
| Queries of this form are easilly parallelized. Most systems
| that support distributed query processing, for example,
| optimize ORDER/LIMIT n queries such that each worker returns
| just the top n rows from its partition. These top rows are then
| merged in a central worker and limited one last time to produce
| the final result.
|
| Even so, if your system supports the TABLESAMPLE clause and
| offers a formulation that lets you specify your actual
| distribution of interest (TABLESAMPLE SYSTEM often does not),
| you're probably better off using it.
| tmoertel wrote:
| For anyone who is curious, the weighted sampling logic is the
| SQL translation of Algorithm A from Efraimidis and Spirakis's
| 2006 paper [1]: Algorithm A Input:
| A population V of weighted items Output: A WRS of
| size n 1: For each v[i] in V, u[i] = random(0, 1) and
| k[i] = u^(1/w[i]) 2: Select the n items with the
| largest keys k[i] as a WRS
|
| Instead of the more direct tanslation of `ORDER BY
| POW(RANDOM(), (1/weight)) DESC`, we use the convenient fact
| that logarithm is a monotone transform over the positive
| reals to take the log of the ORDER BY argument without
| changing the sort order: ORDER BY
| POW(RANDOM(), (1/weight)) DESC => ORDER BY
| LN(POW(RANDOM(), (1/weight))) DESC -- LN preserves order.
| => ORDER BY LN(RANDOM()) / weight DESC -- LN(x^a) =
| LN(x) * a. => ORDER BY -LN(RANDOM()) / weight
| -- Order by `x DESC` is same as by `-x`.
|
| [1] Pavlos S Efraimidis and Paul G Spirakis. Weighted random
| sampling with a reservoir. Information Processing Letters,
| 97(5):181-185, 2006
| revskill wrote:
| Neon open source version is too "unstable" to me though.
| nikita wrote:
| How do you run it?
| revskill wrote:
| I use the docker-compose provided via GIthub.
___________________________________________________________________
(page generated 2023-05-31 23:00 UTC)