[HN Gopher] Windyquery: A non-blocking Python PostgreSQL query b...
___________________________________________________________________
Windyquery: A non-blocking Python PostgreSQL query builder
Author : asiftr
Score : 76 points
Date : 2021-05-10 05:36 UTC (17 hours ago)
(HTM) web link (pypi.org)
(TXT) w3m dump (pypi.org)
| justsomeuser wrote:
| Is there any point in having a query builder API in a dynamic
| language?
|
| I can see the point in a typed language where you can ensure the
| query is correct at compile time.
|
| But in a dynamic language, you never know if the query is correct
| until the interpreter hits that line in the program at runtime.
|
| This is true with both raw SQL (error returned from server) and
| the query builder API (error returned from interpreter).
| aforwardslash wrote:
| > Is there any point in having a query builder API in a dynamic
| language?
|
| Yes, it eases the creation of queries and ensuring both
| escaping of parameters and quoting of columns and fields; It
| also allows the easy creation of dynamic queries based on lists
| of parameters, and facilitates both naming of tables and
| fields.
|
| > I can see the point in a typed language where you can ensure
| the query is correct at compile time.
|
| Given that most query builders allow arbitrary parameters (such
| as table and column names), you can't actually ensure the query
| is correct at compile time, regardless of the type of language.
| justsomeuser wrote:
| > Yes, it eases the creation of queries and ensuring both
| escaping of parameters and quoting of columns and fields; It
| also allows the easy creation of dynamic queries based on
| lists of parameters, and facilitates both naming of tables
| and fields.
|
| But in Python all these are checked at runtime.
|
| Just like it is possible to not quote columns and fields in
| raw SQL, it is possible to mis-use the query builder API and
| only find out at runtime on the same line as you would with
| raw SQL.
|
| Perhaps the dynamic queries may be easier with the query
| builder, but you still find out about issues at the same time
| as raw sql/string templates.
|
| > Given that most query builders allow arbitrary parameters
| (such as table and column names), you can't actually ensure
| the query is correct at compile time, regardless of the type
| of language.
|
| You cannot check that the query is correct for a given schema
| (missing tables/cols etc), but I think many typed languages
| would allow you to ensure the AST is correct at compile time.
| aforwardslash wrote:
| > Perhaps the dynamic queries may be easier with the query
| builder, but you still find out about issues at the same
| time as raw sql/string templates.
|
| As with everything, its a tradeoff and a matter of taste.
| SQL Query builders give you the flexibility of SQL, without
| relying on string concatenation directly, and making sure
| all the values are passed in order. Also, they usually can
| be extended to support multiple databases/adapters (imagine
| specifying values as $1, $2... vs ?,?, quoting identifiers
| using " or `, etc). They also usually fit better than plain
| strings in heavily programmatic contexts (imagine a query
| with different fields and filters depending on the value of
| a given field).
|
| > You cannot check that the query is correct for a given
| schema (missing tables/cols etc), but I think many typed
| languages would allow you to ensure the AST is correct at
| compile time.
|
| There are usually 2 types of query builders - SQL query
| builders, and DSL query builders. DSL ones are usually
| integrated with an ORM or something similar, and those are
| usually abstract enough to guarantee that the generated SQL
| and passed types are syntactically correct, even if invalid
| (ex. mentioning a column or a table that doesn't exist).
| SQL query builders (at least the ones I've seen) don't use
| an AST, as their purpose is to map methods to SQL chunks.
| They basically keep a list of different parts to assemble
| and then just build from those lists when generating the
| actual SQL. Some may provide object mapper capabilities,
| and in that case, some type checking may be performed.
| "May" in the sense that eg. a 14th century date is a valid
| date, but SQL Server will most certainly reject it, so in
| the end there are no "compile time guarantees" in this
| case.
| mattashii wrote:
| > Given that most query builders allow arbitrary parameters
| (such as table and column names), you can't actually ensure
| the query is correct at compile time, regardless of the type
| of language.
|
| JOOQ [0] does a great job of providing a type-safe query
| builder that can guarantee correct queries at compile-time.
| It indeed also supplies arbitrary strings for names and sql
| components which will break that guarantee, but apart from
| that you can extract your databases' schema into typed
| classes and use those to run your queries.
|
| I really like it, but it is definately not a replacement for
| an ORM.
|
| [0] https://github.com/jOOQ/jOOQ
| aforwardslash wrote:
| That is basically the description of an object mapper, with
| all the guarantees of an object mapper :). It seems if you
| actually use the query builder as such, no guarantees
| exist.
|
| I'm pretty picky regarding query builders and ORM's, to the
| extent of having written several of them over the years, in
| different languages (both dynamic and strong typed,
| unfortunately closed-source). I'm a strong advocate of
| schema-first design, and usually a query builder will allow
| you to design your queries explicitly, but having some
| internal behaviors (such as string concatenation,
| identifier quoting and automatic in-order separation of
| parameters and values to be bound) taken care of. As good
| examples of this, I'd mention golang's goqu
| (https://github.com/doug-martin/goqu) and - to some extent
| - C# SqlKata (https://sqlkata.com/). Following my
| frustrations with Python ORMs, I built my own toy project,
| sort-of-in-beta, called rickdb (https://github.com/oddbit-
| project/rick_db).
| nerdponx wrote:
| Also: composability and reusability.
| BiteCode_dev wrote:
| I'm not sure why the example starts with all those verbose calls
| to run_until_complete(). Why not do this? async
| def main(): db = DB() await
| db.connect('db_name', { 'host': 'localhost',
| 'port': '5432', 'database': 'db_name',
| 'username': 'db_user_name', 'password':
| 'db_user_password' }, default=True)
| await db.connect('other_db_name', { 'host':
| 'localhost', 'port': '5432',
| 'database': 'other_db_name', 'username':
| 'db_user_name', 'password': 'db_user_password'
| }, default=False) db.connection('other_db_name')
| db.default = 'other_db_name' await db.stop()
| asyncio.run(main())
|
| Am I missing something ? It's using asyncio.run() in other
| examples, so it assumes 3.7+ anyway.
| uranusjr wrote:
| They differ slightly in semantics; asyncio.run() always creates
| a new event loop to run the task, while run_until_complete()
| don't (unless you call asyncio.create_event_loop() of course).
| I don't know if this is the reason behind the decision, but the
| two are not equivalent (and there are cases asyncio.run()
| wouldn't work).
| BiteCode_dev wrote:
| Yes but await doesn't, which is what I replaced
| run_until_complete() with here.
|
| The asyncio.run() is just to pretend I run the lib snippet in
| a code that already started the loop, which the doc seems to
| do.
| jbverschoor wrote:
| I have no idea why the builder part should be async. Either that,
| or it's named badly, which usually comes from an unclear view of
| the problem domain
| yoavm wrote:
| I was thinking the same. It seems like Windyquery isn't really
| just a query-builder, it also executes the query and parses it.
| whalesalad wrote:
| Because this is more than a builder from the looks of it - it's
| executing and fetching data too. (I thought the same thing)
| wiredfool wrote:
| So it's a database API?
| ghostwriter wrote:
| The amount of code duplication in this language community after
| introduction of asyncio is staggering. Gevent should have been
| adopted into the core language, but instead people are re-
| inventing 'sqlalchemy', network libraries and entire web
| frameworks for the sake of two new keywords. Not to mention that
| a pure query builder should not be concerned of IO mode in the
| first place.
| nickserv wrote:
| It's much more than "2 new keywords", asynchronous programming
| is a different way of thinking about program flow and
| architecture.
|
| The thing is, your program is only as asynchronous as the
| lowest I/O level. It doesn't really make much sense to have
| your business logic take advantage of this, if your program
| will block on lower levels.
|
| So you DB connection, caching, HTTP requests, disk I/O, etc
| must be asynchronous to really reap the benefits. Since Python
| has been synchronous for something like 20 years, there's a lot
| of historical code that simply won't play nicely with the
| asynchronous approach.
|
| Hence the brand-new libraries.
| ghostwriter wrote:
| > The thing is, your program is only as asynchronous as the
| lowest I/O level. It doesn't really make much sense to have
| your business logic take advantage of this, if your program
| will block on lower levels.
|
| it's called non-blocking sockets, precisely the thing that
| 'gevent' brings on the table without breaking conventions of
| other existing interfaces, contrary to what 'asyncio' does -
| https://stackoverflow.com/questions/42448664/async-
| generator...
|
| And no, it's not only the lowest I/O level, since there are
| groups, queues, chains, and other primitives [1] that work on
| higher levels without special keywords too.
|
| [1] https://www.gevent.org/api/index.html
| willvarfar wrote:
| Yes, I remember the discussion before AsyncIO, and how
| Python could have morphed into the Golang -style CSP model.
| I don't think it was to simply adopt gevent, but rather to
| move the core into massive multithreading. Now we seem to
| have the worst of all worlds. The rejection of stacklets
| because Python runs on constrained devices didn't seem
| strong then either.
| orisho wrote:
| Geez, yes, absolutely. Python went from being my favorite
| language to being somewhere down there with Javascript, largely
| thanks to the ecosystem disaster async await brought on the
| language.
| BiteCode_dev wrote:
| > but instead people are re-inventing 'sqlalchemy'
|
| SQLalchemy last version is asyncio compatible now:
| https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.htm...
|
| > network libraries and entire web frameworks for the sake of
| two new keywords
|
| The trend in the community now is SansIO: https://sans-
| io.readthedocs.io
|
| You create entire libs without the IO parts, dealing only with
| the business logic.
|
| The you let the I/O libs to deal with the rest. This allows
| modern project to provides a sync and async API with the same
| protocol implementation. E.G: https://github.com/encode/httpx
|
| Also, I'm pretty glad gevent has not been adopted:
|
| - it's very hard to know if you are doing something wrong with
| gevent. Or doing something with gevent at all.
|
| - gevent magic patching broke stuff in unpredictable ways, and
| the solution was to disable part of it:
| https://stackoverflow.com/questions/8678307/gevent-monkeypat...
|
| - gevent didn't play well with stdlib threads and
| multiprocessing, (had its own thread pool, subprocess wrapper,
| and no multiprocess story) while asyncio has a very good
| interoperability story
|
| - the asyncio pluggable loop meant we didn't need a perfect
| solution out of the box. Thanks to that design, we have been
| able to integrate the nodejs loop, like with the uvloop
| project, giving birth to stuff like asgi, fastapi and uvicorn.
| Asyncio can now comparable async perfs as Go. (gevent does
| that, but forces you to, and having libuv in the stdlib is not
| something the core devs would want).
|
| - gevent is a 5mo package. It's 1/6 of the ENTIRE python
| installer.
|
| - the coroutine based design means we have a standard api for
| async, no matter the backend or implementation. Which means
| solution like trio were able to emerge, and eventually the core
| dev are now considering including trio-like scoping in the next
| asyncio iteration.
|
| There are plenty of problems with asyncio though:
|
| - create_task is basically a goto. See trio for a solution.
|
| - coro() does not run the coro task. In JS it does, and returns
| a future you can await, or not. This makes it very confusing to
| beginner that get lost in the await/return/create_task dance.
|
| - there is no concurrency limiting primitive, cancellation is
| hard and back pressure mitigation is non existent.
|
| I was very skeptical about asyncio, but after 3.7, it became
| quite nice. asyncio.run() made all the difference,
| ergonomically, and when dealing with errors.
| ghostwriter wrote:
| > - the coroutine based design means we have a standard api
| for async, no matter the backend or implementation. Which
| means solution like trio were able to emerge, and eventually
| the core dev are now considering including trio-like scoping
| in the next asyncio iteration.
|
| The structured API for concurrency that Trio provides, _in
| principle_ , doesn't depend on the said keyword-based
| approach - it's just what the author chose to use for the
| implementation. It's definitely not something that asyncio
| enabled for its users that wasn't possible without it. Here's
| another structured API for concurrency that doesn't depend on
| keywords [1]. It can be implemented with gevent.
|
| [1] https://hackage.haskell.org/package/async-2.2.3/docs/Cont
| rol...
| BiteCode_dev wrote:
| No, but it does mean that primitives in trio can be awaited
| in the same way than any async primitive instead of using a
| callback.
|
| I get that scoping trough nurseries is the trio selling
| point, but you do need to have the ability to run something
| "before" and "after" a lone async call, and I don't want to
| have to write a "with" block to wrap any single thing that
| I wish to await.
|
| By having "await" as a universal construct, we allow trio
| (or anything really) to offer an additional level of
| granularity - scoping a group of async calls - without
| losing the smaller one: scoping a single call.
| oauea wrote:
| > gevent is a 5mo package. It's 1/6 of the ENTIRE python
| installer.
|
| Oh no, that means I'll need 2 more floppies to distribute my
| software!
| BiteCode_dev wrote:
| If core dev would accept such budget for one lib, given
| stdlib is composed of a lot of them, each python install
| would be enormous.
|
| Not to mention python runs on constraints systems such as
| MicroPython, which features coroutines.
|
| HN, the place were people complain about how electron is
| bloated then ask you do copy that.
| ghostwriter wrote:
| > SQLalchemy last version is asyncio compatible now.
|
| it's been compatible for 5+ years already, Core API was
| driver-endependent even in 2015.
|
| > Also, I'm pretty glad gevent has not been adopted:
|
| When I say 'adopted' I mean integrated into the language
| runtime so that the entire stdlib is aware of it. Some of
| your points consider the state of gevent 10 years ago, that's
| not a benchmark for a conclusion on how successful a
| potential integration into the core language might be, and
| none of them point at _inherent_ issues with the
| implementation that make it unusable compared to asyncio.
| BiteCode_dev wrote:
| > , that's not a benchmark for a conclusion on how
| successful a potential integration into the core language
| might be
|
| That's fair.
|
| I'm also biased by the fact I really like to know when my
| code is suddenly going into async territory.
|
| I like the JS model for this: it's explicit enough that
| there is little surprise (promises are now a standard, so
| if something returns a promise, it's async), but implicit
| enough that I don't have to care (no loop to setup, no
| coroutine instantiation between calling and scheduling).
| And it aged well: future can now just be awaited, and you
| can simply keep the callback api in non async function.
|
| gevent is too implicit to my taste I guess. Now you could
| argue that we could make gevent calls explicit, but then it
| would be not that much different from asyncio.
|
| I would really be happy if the community would settle in
| something like: @autorun async
| def coro(): ...
|
| Then: coro() # this calls create_task(),
| and binds it to the upper nursery
|
| I think it would strike the right balance from a usability
| point of vue.
|
| As for the interoperability issue, it's a pretty though
| nuts to crack.
| BugsJustFindMe wrote:
| I keep waiting for the Python community to come to its senses
| and set asyncio on fire and start again...again. Everything
| that people do with asyncio is simpler and easier to follow
| using threads. The Thread/Process pool executors from
| concurrent futures were almost good interfaces, and then
| asyncio took a giant step backwards.
|
| "I want this to happen in the background without blocking"
| should never have turned into this terrible dance of rewriting
| everything using new magic keywords so that you can then wedge
| it into an asyncio event loop. It's almost like developers have
| no sense of UX elegance.
| helge9210 wrote:
| With asyncio and multiple processes you can achieve more than
| just with multiple processes.
|
| > "I want this to happen in the background without blocking"
|
| That is not the problem the asyncio is solving.
| BugsJustFindMe wrote:
| > _That is not the problem the asyncio is solving._
|
| It is, actually! All asynchronous code amounts to having
| some things occur in the background while other things
| occur. asyncio just does it with a clumsy interface that
| breaks everything that already exists.
| helge9210 wrote:
| Async is concurrent, but not parallel. Nothing is
| happening "in the background". In contrast with
| threads/processes where it is parallel.
| BugsJustFindMe wrote:
| > _Nothing is happening "in the background"._
|
| Your IO is (that's why the library is called "async io")!
|
| The entire reason the library exists is because all IO
| blocks at the lowest level, even so-called non-blocking
| IO, because computer science abstractions don't change
| physics, which is why sockets require buffers. At some
| point in the chain your process is asking the OS to do
| some IO, and at that point the process gets a choice
| between twiddling its thumbs or doing something else and
| coming back to check on the result later.
|
| The mechanical difference between using coroutines vs
| threads for IO, under the hood, is the process polling
| sockets itself vs letting the OS do it. The OS allocating
| chunks of time to checking different threads that are
| spinning their wheels waiting for an IO response that may
| or may not have arrived yet on an OS socket buffer is
| fundamentally equivalent to the process allocating chunks
| of time to checking different coroutines that are
| spinning their wheels waiting for an IO response that may
| or may not have arrived yet on an OS socket buffer. The
| only difference is which layer does the spinning, the
| OS's schedule timer or the process's event loop timer
| (which of course itself runs at the whim of the OS's
| schedule timer). Are there performance differences
| between those approaches? Yes because creating OS threads
| has more overhead because they're fundamentally more
| powerful, but that's a technical detail that should be
| hidden, not exposed. Introducing an interface that
| requires new magic keywords and behavior in the
| underlying requests so that everything needs to be
| rewritten is terrible when the actual desire is _always_
| some grammatical variation of "I don't want every IO
| request to prevent using the CPU until it completes".
| lopatin wrote:
| What's the alternative with Gevent? Passing callbacks
| everywhere? Is that somehow better than async/await syntax?
|
| Libraries will still have to be re-written and public
| interfaces will change. Might as well do it right, and not
| abandon the concept of functions having meaningful return
| values.
| sonthonax wrote:
| > What's the alternative with Gevent? Passing callbacks
| everywhere?
|
| You build on callbacks to create futures and queues. That's
| how you do async programming in any framework. AsyncIO works
| on top of this and so does Gevent. The only real difference
| between the two is that asyncio explicitly types functions
| that do IO. The fundamental advantage is little more than a
| typing decorator.
|
| The reason that JavaScript was such a crapshoot of callbacks
| was because the majority of JavaScript developers had no idea
| what a future/promise was.
| willvarfar wrote:
| Kudos, gorgeous. Would be great to have a generic version that
| can also connect to other databases too. Perhaps this is the
| start of an async interface that can be used instead of dbapi?
|
| One thing that stood out was the overloading of using functions
| sometimes and sql snippets other times. For example,
| .select('column').where('id', 1).where('name', 'Tom')
|
| or .select('column AS alias').where('id = ? AND
| name = ?', 1, 'Tom')
|
| This means the framework has to pass through the snippet as-is,
| rather than applying escaping etc.
|
| Although sql injection is usually through values, it can be
| through other parts of the string too, e.g. in many user query
| builder UIs the column names themselves may be tainted.
|
| I'm generally a fan of all or nothing, e.g. either go with
| building your own string, or else use helper functions for
| everything e.g.
| .select('column').as('alias').where('id').equals(1)...
|
| This means the framework can always correctly escape column names
| etc.
|
| I ran into problems like this when doing an InfluxDB connector
| for Presto. The Influx Java client library mixed this up so it
| couldn't sanitise column names. I ended up just writing my own
| query builder.
| aforwardslash wrote:
| Shameless plug - you can have a look at my pet project
| https://github.com/oddbit-project/rick_db; Its explicitly not
| async, but the query builder is detached from the DB querying
| logic, and can be used separately.
| salmo wrote:
| Async ACID DB interfaces are so weird to me. I guess the async
| all the things! is frustrating since I'm brought in when things
| go boom, and folks that couldn't deal with parallelism are all up
| in concurrency and baffled by the result.
|
| I get going for throughput over transactional performance. But I
| don't find that to be a practical goal in most business systems.
|
| In my experience every dev team that goes this route thinks
| "async is faster", then gets confused when their latency shifts
| dramatically under load. Or their CPU usage jumps from 20% to all
| the CPU they can get.
|
| It's especially odd with the (overly) microservices trend. A
| bunch of variable latency calls with a ton of network I/O
| overhead required to service 1 customer request. Shove those into
| tiny containers without any tuning and wooo!
|
| And probably exacerbating this is the trend to tack on
| afterthought async frameworks to languages like Python and Java,
| vs systems designed with it originally like Erlang and even Go.
|
| Folks familiar with Java or Python are suddenly lost
| troubleshooting. Stack traces, thread dumps, etc. become
| confusing. It's not impossible, but the majority of developers
| are procedural themselves. I've only known a handful (out of
| thousands of contractors and employees) that were familiar with
| syscalls or could find useful info in a JFR.
|
| All that said, I'm looking at "big old company" workloads. Python
| for analytics, where it's a friendly wrapper around C/FORTRAN,
| and Java for most everything else. And most async use is what I
| would call "YouTube driven development."
|
| Ok, I've gone full "kids these days."
|
| It's interesting stuff, for sure. I just wish there were giant
| disclaimers around it for every library or framework.
| Troubleshooting async performance issues is my new
| troubleshooting C/C++ memory issues of old.
| [deleted]
| ralusek wrote:
| Avoiding async in order to avoid parallel database calls
| literally only makes sense if you're choosing between
| blocking/non-blocking on a single process on a single server
| instance. The second you scale up to use multiple
| processes/threads and/or multiple servers, you're going to be
| dealing with concurrent db calls regardless of whether or not
| your code is written as blocking or non-blocking.
|
| So not only are you saying that you insist on writing blocking
| code in order to avoid parallelism, but you're also saying that
| you'll limit yourself to a single execution process. Assuming a
| 50ms round trip to the db, you're basically setting a best case
| scenario of ~20 requests per second.
___________________________________________________________________
(page generated 2021-05-10 23:02 UTC)