[HN Gopher] Windyquery: A non-blocking Python PostgreSQL query b...
       ___________________________________________________________________
        
       Windyquery: A non-blocking Python PostgreSQL query builder
        
       Author : asiftr
       Score  : 76 points
       Date   : 2021-05-10 05:36 UTC (17 hours ago)
        
 (HTM) web link (pypi.org)
 (TXT) w3m dump (pypi.org)
        
       | justsomeuser wrote:
       | Is there any point in having a query builder API in a dynamic
       | language?
       | 
       | I can see the point in a typed language where you can ensure the
       | query is correct at compile time.
       | 
       | But in a dynamic language, you never know if the query is correct
       | until the interpreter hits that line in the program at runtime.
       | 
       | This is true with both raw SQL (error returned from server) and
       | the query builder API (error returned from interpreter).
        
         | aforwardslash wrote:
         | > Is there any point in having a query builder API in a dynamic
         | language?
         | 
         | Yes, it eases the creation of queries and ensuring both
         | escaping of parameters and quoting of columns and fields; It
         | also allows the easy creation of dynamic queries based on lists
         | of parameters, and facilitates both naming of tables and
         | fields.
         | 
         | > I can see the point in a typed language where you can ensure
         | the query is correct at compile time.
         | 
         | Given that most query builders allow arbitrary parameters (such
         | as table and column names), you can't actually ensure the query
         | is correct at compile time, regardless of the type of language.
        
           | justsomeuser wrote:
           | > Yes, it eases the creation of queries and ensuring both
           | escaping of parameters and quoting of columns and fields; It
           | also allows the easy creation of dynamic queries based on
           | lists of parameters, and facilitates both naming of tables
           | and fields.
           | 
           | But in Python all these are checked at runtime.
           | 
           | Just like it is possible to not quote columns and fields in
           | raw SQL, it is possible to mis-use the query builder API and
           | only find out at runtime on the same line as you would with
           | raw SQL.
           | 
           | Perhaps the dynamic queries may be easier with the query
           | builder, but you still find out about issues at the same time
           | as raw sql/string templates.
           | 
           | > Given that most query builders allow arbitrary parameters
           | (such as table and column names), you can't actually ensure
           | the query is correct at compile time, regardless of the type
           | of language.
           | 
           | You cannot check that the query is correct for a given schema
           | (missing tables/cols etc), but I think many typed languages
           | would allow you to ensure the AST is correct at compile time.
        
             | aforwardslash wrote:
             | > Perhaps the dynamic queries may be easier with the query
             | builder, but you still find out about issues at the same
             | time as raw sql/string templates.
             | 
             | As with everything, its a tradeoff and a matter of taste.
             | SQL Query builders give you the flexibility of SQL, without
             | relying on string concatenation directly, and making sure
             | all the values are passed in order. Also, they usually can
             | be extended to support multiple databases/adapters (imagine
             | specifying values as $1, $2... vs ?,?, quoting identifiers
             | using " or `, etc). They also usually fit better than plain
             | strings in heavily programmatic contexts (imagine a query
             | with different fields and filters depending on the value of
             | a given field).
             | 
             | > You cannot check that the query is correct for a given
             | schema (missing tables/cols etc), but I think many typed
             | languages would allow you to ensure the AST is correct at
             | compile time.
             | 
             | There are usually 2 types of query builders - SQL query
             | builders, and DSL query builders. DSL ones are usually
             | integrated with an ORM or something similar, and those are
             | usually abstract enough to guarantee that the generated SQL
             | and passed types are syntactically correct, even if invalid
             | (ex. mentioning a column or a table that doesn't exist).
             | SQL query builders (at least the ones I've seen) don't use
             | an AST, as their purpose is to map methods to SQL chunks.
             | They basically keep a list of different parts to assemble
             | and then just build from those lists when generating the
             | actual SQL. Some may provide object mapper capabilities,
             | and in that case, some type checking may be performed.
             | "May" in the sense that eg. a 14th century date is a valid
             | date, but SQL Server will most certainly reject it, so in
             | the end there are no "compile time guarantees" in this
             | case.
        
           | mattashii wrote:
           | > Given that most query builders allow arbitrary parameters
           | (such as table and column names), you can't actually ensure
           | the query is correct at compile time, regardless of the type
           | of language.
           | 
           | JOOQ [0] does a great job of providing a type-safe query
           | builder that can guarantee correct queries at compile-time.
           | It indeed also supplies arbitrary strings for names and sql
           | components which will break that guarantee, but apart from
           | that you can extract your databases' schema into typed
           | classes and use those to run your queries.
           | 
           | I really like it, but it is definately not a replacement for
           | an ORM.
           | 
           | [0] https://github.com/jOOQ/jOOQ
        
             | aforwardslash wrote:
             | That is basically the description of an object mapper, with
             | all the guarantees of an object mapper :). It seems if you
             | actually use the query builder as such, no guarantees
             | exist.
             | 
             | I'm pretty picky regarding query builders and ORM's, to the
             | extent of having written several of them over the years, in
             | different languages (both dynamic and strong typed,
             | unfortunately closed-source). I'm a strong advocate of
             | schema-first design, and usually a query builder will allow
             | you to design your queries explicitly, but having some
             | internal behaviors (such as string concatenation,
             | identifier quoting and automatic in-order separation of
             | parameters and values to be bound) taken care of. As good
             | examples of this, I'd mention golang's goqu
             | (https://github.com/doug-martin/goqu) and - to some extent
             | - C# SqlKata (https://sqlkata.com/). Following my
             | frustrations with Python ORMs, I built my own toy project,
             | sort-of-in-beta, called rickdb (https://github.com/oddbit-
             | project/rick_db).
        
           | nerdponx wrote:
           | Also: composability and reusability.
        
       | BiteCode_dev wrote:
       | I'm not sure why the example starts with all those verbose calls
       | to run_until_complete(). Why not do this?                   async
       | def main():                      db = DB()             await
       | db.connect('db_name', {                 'host': 'localhost',
       | 'port': '5432',                 'database': 'db_name',
       | 'username': 'db_user_name',                 'password':
       | 'db_user_password'             }, default=True)
       | await db.connect('other_db_name', {                 'host':
       | 'localhost',                 'port': '5432',
       | 'database': 'other_db_name',                 'username':
       | 'db_user_name',                 'password': 'db_user_password'
       | }, default=False)                  db.connection('other_db_name')
       | db.default = 'other_db_name'                  await db.stop()
       | asyncio.run(main())
       | 
       | Am I missing something ? It's using asyncio.run() in other
       | examples, so it assumes 3.7+ anyway.
        
         | uranusjr wrote:
         | They differ slightly in semantics; asyncio.run() always creates
         | a new event loop to run the task, while run_until_complete()
         | don't (unless you call asyncio.create_event_loop() of course).
         | I don't know if this is the reason behind the decision, but the
         | two are not equivalent (and there are cases asyncio.run()
         | wouldn't work).
        
           | BiteCode_dev wrote:
           | Yes but await doesn't, which is what I replaced
           | run_until_complete() with here.
           | 
           | The asyncio.run() is just to pretend I run the lib snippet in
           | a code that already started the loop, which the doc seems to
           | do.
        
       | jbverschoor wrote:
       | I have no idea why the builder part should be async. Either that,
       | or it's named badly, which usually comes from an unclear view of
       | the problem domain
        
         | yoavm wrote:
         | I was thinking the same. It seems like Windyquery isn't really
         | just a query-builder, it also executes the query and parses it.
        
         | whalesalad wrote:
         | Because this is more than a builder from the looks of it - it's
         | executing and fetching data too. (I thought the same thing)
        
           | wiredfool wrote:
           | So it's a database API?
        
       | ghostwriter wrote:
       | The amount of code duplication in this language community after
       | introduction of asyncio is staggering. Gevent should have been
       | adopted into the core language, but instead people are re-
       | inventing 'sqlalchemy', network libraries and entire web
       | frameworks for the sake of two new keywords. Not to mention that
       | a pure query builder should not be concerned of IO mode in the
       | first place.
        
         | nickserv wrote:
         | It's much more than "2 new keywords", asynchronous programming
         | is a different way of thinking about program flow and
         | architecture.
         | 
         | The thing is, your program is only as asynchronous as the
         | lowest I/O level. It doesn't really make much sense to have
         | your business logic take advantage of this, if your program
         | will block on lower levels.
         | 
         | So you DB connection, caching, HTTP requests, disk I/O, etc
         | must be asynchronous to really reap the benefits. Since Python
         | has been synchronous for something like 20 years, there's a lot
         | of historical code that simply won't play nicely with the
         | asynchronous approach.
         | 
         | Hence the brand-new libraries.
        
           | ghostwriter wrote:
           | > The thing is, your program is only as asynchronous as the
           | lowest I/O level. It doesn't really make much sense to have
           | your business logic take advantage of this, if your program
           | will block on lower levels.
           | 
           | it's called non-blocking sockets, precisely the thing that
           | 'gevent' brings on the table without breaking conventions of
           | other existing interfaces, contrary to what 'asyncio' does -
           | https://stackoverflow.com/questions/42448664/async-
           | generator...
           | 
           | And no, it's not only the lowest I/O level, since there are
           | groups, queues, chains, and other primitives [1] that work on
           | higher levels without special keywords too.
           | 
           | [1] https://www.gevent.org/api/index.html
        
             | willvarfar wrote:
             | Yes, I remember the discussion before AsyncIO, and how
             | Python could have morphed into the Golang -style CSP model.
             | I don't think it was to simply adopt gevent, but rather to
             | move the core into massive multithreading. Now we seem to
             | have the worst of all worlds. The rejection of stacklets
             | because Python runs on constrained devices didn't seem
             | strong then either.
        
         | orisho wrote:
         | Geez, yes, absolutely. Python went from being my favorite
         | language to being somewhere down there with Javascript, largely
         | thanks to the ecosystem disaster async await brought on the
         | language.
        
         | BiteCode_dev wrote:
         | > but instead people are re-inventing 'sqlalchemy'
         | 
         | SQLalchemy last version is asyncio compatible now:
         | https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.htm...
         | 
         | > network libraries and entire web frameworks for the sake of
         | two new keywords
         | 
         | The trend in the community now is SansIO: https://sans-
         | io.readthedocs.io
         | 
         | You create entire libs without the IO parts, dealing only with
         | the business logic.
         | 
         | The you let the I/O libs to deal with the rest. This allows
         | modern project to provides a sync and async API with the same
         | protocol implementation. E.G: https://github.com/encode/httpx
         | 
         | Also, I'm pretty glad gevent has not been adopted:
         | 
         | - it's very hard to know if you are doing something wrong with
         | gevent. Or doing something with gevent at all.
         | 
         | - gevent magic patching broke stuff in unpredictable ways, and
         | the solution was to disable part of it:
         | https://stackoverflow.com/questions/8678307/gevent-monkeypat...
         | 
         | - gevent didn't play well with stdlib threads and
         | multiprocessing, (had its own thread pool, subprocess wrapper,
         | and no multiprocess story) while asyncio has a very good
         | interoperability story
         | 
         | - the asyncio pluggable loop meant we didn't need a perfect
         | solution out of the box. Thanks to that design, we have been
         | able to integrate the nodejs loop, like with the uvloop
         | project, giving birth to stuff like asgi, fastapi and uvicorn.
         | Asyncio can now comparable async perfs as Go. (gevent does
         | that, but forces you to, and having libuv in the stdlib is not
         | something the core devs would want).
         | 
         | - gevent is a 5mo package. It's 1/6 of the ENTIRE python
         | installer.
         | 
         | - the coroutine based design means we have a standard api for
         | async, no matter the backend or implementation. Which means
         | solution like trio were able to emerge, and eventually the core
         | dev are now considering including trio-like scoping in the next
         | asyncio iteration.
         | 
         | There are plenty of problems with asyncio though:
         | 
         | - create_task is basically a goto. See trio for a solution.
         | 
         | - coro() does not run the coro task. In JS it does, and returns
         | a future you can await, or not. This makes it very confusing to
         | beginner that get lost in the await/return/create_task dance.
         | 
         | - there is no concurrency limiting primitive, cancellation is
         | hard and back pressure mitigation is non existent.
         | 
         | I was very skeptical about asyncio, but after 3.7, it became
         | quite nice. asyncio.run() made all the difference,
         | ergonomically, and when dealing with errors.
        
           | ghostwriter wrote:
           | > - the coroutine based design means we have a standard api
           | for async, no matter the backend or implementation. Which
           | means solution like trio were able to emerge, and eventually
           | the core dev are now considering including trio-like scoping
           | in the next asyncio iteration.
           | 
           | The structured API for concurrency that Trio provides, _in
           | principle_ , doesn't depend on the said keyword-based
           | approach - it's just what the author chose to use for the
           | implementation. It's definitely not something that asyncio
           | enabled for its users that wasn't possible without it. Here's
           | another structured API for concurrency that doesn't depend on
           | keywords [1]. It can be implemented with gevent.
           | 
           | [1] https://hackage.haskell.org/package/async-2.2.3/docs/Cont
           | rol...
        
             | BiteCode_dev wrote:
             | No, but it does mean that primitives in trio can be awaited
             | in the same way than any async primitive instead of using a
             | callback.
             | 
             | I get that scoping trough nurseries is the trio selling
             | point, but you do need to have the ability to run something
             | "before" and "after" a lone async call, and I don't want to
             | have to write a "with" block to wrap any single thing that
             | I wish to await.
             | 
             | By having "await" as a universal construct, we allow trio
             | (or anything really) to offer an additional level of
             | granularity - scoping a group of async calls - without
             | losing the smaller one: scoping a single call.
        
           | oauea wrote:
           | > gevent is a 5mo package. It's 1/6 of the ENTIRE python
           | installer.
           | 
           | Oh no, that means I'll need 2 more floppies to distribute my
           | software!
        
             | BiteCode_dev wrote:
             | If core dev would accept such budget for one lib, given
             | stdlib is composed of a lot of them, each python install
             | would be enormous.
             | 
             | Not to mention python runs on constraints systems such as
             | MicroPython, which features coroutines.
             | 
             | HN, the place were people complain about how electron is
             | bloated then ask you do copy that.
        
           | ghostwriter wrote:
           | > SQLalchemy last version is asyncio compatible now.
           | 
           | it's been compatible for 5+ years already, Core API was
           | driver-endependent even in 2015.
           | 
           | > Also, I'm pretty glad gevent has not been adopted:
           | 
           | When I say 'adopted' I mean integrated into the language
           | runtime so that the entire stdlib is aware of it. Some of
           | your points consider the state of gevent 10 years ago, that's
           | not a benchmark for a conclusion on how successful a
           | potential integration into the core language might be, and
           | none of them point at _inherent_ issues with the
           | implementation that make it unusable compared to asyncio.
        
             | BiteCode_dev wrote:
             | > , that's not a benchmark for a conclusion on how
             | successful a potential integration into the core language
             | might be
             | 
             | That's fair.
             | 
             | I'm also biased by the fact I really like to know when my
             | code is suddenly going into async territory.
             | 
             | I like the JS model for this: it's explicit enough that
             | there is little surprise (promises are now a standard, so
             | if something returns a promise, it's async), but implicit
             | enough that I don't have to care (no loop to setup, no
             | coroutine instantiation between calling and scheduling).
             | And it aged well: future can now just be awaited, and you
             | can simply keep the callback api in non async function.
             | 
             | gevent is too implicit to my taste I guess. Now you could
             | argue that we could make gevent calls explicit, but then it
             | would be not that much different from asyncio.
             | 
             | I would really be happy if the community would settle in
             | something like:                   @autorun         async
             | def coro(): ...
             | 
             | Then:                   coro() # this calls create_task(),
             | and binds it to the upper nursery
             | 
             | I think it would strike the right balance from a usability
             | point of vue.
             | 
             | As for the interoperability issue, it's a pretty though
             | nuts to crack.
        
         | BugsJustFindMe wrote:
         | I keep waiting for the Python community to come to its senses
         | and set asyncio on fire and start again...again. Everything
         | that people do with asyncio is simpler and easier to follow
         | using threads. The Thread/Process pool executors from
         | concurrent futures were almost good interfaces, and then
         | asyncio took a giant step backwards.
         | 
         | "I want this to happen in the background without blocking"
         | should never have turned into this terrible dance of rewriting
         | everything using new magic keywords so that you can then wedge
         | it into an asyncio event loop. It's almost like developers have
         | no sense of UX elegance.
        
           | helge9210 wrote:
           | With asyncio and multiple processes you can achieve more than
           | just with multiple processes.
           | 
           | > "I want this to happen in the background without blocking"
           | 
           | That is not the problem the asyncio is solving.
        
             | BugsJustFindMe wrote:
             | > _That is not the problem the asyncio is solving._
             | 
             | It is, actually! All asynchronous code amounts to having
             | some things occur in the background while other things
             | occur. asyncio just does it with a clumsy interface that
             | breaks everything that already exists.
        
               | helge9210 wrote:
               | Async is concurrent, but not parallel. Nothing is
               | happening "in the background". In contrast with
               | threads/processes where it is parallel.
        
               | BugsJustFindMe wrote:
               | > _Nothing is happening "in the background"._
               | 
               | Your IO is (that's why the library is called "async io")!
               | 
               | The entire reason the library exists is because all IO
               | blocks at the lowest level, even so-called non-blocking
               | IO, because computer science abstractions don't change
               | physics, which is why sockets require buffers. At some
               | point in the chain your process is asking the OS to do
               | some IO, and at that point the process gets a choice
               | between twiddling its thumbs or doing something else and
               | coming back to check on the result later.
               | 
               | The mechanical difference between using coroutines vs
               | threads for IO, under the hood, is the process polling
               | sockets itself vs letting the OS do it. The OS allocating
               | chunks of time to checking different threads that are
               | spinning their wheels waiting for an IO response that may
               | or may not have arrived yet on an OS socket buffer is
               | fundamentally equivalent to the process allocating chunks
               | of time to checking different coroutines that are
               | spinning their wheels waiting for an IO response that may
               | or may not have arrived yet on an OS socket buffer. The
               | only difference is which layer does the spinning, the
               | OS's schedule timer or the process's event loop timer
               | (which of course itself runs at the whim of the OS's
               | schedule timer). Are there performance differences
               | between those approaches? Yes because creating OS threads
               | has more overhead because they're fundamentally more
               | powerful, but that's a technical detail that should be
               | hidden, not exposed. Introducing an interface that
               | requires new magic keywords and behavior in the
               | underlying requests so that everything needs to be
               | rewritten is terrible when the actual desire is _always_
               | some grammatical variation of "I don't want every IO
               | request to prevent using the CPU until it completes".
        
         | lopatin wrote:
         | What's the alternative with Gevent? Passing callbacks
         | everywhere? Is that somehow better than async/await syntax?
         | 
         | Libraries will still have to be re-written and public
         | interfaces will change. Might as well do it right, and not
         | abandon the concept of functions having meaningful return
         | values.
        
           | sonthonax wrote:
           | > What's the alternative with Gevent? Passing callbacks
           | everywhere?
           | 
           | You build on callbacks to create futures and queues. That's
           | how you do async programming in any framework. AsyncIO works
           | on top of this and so does Gevent. The only real difference
           | between the two is that asyncio explicitly types functions
           | that do IO. The fundamental advantage is little more than a
           | typing decorator.
           | 
           | The reason that JavaScript was such a crapshoot of callbacks
           | was because the majority of JavaScript developers had no idea
           | what a future/promise was.
        
       | willvarfar wrote:
       | Kudos, gorgeous. Would be great to have a generic version that
       | can also connect to other databases too. Perhaps this is the
       | start of an async interface that can be used instead of dbapi?
       | 
       | One thing that stood out was the overloading of using functions
       | sometimes and sql snippets other times. For example,
       | .select('column').where('id', 1).where('name', 'Tom')
       | 
       | or                   .select('column AS alias').where('id = ? AND
       | name = ?', 1, 'Tom')
       | 
       | This means the framework has to pass through the snippet as-is,
       | rather than applying escaping etc.
       | 
       | Although sql injection is usually through values, it can be
       | through other parts of the string too, e.g. in many user query
       | builder UIs the column names themselves may be tainted.
       | 
       | I'm generally a fan of all or nothing, e.g. either go with
       | building your own string, or else use helper functions for
       | everything e.g.
       | .select('column').as('alias').where('id').equals(1)...
       | 
       | This means the framework can always correctly escape column names
       | etc.
       | 
       | I ran into problems like this when doing an InfluxDB connector
       | for Presto. The Influx Java client library mixed this up so it
       | couldn't sanitise column names. I ended up just writing my own
       | query builder.
        
         | aforwardslash wrote:
         | Shameless plug - you can have a look at my pet project
         | https://github.com/oddbit-project/rick_db; Its explicitly not
         | async, but the query builder is detached from the DB querying
         | logic, and can be used separately.
        
       | salmo wrote:
       | Async ACID DB interfaces are so weird to me. I guess the async
       | all the things! is frustrating since I'm brought in when things
       | go boom, and folks that couldn't deal with parallelism are all up
       | in concurrency and baffled by the result.
       | 
       | I get going for throughput over transactional performance. But I
       | don't find that to be a practical goal in most business systems.
       | 
       | In my experience every dev team that goes this route thinks
       | "async is faster", then gets confused when their latency shifts
       | dramatically under load. Or their CPU usage jumps from 20% to all
       | the CPU they can get.
       | 
       | It's especially odd with the (overly) microservices trend. A
       | bunch of variable latency calls with a ton of network I/O
       | overhead required to service 1 customer request. Shove those into
       | tiny containers without any tuning and wooo!
       | 
       | And probably exacerbating this is the trend to tack on
       | afterthought async frameworks to languages like Python and Java,
       | vs systems designed with it originally like Erlang and even Go.
       | 
       | Folks familiar with Java or Python are suddenly lost
       | troubleshooting. Stack traces, thread dumps, etc. become
       | confusing. It's not impossible, but the majority of developers
       | are procedural themselves. I've only known a handful (out of
       | thousands of contractors and employees) that were familiar with
       | syscalls or could find useful info in a JFR.
       | 
       | All that said, I'm looking at "big old company" workloads. Python
       | for analytics, where it's a friendly wrapper around C/FORTRAN,
       | and Java for most everything else. And most async use is what I
       | would call "YouTube driven development."
       | 
       | Ok, I've gone full "kids these days."
       | 
       | It's interesting stuff, for sure. I just wish there were giant
       | disclaimers around it for every library or framework.
       | Troubleshooting async performance issues is my new
       | troubleshooting C/C++ memory issues of old.
        
         | [deleted]
        
         | ralusek wrote:
         | Avoiding async in order to avoid parallel database calls
         | literally only makes sense if you're choosing between
         | blocking/non-blocking on a single process on a single server
         | instance. The second you scale up to use multiple
         | processes/threads and/or multiple servers, you're going to be
         | dealing with concurrent db calls regardless of whether or not
         | your code is written as blocking or non-blocking.
         | 
         | So not only are you saying that you insist on writing blocking
         | code in order to avoid parallelism, but you're also saying that
         | you'll limit yourself to a single execution process. Assuming a
         | 50ms round trip to the db, you're basically setting a best case
         | scenario of ~20 requests per second.
        
       ___________________________________________________________________
       (page generated 2021-05-10 23:02 UTC)