[HN Gopher] Show HN: Sqlbind a Python library to compose raw SQL
       ___________________________________________________________________
        
       Show HN: Sqlbind a Python library to compose raw SQL
        
       Author : bvrmn
       Score  : 61 points
       Date   : 2024-02-26 17:22 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | graemep wrote:
       | I think the point about not know what SQL an ORM generated query
       | is the one that resonates with me, as does the dig at Django ORM.
       | 
       | I love Django, including aspects of the ORM (its very easy to
       | write schema, migrations are pretty easy to use, even queries are
       | very concise and easy to write) but I have no idea what the SQL
       | will look like except in the simplest cases.
       | 
       | I often do not particularly care, and it is easy enough to see
       | them when I do, but I feel a bit icky about the extra layer of
       | stuff I do not really understand.
        
         | westurner wrote:
         | "Is ORM still an anti-pattern?" (2023)
         | 
         | > _In truth the best way to do the data layer is to use stored
         | procs with a generated binding at the application layer. This
         | is absolutely safe from injection, and is wicked fast as well._
         | 
         | [Who must keep stored procedures in sync with migrations,]
         | 
         | https://news.ycombinator.com/item?id=36497613#36503998
        
           | pmontra wrote:
           | I used the stored procedures approach sometimes. The usual
           | criticisms are
           | 
           | 1. Almost nobody in the team knows SQL, stored procedures
           | more so.
           | 
           | 2. It's not possible to deploy and test stored procedures
           | with the usual tooling. Actually I don't know what's the
           | usual way to do it in projects that go with that approach.
        
             | _a_a_a_ wrote:
             | Perhaps I can help, or maybe learn something.
             | 
             | I don't really understand the confusion here, you write a
             | stored procedure, you document it, test it, peer review it,
             | then stuff it into the database. You then write code that
             | uses it. No sarcasm, but what's the problem?
        
               | robertlagrant wrote:
               | > you ... test it
               | 
               | I think they're referring to ongoing automated tests, and
               | not one-off tests.
        
               | pmontra wrote:
               | Yes, what is the normal way to unit test stored
               | procedures? Dev teams used to Node, Python, Ruby or
               | whatever language have tools that work with ORMs to setup
               | a test database, run tests and zero it. I did wrote some
               | stored procedures in a Rails migration many years ago but
               | that's not the place where one would normally write code.
               | Furthermore there will be many migrations like that,
               | almost one for every change to the stored procedure. The
               | problem is in the frameworks, that are not designed for
               | that. So, what's the recommended way to handle them?
        
               | _a_a_a_ wrote:
               | Right, I'm starting to get the picture. I've never worked
               | with an ORM as my SQL is pretty solid. I have to put my
               | hands up here and say I don't know, sorry.
        
         | jiaaro wrote:
         | If you use postgres, psycopg provides `cursor.mogrify(query,
         | params)` which returns the actual query that will be executed .
         | For example:
         | cursor.mogrify(*queryset.query.sql_with_params())
         | 
         | Alternatively, you can set the log level of the
         | `django.db.backends` logger to DEBUG to see all executed
         | queries
        
           | westurner wrote:
           | https://docs.djangoproject.com/en/5.0/topics/db/sql/ :
           | 
           | > _Django gives you two ways of performing raw SQL queries:
           | you can use `Manager.raw()` to perform raw queries and return
           | model instances, or you can avoid the model layer entirely
           | and execute custom SQL directly._
           | 
           | https://stackoverflow.com/questions/1074212/how-can-i-see-
           | th... has :
           | MyModel.objects.all().query.sql_with_params()
           | str(MyModel.objects.all().query)
           | 
           | And:                 from django.db import connection
           | from myapp.models import SomeModel       queryset =
           | SomeModel.objects.filter(foo='bar')       sql_query, params =
           | queryset.query.as_sql(None, connection)              with
           | connection.connection.cursor(cursor_factory=DictCursor) as
           | cursor:           cursor.execute(sql_query, params)
           | data = cursor.fetchall()
           | 
           | But that's still not backend-specific SQL?
           | 
           | There should be an interface method for this. Why does
           | psycopg call it mogrify?
           | 
           | https://django-debug-
           | toolbar.readthedocs.io/en/latest/panels... :
           | 
           | > debug_toolbar.panels.sql.SQLPanel: _SQL queries including
           | time to execute and links to EXPLAIN each query_
           | 
           | But debug toolbars mostly don't work with APIs.
           | 
           | https://github.com/django-query-profiler/django-query-
           | profil... :
           | 
           | > _Django query profiler - one profiler to rule them all.
           | Shows queries, detects N+1 and gives recommendations on how
           | to resolve them_
           | 
           | https://github.com/jazzband/django-silk :
           | 
           | > _Silk is a live profiling and inspection tool for the
           | Django framework. Silk intercepts and stores HTTP requests
           | and database queries before presenting them in a user
           | interface for further inspection_
        
             | jiaaro wrote:
             | `str(queryset.query)` does not give you executable SQL.
             | Query parameters are not escaped or encoded in that
             | representation of the query. That escaping/interpolation of
             | query parameters is performed by mogrify. I agree the name
             | is odd, and I don't know why they use it other than
             | "transmogrify" being an obscure word for "magically
             | transform".
             | 
             | debug_toolbar and silk are both tools that show you what
             | queries were executed by your running application. They're
             | both good tools, but neither quite solves the problem of
             | giving you the executable SQL for a specific queryset
             | (e.g., in a repl).
        
               | westurner wrote:
               | There are a number of solutions listed in that
               | stackoverflow post for logging queries, but AFAIU none
               | call mogrify().
               | 
               | Maybe it should be called dialectize() or
               | to_sql(parametrized=True)
        
         | WesleyJohnson wrote:
         | I really wish Django would let you write raw queries, but wrap
         | them in an object that still allows `.annotate()` and
         | `.filter`, etc. I know you can do Model.objects.raw(), but I'm
         | talking about a more complex query with nested joins, etc.
         | 
         | I understand why it doesn't allow that, and probably never can,
         | but it would be nice when you need author OLAP queries but want
         | to continue to use as much of the ORM as you can.
        
           | specialist wrote:
           | (I'm not a Python dev. Sorry.)
           | 
           | Guessing .annotation() for runtime type info? .filter()
           | avoids re-executing query just to get a subset?
           | 
           | How do the (canonical utility methods) relate to nested
           | JOINs? Because field types of nested sets aren't visible?
           | 
           | TIA.
        
         | agumonkey wrote:
         | there Queryset.query but it's not obvious unless you read every
         | line of the docs
        
           | airstrike wrote:
           | that's a pretty basic thing to spot in the docs and Django
           | docs are actually godtier
        
             | agumonkey wrote:
             | I have 90% allergy to their doc style. The only worse are
             | old javadocs. I like python stdlib model, so I assumed it
             | would be as nice.. but here it's a blend of newb info (oh
             | look Forms!) and partial technical description. To the
             | point that every time you ask something online.. people
             | tell you straight away try ccbv or "read django source",
             | which should be the first things listed on their website.
        
           | jd3 wrote:
           | also manage.py shell_plus --print-sql will print the sql
           | after each command you enter
           | 
           | you can also print qs.query, qs.explain("analyze"), etc etc
        
         | JodieBenitez wrote:
         | > but I have no idea what the SQL will look like
         | 
         | Why don't you print QuerySet.query ?
        
         | airstrike wrote:
         | > but I have no idea what the SQL will look like except in the
         | simplest cases
         | 
         | that's really on you as Django makes it very easy to find the
         | answer to that because Python makes it very easy to inspect
         | objects at runtime
        
           | graemep wrote:
           | that is what i meant by 'easy enough to see them'
        
       | kissgyorgy wrote:
       | The proposed API is really strange and not intuitive, there is a
       | lot of black magic going on. q/email? which just returns a
       | question mark and the value storage happens in q. Usually when
       | designing a library, explicitness and the Principle of Least
       | Surprise are very good principles. Don't write smart code, write
       | straightforward code!
        
         | bvrmn wrote:
         | Unfortunately straightforward code leads to spaghetti mess in
         | my domain: big reporting queries. API solves an issue of
         | binding a value in exact place of raw SQL query.
        
       | raphinou wrote:
       | Interesting to see this as I recently did a show HN without much
       | success about an F# library also handling raw sql.
       | 
       | I enjoy working with SQL directly, though I understand it has
       | pros and cons. If you need specially crafted queries, it is a
       | must though.
       | 
       | My lib is at https://monazita.gitlab.io/monazita/ and show HN is
       | at https://news.ycombinator.com/item?id=39467742
        
       | nicolaslem wrote:
       | Isn't using f-strings to build SQL queries a large footgun? I can
       | imagine a junior developer introducing a SQL injection by not
       | completely grasping the sqlbind API.
       | 
       | Using the quickstart example, visually both lines are very
       | similar and both are functional, but one of them may introduce a
       | SQL injection:                   sql = f'SELECT * FROM table
       | WHERE field1 = {q/value1} AND field2 > {q/value2}'         sql =
       | f'SELECT * FROM table WHERE field1 = {value1} AND field2 >
       | {value2}'
        
         | hombre_fatal wrote:
         | I don't use Python so can't be bothered to read the source but
         | the general solution here is to make regular string
         | interpolation fail.
         | 
         | e.g. You must use the Q mechanism and doesn't work if you
         | forget it.
        
           | dougk wrote:
           | f-strings are part of the language so you need to change the
           | interpreter to make that fail. That will not play nice with
           | the rest of the codebase.
        
         | gpapilion wrote:
         | Any string interpolation would open you up to sql injection
         | without a lot of care. It generally recommended to used
         | prepared statements.
         | 
         | I guess the q part is the query parameter for the prepared
         | statement. I'd still be a bit antsy here given the ease of a
         | mistake.
         | 
         | I just don't get what it's providing over using my db's module.
         | Usually the dialect is close but different enough to require
         | changes to queries.
        
         | Retr0id wrote:
         | Proposed safe(r) API via decorated methods:
         | QParams = sqlbind.Dialect.some_dialect              @QParams
         | def make_my_query(value1: str, value2: int):             return
         | f'SELECT * FROM table WHERE field1 = {value1} AND field2 >
         | {value2}'              data =
         | connection.execute(*make_my_query(foo, bar))
         | 
         | Obviously this wouldn't work as-is, QParams would need to be
         | modified to support this. The decorator would wrap the method,
         | sanitising the args before they're even passed to the wrapped
         | function.
         | 
         | Edit: actually, I might've misunderstood what sqlbind is doing
         | internally, maybe this approach doesn't quite make sense
         | 
         | Edit 2: Like so: https://github.com/baverman/sqlbind/issues/1
        
           | chrisjc wrote:
           | If I understand your code correctly, you are bypassing
           | DBAPI's binding functionality and opening yourself up for SQL
           | injection.                   QParams =
           | sqlbind.Dialect.some_dialect              @QParams
           | def make_my_query(value1: str, value2: int):             #
           | SELECT * FROM table WHERE field1 = ? AND field2 > ?
           | return f'SELECT * FROM table WHERE field1 = {value1} AND
           | field2 > {value2}'              data =
           | connection.execute(*make_my_query(foo, bar), [foo, bar])
        
             | Retr0id wrote:
             | Where's the injection? (you might want to take a look at
             | the example in the issue I opened, it's more complete - I'm
             | not bypassing the DBAPI, I'm working with it)
        
               | chrisjc wrote:
               | Yup. Your code in github explains what you're suggesting
               | better. My bad.
        
         | chrisjc wrote:
         | It's a little difficult to parse through the README, but I
         | believe placeholders and placeholder structure is being
         | rendered in the SQL, not placeholder values.
         | ps = f'SELECT * FROM table WHERE field1 = {q/value1} AND field2
         | > {q/value2}'         # SELECT * FROM table WHERE field1 = ?
         | AND field2 > ?
         | 
         | However, I would imagine that if any external input is passed
         | through to this framework, then there might still be the
         | possibility of SQL injection attacks passing through this
         | framework and ending up in the prepared statement SQL.
        
       | rmbyrro wrote:
       | I failed to see the value.
       | 
       | You don't have to use the modeling part of an ORM. Just the
       | built-in security, session, and connection pool handling is
       | already valuable. And you can already do:
       | 
       | ```
       | 
       | params = [123]
       | 
       | session.query("SELECT * FROM my_table WHERE userID = ?", params)
       | 
       | ```
       | 
       | Why introduce another library just for that?
        
         | bastawhiz wrote:
         | If you have a lot of values to add to a query, they're
         | physically distant from the place you're using them. With the
         | unnamed version like you used, you're relying on your ability
         | to count how many values are used so you get the list correct.
         | With named parameters, you're adding another layer of
         | indirection. Use a value more than once? If you change it or
         | want to remove it, you'd better be sure you do it for each
         | usage of the name in your query!
        
       | ltbarcly3 wrote:
       | https://github.com/justinvanwinkle/Norm
       | 
       | Similar idea, more fleshed out, doesn't require all the weird
       | interpolation stuff and special python functions which map to SQL
       | grammar.
       | 
       | Almost nothing to memorize so you can use the library.
       | 
       | The SQL it outputs is _extremely_ readable and cleanly formatted.
        
       | wodenokoto wrote:
       | I find it really difficult to write parameterized SQL. The
       | problems are many fold:
       | 
       | I want to write a sql query I can deploy against test and prod,
       | so I need to be able to parameterize table names to some extend.
       | Then there are values, as shown here, with all the footguns that
       | entails. But in the end I also want to be able to have IDE
       | niceties while developing. Autocomplete on column names and table
       | names and inline be able to see types and those kind if things.
       | 
       | And I have never seen anything that can give you all those
       | things.
        
         | _a_a_a_ wrote:
         | If you are allowed to create new schema objects then you can
         | create views over the tables. Give the views a consistent name,
         | but create the view to reference the actual table you want.
         | Then update the view.
        
         | sparks1970 wrote:
         | I came across this yesterday for golang: https://sqlc.dev which
         | is somewhat like what you want, maybe.
         | 
         | Not sure it allows you to parameterize table names but the
         | basic idea is codegen from sql queries so you are working with
         | go code (autocompletion etc).
        
       | agubelu wrote:
       | I went through the first example in the README more than a few
       | times, and found it quite difficult to make sense of...
       | 
       | Are the query params stored in `q`? Which is also updated when
       | it's inserted in the query itself? Why does a
       | `sqlbind.Dialect.default()` turn into an `str`?
       | 
       | And... what's the difference between that and normal
       | parameterized queries? It just seems weirder and with a lot of
       | side effect magic going on.
        
         | bastawhiz wrote:
         | It's not a lot of magic, but it's pretty opaque magic. The q
         | object overloads the division operator and returns a ? and
         | stores the "passed" value inside itself. The q object extends
         | either a list or a dictionary to act as the set of values to
         | pass to the parametrized query.
         | 
         | So in effect, the slash behavior is just a spicy way to append
         | or set (depending on which dialect option you choose).
        
           | agubelu wrote:
           | No, I can see that. It's the same as the logical OR operator
           | | computing unions of sets. I just fail to see how this
           | solution is better than a standard parameterized query, it
           | feels much more opaque without any obvious upsides.
        
       | nick238 wrote:
       | > It's really not an easy task to decipher complex SQLAlchemy
       | expression back into SQL
       | 
       | SQLAlchemy to SQL is borderline trivial, you just call
       | `str(statement)` and it will output SQL. [Here's the docpage for
       | if you want to select dialects and optionally inline parameters](
       | https://docs.sqlalchemy.org/en/20/faq/sqlexpressions.html#ho...).
        
         | tda wrote:
         | Yes, but to get the exact query with interpolated parameters
         | that can be copy pasted into e.g. pgadmin for a nice explain
         | analyze visualization is a bit more involved. As I could never
         | remember it I always copy pasted that function in any project.
         | But other than that minor complaint I have always been
         | thoroughly impressed by both SQLAlchemy ORM and the Core query
         | builder, and how they can be combined to have a best of both
         | worlds situation. I have written some very complex queries that
         | remained maintainable thanks to composition with cte's, sub
         | queries and computed properties. Doing that in pure SQL would
         | also have been possible, but using using SQLlachemy allowed me
         | to name and reuse smaller components, avoiding the wall of text
         | SQL statements.
        
       | phs2501 wrote:
       | I had my own take on this concept[1], though with considerably
       | less language magic involved. I imagine there's a lot of these
       | kind of things running around. My criteria were:
       | 
       | a) let me write actual SQL, not a python DSL that generates SQL
       | 
       | b) be placeholder-safe
       | 
       | c) be composable
       | 
       | Though it was somewhat intentionally limited to what I needed to
       | support for my own needs at the time.
       | 
       | [1] https://github.com/bdowning/sql-athame
        
       | maxxk wrote:
       | Interesting approach. But for me the main disadvantage of SQL
       | code embedded in other language is lack of syntax
       | highlighting/checking.
       | 
       | I've had the best experience with embrace-sql which translates
       | sql-files with magic comment-delimited parametrized queries to
       | python-callable modules: https://pypi.org/project/embrace/
        
         | dr_kiszonka wrote:
         | I wish there was a way to get around the syntax highlighting,
         | checking, and completion issue. Pandas has a nice `query`
         | method, which is hard to use for the same reason.
         | 
         | https://pandas.pydata.org/docs/reference/api/pandas.DataFram...
        
       | owlstuffing wrote:
       | Too bad manifold-sql[1] is for java, it does all that and more,
       | particularly the IDE dev productivity stuff.
       | 
       | 1. https://github.com/manifold-
       | systems/manifold/blob/master/man...
        
       ___________________________________________________________________
       (page generated 2024-02-26 23:01 UTC)