[HN Gopher] Show HN: Sqlbind a Python library to compose raw SQL
___________________________________________________________________
Show HN: Sqlbind a Python library to compose raw SQL
Author : bvrmn
Score : 61 points
Date : 2024-02-26 17:22 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| graemep wrote:
| I think the point about not know what SQL an ORM generated query
| is the one that resonates with me, as does the dig at Django ORM.
|
| I love Django, including aspects of the ORM (its very easy to
| write schema, migrations are pretty easy to use, even queries are
| very concise and easy to write) but I have no idea what the SQL
| will look like except in the simplest cases.
|
| I often do not particularly care, and it is easy enough to see
| them when I do, but I feel a bit icky about the extra layer of
| stuff I do not really understand.
| westurner wrote:
| "Is ORM still an anti-pattern?" (2023)
|
| > _In truth the best way to do the data layer is to use stored
| procs with a generated binding at the application layer. This
| is absolutely safe from injection, and is wicked fast as well._
|
| [Who must keep stored procedures in sync with migrations,]
|
| https://news.ycombinator.com/item?id=36497613#36503998
| pmontra wrote:
| I used the stored procedures approach sometimes. The usual
| criticisms are
|
| 1. Almost nobody in the team knows SQL, stored procedures
| more so.
|
| 2. It's not possible to deploy and test stored procedures
| with the usual tooling. Actually I don't know what's the
| usual way to do it in projects that go with that approach.
| _a_a_a_ wrote:
| Perhaps I can help, or maybe learn something.
|
| I don't really understand the confusion here, you write a
| stored procedure, you document it, test it, peer review it,
| then stuff it into the database. You then write code that
| uses it. No sarcasm, but what's the problem?
| robertlagrant wrote:
| > you ... test it
|
| I think they're referring to ongoing automated tests, and
| not one-off tests.
| pmontra wrote:
| Yes, what is the normal way to unit test stored
| procedures? Dev teams used to Node, Python, Ruby or
| whatever language have tools that work with ORMs to setup
| a test database, run tests and zero it. I did wrote some
| stored procedures in a Rails migration many years ago but
| that's not the place where one would normally write code.
| Furthermore there will be many migrations like that,
| almost one for every change to the stored procedure. The
| problem is in the frameworks, that are not designed for
| that. So, what's the recommended way to handle them?
| _a_a_a_ wrote:
| Right, I'm starting to get the picture. I've never worked
| with an ORM as my SQL is pretty solid. I have to put my
| hands up here and say I don't know, sorry.
| jiaaro wrote:
| If you use postgres, psycopg provides `cursor.mogrify(query,
| params)` which returns the actual query that will be executed .
| For example:
| cursor.mogrify(*queryset.query.sql_with_params())
|
| Alternatively, you can set the log level of the
| `django.db.backends` logger to DEBUG to see all executed
| queries
| westurner wrote:
| https://docs.djangoproject.com/en/5.0/topics/db/sql/ :
|
| > _Django gives you two ways of performing raw SQL queries:
| you can use `Manager.raw()` to perform raw queries and return
| model instances, or you can avoid the model layer entirely
| and execute custom SQL directly._
|
| https://stackoverflow.com/questions/1074212/how-can-i-see-
| th... has :
| MyModel.objects.all().query.sql_with_params()
| str(MyModel.objects.all().query)
|
| And: from django.db import connection
| from myapp.models import SomeModel queryset =
| SomeModel.objects.filter(foo='bar') sql_query, params =
| queryset.query.as_sql(None, connection) with
| connection.connection.cursor(cursor_factory=DictCursor) as
| cursor: cursor.execute(sql_query, params)
| data = cursor.fetchall()
|
| But that's still not backend-specific SQL?
|
| There should be an interface method for this. Why does
| psycopg call it mogrify?
|
| https://django-debug-
| toolbar.readthedocs.io/en/latest/panels... :
|
| > debug_toolbar.panels.sql.SQLPanel: _SQL queries including
| time to execute and links to EXPLAIN each query_
|
| But debug toolbars mostly don't work with APIs.
|
| https://github.com/django-query-profiler/django-query-
| profil... :
|
| > _Django query profiler - one profiler to rule them all.
| Shows queries, detects N+1 and gives recommendations on how
| to resolve them_
|
| https://github.com/jazzband/django-silk :
|
| > _Silk is a live profiling and inspection tool for the
| Django framework. Silk intercepts and stores HTTP requests
| and database queries before presenting them in a user
| interface for further inspection_
| jiaaro wrote:
| `str(queryset.query)` does not give you executable SQL.
| Query parameters are not escaped or encoded in that
| representation of the query. That escaping/interpolation of
| query parameters is performed by mogrify. I agree the name
| is odd, and I don't know why they use it other than
| "transmogrify" being an obscure word for "magically
| transform".
|
| debug_toolbar and silk are both tools that show you what
| queries were executed by your running application. They're
| both good tools, but neither quite solves the problem of
| giving you the executable SQL for a specific queryset
| (e.g., in a repl).
| westurner wrote:
| There are a number of solutions listed in that
| stackoverflow post for logging queries, but AFAIU none
| call mogrify().
|
| Maybe it should be called dialectize() or
| to_sql(parametrized=True)
| WesleyJohnson wrote:
| I really wish Django would let you write raw queries, but wrap
| them in an object that still allows `.annotate()` and
| `.filter`, etc. I know you can do Model.objects.raw(), but I'm
| talking about a more complex query with nested joins, etc.
|
| I understand why it doesn't allow that, and probably never can,
| but it would be nice when you need author OLAP queries but want
| to continue to use as much of the ORM as you can.
| specialist wrote:
| (I'm not a Python dev. Sorry.)
|
| Guessing .annotation() for runtime type info? .filter()
| avoids re-executing query just to get a subset?
|
| How do the (canonical utility methods) relate to nested
| JOINs? Because field types of nested sets aren't visible?
|
| TIA.
| agumonkey wrote:
| there Queryset.query but it's not obvious unless you read every
| line of the docs
| airstrike wrote:
| that's a pretty basic thing to spot in the docs and Django
| docs are actually godtier
| agumonkey wrote:
| I have 90% allergy to their doc style. The only worse are
| old javadocs. I like python stdlib model, so I assumed it
| would be as nice.. but here it's a blend of newb info (oh
| look Forms!) and partial technical description. To the
| point that every time you ask something online.. people
| tell you straight away try ccbv or "read django source",
| which should be the first things listed on their website.
| jd3 wrote:
| also manage.py shell_plus --print-sql will print the sql
| after each command you enter
|
| you can also print qs.query, qs.explain("analyze"), etc etc
| JodieBenitez wrote:
| > but I have no idea what the SQL will look like
|
| Why don't you print QuerySet.query ?
| airstrike wrote:
| > but I have no idea what the SQL will look like except in the
| simplest cases
|
| that's really on you as Django makes it very easy to find the
| answer to that because Python makes it very easy to inspect
| objects at runtime
| graemep wrote:
| that is what i meant by 'easy enough to see them'
| kissgyorgy wrote:
| The proposed API is really strange and not intuitive, there is a
| lot of black magic going on. q/email? which just returns a
| question mark and the value storage happens in q. Usually when
| designing a library, explicitness and the Principle of Least
| Surprise are very good principles. Don't write smart code, write
| straightforward code!
| bvrmn wrote:
| Unfortunately straightforward code leads to spaghetti mess in
| my domain: big reporting queries. API solves an issue of
| binding a value in exact place of raw SQL query.
| raphinou wrote:
| Interesting to see this as I recently did a show HN without much
| success about an F# library also handling raw sql.
|
| I enjoy working with SQL directly, though I understand it has
| pros and cons. If you need specially crafted queries, it is a
| must though.
|
| My lib is at https://monazita.gitlab.io/monazita/ and show HN is
| at https://news.ycombinator.com/item?id=39467742
| nicolaslem wrote:
| Isn't using f-strings to build SQL queries a large footgun? I can
| imagine a junior developer introducing a SQL injection by not
| completely grasping the sqlbind API.
|
| Using the quickstart example, visually both lines are very
| similar and both are functional, but one of them may introduce a
| SQL injection: sql = f'SELECT * FROM table
| WHERE field1 = {q/value1} AND field2 > {q/value2}' sql =
| f'SELECT * FROM table WHERE field1 = {value1} AND field2 >
| {value2}'
| hombre_fatal wrote:
| I don't use Python so can't be bothered to read the source but
| the general solution here is to make regular string
| interpolation fail.
|
| e.g. You must use the Q mechanism and doesn't work if you
| forget it.
| dougk wrote:
| f-strings are part of the language so you need to change the
| interpreter to make that fail. That will not play nice with
| the rest of the codebase.
| gpapilion wrote:
| Any string interpolation would open you up to sql injection
| without a lot of care. It generally recommended to used
| prepared statements.
|
| I guess the q part is the query parameter for the prepared
| statement. I'd still be a bit antsy here given the ease of a
| mistake.
|
| I just don't get what it's providing over using my db's module.
| Usually the dialect is close but different enough to require
| changes to queries.
| Retr0id wrote:
| Proposed safe(r) API via decorated methods:
| QParams = sqlbind.Dialect.some_dialect @QParams
| def make_my_query(value1: str, value2: int): return
| f'SELECT * FROM table WHERE field1 = {value1} AND field2 >
| {value2}' data =
| connection.execute(*make_my_query(foo, bar))
|
| Obviously this wouldn't work as-is, QParams would need to be
| modified to support this. The decorator would wrap the method,
| sanitising the args before they're even passed to the wrapped
| function.
|
| Edit: actually, I might've misunderstood what sqlbind is doing
| internally, maybe this approach doesn't quite make sense
|
| Edit 2: Like so: https://github.com/baverman/sqlbind/issues/1
| chrisjc wrote:
| If I understand your code correctly, you are bypassing
| DBAPI's binding functionality and opening yourself up for SQL
| injection. QParams =
| sqlbind.Dialect.some_dialect @QParams
| def make_my_query(value1: str, value2: int): #
| SELECT * FROM table WHERE field1 = ? AND field2 > ?
| return f'SELECT * FROM table WHERE field1 = {value1} AND
| field2 > {value2}' data =
| connection.execute(*make_my_query(foo, bar), [foo, bar])
| Retr0id wrote:
| Where's the injection? (you might want to take a look at
| the example in the issue I opened, it's more complete - I'm
| not bypassing the DBAPI, I'm working with it)
| chrisjc wrote:
| Yup. Your code in github explains what you're suggesting
| better. My bad.
| chrisjc wrote:
| It's a little difficult to parse through the README, but I
| believe placeholders and placeholder structure is being
| rendered in the SQL, not placeholder values.
| ps = f'SELECT * FROM table WHERE field1 = {q/value1} AND field2
| > {q/value2}' # SELECT * FROM table WHERE field1 = ?
| AND field2 > ?
|
| However, I would imagine that if any external input is passed
| through to this framework, then there might still be the
| possibility of SQL injection attacks passing through this
| framework and ending up in the prepared statement SQL.
| rmbyrro wrote:
| I failed to see the value.
|
| You don't have to use the modeling part of an ORM. Just the
| built-in security, session, and connection pool handling is
| already valuable. And you can already do:
|
| ```
|
| params = [123]
|
| session.query("SELECT * FROM my_table WHERE userID = ?", params)
|
| ```
|
| Why introduce another library just for that?
| bastawhiz wrote:
| If you have a lot of values to add to a query, they're
| physically distant from the place you're using them. With the
| unnamed version like you used, you're relying on your ability
| to count how many values are used so you get the list correct.
| With named parameters, you're adding another layer of
| indirection. Use a value more than once? If you change it or
| want to remove it, you'd better be sure you do it for each
| usage of the name in your query!
| ltbarcly3 wrote:
| https://github.com/justinvanwinkle/Norm
|
| Similar idea, more fleshed out, doesn't require all the weird
| interpolation stuff and special python functions which map to SQL
| grammar.
|
| Almost nothing to memorize so you can use the library.
|
| The SQL it outputs is _extremely_ readable and cleanly formatted.
| wodenokoto wrote:
| I find it really difficult to write parameterized SQL. The
| problems are many fold:
|
| I want to write a sql query I can deploy against test and prod,
| so I need to be able to parameterize table names to some extend.
| Then there are values, as shown here, with all the footguns that
| entails. But in the end I also want to be able to have IDE
| niceties while developing. Autocomplete on column names and table
| names and inline be able to see types and those kind if things.
|
| And I have never seen anything that can give you all those
| things.
| _a_a_a_ wrote:
| If you are allowed to create new schema objects then you can
| create views over the tables. Give the views a consistent name,
| but create the view to reference the actual table you want.
| Then update the view.
| sparks1970 wrote:
| I came across this yesterday for golang: https://sqlc.dev which
| is somewhat like what you want, maybe.
|
| Not sure it allows you to parameterize table names but the
| basic idea is codegen from sql queries so you are working with
| go code (autocompletion etc).
| agubelu wrote:
| I went through the first example in the README more than a few
| times, and found it quite difficult to make sense of...
|
| Are the query params stored in `q`? Which is also updated when
| it's inserted in the query itself? Why does a
| `sqlbind.Dialect.default()` turn into an `str`?
|
| And... what's the difference between that and normal
| parameterized queries? It just seems weirder and with a lot of
| side effect magic going on.
| bastawhiz wrote:
| It's not a lot of magic, but it's pretty opaque magic. The q
| object overloads the division operator and returns a ? and
| stores the "passed" value inside itself. The q object extends
| either a list or a dictionary to act as the set of values to
| pass to the parametrized query.
|
| So in effect, the slash behavior is just a spicy way to append
| or set (depending on which dialect option you choose).
| agubelu wrote:
| No, I can see that. It's the same as the logical OR operator
| | computing unions of sets. I just fail to see how this
| solution is better than a standard parameterized query, it
| feels much more opaque without any obvious upsides.
| nick238 wrote:
| > It's really not an easy task to decipher complex SQLAlchemy
| expression back into SQL
|
| SQLAlchemy to SQL is borderline trivial, you just call
| `str(statement)` and it will output SQL. [Here's the docpage for
| if you want to select dialects and optionally inline parameters](
| https://docs.sqlalchemy.org/en/20/faq/sqlexpressions.html#ho...).
| tda wrote:
| Yes, but to get the exact query with interpolated parameters
| that can be copy pasted into e.g. pgadmin for a nice explain
| analyze visualization is a bit more involved. As I could never
| remember it I always copy pasted that function in any project.
| But other than that minor complaint I have always been
| thoroughly impressed by both SQLAlchemy ORM and the Core query
| builder, and how they can be combined to have a best of both
| worlds situation. I have written some very complex queries that
| remained maintainable thanks to composition with cte's, sub
| queries and computed properties. Doing that in pure SQL would
| also have been possible, but using using SQLlachemy allowed me
| to name and reuse smaller components, avoiding the wall of text
| SQL statements.
| phs2501 wrote:
| I had my own take on this concept[1], though with considerably
| less language magic involved. I imagine there's a lot of these
| kind of things running around. My criteria were:
|
| a) let me write actual SQL, not a python DSL that generates SQL
|
| b) be placeholder-safe
|
| c) be composable
|
| Though it was somewhat intentionally limited to what I needed to
| support for my own needs at the time.
|
| [1] https://github.com/bdowning/sql-athame
| maxxk wrote:
| Interesting approach. But for me the main disadvantage of SQL
| code embedded in other language is lack of syntax
| highlighting/checking.
|
| I've had the best experience with embrace-sql which translates
| sql-files with magic comment-delimited parametrized queries to
| python-callable modules: https://pypi.org/project/embrace/
| dr_kiszonka wrote:
| I wish there was a way to get around the syntax highlighting,
| checking, and completion issue. Pandas has a nice `query`
| method, which is hard to use for the same reason.
|
| https://pandas.pydata.org/docs/reference/api/pandas.DataFram...
| owlstuffing wrote:
| Too bad manifold-sql[1] is for java, it does all that and more,
| particularly the IDE dev productivity stuff.
|
| 1. https://github.com/manifold-
| systems/manifold/blob/master/man...
___________________________________________________________________
(page generated 2024-02-26 23:01 UTC)