[HN Gopher] Getting AI to write good SQL: Text-to-SQL techniques...
___________________________________________________________________
Getting AI to write good SQL: Text-to-SQL techniques explained
Author : richards
Score : 68 points
Date : 2025-05-16 21:10 UTC (1 hours ago)
(HTM) web link (cloud.google.com)
(TXT) w3m dump (cloud.google.com)
| mritchie712 wrote:
| the short answer: use a semantic layer.
|
| It's the cleanest way to give the right context and the best
| place to pull a human in the loop.
|
| A human can validate and create all important metrics (e.g. what
| does "monthly active users" really mean) then an LLM can use that
| metric definition whenever asked for MAU.
|
| With a semantic layer, you get the added benefit of writing
| queries in JSON instead of raw SQL. LLM's are much more
| consistent at writing a small JSON vs. hundreds of lines of SQL.
|
| We[0] use cube[1] for this. It's the best open source semantic
| layer, but there's a couple closed source options too.
|
| My last company wrote a post on this in 2021[2]. Looks like the
| acquirer stopped paying for the blog hosting, but the HN post is
| still up.
|
| 0 - https://www.definite.app/
|
| 1 - https://cube.dev/
|
| 2 - https://news.ycombinator.com/item?id=25930190
| ljm wrote:
| > you get the added benefit of writing queries in JSON instead
| of raw SQL.
|
| I'm sorry, I can't. The tail is wagging the dog.
|
| dang, can you delete my account and scrub my history? I'm
| serious.
| dangscientist wrote:
| my friend, dang only responds to naked aggression (at least)
| fhkatari wrote:
| You move all the tools to debug and inspect slow queries, in
| a completely unsupported JSON environment, with prompts not
| to make up column names. And this is progress?
| westurner wrote:
| From "Show HN: We open sourced our entire text-to-SQL product"
| (2024) https://news.ycombinator.com/item?id=40456236 :
|
| > _awesome-Text2SQL:https://github.com/eosphoros-ai/Awesome-
| Text2SQL _
|
| > _Awesome-code-llm > Benchmarks > Text to SQL:
| https://github.com/codefuse-ai/Awesome-Code-LLM#text-to-sql _
| cloudking wrote:
| This is pretty simple in any foundation model, provide a well
| commented schema and ask for the query
| tibbar wrote:
| Step 1: Your schema has thousands of tables and there aren't
| many comments.
|
| Step 2...
| john2x wrote:
| Use AI to generate the comments of course
| cloudking wrote:
| Exactly, add any documentation you have about the app for
| more context too.
| fsndz wrote:
| the smolagents library is also pretty nice to do the
| scaffolding around the model. Text to sql seems simple in
| demos, but to make it work in real life complex cases is very
| hard: https://medium.com/thoughts-on-machine-learning/build-a-
| text...
| quantadev wrote:
| I agree. There's really no magic to it any more. The table
| create DDL commands are very a very precise description of the
| tables, so almost nothing more is ever needed. You can just
| describe in detail what query you need, and any modern SOTA
| model can do it just fine.
| mousetree wrote:
| Out of all the AI tools and models I've tried, the most
| disappointing is the Gemini built into BigQuery. Despite having
| well named columns with good descriptions it consistently gets
| nowhere close to solving the problem.
| quantadev wrote:
| Having proper constraints and foreign keys that are clear is
| generally all that's needed in my experience. Are you sure your
| tables have well defined constraints, so that the AI can be
| absolutely 100% sure how everything links up? SQL is very
| precise, but only if you're utilizing constraints and foreign
| key definitions well.
| flysand7 wrote:
| Having written more SQL than any other programming language by
| now, every time I've tried to use AI to write the query for me,
| I'd spend way more time getting the output right than if I'd
| just written it myself.
|
| As a quick aside there's one thing I wish SQL had that would
| make writing queries so much faster. At work we're using a DSL
| that has one operator that automatically generates joins from
| foreign key columns, just like
| credit.CLIENT->NAME
|
| And you got clients table automatically joined into the query.
| Having to write ten to twenty joins for every query is by far
| the worst thing, everything else about writing SQL is not that
| bad.
| nashashmi wrote:
| AI text to regex solutions would be incredibly handy.
___________________________________________________________________
(page generated 2025-05-16 23:00 UTC)