[HN Gopher] Getting AI to write good SQL: Text-to-SQL techniques...
       ___________________________________________________________________
        
       Getting AI to write good SQL: Text-to-SQL techniques explained
        
       Author : richards
       Score  : 68 points
       Date   : 2025-05-16 21:10 UTC (1 hours ago)
        
 (HTM) web link (cloud.google.com)
 (TXT) w3m dump (cloud.google.com)
        
       | mritchie712 wrote:
       | the short answer: use a semantic layer.
       | 
       | It's the cleanest way to give the right context and the best
       | place to pull a human in the loop.
       | 
       | A human can validate and create all important metrics (e.g. what
       | does "monthly active users" really mean) then an LLM can use that
       | metric definition whenever asked for MAU.
       | 
       | With a semantic layer, you get the added benefit of writing
       | queries in JSON instead of raw SQL. LLM's are much more
       | consistent at writing a small JSON vs. hundreds of lines of SQL.
       | 
       | We[0] use cube[1] for this. It's the best open source semantic
       | layer, but there's a couple closed source options too.
       | 
       | My last company wrote a post on this in 2021[2]. Looks like the
       | acquirer stopped paying for the blog hosting, but the HN post is
       | still up.
       | 
       | 0 - https://www.definite.app/
       | 
       | 1 - https://cube.dev/
       | 
       | 2 - https://news.ycombinator.com/item?id=25930190
        
         | ljm wrote:
         | > you get the added benefit of writing queries in JSON instead
         | of raw SQL.
         | 
         | I'm sorry, I can't. The tail is wagging the dog.
         | 
         | dang, can you delete my account and scrub my history? I'm
         | serious.
        
           | dangscientist wrote:
           | my friend, dang only responds to naked aggression (at least)
        
           | fhkatari wrote:
           | You move all the tools to debug and inspect slow queries, in
           | a completely unsupported JSON environment, with prompts not
           | to make up column names. And this is progress?
        
       | westurner wrote:
       | From "Show HN: We open sourced our entire text-to-SQL product"
       | (2024) https://news.ycombinator.com/item?id=40456236 :
       | 
       | > _awesome-Text2SQL:https://github.com/eosphoros-ai/Awesome-
       | Text2SQL _
       | 
       | > _Awesome-code-llm > Benchmarks > Text to SQL:
       | https://github.com/codefuse-ai/Awesome-Code-LLM#text-to-sql _
        
       | cloudking wrote:
       | This is pretty simple in any foundation model, provide a well
       | commented schema and ask for the query
        
         | tibbar wrote:
         | Step 1: Your schema has thousands of tables and there aren't
         | many comments.
         | 
         | Step 2...
        
           | john2x wrote:
           | Use AI to generate the comments of course
        
             | cloudking wrote:
             | Exactly, add any documentation you have about the app for
             | more context too.
        
         | fsndz wrote:
         | the smolagents library is also pretty nice to do the
         | scaffolding around the model. Text to sql seems simple in
         | demos, but to make it work in real life complex cases is very
         | hard: https://medium.com/thoughts-on-machine-learning/build-a-
         | text...
        
         | quantadev wrote:
         | I agree. There's really no magic to it any more. The table
         | create DDL commands are very a very precise description of the
         | tables, so almost nothing more is ever needed. You can just
         | describe in detail what query you need, and any modern SOTA
         | model can do it just fine.
        
       | mousetree wrote:
       | Out of all the AI tools and models I've tried, the most
       | disappointing is the Gemini built into BigQuery. Despite having
       | well named columns with good descriptions it consistently gets
       | nowhere close to solving the problem.
        
         | quantadev wrote:
         | Having proper constraints and foreign keys that are clear is
         | generally all that's needed in my experience. Are you sure your
         | tables have well defined constraints, so that the AI can be
         | absolutely 100% sure how everything links up? SQL is very
         | precise, but only if you're utilizing constraints and foreign
         | key definitions well.
        
         | flysand7 wrote:
         | Having written more SQL than any other programming language by
         | now, every time I've tried to use AI to write the query for me,
         | I'd spend way more time getting the output right than if I'd
         | just written it myself.
         | 
         | As a quick aside there's one thing I wish SQL had that would
         | make writing queries so much faster. At work we're using a DSL
         | that has one operator that automatically generates joins from
         | foreign key columns, just like
         | credit.CLIENT->NAME
         | 
         | And you got clients table automatically joined into the query.
         | Having to write ten to twenty joins for every query is by far
         | the worst thing, everything else about writing SQL is not that
         | bad.
        
       | nashashmi wrote:
       | AI text to regex solutions would be incredibly handy.
        
       ___________________________________________________________________
       (page generated 2025-05-16 23:00 UTC)