[HN Gopher] Vanna.ai: Chat with your SQL database
       ___________________________________________________________________
        
       Vanna.ai: Chat with your SQL database
        
       Author : ignoramous
       Score  : 217 points
       Date   : 2024-01-14 17:58 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | arter4 wrote:
       | I'm curious about how this performs with more complex queries,
       | like joins across five tables.
       | 
       | Also, does the training phase actually involve writing SELECT
       | queries by hand?
       | 
       | In the age of ORMs and so on, many people have probably forgotten
       | how to write raw SQL queries.
        
         | nkozyra wrote:
         | From my experience, GPT-4 will do just as well with joins as
         | without. And that needs no specific, separate SQL training
         | (which I assume tens of thousands of examples are already in).
        
         | teaearlgraycold wrote:
         | > In the age of ORMs and so on, many people have probably
         | forgotten how to write raw SQL queries.
         | 
         | I've heard this general sentiment repeated quite a lot - mostly
         | by people that don't use ORMs. In my experience pretty quickly
         | you reach the limits of even the best ORMs and need to write
         | some queries by hand. And these tend to be the relatively
         | complicated queries. You need to know about all of the
         | different join types, coalescing, having clauses, multiple
         | joins to the same table with where filters, etc.
         | 
         | Not that this makes you a SQL expert but you can't get too far
         | if you don't know SQL.
        
       | account-5 wrote:
       | What I'd really be interested in is being able to describe a
       | problem space and have it generate a schema that models it. I'm
       | actually not that bad at generating my own SQL queries.
        
         | CharlesW wrote:
         | This works pretty well without a dedicated application today,
         | e.g. _" Knowing everything you do about music and music
         | distribution, please define a database schema that supports
         | albums, tracks, and artists"_. If you have additional
         | requirements or knowledge that the response doesn't address,
         | just add it and re-prompt. When you're done, ask for the SQL to
         | set up the schema in your database of choice.
        
           | account-5 wrote:
           | Maybe my prompting needs to improve, I tried recently to get
           | chatgpt to provide a schema for an sqlite database that
           | implements vcard data in a normalised way. I gave up...
        
             | coder543 wrote:
             | ChatGPT-3.5 or ChatGPT-4? There is a big difference.
             | 
             | For fun, I just asked ChatGPT-4 to generate a normalized
             | database representation of vcard information: https://chat.
             | openai.com/share/1c88813c-0a50-4ec6-ba92-4d6ff8...
             | 
             | It seems like a reasonable start to me.
        
               | account-5 wrote:
               | Chatgpt 3.5. Maybe I should pay for a couple of months
               | access to 4 to see the difference. Is it worth the money?
        
               | coder543 wrote:
               | ChatGPT-3.5 isn't even worth touching as an end-user
               | application. Bard is better (due to having some
               | integrations), but it's still barely useful.
               | 
               | ChatGPT-4 is on an another level entirely compared to
               | either 3.5 or Bard. It is actually useful for a lot.
               | 
               | ChatGPT-3.5 can still serve a purpose when you're talking
               | about API automations where you provide all the data in
               | the prompt and have ChatGPT-3.5 help with parsing or
               | transforming it, but not as a complete chat application
               | on its own.
               | 
               | Given the bad experiences ChatGPT-3.5 gives out on a
               | regular basis as a chat application, I don't even know
               | why OpenAI offers it for free. It seems like a net-
               | negative for ChatGPT/OpenAI's reputation.
               | 
               | I think it is worth paying for a month of ChatGPT-4. Some
               | people get more use out of it than others, so it may not
               | be worth it to you to continue, but it's hard for anyone
               | to know just how big of a difference ChatGPT-4 represents
               | when they haven't used it.
               | 
               | I provided a sample of ChatGPT-4's output in my previous
               | response, so you can compare that to your experiences
               | with ChatGPT-3.5.
        
               | account-5 wrote:
               | You sample completely blows away what I got out of 3.5.
               | I'm now wondering if Bing is 3.5 or 4. But will likely
               | fork out for a couple of months.
        
           | simonw wrote:
           | Yeah, GPT-4 is really good at schema design. ChatGPT can even
           | go a step further and create those tables in a SQLite
           | database file for you to download.
        
         | burcs wrote:
         | We actually built something that does this at Outerbase.
         | ob1.outerbase.com it'll generate API endpoints as well, if you
         | need them.
        
       | jedberg wrote:
       | This is awesome. It's a quick turnkey way to get started with RAG
       | using your own existing SQL database. Which to be honest is what
       | most people really want when they say they "want ChatGPT for
       | their business".
       | 
       | They just want a way to ask questions in prose and get an answer
       | back, and this gets them a long way there.
       | 
       | Very cool!
        
       | new_user_final wrote:
       | Does it work with Google/Facebook ads data? Can I ask it to show
       | best performing ads from BigQuery Facebook/Google ads data by
       | supermetrics or improvado.
        
         | sonium wrote:
         | Aren't there already tons of apps answering that specific
         | question? I think the strength of this approach is answering
         | the non-obvious questions.
        
       | osigurdson wrote:
       | I wish we had landed on a better acronym than RAG.
        
         | vinnymac wrote:
         | Every single time I see it, I immediately think of Red Amber
         | Green.
        
         | nightski wrote:
         | It doesn't matter, RAG is very temporary and will not be around
         | long imho.
        
           | sroecker wrote:
           | Care to enlighten us why?
        
             | nkozyra wrote:
             | Most of this stuff is replaced within a calendar year and
             | that will probably accelerate.
        
             | osigurdson wrote:
             | It sounds dumb to me.
        
           | ren_engineer wrote:
           | how else would you get private or recent data into an LLM
           | without some form of RAG? The only aspect that might not be
           | needed is the vector database
        
           | mediaman wrote:
           | RAG, at its core, is a very human way of doing research,
           | because RAG is essentially just building a search mechanism
           | for a reasoning engine. Much like human research.
           | 
           | Your boss asks you to look into something, and you do it
           | through a combination of structured and semantic research.
           | Perhaps you get some books that look relevant, you use search
           | tools to find information, you use structured databases to
           | find data. Then you synthesize it into a response that's
           | useful to answer the question.
           | 
           | People say RAG is temporary, that it's just a patch until
           | "something else" is achieved.
           | 
           | I don't understand what technically is being proposed.
           | 
           | That the weights will just learn everything it needs to know?
           | That is an awful way of knowing things, because it is
           | difficult to update, difficult to cite, difficult to ground,
           | and difficult to precisely manage weights.
           | 
           | That the context windows will get huge so retrieval will be
           | unnecessary? That's an argument about chunking, not
           | retrieval. Perhaps people could put 30,000 pages of documents
           | into the context for every question. But there will always be
           | tradeoffs between size and quality: you could run a smarter
           | model with smaller contexts for the same money, so why, for a
           | given budget, would you choose to stuff a dumber model with
           | enormous quantities of unnecessary information, when you
           | could get a better answer from a higher intelligence using
           | more reasonably sized retrievals at the same cost?
           | 
           | Likewise, RAG is not just vector DBs, but (as in this case)
           | the use of structured queries to analyze information, the use
           | of search mechanisms to find information in giant
           | unstructured corpuses (i.e., the Internet, corporate
           | intranets, etc).
           | 
           | Because RAG is relatively similar to the way organic
           | intelligence conducts research, I believe RAG is here for the
           | long haul, but its methods will advance significantly and the
           | way it gets information will change over time. Ultimately,
           | achieving AGI is not about developing a system that "knows
           | everything," but a system that can reason about anything, and
           | dismissing RAG is to confuse the two objectives.
        
         | bdcravens wrote:
         | Rags are used for cleaning, and this gives you a cleaner
         | interface into your data :-)
        
         | arbot360 wrote:
         | REALM (REtrieval Augmented Language Model) is a better acronym.
        
         | spencerchubb wrote:
         | I'm pretty sure whoever coined the term just wanted to sound
         | smart. Retrieval Augmented Generation is a fancy way to say
         | "put data in the prompt"
        
       | teaearlgraycold wrote:
       | I haven't loaded this up so maybe this has been accounted for,
       | but I think a critical feature is tying the original SQL query to
       | all artifacts generated by Vanna.
       | 
       | Vanna would be helpful for someone that knows SQL when they don't
       | know the existing schema and business logic and also just to save
       | time as a co-pilot. But the users that get the most value out of
       | this are the ones _without_ the ability to validate the generated
       | SQL. Issues will occur - people will give incomplete definitions
       | to the AI, the AI will reproduce some rookie mistake it saw
       | 1,000,000 times in its training data (like failing to realize
       | that by default a UNIQUE INDEX will consider NULL != NULL), etc.
       | At least if all distributed assets can tie back to the query
       | people will be able to retroactively verify the query.
        
       | sighansen wrote:
       | This looks really helpful! I'm working a lot on graph databases
       | and am wondering, if there are similar projects working with say
       | neo4j. I guess because you don't have a schema, the complexity
       | goes up.
        
         | jazzyjackson wrote:
         | neo4j advertises such an integration on their landing page
         | 
         | https://neo4j.com/generativeai/
        
       | altdataseller wrote:
       | What's the origin behind the name Vanna?
        
         | booleandilemma wrote:
         | Vanna White? It's the only Vanna I know.
         | 
         | https://en.wikipedia.org/wiki/Vanna_White
        
       | bob1029 wrote:
       | The most success I had with AI+SQL was when I started feeding
       | errors from the sql provider back to the LLM after each
       | iteration.
       | 
       | I also had a formatted error message wrapper that would strongly
       | suggest querying system tables to discover schema information.
       | 
       | These little tweaks made it scary good at finding queries, even
       | ones requiring 4+ table joins. Even without any examples or fine
       | tuning data.
        
         | echelon wrote:
         | Please turn this into a product. There's enormous demand for
         | that.
        
           | teaearlgraycold wrote:
           | Someone get YC on the phone
        
           | bob1029 wrote:
           | I feel like by the time I could turn it into a product,
           | Microsoft & friends will release something that makes it look
           | like a joke. If there is no one on the SQL Server team
           | working on this right now, I don't know what the hell their
           | leadership is thinking.
           | 
           | I am not chasing this rabbit. Someone else will almost
           | certainly catch it first. For now, this is a fun toy I enjoy
           | in my free time. The moment I try to make money with it the
           | fun begins to disappear.
           | 
           | Broadly speaking, I do think this is approximately the only
           | thing that matters once you realize you can put pretty much
           | anything in a big SQL database. What happens when 100% of the
           | domain is in-scope of an LLM that has iteratively optimized
           | itself against the schema?
        
             | andy_ppp wrote:
             | I will be extremely surprised if Microsoft build this for
             | open source databases, however someone else will definitely
             | build it if you don't, that is completely true :-)
        
               | JelteF wrote:
               | Disclaimer: I work at Microsoft on Postgres related open
               | source tools (Citus & PgBouncer mostly)
               | 
               | Microsoft is heavily investing in Postgres and its
               | ecosystem, so I wouldn't be extremely surprised if we
               | would do this. We're definitely building things to
               | combine AI with Postgres[1]. Although afaik no-one is
               | working actively on query generation using AI.
               | 
               | But I actually did a very basic POC of "natural language
               | queries" in Postgres myself last year:
               | 
               | Conference talk about it:
               | https://youtu.be/g8lzx0BABf0?si=LM0c6zTt8_P1urYC Repo
               | (unmaintained): https://github.com/JelteF/pg_human
               | 
               | 1: https://techcommunity.microsoft.com/t5/azure-database-
               | for-po...
        
             | dcreater wrote:
             | You can just make a GitHub repo with what you have. It'd
             | still be valuable to the community
        
             | whoiscroberts wrote:
             | If the do release it , they will only release it for
             | enterprise. Many many sql server installs are sql server
             | standard. There is an entire ecosystem of companies built
             | on selling packages that support sql server standard, wee
             | DevArt, RedGate.
        
             | personjerry wrote:
             | Wouldn't it be pretty fast to make it as a chatgpt?
        
           | quickthrower2 wrote:
           | Or open source? You could get 10k stars :-)
        
           | mcapodici wrote:
           | I would be tempted to pivot to that! I am working on similar
           | for CSS (see bio) but if that doesn't work out my plan was to
           | pivot to other languages.
        
           | petters wrote:
           | It sounds like pretty standard constructions with OpenAI's
           | API. I have a couple of such iterative scripts myself for
           | bash commands, SQL etc.
           | 
           | But sure, why not!
        
           | SOLAR_FIELDS wrote:
           | There are already several products out there with varying
           | success.
           | 
           | Some findings after I played with it awhile:
           | 
           | - Langchain already does something like this - a lot of the
           | challenge is not with the query itself but efficiently
           | summarizing data to fit in the context window. In other words
           | if you give me 1-4 tables I can give you a product that will
           | work well pretty easy. But when your data warehouse has tens
           | or hundreds of tables with columns and meta types now we need
           | to chain together a string of queries to arrive at the answer
           | and we are basically building a state machine of sorts that
           | has to do fun and creative RAG stuff - the single biggest
           | thing that made a difference in effectiveness was not what op
           | mentioned at all, but instead having a good summary of what
           | every column in the db was stored in the db. This can be AI
           | generated itself, but the way Langchain attempts to do it on
           | the fly is slow and rather ineffective (or at least was the
           | case when I played with it last summer, it might be better
           | now).
           | 
           | Not affiliated, but after reviewing the products out there
           | the data team I was working with ended up selecting getdot.ai
           | as it had the right mix of price, ease of use, and
           | effectiveness.
        
           | l5870uoo9y wrote:
           | You can check this out https://www.sqlai.ai. It has AI-
           | powered generators for:
           | 
           | - Generate SQL
           | 
           | - Generate optimized SQL
           | 
           | - Fix query
           | 
           | - Optimize query
           | 
           | - Explain query
           | 
           | Disclaimer: I am the solo developer behind it.
        
       | holoduke wrote:
       | It would be fun if you could actually train your raw sql and the
       | llm output is the actual answer and not sql commands. In this way
       | its just another language layer on top/in between of sql.
       | Probably hurts efficiency and performance in the long run.
        
       | esafak wrote:
       | The nitty gritty: https://vanna.ai/blog/ai-sql-accuracy.html
        
       | pamelafox wrote:
       | I love that this exists but I worry how it uses the term "train",
       | even in quotes, as I spend a lot of time explaining how RAG works
       | and I try to emphasize that there is no training/fine-tuning
       | involved. Just data preparation, chunking and vectorization as
       | needed.
        
       | jonahx wrote:
       | Is the architecture they use in this diagram currently the best
       | way to train LLMs in general on custom data sets?
       | 
       | https://raw.githubusercontent.com/vanna-ai/vanna/main/img/va...
       | 
       | That is, store your trained custom data in vector db and then use
       | RAG to retrieve relevant content and inject that into the prompt
       | of the LLM the user is querying with?
       | 
       | As opposed to fine tuning or other methods?
        
         | firejake308 wrote:
         | All the podcasts I've been listening to recommend RAG over
         | fine-tuning. My intuition is that having the relevant knowledge
         | in the context rather than the weights brings it closer to the
         | outputs, thereby making it much more likely to provide accurate
         | information and avoid hallucinations/confabulations.
        
           | benjaminwootton wrote:
           | Do you have any podcasts you would reccomend with this type
           | of content?
        
           | zmmmmm wrote:
           | > All the podcasts I've been listening to recommend RAG over
           | fine-tuning
           | 
           | I'm always suspicious that is just because RAG is so much
           | more accessible (both compute wise and in terms of expertise
           | required). There's far more profit in selling something
           | accessible to the masses to a lot of people than something
           | only a niche group of users can do.
           | 
           | I think most people who do actual fine tuning would still
           | probably then use RAG afterwards ...
        
         | ajhai wrote:
         | We can get a lot done with vector db + RAG before having to
         | finetune or custom models. There are a lot of techniques to
         | improve RAG performance. Captured a few of them a while back at
         | https://llmstack.ai/blog/retrieval-augmented-generation.
        
         | 331c8c71 wrote:
         | Yes from what I gather. And just to emphasize there's no LLM
         | (re)training involved at all.
        
       | metflex wrote:
       | that's it, we are going to lose our jobs
        
       | kleiba wrote:
       | Sorry, maybe I'm just too tired to see it, but how much control
       | do you have over the SQL query that is generated by the AI? Is
       | there a risk that it could access unwanted portions or, worse,
       | delete parts of your data? (the AI equivalent of Bobby Tables, so
       | to speak)
        
         | htk wrote:
         | I guess you could limit that with the correct user permissions.
        
         | thih9 wrote:
         | Why not give it access to relevant parts of the database only?
         | And read only access too?
        
         | bob1029 wrote:
         | In some SQL providers, your can define rules that dynamically
         | mask fields, suppress rows, etc. based upon connection-specific
         | details (e.g. user or tenant ID).
         | 
         | So, you could have all connections from the LLM-enabled systems
         | enforce masking of PII, whereas any back-office connections get
         | to see unmasked data. Doing things at this level makes it very
         | difficult to break out of the intended policy framework.
        
         | iuvcaw wrote:
         | Guessing its intended use case is business analytic queries
         | without write permissions --- particularly for non-programmers.
         | I don't think it'd be advisable to use something like this for
         | app logic
        
       | ajhai wrote:
       | We have recently added support to query data from SingleStore to
       | our agent framework, LLMStack
       | (https://github.com/trypromptly/LLMStack). Out of the box
       | performance performance when prompting with just the table
       | schemas is pretty good with GPT-4.
       | 
       | The more domain specific knowledge needed for queries, the harder
       | it has gotten in general. We've had good success `teaching` the
       | model different concepts in relation to the dataset and giving it
       | example questions and queries greatly improved performance.
        
       | hrpnk wrote:
       | Prompts are quite straightforward.
       | 
       | - OpenAI: https://github.com/vanna-
       | ai/vanna/blob/a4cdf7593ac0c584f7d74...
       | 
       | - Mistral: https://github.com/vanna-
       | ai/vanna/blob/a4cdf7593ac0c584f7d74...
        
         | peheje wrote:
         | Many of these AI "products" - Is it just feeding text into LLMs
         | in a structured manner?
        
           | okwhateverdude wrote:
           | Basically, yeah. It is shockingly trivial to do, and yet like
           | playing with alchemy when it comes to the prompting,
           | especially if doing inference on the cheap like running
           | smaller models. They can get distracted in your formatting,
           | ordering, CAPITALIZATION, etc.
        
       | benjaminwootton wrote:
       | I built a demo of something similar, using LlamaIndex to query
       | data as it streamed into ClickHouse.
       | 
       | I think this has a lot of real world potential, particularly when
       | you move between the query and a GenAI task:
       | 
       | https://youtu.be/F3Eup8yQiQQ?si=pa_JrUbBNyvPXlV0
       | 
       | https://youtu.be/7G-VwZ_fC5M?si=TxDQgi-w5f41xRJL
       | 
       | I generally found this worked quite well. It was good at
       | identifying which fields to query and how to build where clauses
       | and aggregations. It could pull off simple joins but started to
       | break down much past there.
       | 
       | I agree with the peer comment that being able to process and
       | respond to error logs would make it more robust.
        
       | breadwinner wrote:
       | I have seen good results from just describing the schema to
       | ChatGPT-4 and then asking it to translate English to SQL. Does
       | this work significantly better?
        
         | SOLAR_FIELDS wrote:
         | That's mostly what the products and libraries around this like
         | llamaindex or Langchain are doing. If you look at the Langchain
         | sql agent all it's doing is chaining together a series of
         | prompts that take the users initial query, attempt to take in a
         | db and discover its schema on the fly and then execute queries
         | against it based on that discovered schema, ensuring the result
         | makes sense.
         | 
         | The tough part is doing this at scale as part of a fully
         | automated solution (picture a slack bot hooked up to your data
         | warehouse that just does all of that for you that you converse
         | with). When you have tens or hundreds of tables with
         | relationships and metadata in that schema and you want your AI
         | to be able to unprompted walk all of them, you're then
         | basically doing some context window shenanigans and building
         | complex state machines to walk that schema
         | 
         | Unfortunately that's kind of what you need if you want to
         | achieve the dream of just having a db that you can ask
         | arbitrary questions to with no other knowledge of sql or how it
         | works. Else the end user has to have some prior knowledge of
         | the schema and db's to get value from the LLM. Which somewhat
         | reduces the audience for said chatbot if you have to do that
        
       | codegeek wrote:
       | I have been keeping track of a few products like these including
       | some that are YC backed. Interesting space as I am looking for a
       | solution myself:
       | 
       | - Minds DB (YC W20) https://github.com/mindsdb/mindsdb
       | 
       | - Buster (YC W24) https://buster.so
       | 
       | - DB Pilot https://dbpilot.io
       | 
       | and now this one
        
         | bredren wrote:
         | Have you written up any results of your experience with each?
         | 
         | I'm interested in a survey of this field so far and would read
         | it.
        
         | pylua wrote:
         | I don't fully understand the use of business case after reading
         | the documentation. Is it really a time save?
        
           | EmilStenstrom wrote:
           | Allow people that don't know SQL to query a database.
        
           | MattGaiser wrote:
           | It would be for people who are not that fluent in SQL. Even
           | as a dev, I find ChatGPT to be easier for writing queries
           | than hand coding them as I do it so infrequently.
        
             | pylua wrote:
             | Yeah, same here. Seems like that approach is much simpler
             | than this.
             | 
             | I guess the real benefit here is that you don't need to
             | understand the schemas so the knowledge is not lost when
             | someone leaves a company.
             | 
             | Sort of an abstraction layer for the schemas
        
           | realanswe91 wrote:
           | In the 1970s SQL was developed, with an easy English-like
           | syntax, such that the average white collar worker (IQ ~115)
           | could use it.
           | 
           | But it's 2024 and the average American white collar worker
           | (IQ ~100) need something that can parse English-like language
           | and tolerate ambiguities and still return some sort of result
           | (hopefully the right one lol).
           | 
           | This might sound like a cynical take but there is a big and
           | growing market in making sure Americans can continue to use
           | technology.
        
         | refset wrote:
         | It's not a public facing product, but there was a talk from a
         | team at Alibaba a couple of months ago during CMU's "ML=DB
         | Seminar Series" [0] on how they augmented their NL2SQL
         | transformer model with "Semantics Correction [...] a post-
         | processing routine, which checks the initially generated SQL
         | queries by applying rules to identify and correct semantic
         | errors" [1]. It will be interesting to see whether VC-backed
         | teams can keep up with the state of the art coming out of
         | BigCorps.
         | 
         | [0] "Alibaba: Domain Knowledge Augmented AI for Databases (Jian
         | Tan)" -
         | https://www.youtube.com/watch?v=dsgHthzROj4&list=PLSE8ODhjZX...
         | 
         | [1] "CatSQL: Towards Real World Natural Language to SQL
         | Applications" - https://www.vldb.org/pvldb/vol16/p1534-fu.pdf
        
         | kszucs wrote:
         | Please add Ibis Birdbrain https://ibis-project.github.io/ibis-
         | birdbrain/ to the list. Birdbrain is an AI-powered data bot,
         | built on Ibis and Marvin, supporting more than 18 database
         | backends.
         | 
         | See https://github.com/ibis-project/ibis and https://ibis-
         | project.org for more details.
        
           | codyvoda wrote:
           | note that Ibis Birdbrain is very much work-in-progress, but
           | should provide an open-source solution to do this w/ 20+
           | backends
           | 
           | old demo here: https://gist.github.com/lostmygithubaccount/08
           | ddf29898732101...
           | 
           | planning to finish it...soon...
        
       | jug wrote:
       | I wonder if this supports spatial queries as in PostGIS,
       | SpatiaLite, SQL Server Spatial as per the OGC standard?
       | 
       | I'm interested in integrating a user friendly natural language
       | query tool for our GIS application.
       | 
       | I've looked at LangChain and the SQL chain before but I didn't
       | feel it was robust enough for professional use. You needed to run
       | an expensive GPT-4 backend to begin with and even then, it wasn't
       | perfect. I think a major part of this is that it wasn't actually
       | trained on the data like Vanna apparently does.
        
       | crimbles wrote:
       | I can't wait until it naively does a table scan on one of our
       | several TB tables...
        
       | swimwiththebeat wrote:
       | I'm curious to see if people have tried this out with their
       | datasets and seen success? I've been using similar techniques at
       | work to build a bot that allows employees internally to talk to
       | our structured datasets (a couple MySQL tables). It works kind of
       | ok in practice, but there are a few challenges:
       | 
       | 1. We have many enums and data types specific to our business
       | that will never be in these foundation models. Those have to be
       | manually defined and fed into the prompt as context also (i.e.
       | the equivalent of adding documentation in Vanna.ai).
       | 
       | 2. People can ask many kinds of questions that are time-related
       | like 'how much demand was there in the past year?'. If you store
       | your data in quarters, how would you prompt engineer the model to
       | take into account the current time AND recognize it's the last 4
       | quarters? This has typically broken for me.
       | 
       | 3. It took a LOT of sample and diverse example SQL queries in
       | order for it to generate the right SQL queries for a set of
       | plausible user questions (15-20 SQL queries for a single MySQL
       | table). Given that users can ask anything, it has to be extremely
       | robust. Requiring this much context for just a single table means
       | it's difficult to scale to tens or hundreds of tables. I'm
       | wondering if there's a more efficient way of doing this?
       | 
       | 4. I've been using the Llama2 70B Gen model, but curious to know
       | if other models work significantly better than this one in
       | generating SQL queries?
        
       | aussieguy1234 wrote:
       | I've already done this with GPT-4.
       | 
       | It goes something like this:
       | 
       | Here's the table structure from MySQL cli `SHOW TABLE` statements
       | for my tables I want to query.
       | 
       | Now given those tables, give me a query to show me my cart
       | abandonment rate (or, some other business metric I want to know).
       | 
       | Seems to work pretty well.
        
       | miohtama wrote:
       | How about instead of making AI wrappers to over 50 years old SQL,
       | we'd make a database query language that's easier to read an
       | write?
        
         | marginalia_nu wrote:
         | In general, if something has been around for a very long time
         | and nobody apparently seems to have thought to improve it, then
         | odds are the reason is it's pretty good and genuinely hard to
         | improve on.
        
           | aae42 wrote:
           | in other words, SQL is a shark, not a dinosaur
        
         | neodymiumphish wrote:
         | My fear with this approach is that the first implementation
         | would be severely handicapped compared to SQL, amd it'd take
         | years to support some one-off need for any organizational user,
         | so it'd never be fully utilized.
        
       | neofrommatrix wrote:
       | I've done this with Neo4j. Pretty simple to hook it up with Open
       | AI APIs and have a conversational interface.
        
       | Vosporos wrote:
       | I can hire a DBA to tell me that my indexes aren't shit, no need
       | for AI.
        
       | l5870uoo9y wrote:
       | Is there a list of SQL generations to see how it performs? This
       | is a list of SQL examples using GTP-4 and the DVDrental database
       | sample.
       | 
       | [1]: https://www.sqlai.ai/sql-examples
       | 
       | [2]: https://www.postgresqltutorial.com/postgresql-getting-
       | starte...
        
       | kulikalov wrote:
       | While I recognize the efforts in developing natural language to
       | SQL translation systems, I remain skeptical. The core of my
       | concern lies in the inherent nature of natural language and these
       | models, which are approximative and lack precision. SQL
       | databases, on the other hand, are built to handle precise,
       | accurate information in most cases. Introducing an approximative
       | layer, such as a language model, into a system that relies on
       | precision could potentially create more problems than it solves,
       | leading me to question the productivity of these endeavors in
       | effectively addressing real-world needs.
        
         | samstave wrote:
         | Reverse idea:
         | 
         | Use this to POPULATE sql based on captured NLP "surveillance"
         | -- for example, build a DB of things I say as my thing listens
         | to me, and categorize things, topics, place, people etc
         | mentioned.
         | 
         | Keep count of experiencing the same things....
         | 
         | When I say I need to "buy thing" build table of frequency for
         | "buy thing" etc...
         | 
         | Effectively - query anything you've said to Alexa and be able
         | to map behaviors/habits/people/things...
         | 
         | If I say - Bob's phone number is BLAH. It add's bob+# to my
         | "random people I met today table" with a note of "we met at the
         | dog park"
        
       ___________________________________________________________________
       (page generated 2024-01-14 23:00 UTC)