[HN Gopher] Libgsqlite: A SQLite extension which loads a Google ...
       ___________________________________________________________________
        
       Libgsqlite: A SQLite extension which loads a Google Sheet as a
       virtual table
        
       Author : x2bool
       Score  : 246 points
       Date   : 2023-03-18 11:24 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | charlie0 wrote:
       | I could see this being very useful. I uploaded a 70MB xls file to
       | Sheets and Chrome says that tab is 600MB to 800MB of RAM. I ran a
       | Find operation and that spiked up to 1.2GB of RAM. How 70MB of
       | data turns into over 1GB of RAM is beyond me.
        
       | vaughan wrote:
       | Spreadsheets are rather peculiar. I don't think anyone would come
       | up with the concept of them today.
       | 
       | This "fill formula down/right" feature is so strange. A developer
       | would be like: let's just set a formula for the entire column,
       | and make it refer to another column. And let's name the columns
       | something. Like Airtable.
       | 
       | But its surpringly flexible and hackable.
       | 
       | I wish there was something like GSheets + Airtable.
        
         | anaisconce wrote:
         | Sounds like you're describing Grist! And it's open source.
         | https://github.com/gristlabs/grist-core
        
         | rs999gti wrote:
         | > I wish there was something like GSheets + Airtable
         | 
         | Smartsheet.com ?
        
         | gregwebs wrote:
         | It's called coda.io I use it for small data and I love it.
        
         | dsagal wrote:
         | Grist is it, in fact "GSheets + Airtable" is how people often
         | see it. From https://blog.appsumo.com/airtable-
         | alternatives/#3_Grist: "Grist has been described as if Google
         | Sheets + Microsoft Excel + Airtable had a beautiful baby."
        
         | nonethewiser wrote:
         | > let's just set a formula for the entire column, and make it
         | refer to another column. And let's name the columns something.
         | 
         | I don't understand because you obviously can do this with
         | sheets/excel
        
           | srcreigh wrote:
           | Enter the formula on row 1, copy it, select the column and
           | paste
        
         | aarondia wrote:
         | If when you say GSheets + Airtable, you're looking for a
         | spreadsheet that is designed for data analysis +a let's you
         | name columns and sets formulas to the entire column, then
         | checkout Mito (https://www.trymito.io). Disclosure: I created
         | Mito.
         | 
         | Mito doesn't have real time collaboration though. That is, not
         | until Jupyter real time collaboration becomes more popular
         | (https://jupyterlab.readthedocs.io/en/stable/user/rtc.html)
        
           | vaughan wrote:
           | I think the real test is: can it quickly build an ad-hoc
           | financial model. This is the bread and butter of
           | spreadsheets.
           | 
           | Mito doesn't look like it keeps the flexibility of a
           | spreadsheet in that way.
        
         | tylerchurch wrote:
         | > let's just set a formula for the entire column, and make it
         | refer to another column. And let's name the columns something
         | 
         | If you turn a worksheet section into a proper Table in excel,
         | it does exactly this.
        
         | Jakob wrote:
         | > set a formula for the entire column
         | 
         | Spreadsheet power users on both excel and sheets do exactly
         | that. It's called array formula. Press cmd-shift-enter on the
         | cell with the formula and it's valid for the entire column and
         | can be changed in one central place.
        
         | airstrike wrote:
         | Hah, it's you again! I keep bumping into you on these threads
         | and feeling like I'm reading my own words in your comments.
         | Yes, spreadsheets are beautiful. Most developers are missing
         | the vision (and I think they missed the point in your comment
         | too)
        
         | mpweiher wrote:
         | > let's just set a formula for the entire column, and make it
         | refer to another column
         | 
         | Meet Lotus Improv (1991):
         | https://en.wikipedia.org/wiki/Lotus_Improv
         | 
         | I think the patents have expired now.
        
           | bargle0 wrote:
           | Improv brought me joy. There isn't anything like it.
        
             | recuter wrote:
             | I read this many times on HN. As far as I can tell Excel
             | copied most of it about a decade later with pivot tables.
        
               | geocar wrote:
               | Sort of. Improv lets you work in the pivot though, like
               | that's the main view, and the ramifications of that are
               | huge.
               | 
               | - You don't have to make your sum/avg/analytic rows
               | because Improv does that automatically. Yes, Excel does
               | this on the pivot tables, but because you can't work
               | there, a lot of people make a SUM(B2:B999) or something
               | to make summary data more accessible.
               | 
               | - Your formulas look simple like
               | ConversionRate=Total:Closes/Total:Leads instead of
               | =SUM(A2:A99)/SUM(B2:B99) and sure I know there's a _way_
               | to make Excel do something like that, but nobody does,
               | which is one thing that makes it hard to mix stuff
               | between multiple documents in Excel -- someone moves the
               | lead count on the lead sheet and it blows everything up
               | back at headquarters. In Improv if someone renamed Leads
               | to Pipe or something it 'd break and you'd see that right
               | away with much less chaos.
               | 
               | And so on
        
       | fulafel wrote:
       | In case you don't read the whole README, note this in
       | limitations:
       | 
       | > The extension will load the spreadsheet only once while
       | creating a virtual table. If you want to pick up recent changes,
       | drop the table and create it again.
        
         | password4321 wrote:
         | Thanks I was wondering about this, didn't want to trigger usage
         | limits.
        
         | factormeta wrote:
         | That is along is already great! Could be a way to slowly
         | migrate off google spread sheets.
        
       | dig1 wrote:
       | > INSERT, UPDATE and DELETE statements won't be implemented.
       | Welcome PRs.
       | 
       | Adding support for modifying Google Sheets would be a game
       | changer. I see many (SQL-like) extensions that will query GS, but
       | none can change it.
        
       | nathanwallace wrote:
       | In a similar vein, Steampipe [1] has a Google Sheets plugin [2].
       | It uses Postgres Foreign Data Wrappers instead of SQLite as the
       | virtual table engine. (Disclaimer: I'm a lead on this open source
       | project.)
       | 
       | 1 - https://github.com/turbot/steampipe 2 -
       | https://github.com/turbot/steampipe-plugin-googlesheets
        
       | zX41ZdbW wrote:
       | In ClickHouse, querying Google Sheets works out of the box, no
       | plugins required:                   select * from url('https://do
       | cs.google.com/spreadsheets/d/1XGCy0tYU5YcEouO09_ErZIyqjA-
       | VJ4pidLZmMmJkEdk/gviz/tq?tqx=out:csv&sheet=Sheet1&range=A:C',
       | CSVWithNames)
        
       | jmathai wrote:
       | I've been using google sheets as the backend database storage for
       | a few recent projects.
       | 
       | Using SQLite as the interface instead of the sheets API will be
       | so nice.
       | 
       | Btw - highly recommend sheets as a backend. For my recent
       | projects it has been such a time saver.
        
       | xordux wrote:
       | I get it. Google Sheets costs you Zero money.
       | 
       | BUT you can get a proper SQL managed instance(with much faster
       | query, larger data store, Geo-replication, point-in-time backup
       | etc) for less than $5 a month.
       | 
       | Do your due diligence, identify your needs, compare the price of
       | engineering hours with $5/month and go ahead with whatever suites
       | you the best.
        
         | sokoloff wrote:
         | Sheets has a human-friendly, readily-understood, multi-user
         | capable user interface.
         | 
         | You're likely correct that $5/mo isn't the barrier to
         | switching.
        
           | xordux wrote:
           | 100% agree, user-friendliness is better in Sheets.
        
           | jeffbee wrote:
           | Sheets are also geo-replicated and have infinite version
           | history. I seriously doubt that anyone has ever lost data in
           | Sheets.
        
             | DANmode wrote:
             | But are they always aware when correct version isn't
             | current version?
        
       | quadcore wrote:
       | Someone has had the idea to plug all that into GPT? Like "please
       | take those data and tell me what's the average price of our
       | customers' basquets"?
        
       | victorbjorklund wrote:
       | This is perfect!
        
       | [deleted]
        
       | mixcocam wrote:
       | You can just use curl to get the csv from the gayest and pipe
       | into SQLite using the .import command.
       | 
       | Not sure what this has on top of that.
        
         | cristoperb wrote:
         | I wrote a little cli[1] that is a bit easier than curl for
         | getting data to and from google sheets as csv. I do think this
         | sqlite extension would be even more convenient if it supported
         | writing data with INSERT/UPDATE.
         | 
         | 1: https://github.com/cristoper/gsheet
        
           | samstave wrote:
           | This needs its own post
        
         | fulafel wrote:
         | For one, there's no CSV standard and often csv based data
         | exchange methods have lurking (and/or obvious) correctness
         | problems. Maybe gsheets and sqlite are lucky there, maybe not.
         | 
         | This project is also well documented including the gsheets side
         | (which is quite non-obvious).
        
           | cosmojg wrote:
           | > For one, there's no CSV standard
           | 
           | This is false: https://www.rfc-editor.org/rfc/rfc4180
        
             | fulafel wrote:
             | Well yes, there are actually many many CSV standards, my
             | mistake for imprecise language. What I meant to say that
             | there's no single CSV format that programs agree on.
             | Starting with Excel...
             | 
             | Apparently it's quite common in Gsheet land to base your
             | CSV export around things like this copypasta javascript
             | code and customize as needed (follow forks, in parent
             | direction too, for more of the story) https://gist.github.c
             | om/mrkrndvs/a2c8ff518b16e9188338cb809e0...
        
           | samstave wrote:
           | >>" _(which is quite non-obvious)._ "
           | 
           | ELI5 what this means, please? (unpack this for the ignorant
           | among us, plz)
        
             | fulafel wrote:
             | I mean the gsheets api access setup is convoluted and here
             | it's well documented. It has about two dozen steps.
        
           | jmull wrote:
           | > Maybe gsheets and sqlite are lucky there, maybe not
           | 
           | Seems weird to to pooh-pooh the idea without any reason to
           | think it isn't good. Also, it's not really a question of
           | luck, is it? It's not that mysterious how to validate an
           | integration like this.
        
         | Sirikon wrote:
         | from the _what_
        
           | qolop wrote:
           | From the gayest
        
             | garbagecoder wrote:
             | That sounds like me, but I don't issue csvs.
        
               | samstave wrote:
               | I only issue .STLs
        
             | nonethewiser wrote:
             | Gayest what? Sheet? Account holder?
        
               | samstave wrote:
               | Hardware
        
           | [deleted]
        
           | xnx wrote:
           | typo of gsheet?
        
             | mixcocam wrote:
             | Yes that's correct ! Oops
        
             | masklinn wrote:
             | More likely to be autocorrect doing it's usual.
        
       | x2bool wrote:
       | This project actually started as a fork of
       | https://github.com/x2bool/xlite - SQLite extension for querying
       | .xls and .xlsx spreadsheets.
        
       | simonw wrote:
       | Those setup instructions are so frustrating:
       | https://github.com/0x6b/libgsqlite#setup-google-cloud
       | 
       | Why does this have to be so hard!? I really wish Google Cloud
       | (and likewise AWS and many other providers) didn't force users to
       | jump through so many steps to use tools like this.
        
         | dragonwriter wrote:
         | > Why does this have to be so hard!?
         | 
         | Well, in part because they are using the console rather driving
         | it programmatically; all the cloud platforms are optimized for
         | programmatic/IAC rather than console operations.
         | 
         | And in part because Google Cloud (as opposed to AWS) has a
         | segregated-projects, and activate-options approach to manage
         | the scope of management surface and costs, whereas AWS (unless
         | you create separate _accounts_ ) is more of "lump everything
         | together", and within an account is everything-default-on.
         | 
         | If there was a similar thing for AWS, while there might be some
         | configuration needed, the "create a project" and "activate the
         | API" processes probably wouldn't be part of the basic
         | instructions.
        
       | mafro wrote:
       | Tools such as steampipe, and Python pandas/gspread make it pretty
       | trivial to get data and push it out into Google Sheets as a front
       | end.
       | 
       | eg. I frequently use Google Sheets as the UI for simple and
       | complex projects. The web front end is provided by Google, as is
       | the identity/auth part. You only have to worry about important
       | part - the data.
        
       | JasonFruit wrote:
       | This seems like the least sqlite-like possible thing.
       | Immediately, it makes me ask if sqlite is the right tool for a
       | job that involves Google Sheets. But I'm sure it fills a very
       | particular need for some people.
        
       | satoshiiii wrote:
       | Last project I was involved with deals with Google Sheets. They
       | were using it as their multiuser read/write database. I offered
       | to migrate it all to a proper database for FREE during my own
       | personal time outside of work. Blank stares were all I got. I
       | really hope AGI will soon be able to replace those humans that
       | cannot be reasoned with logic. Millions were involved, hundreds
       | of employees, thousands of sales, all being tracked in a couple
       | of sheets. The faster data retrieval alone was worth it if we
       | migrated it to a proper database.
        
         | msravi wrote:
         | So it probably would have been better to give them an excel-
         | like frontend connected to an sql server?
        
           | password4321 wrote:
           | If you could match the same price, security, availability,
           | mobile app/responsive UI support, audit history, comments,
           | read-only sharing, export features, 3rd party add-ons, etc.
           | 
           | The "we will destroy all your accounts across our entire
           | service catalog including any potential of future earnings if
           | you cross our AI abuse detection thresholds" sword of
           | Damocles is obviously unavailable and thus out of scope.
        
         | gruturo wrote:
         | I hate to break it to you, but an equally likely outcome of the
         | introduction of AGI would be to replace you instead, and give
         | those users exactly what they asked for, even if it's not
         | optimal (or improving it, but preserving the interface they are
         | used to).
        
         | AlecSchueler wrote:
         | But could the hundreds of employees continue to make changes to
         | the DB if it was moved out of sheets?
        
           | satoshiiii wrote:
           | Actually there were only a handful of people updating it
           | daily and it was my job to make all of those somehow give
           | accurate results. The hundred of employees I was talking
           | about were the ones making the sales, no direct access to
           | those sheets, and their performance is going to be evaluated
           | with the data that was recorded within those sheets. Data
           | retrieval is so slow, a month of sales by 300 salesperson is
           | around 30k rows, we have to connect that to their attendance,
           | the day whether it is a weekday, weekend, holiday, rest day.
           | The store they were assigned. The monthly quota they have,
           | etc.
           | 
           | I lasted 3 months only, got tired of the sudden meetings
           | discussing why can't we give fast and accurate results.
        
             | cswhnjidd wrote:
             | The point the person above was trying to make is that the
             | reason you got "blank stares" was that you were proposing
             | replacing a system where the users could read and write the
             | data, to a system where they can't access it. Everyone saw
             | this but you. Im sure AGI would've seen it too.
             | 
             | Also, the fact that you offered to do it for free makes me
             | question your motivation. I think you were trying to make
             | your own job easier at the cost of the users no longer
             | having the thing that worked for them.
        
               | satoshiiii wrote:
               | They were using the wrong tool for the job. The R in
               | RDBMS for the existing sheets, they were using R as in
               | not machines but humans doing the RELATIONAL work. Yes I
               | was trying to make my job easier, that is why mankind
               | invented RDBMS right? To make our work easier and
               | accurate? No?
        
               | zarzavat wrote:
               | The user interface is usually more important than the
               | backend. If the user interface for a spreadsheet is more
               | powerful, then that can make it a better solution than a
               | database that has a better backend but terrible UI.
               | 
               | It can be very difficult to replace spreadsheets because
               | the UI has had 30+ years of iteration.
        
               | nonethewiser wrote:
               | Can't they though? Presumably there would be a form that
               | validates inputs that they can use. Part of the problem
               | that OP described is that they were writing directly to
               | the "database" (and OP had to validate input afterwards).
        
         | pphysch wrote:
         | Spreadsheets have their place, not every sheet needs to be
         | migrated to a database application. But some certainly should.
        
           | aarondia wrote:
           | Somewhere before "I had to procure a new VM just to open this
           | Excel file", you should probably migrate to a database +
           | Python
        
         | NoZebra120vClip wrote:
         | I am not a lawyer, and I do not know your jurisdiction, but
         | where I come from, it would be _illegal_ for me to work off-
         | the-clock for my employer. It opens up many cans of worms
         | regarding liability, accounting, etc. I have to wonder whether
         | those sorts of things were going through the heads which gave
         | you  "blank stares". Did you attempt to compromise by offering
         | to do it on the clock instead?
        
         | cloudking wrote:
         | > I really hope AGI will soon be able to replace those humans
         | that cannot be reasoned with logic.
         | 
         | Great use case
        
         | [deleted]
        
         | [deleted]
        
         | O__________O wrote:
         | Nothing you've described sounds like a valid reason to move "a
         | couple of spreadsheets" to a database system which might
         | require: budget planning, database administrator, system
         | design, server, colocation, query optimization, report system,
         | monitoring, user interface, backup/recovery plan,
         | authentication system, change management logs, user training,
         | business process analysis, etc.
        
           | RadiozRadioz wrote:
           | The two planks of wood across the canyon work fine, there's
           | no valid reason to build a bridge! That would require
           | planning, architects, engineers, safety inspections,
           | construction, etc. Let's just keep ferrying cars across the
           | wood planks, we all know how wood planks work, and we've got
           | plenty of duct tape for when they break!
        
             | jeffbee wrote:
             | This is a deeply flawed analogy. Hosted cloud spreadsheet
             | has effectively perfect availability and durability
             | compared to most database solutions, and never requires
             | maintenance, and costs nothing.
        
               | RadiozRadioz wrote:
               | Google Sheets, the product, may have excellent
               | reliability, but business processes that rely on
               | spreadsheets certainly do not. The point in my analogy
               | was not that the planks of wood were bad products, it's
               | that they were being misused by ignorant people.
               | 
               | Use wood planks as a temporary crossing, don't ferry cars
               | over them. Run your bake sale with spreadsheets, don't
               | run a multimillion business with them.
               | 
               | I don't need to link you to the resources on the
               | astronomical number of problems that are caused by
               | spreadsheet-oriented business processes and shadow IT.
               | Using spreadsheets instead of proper systems shifts the
               | operational burden onto humans, who certainly do not have
               | perfect availability and durability.
        
               | jbandela1 wrote:
               | > but business processes that rely on spreadsheets
               | certainly do not
               | 
               | I wonder if this migration from spreadsheets to databases
               | is what helps ossify business culture in older companies.
               | With spreadsheets, change and iteration are easy with
               | fast cycles. With a database, change is a pain and
               | iteration cycles are very long since everything has to
               | though multiple stages of review, approval, and
               | implementation.
        
               | RadiozRadioz wrote:
               | Interesting idea. I'd say probably yes, to some extent.
               | Bringing back the human element, I'd say the businesses
               | still stuck on spreadsheets are much less open to change
               | than those who are willing to adopt database technology.
               | 
               | While long review cycles for schema changes are annoying,
               | I find it more worrying that most businesses have zero
               | review process at all for spreadsheets.
        
               | jeffbee wrote:
               | That's why your standard IT guy loves databases and hates
               | democratized software like Sheets. You can run a 80% of
               | business in a spreadsheet but IT Guy doesn't want to be
               | disintermediated.
        
               | RadiozRadioz wrote:
               | No, as an actual IT guy myself, our job is to support and
               | improve business technical operations. Most of us want
               | _less_ work. If we wanted to be intermediaries, we'd be
               | middle managers.
               | 
               | I don't know how much time you've spent supporting
               | spreadsheet users, but I work with them daily and I see
               | first-hand the amount of energy being wasted. Stuff that
               | takes them all day, they could do in a single SQL query.
               | Probably wouldn't take them more than a day to learn how.
               | 
               | It's not about democracy, it's about doing things in a
               | sensible way. I'd be happy if they used another
               | "democratic" tool like Microsoft PowerBI and a fully-
               | managed database.
        
               | jeffbee wrote:
               | You're just gatekeeping. You have tautologically defined
               | "sensible" to be the same as your personal preferences.
        
               | RadiozRadioz wrote:
               | Eh, maybe. I'll hazard that my preferences for business
               | technical operations are more informed than those of my
               | users. I seek only to help them.
               | 
               | Side note, I'm quite surprised that somebody in this form
               | is so staunchly in favor of spreadsheets.
        
       | pbmango wrote:
       | This is very cool and could be the basis for a huge set of
       | interesting operations - but any time I read about DB <>
       | Spreadsheet discussions they seem to be started by folks who much
       | better understand DBs.
       | 
       | The principle use of the spreadsheet for both power users and
       | casual users is a lot around justifying answers by being able to
       | see the data. The goal is not speed or scale.
       | 
       | If you and sending a report to a superior - clear and visible
       | rows, columns, and "math" makes it faster and easier to trust
       | information for non-technical users.
       | 
       | This same logic is why the spreadsheet is best serving
       | customized, one-off, or changing operations without a huge amount
       | of data. This makes up a huge chunk of day-to-day business
       | decisions, especially in finance where the math is much easier.
       | 
       | I don't think the spreadsheet lasts forever, but its successor
       | will likely be something that presents an easy way to get answers
       | from a set of data that is visible or at least able to be quickly
       | understood, regardless of whether its fast, scalable, networked
       | etc.
        
         | lkirk wrote:
         | Yes, I would agree that technical and non-technical usability
         | is one appealing spreadsheet use case. I think that another
         | thing I find myself using spreadsheets for is simple data
         | entry. By using a google sheet to enter data, you've already
         | got yourself an authenticated, multi-user web form for easily
         | entering data. It even integrates with google forms, which is a
         | very handy tool for collecting data. It's certainly a lot
         | easier than building it all yourself and hosting it.
         | 
         | Once the data is entered, there are much better tools for data
         | visualization and aggregation (in my opinion). It's nice to
         | have the "frontend" taken care of so that your time is freed up
         | to analyze and extract meaning from the data. I'm speaking from
         | the perspective of someone who does a lot of rapid prototyping
         | and data analysis. There are likely use cases where this won't
         | scale.
        
       ___________________________________________________________________
       (page generated 2023-03-19 23:03 UTC)