hngopher.com

       [HN Gopher] DuckDB Doesn't Need Data to Be a Database
       ___________________________________________________________________
        
       DuckDB Doesn't Need Data to Be a Database
        
       Author : tosh
       Score  : 115 points
       Date   : 2024-05-29 09:13 UTC (13 hours ago)
        
 (HTM) web link (www.nikolasgoebel.com)
 (TXT) w3m dump (www.nikolasgoebel.com)
        
       | jhoechtl wrote:
       | Waiting for DDb to be able to read delta tables natively.
        
         | NortySpock wrote:
         | https://github.com/duckdb/duckdb_delta
         | 
         | Extension available (read only, apparently)
        
           | noone_important wrote:
           | I tried to use the extension, but unfortunately i couldn't
           | resolve my problems with it. I always run into errors when i
           | try to execute queries on delta tables.
        
             | aleatorisch wrote:
             | I'm curious what errors you were running into? Mind posting
             | an issue in the repo, or here? Thanks!
        
       | mbreese wrote:
       | Back in the day (early 2000's), I worked with a DB2 instance that
       | had similar functionality. At the time, they called this feature
       | federated databases. If you had the appropriate wrapper, you
       | could use any data source in a query. Even output from other
       | programs. At the time I used it for including dynamic DNA
       | sequence alignments in queries.
       | 
       | IIRC, SQLite can do similar things with virtual tables (with a
       | more limited set of data file types).
       | 
       | I always liked this way of working, but I also wonder why it
       | never really took off. Data discovery can be an issue, and I can
       | see the lack of indexing as being a problem.
       | 
       | I guess that's a long winded way to ask: as interesting as this
       | is, what are the use cases where one would really want (or need)
       | to use it?
        
         | solidsnack9000 wrote:
         | A related functionality is "SQL/MED", a SQL specification for
         | federated databases that has some kind of relationship to
         | medical data historically (I believe one of the use cases is
         | data at one site that another site is allowed to query, in a
         | limited way, data hosted at another site that may not be moved
         | from it).
        
         | clscott wrote:
         | Postgres calls these foreign data wrappers (FDW)
         | 
         | https://wiki.postgresql.org/wiki/Foreign_data_wrappers
        
           | refset wrote:
           | Steampipe demonstrates a rather impressive range of scenarios
           | for using FDWs + SQL in place of regular ETL and API
           | integrations: https://steampipe.io/
        
         | abraae wrote:
         | > I always liked this way of working, but I also wonder why it
         | never really took off.
         | 
         | In today's new fangled world, a lot of developers don't use a
         | lot of the great stuff that RDBMS can provide - stored
         | procedures, SQL constraints, even indexes. The modern mindset
         | seems to be that that stuff belongs in the code layer, above
         | the database. Sometimes it's justified as keeping the database
         | vanilla, so that it can be swapped out.
         | 
         | In the old days you aimed to keep you database consistent as
         | far as possible, no matter what client was using it. So of
         | course you would use SQL constraints, otherwise people could
         | accidentally corrupt the database using SQL tools, or just with
         | badly written application code.
         | 
         | So it's not hard to see why more esoteric functions are not
         | widely used.
        
       | countvonbalzac wrote:
       | Does DuckDB cache the S3 downloads? Otherwise it could get pretty
       | expensive, no?
        
         | davesque wrote:
         | If the parquet file includes any row group stats, then I
         | imagine DuckDB might be able to use those to avoid scanning the
         | entire file. It's definitely possible to request specific
         | sections of a blob stored in S3. But I'm not familiar enough
         | with DuckDB to know whether or not it does this.
        
           | akdor1154 wrote:
           | It does do that. I can't answer OP's qn about caching though.
        
       | blyry wrote:
       | This is a great feature. We've been able to significantly extend
       | the scope and usefulness of our on-prem SQL Cluster for analytics
       | and reporting with PolyBase by building new transactional systems
       | with cheaper postgres, doing ETLs of third-party data to delta
       | tables in azure storage, and then federating access to them with
       | PolyBase so that nobody in the business has to change how they
       | actually query the data. I'm sure in another decade we'll be
       | fully migrated to some cloud platform but for now, federating the
       | queries is a huge win.
        
       | brutuscat wrote:
       | Does it work with some format that supports indexes like Apache
       | carbon data rather than parquet?
       | 
       | https://github.com/apache/carbondata
        
       | dangoodmanUT wrote:
       | does view creation still list all files? ime even if not queried,
       | the view would do a lot of s3 calls
        
       | clumsysmurf wrote:
       | DuckDB has Swift bindings, but unfortunately, afaik, nothing
       | official for Android. If anyone has gotten it working on Android
       | I'd love to hear about it.
        
       ___________________________________________________________________
       (page generated 2024-05-29 23:00 UTC)