[HN Gopher] DuckDB Doesn't Need Data to Be a Database
___________________________________________________________________
DuckDB Doesn't Need Data to Be a Database
Author : tosh
Score : 115 points
Date : 2024-05-29 09:13 UTC (13 hours ago)
(HTM) web link (www.nikolasgoebel.com)
(TXT) w3m dump (www.nikolasgoebel.com)
| jhoechtl wrote:
| Waiting for DDb to be able to read delta tables natively.
| NortySpock wrote:
| https://github.com/duckdb/duckdb_delta
|
| Extension available (read only, apparently)
| noone_important wrote:
| I tried to use the extension, but unfortunately i couldn't
| resolve my problems with it. I always run into errors when i
| try to execute queries on delta tables.
| aleatorisch wrote:
| I'm curious what errors you were running into? Mind posting
| an issue in the repo, or here? Thanks!
| mbreese wrote:
| Back in the day (early 2000's), I worked with a DB2 instance that
| had similar functionality. At the time, they called this feature
| federated databases. If you had the appropriate wrapper, you
| could use any data source in a query. Even output from other
| programs. At the time I used it for including dynamic DNA
| sequence alignments in queries.
|
| IIRC, SQLite can do similar things with virtual tables (with a
| more limited set of data file types).
|
| I always liked this way of working, but I also wonder why it
| never really took off. Data discovery can be an issue, and I can
| see the lack of indexing as being a problem.
|
| I guess that's a long winded way to ask: as interesting as this
| is, what are the use cases where one would really want (or need)
| to use it?
| solidsnack9000 wrote:
| A related functionality is "SQL/MED", a SQL specification for
| federated databases that has some kind of relationship to
| medical data historically (I believe one of the use cases is
| data at one site that another site is allowed to query, in a
| limited way, data hosted at another site that may not be moved
| from it).
| clscott wrote:
| Postgres calls these foreign data wrappers (FDW)
|
| https://wiki.postgresql.org/wiki/Foreign_data_wrappers
| refset wrote:
| Steampipe demonstrates a rather impressive range of scenarios
| for using FDWs + SQL in place of regular ETL and API
| integrations: https://steampipe.io/
| abraae wrote:
| > I always liked this way of working, but I also wonder why it
| never really took off.
|
| In today's new fangled world, a lot of developers don't use a
| lot of the great stuff that RDBMS can provide - stored
| procedures, SQL constraints, even indexes. The modern mindset
| seems to be that that stuff belongs in the code layer, above
| the database. Sometimes it's justified as keeping the database
| vanilla, so that it can be swapped out.
|
| In the old days you aimed to keep you database consistent as
| far as possible, no matter what client was using it. So of
| course you would use SQL constraints, otherwise people could
| accidentally corrupt the database using SQL tools, or just with
| badly written application code.
|
| So it's not hard to see why more esoteric functions are not
| widely used.
| countvonbalzac wrote:
| Does DuckDB cache the S3 downloads? Otherwise it could get pretty
| expensive, no?
| davesque wrote:
| If the parquet file includes any row group stats, then I
| imagine DuckDB might be able to use those to avoid scanning the
| entire file. It's definitely possible to request specific
| sections of a blob stored in S3. But I'm not familiar enough
| with DuckDB to know whether or not it does this.
| akdor1154 wrote:
| It does do that. I can't answer OP's qn about caching though.
| blyry wrote:
| This is a great feature. We've been able to significantly extend
| the scope and usefulness of our on-prem SQL Cluster for analytics
| and reporting with PolyBase by building new transactional systems
| with cheaper postgres, doing ETLs of third-party data to delta
| tables in azure storage, and then federating access to them with
| PolyBase so that nobody in the business has to change how they
| actually query the data. I'm sure in another decade we'll be
| fully migrated to some cloud platform but for now, federating the
| queries is a huge win.
| brutuscat wrote:
| Does it work with some format that supports indexes like Apache
| carbon data rather than parquet?
|
| https://github.com/apache/carbondata
| dangoodmanUT wrote:
| does view creation still list all files? ime even if not queried,
| the view would do a lot of s3 calls
| clumsysmurf wrote:
| DuckDB has Swift bindings, but unfortunately, afaik, nothing
| official for Android. If anyone has gotten it working on Android
| I'd love to hear about it.
___________________________________________________________________
(page generated 2024-05-29 23:00 UTC)