[HN Gopher] DuckDB - An in-process SQL OLAP database management ...
       ___________________________________________________________________
        
       DuckDB - An in-process SQL OLAP database management system
        
       Author : freilanzer
       Score  : 98 points
       Date   : 2023-02-10 15:48 UTC (7 hours ago)
        
 (HTM) web link (duckdb.org)
 (TXT) w3m dump (duckdb.org)
        
       | samwillis wrote:
       | DuckDB is awesome!
       | 
       | Last year I was working on something using SQLite, users could
       | perform analytical queries that would scan the entire 3gb db and
       | generate aggregates. It would take at least 45 seconds to do the
       | queries.
       | 
       | I did a dump of the db and imported to DuckDB. The same queries
       | now only take 1.5 seconds with exactly the same SQL.
       | 
       | Obviously there is a trade off, inserts are slower on DuckDB. But
       | for a low write, analytical read app it's perfect.
       | 
       | I tried to use the SQLite connector but struggled to get it
       | working. Need to circle back and have another go.
       | 
       | With their SQLite and Postgres connectors, as a Django dev I
       | would love an app that lets you run specific queries on your DB
       | via DuckDB almost transparently. Would be awesome for analytical
       | dashboards.
        
         | mcdonje wrote:
         | Yeah, I see it as the sqlite of the reporting world. Especially
         | with columnar stores and parquet support.
        
         | dunefox wrote:
         | I'd need a MySQL/MariaDB connector for it to be really useful
         | for me. But still, it looks great.
        
       | everybodyknows wrote:
       | Anyone tried using the Go driver? It has a few open bugs:
       | 
       | https://github.com/marcboeker/go-duckdb/issues
       | 
       | And a third-party effort, only:
       | 
       | https://duckdb.org/docs/api/overview
        
         | kagitac wrote:
         | I've found the CGO boundary to be quite slow for large result
         | sets and have taken to just running commands that do SELECT and
         | COPY to files on the system and then read those.
        
       | out_of_step wrote:
       | DuckDB has an awesome integration with Apache Arrow and very
       | useful implementations in R and Python. Great for easy larger-
       | than-memory analyses
       | 
       | https://duckdb.org/2021/12/03/duck-arrow.html
        
       | vgt wrote:
       | DuckDB is such a breath of fresh air!
       | 
       | We at MotherDuck at working very closely with the DuckDB folks to
       | build a DuckDB-based cloud service. I'm talking to various folks
       | in the industry about the details in 1:1. Feel free to reach out
       | to tino at motherduck.com.
       | 
       | (co-founder and head of Produck at MotherDuck)
        
       | lopkeny12ko wrote:
       | I appreciate the clarity on the explicitly unsupported use cases
       | in "When to not use DuckDB."
       | 
       | There are so many infrastructure products, especially database
       | products for some reason, where the marketing team takes control
       | of the messaging away from engineers, and push outlandish claims
       | on how their new DB is faster than all the competition, can
       | support any workload, can scale infinitely, etc.
        
       | SnowflakeOnIce wrote:
       | DuckDB is terrific. I'm bullish on its potential for simplifying
       | many big data pipelines. Particularly, it's plausible that DuckDB
       | + Parquet could be used on a large SMP machine (32+ cores and
       | 128GB+ memory) to deal with data munging for 100s of gigabytes to
       | several terabytes, all from SQL, without dealing with Hadoop,
       | Spark, Ray, etc.
       | 
       | I have successfully used DuckDB like above for preparing an ML
       | dataset from about 100GB of input.
       | 
       | DuckDB is undergoing rapid development these days. There have
       | been format-breaking changes and bugs that could lose data. I
       | would not yet trust DuckDB for long-term storage or archival
       | purposes. Parquet is a better choice for that.
        
         | nerdponx wrote:
         | As far as I can tell, DuckDB is an alternative to "data frame"
         | libraries like Data.table, Polars, Pandas, etc. Is that the
         | case? What makes DuckDB a better choice than, say, Polars?
        
           | corford wrote:
           | This blog post offers a nice summary:
           | https://motherduck.com/blog/six-reasons-duckdb-slaps/
        
             | nerdponx wrote:
             | The blog post doesn't really make a comparison between
             | DuckDB and data frame libraries. It mentions that the
             | DuckDB Python bindings can interoperate with Pandas, but it
             | doesn't really explain why you would use DuckDB _instead
             | of_ Pandas, or Polars (which is both faster and more
             | portable than Pandas).
        
               | simonw wrote:
               | Don't Polars and Pandas both require your entire data to
               | fit in memory?
        
           | syntaxfree wrote:
           | It's a drop in alternative to SQLite that's column-
           | oriented/OLAP. I've been profiling entire projects in
           | production switching between SQLite and duckdb (no clear
           | conclusions yet)
        
             | nerdponx wrote:
             | I suppose that leads to a broader question: when should you
             | use an in-memory database, and when should you use a data
             | frame library? The distinction between the two seems to be
             | getting blurry (which maybe is a good thing).
        
               | IanCal wrote:
               | Very blurry. The answer now is just "whichever is easier
               | for the small part of the task right now". Since duckdb
               | happily talks arrow, you can use pandas for part of it,
               | quickly do some SQL where that is easier (with no data
               | copying) then switch back to pandas for something. You
               | don't really have to choose which one to use any more.
        
           | benjaminwootton wrote:
           | SQL is easier and more natural to work with than Pandas.
        
         | RobinL wrote:
         | I'd love to hear any real world experiences of anyone who's
         | tried to run jobs that would usually require a spark cluster on
         | a single machine with loads of cores and memory.
         | 
         | How big can you go, and how does speed compare to Spark? (I'm
         | guessing significantly faster from my experience using Duckdb
         | on smaller machines)
        
           | wenc wrote:
           | I have a single machine EC2 instance with 32 cores and 240GB
           | memory and about 200 GB of partitioned Parquet files. I use
           | DuckDB and Python with complex SQL (window functions,
           | inequality joins, quantile functions etc) to extract data
           | from this data.
           | 
           | Because it's a single machine (no distributed cluster) DuckDB
           | can heavily parallelize and vectorize. I don't know if I can
           | give you perf numbers but complex analytic queries over the
           | entire dataset regularly finish in 1-2 mins (not scientific
           | since I'm not telling what kinds of queries I'm running).
           | 
           | I've used Spark SQL and DuckDB overall is just more
           | ergonomic, less boilerplate and is much faster since it is so
           | lightweight.
           | 
           | Granted DuckDB can only process data on one machine (whereas
           | Spark can scale up indefinitely by adding machines) but most
           | data sets I work with fit on a single beefy machine.
           | 
           | Distributed computing -- most of the time, you ain't gonna
           | need it.
           | 
           | It's like StackOverflow: it serves 2B requests a month but
           | only runs on a few on-prem servers. Most people think this is
           | impossible but you can actually do a lot with very few
           | machines if you're smart about it. Same with data. Big data
           | is overrated.
        
             | RobinL wrote:
             | Thanks - very interesting
        
           | jakewins wrote:
           | I used it as glue for a monthly job, last week, processing
           | ~1B event records, juggling them a bit and pushing them as 2M
           | parquet segment files to S3.
           | 
           | Mixed experience, would definitely not put it in a system
           | that isn't ok crashing frequently.
           | 
           | It segfaults on Alpine (argh, C++!), and force exits the
           | whole NodeJS process when it gets unexpected HTTP responses
           | from S3.
           | 
           | In an archive run of 2M pushes it'll crash the process 4-5
           | times.
           | 
           | Overall still really, really like it, but learned to not
           | trust it
        
             | tracker1 wrote:
             | Node + AWS for data ingress has been pretty painful in my
             | experience (mostly dynamo feeds from large csv (whois
             | database)). In the end, rewrote in C# (core 2) and it was
             | able to complete more reliably. I'm guessing that go and
             | rust would also be better. I like node, really like JS, but
             | I just think that maybe the AWS libraries aren't that great
             | in the space. It would run for 3-5 hours, then just blow up
             | unexpectedly, even with plenty of memory overhead, and not
             | really bandwidth limited, with appropriate retries and
             | slowdown for dynamo rejections.
             | 
             | If I never have to write ETL pipelines again, I won't be
             | upset about it.
        
               | Master_Odin wrote:
               | There was a bug that was recently fixed in node 16.17+
               | that was causing hard node processes crashes when doing
               | stuff with S3 and I think had to do with receiving
               | multiple packets at once or something.
        
         | benjaminwootton wrote:
         | DuckDB is a relational OLAP store. If you want to do
         | transformations on relational data using SQL then I think
         | nowadays you would look at the modern data stack and do it with
         | DBT.
         | 
         | If you have genuinely big and unstructured data then of course
         | you need a cluster and would reach for Spark.
         | 
         | If you have smallish data then maybe DuckDB has a role because
         | working with SQL is nicer than Pandas. But a lot of time you
         | actually need the complexity of Pandas to do the transformation
         | you need.
         | 
         | DuckDB is neat but I still can't quite convince myself of a
         | killer use case.
        
           | 89vision wrote:
           | > working with SQL is nicer than Pandas
           | 
           | Really? I prefer working with dataframe apis. You get a nice
           | sql-like paradigm plus all the control structures of the
           | runtime.
        
           | neeleshs wrote:
           | I am not sure I understand the first comment very well. Are
           | you saying that instead of duckdb use modern data stack?
           | Because DBT and DuckDB don't seem to contradict, but can work
           | together. FWIW, I think the only important breakthrough in
           | the "modern data stack" is really dbt. The rest, nothing
           | modern about it
        
         | krimpenrik wrote:
         | Recently tried the GUI tool for ducks, forgot what's it called,
         | something like 'Tab' and was quite disappointed. I feel duckdb
         | needs a good tool like sqliteviewer to really take off.
        
           | 0cf8612b2e1e wrote:
           | DBeaver supports duckdb.
        
       | bwanab wrote:
       | There's a nice intro to DuckDB for julia developers by Bogumil
       | Kaminski, the creator of DataFrames.jl here:
       | https://juliazoid.com/welcome-to-duckdb-3c4e75f50b97.
       | 
       | Interesting since in some ways, as he points out, it's in direct
       | competition with DataFrames for use cases, but he gives it a very
       | positive treatment and shows how they can work together using
       | advantages of standard SQL along with processing power of
       | DataFrames.
        
         | nickpeterson wrote:
         | When someone gives fair opinions on something that directly
         | competes were their own work, you should take their opinion
         | very seriously. It's an excellent quality in a person, and
         | shows they're more focused on the problem than their ego.
        
           | phillc73 wrote:
           | Quite a while ago, when duckdb was just a duckling, I wrote
           | an R package that supported direct manipulation of R
           | dataframes using SQL.[1] duckdb was the engine for this.
           | 
           | The approach was never as fast as data.table but did approach
           | the speed of dplyr for more complex queries.
           | 
           | Life had other things in store for me and I haven't touched
           | this library for a while now.
           | 
           | At the time there was no Julia connector for duckdb, but now
           | that there is, I'd like to try this approach in that
           | language.
           | 
           | [1] https://github.com/phillc73/duckdf
        
       | dang wrote:
       | Related--others?:
       | 
       |  _Motherduck Raises $47.5M for DuckDB_ -
       | https://news.ycombinator.com/item?id=33610218 - Nov 2022 (2
       | comments)
       | 
       |  _Modern Data Stack in a Box with DuckDB_ -
       | https://news.ycombinator.com/item?id=33191938 - Oct 2022 (5
       | comments)
       | 
       |  _Querying Postgres Tables Directly from DuckDB_ -
       | https://news.ycombinator.com/item?id=33035803 - Sept 2022 (38
       | comments)
       | 
       |  _Notes on the SQLite DuckDB Paper_ -
       | https://news.ycombinator.com/item?id=32684424 - Sept 2022 (28
       | comments)
       | 
       |  _Show HN: CSVFiddle - Query CSV files with DuckDB in the
       | browser_ - https://news.ycombinator.com/item?id=31946039 - July
       | 2022 (13 comments)
       | 
       |  _Show HN: Easily Convert WARC (Web Archive) into Parquet, Then
       | Query with DuckDB_ -
       | https://news.ycombinator.com/item?id=31867179 - June 2022 (15
       | comments)
       | 
       |  _Range joins in DuckDB_ -
       | https://news.ycombinator.com/item?id=31530639 - May 2022 (24
       | comments)
       | 
       |  _Friendlier SQL with DuckDB_ -
       | https://news.ycombinator.com/item?id=31355050 - May 2022 (133
       | comments)
       | 
       |  _Fast analysis with DuckDB and Pyarrow_ -
       | https://news.ycombinator.com/item?id=31217782 - April 2022 (53
       | comments)
       | 
       |  _Directly running DuckDB queries on data stored in SQLite files_
       | - https://news.ycombinator.com/item?id=30801575 - March 2022 (23
       | comments)
       | 
       |  _Parallel Grouped Aggregation in DuckDB_ -
       | https://news.ycombinator.com/item?id=30589250 - March 2022 (10
       | comments)
       | 
       |  _DuckDB quacks Arrow: A zero-copy data integration between Arrow
       | and DuckDB_ - https://news.ycombinator.com/item?id=29433941 - Dec
       | 2021 (13 comments)
       | 
       |  _DuckDB-Wasm: Efficient analytical SQL in the browser_ -
       | https://news.ycombinator.com/item?id=29039235 - Oct 2021 (58
       | comments)
       | 
       |  _Comparing SQLite, DuckDB and Arrow with UN trade data_ -
       | https://news.ycombinator.com/item?id=29010103 - Oct 2021 (79
       | comments)
       | 
       |  _DuckDB is the better SQLite with APIs for Java /Python/R and
       | it's got potential_ -
       | https://news.ycombinator.com/item?id=28692997 - Sept 2021 (2
       | comments)
       | 
       |  _Fastest table sort in the West - Redesigning DuckDB 's sort_ -
       | https://news.ycombinator.com/item?id=28328657 - Aug 2021 (27
       | comments)
       | 
       |  _Querying Parquet with Precision Using DuckDB_ -
       | https://news.ycombinator.com/item?id=27634840 - June 2021 (32
       | comments)
       | 
       |  _DuckDB now has a Node.js API_ -
       | https://news.ycombinator.com/item?id=25289574 - Dec 2020 (4
       | comments)
       | 
       |  _DuckDB - An embeddable SQL database like SQLite, but supports
       | Postgres features_ -
       | https://news.ycombinator.com/item?id=24531085 - Sept 2020 (160
       | comments)
       | 
       |  _DuckDB: SQLite for Analytics_ -
       | https://news.ycombinator.com/item?id=23287278 - May 2020 (67
       | comments)
        
       | bluehat974 wrote:
       | DuckCon 2023 - State of the Duck - Mark Raasveldt and Hannes
       | Muhleisen
       | 
       | https://youtu.be/rdnPkLSkoyU
        
       | d_watt wrote:
       | I'm duck-curious.
       | 
       | Looking at how it's deployed, as an in process database, how do
       | people actually use this in production? Trying to figure out
       | where I might actually want to think about replacing current
       | databases or analyses with DuckDB.
       | 
       | EG if you deployed new code
       | 
       | 1. Do you have a stateful machine you're doing an old school
       | "Kill the old process, start the new process" deploy, and there's
       | some duckdb file on disk that is maintained?
       | 
       | 2. Or do you back that duckdb file in some sort of shared disk
       | (Eg EBS), and have a rolling deploy where multiple applications
       | access the same DB at the same time?
       | 
       | 3. Or is DuckDB is treated as ephemeral, and you're using it to
       | process data on the fly, so persisted state isn't an issue?
        
         | benrutter wrote:
         | There's a great podcast/interview with the creator of duckdb.
         | He's pretty clear of thinking of the use case as more or less
         | equivalent to mysql but for aggregated queries. I think trying
         | to use it in place of something like a fully fledged postgres
         | server might get weird, less because of any issues with duckdb
         | and more because that isn't what it's designed for.
         | 
         | Podcast/interview link for anyone interested:
         | https://www.dataengineeringpodcast.com/duckdb-in-process-ola...
        
         | bigdubs wrote:
         | We use DuckDB extensively where I work (https://watershed.com),
         | the primary way we're using it is to query Parquet formatted
         | files stored in GCS, and we have some machinery to make that
         | doable on demand for reporting and analysis "online" queries.
        
           | dstroot wrote:
           | Storing data in Parquet files and querying via DuckDB is fast
           | and kind of magical.
        
             | bigdubs wrote:
             | Shockingly fast and nice and having the intermediate files
             | be immutable is super nice.
        
               | Tostino wrote:
               | Haven't used it yet, but this aspect seems very
               | appealing.
        
           | mritchie712 wrote:
           | Do you load the Parquet files in duckdb or just query them
           | directly?
        
             | bigdubs wrote:
             | We query them directly in most cases with the httpfs
             | plugin, but for hot paths we fetch them and cache them on
             | disk locally.
        
         | dunefox wrote:
         | Good questions, I'm thinking about evaluating it along with
         | pola.rs as data science tools. Would be interesting to hear
         | from someone using it already.
        
         | romski wrote:
         | Yes using it in production - Stateless and ephemeral. For sure
         | there's a learning curve.
        
           | d_watt wrote:
           | I see. Would it be fair to say you treat it almost like
           | Pandas, except that it has a lower memory footprint since
           | data is written to disk instead of memory. IE you use it for
           | on the fly analysis of large frames of data, not like more
           | traditional database/datawarehouse?
        
             | chrisjc wrote:
             | I'm not sure they're saying that.
             | 
             | BTW, your questions are exactly those that I've been ask
             | over the last few months, but also with a lot of focus over
             | the last few days. Still learning as much as I can so the
             | following might not be true.
             | 
             | For what it's worth, there's a difference between using
             | duckdb to query a set of files vs loading a bunch of files
             | in to a table. But once the data has been loaded into a
             | table it can be backed up as a duckdb db file.
             | 
             | Therefore it might be more performant to preprocess duckdb
             | db files (perhaps a process that works in conjunction with
             | whatever manages your external tables) and load these db
             | files into duckdb as needed (on the fly analysis) instead
             | of loading datafiles into duckdb, transforming and CTAS
             | every time.
             | 
             | https://duckdb.org/docs/sql/statements/attach
             | 
             | Of course all of this might be introducing more latency esp
             | if you're trying to do NRT analytics.
             | 
             | I assume you could partition your data into multiple db
             | files similar to how you would probably do it with your
             | data files (managing external tables).
        
               | d_watt wrote:
               | Ah, almost like a little bit of a lower level trino,
               | where DuckDB is able to push out queries to different
               | data storage endpoints?
        
               | chrisjc wrote:
               | Unfortunately not. At least not without a little
               | intervention. See this blog post for more details about
               | what I mean. They inspect the iceberg table's catalogue
               | to list the related parquet files and then load them into
               | duckdb.
               | 
               | https://www.arecadata.com/getting-started-with-iceberg-
               | using...
               | 
               | You would still need to interact with some kind of
               | catalogue to understand which .db files you need to
               | fetch.
               | 
               | And honestly I don't really know or understand the
               | performance implications of the attach command.
               | 
               | I'm excited to see if the duckdb team will be able to
               | integrate with external tables directly one day. (not
               | that data files would be .db files)
               | 
               | Imagine this:
               | 
               | 1) you have an external managed external table (iceberg,
               | delta, etc... managed by Glue, databricks, etc)
               | 
               | 2) register this table in duckdb                   CREATE
               | OR REPLACE EXTERNAL TABLE my_table ... TYPE = 'ICEBERG'
               | CATALOG = 's3://...' CREDENTIALS = '...' etc
               | 
               | 3) simply interact with table in duckdb as you would any
               | other table
        
       | argella wrote:
       | Isn't SQLite adding features for analytic queries that should
       | take the wind out of the sails of duckdb?
        
         | tanin wrote:
         | I doubt sqlite will catch up soon in terms of analytics due to
         | a few below reasons.
         | 
         | The SQL dialect is so much lacking that it seems intentional.
         | They are meant to be a transactional database, not an analytics
         | one.
         | 
         | Sqlite also takes pride on stability (deployed on a billion
         | android devices). Adding 100+ analytics capabilities e.g.
         | functions is not gonna be easy in terms of maintaining
         | stability.
         | 
         | I want to be wrong though because my paid app
         | (superintendent.app) uses Sqlite. Not supporting analytics well
         | is the number one complaint.
        
         | simonw wrote:
         | Not really. There was an academic project around this but I've
         | not seen signs that it will be integrated with SQLite core:
         | https://simonwillison.net/2022/Sep/1/sqlite-duckdb-paper/
         | 
         | The key thing to consider here is trade-offs.
         | 
         | Analytical databases tend to be optimized for analytical
         | queries at the expense of fast atomic read-write transactions.
         | 
         | SQLite is mainly used in situations where fast atomic read-
         | write transactions are key - that's why it's used in so many
         | mobile phone applications, for example.
         | 
         | It's not going to grow analytical-query-at-scale capabilities
         | if that means negatively impacting the stuff it's really good
         | at already.
        
       | kiratp wrote:
       | Its disappointing that C++ was chosen to build something that is
       | going to live in-process. Rust would have been so much safer. All
       | the segfaults you get when running DucDB supports this statement.
        
         | mcdonje wrote:
         | I agree about Rust's memory safety advantage over C++, but I
         | disagree that it's disappointing from a project perspective.
         | Some DB experts made a good DB using a performant language
         | they're comfortable with.
         | 
         | You can't make project choices in a vacuum, and you can't
         | assume others can either. People have limited time. The choice
         | they were facing was probably not C++ vs Rust, but C++ vs
         | nothing because they didn't have time to learn a new language
         | before starting their project.
         | 
         | Also, their first release was in 2019, so they probably heard
         | of it, but that's around the beginning of its recent spike in
         | popularity. It's starting to be viewed as a good long term
         | option, but back then a lot of people were still wondering if
         | it was a fad.
         | 
         | I'm learning Rust, and I'm a big fan, but this is a bad take.
        
       | intrasight wrote:
       | I see no MDX query support. How is this an OLAP database? Or do I
       | misunderstand what it is?
        
         | slt2021 wrote:
         | MDX is microsoft proptech, OLAP is more generic term to define
         | analytical processing (in contrast to transaction processing as
         | in CRUD).
         | 
         | OLAP cube is microsoft's take on OLAP using 90s technologies
         | for tech stacks from 1990s (Windows Server + SQL Server +
         | SSAS).
         | 
         | DuckDB is a modern take on OLAP
        
       | epberry wrote:
       | We just switched to DuckDB from Postgres to analyze AWS billing
       | data and wrote up our experience,
       | https://www.vantage.sh/blog/querying-aws-cost-data-duckdb
       | 
       | Arguably Postgres was never the right tool to use for this
       | analysis but nonetheless I was surprised at how much faster
       | DuckDB was.
        
         | thewisenerd wrote:
         | ha! i'm using duckdb for a similar use-case.. plotting 18-month
         | aggregates take less than 2 seconds.. vendor only provides data
         | for the last 3 months (through the web portal).
         | 
         | (i'm "only" working with 30 million rows though)
        
       | lichtenberger wrote:
       | I wonder if we'll in the near future also see JVM based in-memory
       | or hybrid OLAP database systems, which will make use of SIMD
       | instructions and columnar storage layouts with the incubating
       | Vector API.
       | 
       | It would be also interesting to see how we can process semi-
       | structured data in a similar way.
        
       | jengleton wrote:
       | Great to see this posted here! DuckDB is an integral part of an
       | in-browser data analytics tool that I've been working on. It
       | compiles to WASM and runs in a web worker. Queries against WASM
       | DuckDB regularly run 10x faster than the original JavaScript
       | implementation!
        
         | thenaturalist wrote:
         | In case you're interested in or willing to share more details
         | about the tool or your architecture I'd be an interested
         | reader.
        
         | hummus_bae wrote:
         | [dead]
        
       | wasd wrote:
       | Our database is heroku postgresql database. What's the best way
       | to get this working with DuckDB? I see there's a postgresql
       | connector but I'm not totally following how to deploy it. Would I
       | just spin up a dyno with the docker image / custom build pack and
       | connect it to the DB?
        
         | chrisjc wrote:
         | Doesn't postgres have a columnar option? If so, you could prob
         | get better performance for your analytical interactions if you
         | switched some tables to columnar.
         | 
         | Otherwise check out postgres scanner.
         | https://github.com/duckdblabs/postgres_scanner
         | 
         | They have a blog entry about it too:
         | https://motherduck.com/blog/duckdb-ecosystem-newsletter-two/
        
       | ctc24 wrote:
       | We're big fans of DuckDb at https://prequel.co! We use it as part
       | of our own dataframe implementation in Go. The speed is
       | unbeatable and the tool is top notch. There are a few rough edges
       | (it's not quite 1.0 level of stability yet), but the team is
       | super reactive and has fixed bugs we've reported in < 48hrs
       | pretty much every time.
        
       | __mharrison__ wrote:
       | Not trying to troll, but assuming proficiency in python, when
       | would someone prefer this to say Pandas (or Polars)?
       | 
       | I've written a lot of OLAP queries (wrote a materialization layer
       | for MonetDb and Postgres years ago). I find Pandas so much easier
       | to work with for semi complicated work.
        
         | cdavid wrote:
         | The obvious one is speed/data size: you can handle much larger
         | data with duckdb compared to pandas. Depending on the data,
         | maybe 10x larger, sometimes even more.
         | 
         | From an ergonomics perspective, I find pandas much harder to
         | use casually than SQL. When I was an IC and was using it a lot,
         | I was proefficient in it. But now that I code maybe 5 hours /
         | month at work, I can't really do anything non trivial besides
         | basic stuff/pivots. OTOH, I never really forget SQL.
        
       | brunoqc wrote:
       | Anyone both tried duckdb and clickhouse-local?
        
       ___________________________________________________________________
       (page generated 2023-02-10 23:00 UTC)