hngopher.com

       [HN Gopher] DuckDB 1.1.0 Released
       ___________________________________________________________________
        
       DuckDB 1.1.0 Released
        
       Author : craigkerstiens
       Score  : 161 points
       Date   : 2024-09-09 15:33 UTC (7 hours ago)
        
 (HTM) web link (duckdb.org)
 (TXT) w3m dump (duckdb.org)
        
       | simonw wrote:
       | I feel like R-Tree spatial indexes are potentially the most
       | exciting new feature in this release, but they're buried right
       | down at the bottom of the announcement:
       | https://duckdb.org/2024/09/09/announcing-duckdb-110.html#r-t...
        
         | maxxen wrote:
         | Thanks! I've been wanting to add this since I first started out
         | working on DuckDB almost two years ago but I finally managed to
         | accumulate the time (and the skills required!) to finish it up
         | over the summer. It still has a long way to go, support for
         | indexes in extensions are pretty... raw, and we only push down
         | constant filters into index scans (so no spatial index-join
         | acceleration yet). But I think having a proper spatial index is
         | one of those things that are kind of required to really elevate
         | the spatial extension from being just a toy and I'm super
         | stoked to work more on it during the next release cycle and all
         | the new possibilities that it opens up.
        
           | maxxen wrote:
           | I still need to update the docs and there's probably going to
           | be a blog post on it in the future, but for now there are
           | some more details in the PR
           | https://github.com/duckdb/duckdb_spatial/pull/383
        
       | RandomCitizen12 wrote:
       | The namesake: https://en.wikipedia.org/wiki/Eaton%27s_pintail
        
       | log4shell wrote:
       | Congratulations to duckdb team! Can't wait to try some of the
       | newly released features and performance improvements.
       | 
       | I am quite curious about the plans for python dataframe like API
       | for duckdb, and python ecosystem in general.
        
         | exergy wrote:
         | there is Ibis[0] as a fairly mature package. They recently
         | adopted duckdb as the default execution engine and it can give
         | you a nice python dataframe API ontop of duckdb, with hot-
         | swappability towards heavier engines.
         | 
         | With tools like this providing a comprehensive python API and
         | the ability to always fall back to raw SQL, i am not sure
         | DuckDB devs should focus on the python API at all beyond basic
         | (to_table, from_table) features.
         | 
         | Impressive progress and a real chance to shake up the data tool
         | market, but still a way to go: There is is still much to do
         | especially on large table formats (iceberg/delta) and memory
         | management when running on bigger boxes on cloud. Eg the
         | elusive "Failed to allocate ..." bug[1] is an inhibitor to the
         | claim that big data is dead[2]. As it is, we tried and
         | abandoned DuckDB as a cheaper replacement for some databricks
         | batch jobs.
         | 
         | [0] https://github.com/ibis-project/ibis [1]
         | https://github.com/duckdb/duckdb/issues/12667,
         | https://github.com/duckdb/duckdb/issues/9880,
         | https://github.com/duckdb/duckdb/issues/12528 [2]
         | https://motherduck.com/blog/big-data-is-dead/
        
         | cmdlineluser wrote:
         | The last I read, the Spark API was to become the focus point.
         | 
         | https://duckdb.org/docs/api/python/spark_api
         | 
         | Not sure what the current status is.
         | 
         | ref:
         | https://github.com/duckdb/duckdb/issues/2000#issuecomment-18...
        
       | ZeroCool2u wrote:
       | Damn, GeoParquet and R-Tree for spatial indexes is huge!!! ESRI
       | better watch their back!
        
         | DonnyV wrote:
         | Unfortunately ESRI will probably just build an application
         | around it, promote the app like they created something new and
         | sell it to their customers. Same thing they did with GDAL.
        
       | wenc wrote:
       | Lots of automatic performance optimizations on what is already a
       | very fast engine. (I've stopped using Pandas)
       | 
       | I know most software folks feel some type of way about SQL (most
       | don't grok it beyond a simple SELECT) but this is one of the
       | advantages of declarative languages like SQL and a plan-execute
       | programming paradigm where a plan is created before instructions
       | are run, making it amenable to plan optimization.
       | 
       | Maybe the syntax of the language could be improved (e.g. Linq)
       | but conceptually SQL is historically when we've blundered into
       | the right. Data operations are often done in sets rather than
       | loops and it's a worthwhile investment for software engineers to
       | learn to think in this way if they want to work with data
       | correctly at scale.
       | 
       | Stonebraker was right in that people who avoid SQL are doomed to
       | reinvent it poorly.
        
         | theLiminator wrote:
         | Imo dataframe apis can be superior, take a look at polars. I
         | personally much prefer it to SQL.
         | 
         | For duckdb, ibis looks like a pretty nice way of using the
         | duckdb query engine.
        
           | isoprophlex wrote:
           | Dataframe APIs, sure.
           | 
           | Pandas? Not even once.
        
             | neeleshs wrote:
             | Can you please elaborate what's wrong with Pandas? Im
             | looking to use either Polars or Pandas in a project and
             | looking for insights.
        
               | fsndz wrote:
               | I don't find pandas intuitive (API simplicity), then you
               | have the hard to debug issues and perf
        
               | isoprophlex wrote:
               | Yes, agreed. The API is a big inconsistent kludge, has
               | many warts, and generally requires too much typing and
               | memorization. The performance is subpar. There are some
               | very annoying design choices wrt. implicit type coercion
               | that don't jive with my personal preferences, which
               | caused me recurring grief.
               | 
               | And to engage in some light gatekeeping, there sure is a
               | lot of _terrible_ pandas code out there written by people
               | that have no business calling themselves programmers. I
               | fully realize this can happen anywhere, but I 'm never
               | excited anymore to read a line of pandas.
        
               | emmanueloga_ wrote:
               | Related: Ibis (a portable Python dataframe library)
               | dropping the pandas backend in favor of DuckDB for better
               | performance and compatibility. [1]
               | 
               | --
               | 
               | 1: https://news.ycombinator.com/item?id=41389806
        
             | theLiminator wrote:
             | Yeah, I agree that pandas has a horrible api. Polars is by
             | and far the best one I've tried.
             | 
             | It maps to SQL semantics fairly cleanly, but is more
             | expressive and composable.
        
       | ashkankiani wrote:
       | Love the expanded C API support! Also those performance
       | improvements are massive! Pushing through filters and the
       | streaming optimization for fetchone() is great! This makes it
       | more viable to use duckdb in smaller queries from python.
       | 
       | I'm pretty excited for variables too! I really wanted them for
       | when I'm using the CLI. Same with query/query_table! I appreciate
       | the push for features that make people's lives easier while also
       | still improving performance.
       | 
       | Everyone who I've introduced duckdb to (at work or outside of
       | work) eventually is blown away (some still have lingering SQL
       | stigma)
        
       | beingflo wrote:
       | I've been eyeing DuckDB for a metric collection hobby project.
       | Quick benchmark showed promising query performance over SQLite
       | (unsurprising considering DuckDB is column oriented), but quite a
       | bit slower for inserts. Does anyone have experience using it as
       | an "online" backend DB as opposed to a data analytics engine for
       | interactive use? From what I gather they are trying to position
       | themselves more in the latter use case.
        
         | 89vision wrote:
         | Depends on the scale of users you expect for your project.
         | Generally I like to keep oltp and olap tools in their lanes,
         | but if < 100 people are going to be using it probably doesn't
         | matter. I doubt duckdb has any sort of acid guarantees, so
         | thats something to keep in mind.
        
           | sudarshnachakra wrote:
           | DuckDB does have ACID guarantees and transactions but I'd not
           | be surprised if they are rarely used (if at all).
           | 
           | Ref: https://duckdb.org/docs/sql/statements/transactions
           | 
           | In the concurrency documentation they explicitly specify that
           | it's not designed for lots of small transactions
           | 
           | Concurrency: https://duckdb.org/docs/connect/concurrency
        
         | pantsforbirds wrote:
         | You can always use sqlite as your primary data store, and then
         | directly query the sqlite database from duckdb whenever you
         | need analytics.
        
         | voidsnax wrote:
         | Doing row-by-row inserts into DuckDB is really slow.
         | Accumulating rows in an in-memory data structure and
         | periodically batching them into something like an in-memory
         | Arrow table, and then reading the Arrow table into DuckDB, is
         | fast and has been tenable for my own use cases.
        
       | fforflo wrote:
       | The C extensions API is a big big very big thing. As someone who
       | routinely write small PG extensions, I'd love to be able to kinda
       | use the same code for multiple backend DBs. And I guess lots of
       | inspiration has come from all the efforts that embed DuckDb in
       | Postgres as a miniOLAP
        
       | BurnGpuBurn wrote:
       | Today I learned:
       | 
       | SELECT 1 / 0 AS var_name;
       | 
       | yields a double with value infinite. Which is SQL spec. Must be
       | fun times to actually use that :-)
        
       | tlavoie wrote:
       | This episode of Kris Jenkins' Developer Voices podcast talks with
       | a couple authors of a new book on DuckDB, and does a great job of
       | explaining the sorts of things that make it so unusual:
       | https://www.youtube.com/watch?v=_nA3uDx1rlg
        
       | adwf wrote:
       | Big fan of DuckDB!
       | 
       | Has saved us a number of times when having to deploy at a remote
       | client with limited on-prem customisation for security reasons
       | (ie. no to installing a big Postgres or other RDBMS solution).
       | 
       | Powerful tooling, all local to the environment and the data being
       | worked on, SQL - so it's pretty close to a drop-in replacement
       | compared to our old solution. Really great stuff and I was very
       | happy to see the project gain the confidence to hit 1.0 a while
       | back and now 1.1.
       | 
       | Congrats to everyone!
        
       ___________________________________________________________________
       (page generated 2024-09-09 23:01 UTC)