[HN Gopher] DuckDB 1.1.0 Released
___________________________________________________________________
DuckDB 1.1.0 Released
Author : craigkerstiens
Score : 161 points
Date : 2024-09-09 15:33 UTC (7 hours ago)
(HTM) web link (duckdb.org)
(TXT) w3m dump (duckdb.org)
| simonw wrote:
| I feel like R-Tree spatial indexes are potentially the most
| exciting new feature in this release, but they're buried right
| down at the bottom of the announcement:
| https://duckdb.org/2024/09/09/announcing-duckdb-110.html#r-t...
| maxxen wrote:
| Thanks! I've been wanting to add this since I first started out
| working on DuckDB almost two years ago but I finally managed to
| accumulate the time (and the skills required!) to finish it up
| over the summer. It still has a long way to go, support for
| indexes in extensions are pretty... raw, and we only push down
| constant filters into index scans (so no spatial index-join
| acceleration yet). But I think having a proper spatial index is
| one of those things that are kind of required to really elevate
| the spatial extension from being just a toy and I'm super
| stoked to work more on it during the next release cycle and all
| the new possibilities that it opens up.
| maxxen wrote:
| I still need to update the docs and there's probably going to
| be a blog post on it in the future, but for now there are
| some more details in the PR
| https://github.com/duckdb/duckdb_spatial/pull/383
| RandomCitizen12 wrote:
| The namesake: https://en.wikipedia.org/wiki/Eaton%27s_pintail
| log4shell wrote:
| Congratulations to duckdb team! Can't wait to try some of the
| newly released features and performance improvements.
|
| I am quite curious about the plans for python dataframe like API
| for duckdb, and python ecosystem in general.
| exergy wrote:
| there is Ibis[0] as a fairly mature package. They recently
| adopted duckdb as the default execution engine and it can give
| you a nice python dataframe API ontop of duckdb, with hot-
| swappability towards heavier engines.
|
| With tools like this providing a comprehensive python API and
| the ability to always fall back to raw SQL, i am not sure
| DuckDB devs should focus on the python API at all beyond basic
| (to_table, from_table) features.
|
| Impressive progress and a real chance to shake up the data tool
| market, but still a way to go: There is is still much to do
| especially on large table formats (iceberg/delta) and memory
| management when running on bigger boxes on cloud. Eg the
| elusive "Failed to allocate ..." bug[1] is an inhibitor to the
| claim that big data is dead[2]. As it is, we tried and
| abandoned DuckDB as a cheaper replacement for some databricks
| batch jobs.
|
| [0] https://github.com/ibis-project/ibis [1]
| https://github.com/duckdb/duckdb/issues/12667,
| https://github.com/duckdb/duckdb/issues/9880,
| https://github.com/duckdb/duckdb/issues/12528 [2]
| https://motherduck.com/blog/big-data-is-dead/
| cmdlineluser wrote:
| The last I read, the Spark API was to become the focus point.
|
| https://duckdb.org/docs/api/python/spark_api
|
| Not sure what the current status is.
|
| ref:
| https://github.com/duckdb/duckdb/issues/2000#issuecomment-18...
| ZeroCool2u wrote:
| Damn, GeoParquet and R-Tree for spatial indexes is huge!!! ESRI
| better watch their back!
| DonnyV wrote:
| Unfortunately ESRI will probably just build an application
| around it, promote the app like they created something new and
| sell it to their customers. Same thing they did with GDAL.
| wenc wrote:
| Lots of automatic performance optimizations on what is already a
| very fast engine. (I've stopped using Pandas)
|
| I know most software folks feel some type of way about SQL (most
| don't grok it beyond a simple SELECT) but this is one of the
| advantages of declarative languages like SQL and a plan-execute
| programming paradigm where a plan is created before instructions
| are run, making it amenable to plan optimization.
|
| Maybe the syntax of the language could be improved (e.g. Linq)
| but conceptually SQL is historically when we've blundered into
| the right. Data operations are often done in sets rather than
| loops and it's a worthwhile investment for software engineers to
| learn to think in this way if they want to work with data
| correctly at scale.
|
| Stonebraker was right in that people who avoid SQL are doomed to
| reinvent it poorly.
| theLiminator wrote:
| Imo dataframe apis can be superior, take a look at polars. I
| personally much prefer it to SQL.
|
| For duckdb, ibis looks like a pretty nice way of using the
| duckdb query engine.
| isoprophlex wrote:
| Dataframe APIs, sure.
|
| Pandas? Not even once.
| neeleshs wrote:
| Can you please elaborate what's wrong with Pandas? Im
| looking to use either Polars or Pandas in a project and
| looking for insights.
| fsndz wrote:
| I don't find pandas intuitive (API simplicity), then you
| have the hard to debug issues and perf
| isoprophlex wrote:
| Yes, agreed. The API is a big inconsistent kludge, has
| many warts, and generally requires too much typing and
| memorization. The performance is subpar. There are some
| very annoying design choices wrt. implicit type coercion
| that don't jive with my personal preferences, which
| caused me recurring grief.
|
| And to engage in some light gatekeeping, there sure is a
| lot of _terrible_ pandas code out there written by people
| that have no business calling themselves programmers. I
| fully realize this can happen anywhere, but I 'm never
| excited anymore to read a line of pandas.
| emmanueloga_ wrote:
| Related: Ibis (a portable Python dataframe library)
| dropping the pandas backend in favor of DuckDB for better
| performance and compatibility. [1]
|
| --
|
| 1: https://news.ycombinator.com/item?id=41389806
| theLiminator wrote:
| Yeah, I agree that pandas has a horrible api. Polars is by
| and far the best one I've tried.
|
| It maps to SQL semantics fairly cleanly, but is more
| expressive and composable.
| ashkankiani wrote:
| Love the expanded C API support! Also those performance
| improvements are massive! Pushing through filters and the
| streaming optimization for fetchone() is great! This makes it
| more viable to use duckdb in smaller queries from python.
|
| I'm pretty excited for variables too! I really wanted them for
| when I'm using the CLI. Same with query/query_table! I appreciate
| the push for features that make people's lives easier while also
| still improving performance.
|
| Everyone who I've introduced duckdb to (at work or outside of
| work) eventually is blown away (some still have lingering SQL
| stigma)
| beingflo wrote:
| I've been eyeing DuckDB for a metric collection hobby project.
| Quick benchmark showed promising query performance over SQLite
| (unsurprising considering DuckDB is column oriented), but quite a
| bit slower for inserts. Does anyone have experience using it as
| an "online" backend DB as opposed to a data analytics engine for
| interactive use? From what I gather they are trying to position
| themselves more in the latter use case.
| 89vision wrote:
| Depends on the scale of users you expect for your project.
| Generally I like to keep oltp and olap tools in their lanes,
| but if < 100 people are going to be using it probably doesn't
| matter. I doubt duckdb has any sort of acid guarantees, so
| thats something to keep in mind.
| sudarshnachakra wrote:
| DuckDB does have ACID guarantees and transactions but I'd not
| be surprised if they are rarely used (if at all).
|
| Ref: https://duckdb.org/docs/sql/statements/transactions
|
| In the concurrency documentation they explicitly specify that
| it's not designed for lots of small transactions
|
| Concurrency: https://duckdb.org/docs/connect/concurrency
| pantsforbirds wrote:
| You can always use sqlite as your primary data store, and then
| directly query the sqlite database from duckdb whenever you
| need analytics.
| voidsnax wrote:
| Doing row-by-row inserts into DuckDB is really slow.
| Accumulating rows in an in-memory data structure and
| periodically batching them into something like an in-memory
| Arrow table, and then reading the Arrow table into DuckDB, is
| fast and has been tenable for my own use cases.
| fforflo wrote:
| The C extensions API is a big big very big thing. As someone who
| routinely write small PG extensions, I'd love to be able to kinda
| use the same code for multiple backend DBs. And I guess lots of
| inspiration has come from all the efforts that embed DuckDb in
| Postgres as a miniOLAP
| BurnGpuBurn wrote:
| Today I learned:
|
| SELECT 1 / 0 AS var_name;
|
| yields a double with value infinite. Which is SQL spec. Must be
| fun times to actually use that :-)
| tlavoie wrote:
| This episode of Kris Jenkins' Developer Voices podcast talks with
| a couple authors of a new book on DuckDB, and does a great job of
| explaining the sorts of things that make it so unusual:
| https://www.youtube.com/watch?v=_nA3uDx1rlg
| adwf wrote:
| Big fan of DuckDB!
|
| Has saved us a number of times when having to deploy at a remote
| client with limited on-prem customisation for security reasons
| (ie. no to installing a big Postgres or other RDBMS solution).
|
| Powerful tooling, all local to the environment and the data being
| worked on, SQL - so it's pretty close to a drop-in replacement
| compared to our old solution. Really great stuff and I was very
| happy to see the project gain the confidence to hit 1.0 a while
| back and now 1.1.
|
| Congrats to everyone!
___________________________________________________________________
(page generated 2024-09-09 23:01 UTC)