[HN Gopher] Database drivers: Naughty or nice?
___________________________________________________________________
Database drivers: Naughty or nice?
Author : ctc24
Score : 79 points
Date : 2022-12-12 15:51 UTC (7 hours ago)
(HTM) web link (www.prequel.co)
(TXT) w3m dump (www.prequel.co)
| wswope wrote:
| Speaking as a professional data yeeter who has been bit by weird
| driver issues in the past, I'll go against the grain to say that
| I found this really helpful as a checklist to pattern match
| against when troubleshooting, even without calling out specific
| drivers. Thanks for sharing!
| akdor1154 wrote:
| I will presume that at least half of these are issues with the
| Snowflake connectors until told otherwise.. those guys have a
| great core product, but the connectors are pretty dodgy in many
| cases I've seen.
|
| Telling as well that DuckDB fixed your issue in a couple of
| days.. had snowflake support even demonstrated technical
| understanding of your request in that time? I guess it doesn't
| really matter if it takes them six months or more to fix the
| connector bugs anyway.
| omgbear wrote:
| I've had such problems with the official AWS redshift JDBC driver
| -- Mostly around memory usage compared to the PG8.4 driver that
| also works. But also,
|
| - Much slower to cancel queries
|
| - Still reading and buffering most of the result after canceling.
|
| - Weird escaping issues
|
| Glad to be done with that.
| no_wizard wrote:
| I know this isn't a specifically Amazon issue or topic, but I
| am so disappointed by their SDKs. From Amplify to AppSync
| they're just layers of half baked ideas and implementations.
| I've hit _a lot_ of bugs and limitations on their platform.
|
| And they they still have no concept for native dev tooling for
| most of their services. Its always "just setup separate
| environments for testing, CI, development, staging, release"
| and pay them more money.
|
| I hate AWS, so much.
|
| I am not shocked their DB drivers have issues.
| gumboza wrote:
| Completely agree with this. The only products I'd even
| consider by choice from Amazon are the core IaaS products.
| Everything they built in house on top of their own platform
| is a buggy shit show.
|
| One recent problem I have been dealing with, which I can't
| detail because they'll know who I am and I am under NDA,
| involved a major release last year. I found two critical bugs
| in the stack which they had extended from an open standard.
| As we're a massive company with a big spend I managed to get
| the team leads and enterprise support on a call and I found
| the whole fucking thing was clearly looked after by two guys
| who didn't know what they hell they were doing, had never
| spent any time on the stack that they were extending and had
| never even considered a real world use case.
|
| I have considered looking into Azure but I'm not sure that'll
| be any better.
| no_wizard wrote:
| From my experience, having extensively used each cloud at 3
| different jobs, this is _my take_. As always, YMMV, that
| said:
|
| - AWS is the worst of all the clouds, not just (typically)
| in pricing, but definitely it has the worst DX, confusion
| of options, and documentation out of the big 3. They're
| flexible I guess, and have tons of services but no easy and
| tight way to integrate them, and they're very configuration
| heavy. Just didn't care for how they presented their happy
| paths (WTF is up with VTL?!). I think Lambda's are
| overrated and idk why they have cold starts in 2022 when
| all their competitors don't anymore.
|
| - Azure has some good stuff, including local emulators you
| can download for some services, and they have mock packages
| for programmatically emulation some of their services.
| Their biggest downfall for me was the confusion amount of
| options they have and sometimes their pricing is opaque
| what the final cost on a service will be. Both combined
| made it harder to use than I would have liked but it was a
| step up (to me) from AWS. A note of caution though, is that
| Azure doesn't have the per se greatest support for things
| not in their "stack". .NET of course works well, Node,
| Python, and more recently Go have seen good support
| (particularly Node) but it has less flexible ways to do
| more customized deployments in my experience. Doesn't mean
| you can't, but you pay alot more for the privilege. They
| also are more aggressive in the upsell department as well.
| Azure is particularly great if you're migrating older
| Windows Server apps and/or heavily using things like
| ActiveDirectory. If you're willing to pay they do have good
| support. I recall Azure having a pretty decent cloud SDK
| and they supported all the cross-cloud ones decently
| (Terraform, CF etc) as well. One thing I like about Azure
| is they have the least confusing CLIs IME
|
| - GCP is has pretty good DX. I think they're more expensive
| in alot of ways, if you aren't careful, but their "main"
| services are pretty cost competitive. In particular, I feel
| like AppRunner is the underrated alternative to things like
| AWS Lambda or other FaaS offerings. GCP also has a good
| amount of emulators and Firebase is still pretty great. GCP
| is very happy path oriented (more so than the other cloud
| providers) in my experience, but generally they seem to
| have picked good "happy paths", again IME. BigQuery is more
| byzantine in cost than I'd like, same with Spanner. Their
| support is pretty bad IME unless you pay for big contracts,
| though. Even then it wasn't the best, but at least we could
| get a human on the phone. Also, I wouldn't trust their one
| click apps, there is hardly any control GCP exerts over
| them to ensure quality, so unless you know the vendor is
| specifically supporting them for your use case, avoid them.
| I did find their permissions model to be as confusing as
| AWS. Didn't like their cloud SDK as much but it was better
| than AWS.
|
| What would I do if I was able to start over again in 2022
| though? I'd deploy to something like Supabase and/or
| Cloudflare if I wanted pretty much 100% managed services.
| Otherwise if I just want compute and add in my own services
| I'd do a combo of Fly.io with Planetscale (or a similiar DB
| provider) and using Cloudflare services (such as R2 and
| workers) for files / caching or a combo of B2 + Upstash.
| You could proxy / load balance your HTTP / TCP traffic via
| Cloudflare as well (idk that fly.io autoproxies provisioned
| instances for load balancing. I'm pretty sure it doesn't)
| gumboza wrote:
| Don't start me on VTL. I spent two weeks trying to delete
| a VTL which was sitting unused at $5000 a month.
|
| On lambda, the issue I mentioned was actually because one
| of their services was built on top of that and maintained
| state between lambda runs whereas the open source product
| they ripped off did not. That meant their entire product
| was stateful when it should have been stateless. It
| compromised everything.
|
| I couldn't possibly use Google after a GApps fiasco I had
| a couple of years back. No way.
|
| Thanks for the rest of the info. I shall carefully
| decipher it.
|
| For the next platform migration we do it might be back to
| a dedicated DC which is where we came from. We have
| stable constrained, predictable load. The TCO was
| cheaper, the staff were fewer, the performance was better
| and the licensing of some commercial products we leverage
| was much easier to handle and cheaper.
| VWWHFSfQ wrote:
| without actually naming the drivers in question so that people
| can learn from it this is pretty pointless.
| Nzen wrote:
| Presumably, this is a warning for people building a fresh
| system. Even if the authors called out a, b, and c for version
| 1, 2, and 3 (respectively); there is no promise that x, y, and
| z don't exhibit these issues (or that other versions of abc do
| or don't exhibit the behavior, either).
|
| This is a reminder to consider our use cases and vet our
| dependencies for those use cases carefully.
| janci wrote:
| It is the same shitshow in JDBC land.
| [deleted]
| counttheforks wrote:
| This article would be way more useful if it named the offending
| drivers and linked to relevant github issues. As is this is just
| a set of useless curiosities.
|
| I came in wanting to check if any of the drivers I use were
| affected by any of the bugs they found. No idea!
| ctc24 wrote:
| Understand the sentiment. On the flip side, and as noted in the
| post, we don't want it to serve as a "name and shame"
| situation. We have a lot of respect for the folks maintaining
| the drivers we use, they do a pretty hard job, and so we don't
| want to come out swinging and taking shots at them. So we're
| trying to walk that line.
|
| We also think the information provided in the post is valuable
| as is: it's easy enough to check whether a driver you use faces
| any of the issues we mention.
|
| With that said, here's a bit more color:
|
| - the native Snowflake driver for GoLang does not implement
| COPY functionality (or at least it did not when we last tried
| to use it).
|
| - the memory leaks are pretty prevalent across ODBC drivers.
| It's worth watching out for that one if you're using any ODBC
| driver.
|
| - the breaking change on connection string was Databricks'
| GoLang driver.
|
| - the DECIMAL one is pretty prevalent too. BigQuery only allows
| you to go up to DECIMAL(38,9) while most other drivers let you
| go to 18 on scale, and ClickHouse supports precision/scale of
| up to 76. Redshift complains loudly if you try to insert a
| DECIMAL(38,17) into a DECIMAL(38,18) column, for instance.
|
| Hope the added color is helpful!
|
| edit: formatting
| tacker2000 wrote:
| Was wondering the same.
|
| Are they scared of pointing fingers or what?
| bsder wrote:
| Agreed. Without actual references, this is marketing blather,
| not a technical article.
|
| The worst part is that it doesn't have to be name and shame.
| Take the "epoch" discussion, for example. The fact that
| "epochs" differ in implementation is something that isn't even
| a bug--it's just different. That alone is likely surprising to
| a lot of people and would probably be worth an article.
|
| Of course, the real issue is probably that "boring" databases
| just work and "exciting" databases are full of bugs. If you're
| a database SAAS startup, slagging the databases that everybody
| considers cool and hip isn't going to be good for your exit.
| aynyc wrote:
| I used to work with Sybase, and its JDBC driver was pure madness.
| There were times where the same query on same data would return
| different results. Never figured out why. We ended use its ODBC
| driver for reporting instead.
| tlocke wrote:
| Hello, maintainer of the pg8000 driver for PostgreSQL here. I'm
| keen to know of any problems you had if you used pg8000. Thanks!
| https://github.com/tlocke/pg8000
| bhargav wrote:
| This is one of the few Github repos I have seen that are very
| very well documented. Great job; starred.
| ctc24 wrote:
| Appreciate that and will do!
| hermitdev wrote:
| I remember dealing with DataDirect on Linux more than 10 years
| ago. I don't recall the specifics, but under certain error
| conditions, either the ODBC driver we were using or the driver
| manager would simply do `exit(0)` with no errors logged anywhere.
| That was a "fun" one to solve...
| eska wrote:
| I remember something about mysql not treating localhost as local,
| but 127.0.0.1 would. This led to very different performance.
| (Long ago, not sure about details)
|
| Also surprised by libraries not using more efficient protocols
| although they were defined.
|
| Compression also doesn't seem to be a thing.
| georgewfraser wrote:
| The world would be a better place if database drivers were
| completely abandoned as a way for clients to connect to
| databases. A standard API, implemented by multiple vendors, is a
| vastly preferable solution. Arrow Flight is an example of this.
|
| https://arrow.apache.org/blog/2019/10/13/introducing-arrow-f...
| lidavidm wrote:
| Even within the Arrow project, there's still room for drivers
| just because not every vendor is going to implement the same
| wire protocol (at least on a feasible timeline). Hence both
| "ADBC" [1] and Flight SQL [2] (note: NOT a SQL dialect, it is a
| wire protocol) coexist in complementary niches.
|
| [1]: https://arrow.apache.org/docs/format/ADBC.html [2]:
| https://arrow.apache.org/docs/format/FlightSql.html
| kardianos wrote:
| Nice.
|
| This [1] appears to be the SQL layer on top of Arrow Flight
| specifically about SQL. It seems a bit chatty, where two
| network requests are required for each query if I read it
| correctly.
|
| [1] https://arrow.apache.org/docs/format/FlightSql.html
| lidavidm wrote:
| Yup. The chattiness is to account for distributed databases,
| so you can spread the result set across multiple instances.
|
| That said there is a proposal for base Flight RPC to help
| allow embedding small results directly into the first
| response, that mostly needs someone to draft a prototype and
| push it through. (That doesn't help the case of a large-ish
| response from a single backend, though; that may also need
| some work, if we want to get rid of the second request.)
| jeff-davis wrote:
| I generally think that's a good idea, but be aware that the
| protocols are more interesting than you might first imagine,
| and that leads to a lot of the differences between drivers for
| different databases.
|
| For instance, when setting a user's password in Postgres, you
| can do the hashing on the client side, even for non-trivial
| schemes like SCRAM. This means that the password itself never
| needs to move over the network, and that's very desirable.
| Speaking of authentication methods, that also opens up a big
| topic.
|
| There are also important modes. For instance, the client
| encoding controls how strings are transcoded when they get to
| the server. That allows the client to not know/care what the
| encoding of the database is. You could demand that everything
| is UTF-8, and that's one philosophy, but not everyone agrees.
|
| In practice, I think it'll be a while before there is consensus
| on all these points. And even when there is, the standard will
| need to evolve to handle new auth methods, etc.
|
| If we invent a standard protocol, it will probably be more of a
| fallback for simple cases when the language framework doesn't
| offer a driver yet. Still helpful, though.
| EthicalSimilar wrote:
| > For instance, when setting a user's password in Postgres,
| you can do the hashing on the client side, even for non-
| trivial schemes like SCRAM. This means that the password
| itself never needs to move over the network, and that's very
| desirable.
|
| Off-topic, but I'm surprised more online apps don't employ
| something similar.
|
| It would all but eliminate accidental leaks that occur from
| logs being incorrectly stored / misconfigured, not to mention
| worries about MITM attacks (useful for corporate networks, or
| public networks).
|
| Given how many people share usernames, emails, and passwords
| across sites I find it quite important to mitigate those
| issues as much as possible.
| kardianos wrote:
| SQL Database drivers have so much in common, but each represent
| schema and data table(s) differently on the wire. Some handle
| cancellation within the protocol, others require a work around
| (like another DB connection that issues a KILL command). TDS
| (Microsoft Server) is actually one of the better protocols and
| better documented protocols (due in large part to historical
| court orders) out there.
|
| My takeaway is that compact schema streaming data is not a well
| developed field. I think we can do better. Not only that, but
| developing both such a schema, protocol, and associated tooling
| is key to significantly better data-centric applications from end
| to end, not just the database.
| ctc24 wrote:
| Agreed! It's kind of fascinating that a better standard hasn't
| emerged yet, but not surprising given that each one is
| developed in isolation by a different organization.
|
| Definitely a lot of room for improvement. Curious if anyone has
| thoughts about the best route to getting such a standard /
| protocol in place: it seems like a lot of stars would have to
| align, but would be invaluable nonetheless.
| lidavidm wrote:
| As mentioned by others elsewhere in the thread, Apache Arrow
| Flight SQL aims to be a standard wire protocol that multiple
| databases can implement, using a columnar data
| representation:
| https://arrow.apache.org/docs/format/FlightSql.html
|
| However it's early days yet.
| layer8 wrote:
| Such a nonproprietary protocol would reduce vendor lock-in,
| hence vendors have no incentive to develop one.
___________________________________________________________________
(page generated 2022-12-12 23:00 UTC)