https://datastation.multiprocess.io/blog/2022-02-08-the-world-of-postgresql-wire-compatibility.html February 7, 2022: DataStation 0.7.0 is released! Read more DataStation The Open-Source Data IDE Install the App Subscribe Documentation Blog Community Try it out online! Blog The world of PostgreSQL wire compatibility Published on February 8, 2022 by Phil Eaton postgresdatabasespython A wire protocol is the format for interactions between a database server and its clients. It encompasses authentication, sending queries, receiving responses, and so on. It is a description of the exact bytes sent and received by servers and clients. It does NOT encompass the actual query language itself, let alone database semantics. MySQL Query Protocol The MySQL query protocol Every server-client database has a wire protocol. (SQLite is an embedded database, not a server-client database, so it does not have a wire protocol.) Wire protocols are important to both database developers and database driver developers. Proprietary databases like Oracle and IBM Db2 find value in developing their own drivers. They don't tend to publish their wire protocol. Among other reasons, it's just not necessary since they develop their own drivers. Proprietary vendors tend to prefer a monolithic driver like ODBC/JDBC rather than native drivers since it's easier to develop one driver for all languages than one for each language. If someone wants to develop a native driver for a database without a published wire protocol they'll need to reverse-engineer the wire protocol. An example of this is the go-ora Oracle client for Go. Worth noting, among proprietary vendors, SQL Server actually does the opposite. Their wire protocol has been publicly documented for over a decade. And more native drivers exist for SQL Server than for other proprietary vendors with their own wire protocol. In contrast, the wire protocols for open-source databases tend to be documented since it's just another part of an open-source system. Due to their popularity and age, PostgreSQL and MySQL wire protocols are particularly well documented. Native drivers exist for both databases in every major language. For this reason, a large number of newer database vendors just reuse the PostgreSQL or MySQL wire protocol. This means that every existing driver (and SQL GUI like DataStation) will work with them without any modification to driver code. This doesn't mean that any PostgreSQL or MySQL query will work with them since, as mentioned previously, query language and database semantics are independent of the wire protocol. In Practice Let's write a simple program that can handle query text and read data from disk. If it implements the PostgreSQL wire protocol protocol, it would work with any existing PostgreSQL client. That would be neat! Someone already wrote a small library implementing the PostgreSQL server protocol in Python2. I ported it to Python3 for the sake of this tutorial. So we can just grab that Python3 fork's code and edit the query method with some code that handles the query text and reads data from disk, and we'd be done! To reiterate, the wire protocol doesn't specify anything about the query language. So let's support a single made-up query: SELECT FROM x where x can be a valid table name: $ diff original.py pgserver.py 0a1 > import json 189,190c190,202 < fields = [IntField('a'), IntField('b')] < rows = [[1, 2], [3, 4], [5, 6]] --- > sql = sql.decode().upper() > prefix = 'SELECT FROM ' > if not sql.startswith(prefix): > raise Exception('Bad query.') > > table = sql.strip()[len(prefix):-2].strip().lower() > try: > with open('data/'+table+'.json') as f: > d = json.load(f) > fields = [IntField(k) for k in d[0].keys()] > rows = [[row[field.name] for field in fields] for row in d] > except Exception as e: > raise Exception('Invalid table.') from e Now let's create a data directory where all the table JSON files will be stored. And let's create a few tables: $ cat data/users.json [ {"id": 0, "age": 100}, {"id": 1, "age": 45}, {"id": 2, "age": 52} ] $ cat data/users_organization.json [ {"user_id": 0, "organization_id": 0}, {"user_id": 1, "organization_id": 1}, {"user_id": 2, "organization_id": 0} ] Run it: $ python pgserver.py server running, try: $ psql -h localhost -p 55432 And in a separate terminal, open psql as suggested (enter any text when prompted for a password, this code doesn't do anything with it), and run some queries: $ psql -h localhost -p 55432 Password for user phil: psql (13.4, server 0.0.0) Type "help" for help. phil=> select from users; id | age ----+----- 0 | 100 1 | 45 2 | 52 (3 rows) phil=> select from users_organization; user_id | organization_id ---------+----------------- 0 | 0 1 | 1 2 | 0 (3 rows) That is really neat! Incidentally a similar technique can be used to help in testing a la pgmock. Vendors But let's get back to the real world, dig into some of the databases that implement the PostgreSQL wire protocol, and see where else they diverge. And although this table mentions compatibility, don't take that to be a judgement call. Just because a database implements the PostgreSQL wire protocol does not mean it intends to be a drop-in replacement for PostgreSQL. Name Category Source Implementation Parser Language ACID? Scaling language compatibility strategy PostgreSQL (the Relational Open C YACC N/A Yes Limited control group) Enterprise QuestDB Time Open Java Custom Generally No only, series hand-written compatible* strategy unclear Unlikely to fall far YACC grammar behind CockroachDB Relational Available Go adapted from PostgreSQL Yes Distributed PostgreSQL due to using Raft sharing a grammar specification Shared CrateDB Document Open Java Custom ANTLR Generally No nothing, grammar compatible* eventually consistent Unlikely to fall far PostgreSQL's behind YugabyteDB Relational Open C, C++ parser as a PostgreSQL Yes Sharding on library due to using primary key the PostgreSQL parser Timescale is a PostgreSQL Cannot fall Sharding on Timescale Relational Open C extension, behind Yes user-chosen so it uses PostgreSQL column PostgreSQL's parser immudb Document Open Go Custom YACC Somewhat Yes No grammar compatible* Experimental: Materialize Streaming Open Rust Custom Generally No Enterprise hand-written compatible* only, data-parallel Custom Not Distributed RavenDB Document Open C# hand-written compatible Yes using Raft parser Cloud Relational Proprietary ** ** Generally Yes ** Spanner compatible* Aurora Relational Proprietary ** ** Generally Yes ** compatible* Generally Yellowbrick Analytics Proprietary ** ** compatible* ** ** ** * Not reusing PostgreSQL's parser does make it somewhat more likely for there to be differences in complex SQL queries compared to PostgreSQL ** Unknown because I didn't take the time to look into it. Feel free to message me and I'll fill it in. And a special shoutout to ClickHouse that according to its config example can pretend to be PostgreSQL and, separately, MySQL. But since this isn't documented nor can I find the PostgreSQL parser code (but I can find the MySQL parser code) I won't include it in this list. I'm sure I missed some. Message me with other PostgreSQL wire-compatible databases I should include here! Syntax and engine support A parser is only the start of compatibility though. Just because a parser supports some syntax doesn't mean that the engine supports it too. But you can be more sure that if a parser doesn't support some syntax, the engine can't support it either (unless the project keeps dead code around, which isn't common). And an implementation that falls behind on parser support is likely to fall behind in other areas of compatibility. This is not to say that one vendor is better or worse. Excellent engineering can almost always overcome these types of obstacles. But there are always implications to technical design decisions. So just looking at parser support is not enough to talk about PostgreSQL query or semantic compatibility. But it is a starting point. Wire protocol edge cases, another time Lastly, I said this post was about wire protocols but then I ended up focusing on the query language (which as I mentioned is separate from the wire protocol). The wire protocol does more than just wrap queries though. It includes more complex interactions like COPY-ing data or asynchronous requests or cancelling running requests. In this post I did not evaluate the level of completeness of each database's implementation of the wire protocol. I only looked into what they'd do with basic queries. More could be done on evaluating the edge cases of the wire protocol in each implementation, in another post. Let me know about your ideas, questions, corrections! Share There's a wide world of databases implementing the Postgres wire protocol. Databases like @Yugabyte @TimescaleDB @cratedb @QuestDb @CockroachDB . But what does that mean? And where does the wire protcol end? All this and more explored in a new blog posthttps://t.co/ N4VnF2paCl pic.twitter.com/ypvsSXVeP5 -- Multiprocess Labs (@multiprocessio) February 8, 2022 Questions? Feedback? Feel free to reach the author at phil@multiprocess.io. Subscribe for more posts like this! DataStation Documentation Blog Community Twitter Github LinkedIn Contact (c) 2021-2022 Multiprocess Labs LLC