[HN Gopher] Generating JSON Directly from Postgres
___________________________________________________________________
Generating JSON Directly from Postgres
Author : izend
Score : 124 points
Date : 2021-07-15 18:15 UTC (4 hours ago)
(HTM) web link (blog.crunchydata.com)
(TXT) w3m dump (blog.crunchydata.com)
| DoomHotel wrote:
| Postgres (>11) also implements the SQL/JSON path language that
| lets you extract table data from JSON.
| tehlike wrote:
| Once you build V8 into postgres, things get real interesting
| (thin javascript projections, and a bunch of other stuff)..
|
| https://github.com/plv8/plv8
| gigatexal wrote:
| I wish RedShift got these functions. :(
| pjungwir wrote:
| If you are using Rails and ActiveModelSerializers, I wrote a gem
| to take your serializer definition and build the whole JSON
| result in Postgres: [0] It hasn't gotten much use, but I know at
| least one company that has used it to get 100x performance boosts
| on #index endpoints.
|
| It generates jsonapi right now, since that is standard for Ember
| projects, but it'd be pretty easy to add an adapter for some
| other format.
|
| [0] https://github.com/pjungwir/active_model_serializers_pg
| davidwparker wrote:
| How does it compare in terms of performance to
| https://github.com/jsonapi-serializer/jsonapi-serializer (a
| fork of Fast JSON API out of Netflix, which wasn't being
| maintained)?
| jrochkind1 wrote:
| wow, that's really neat, this could solve a problem I have,
| bookmarking!
| blacktriangle wrote:
| Please just no, do not do this. Even the dumbest most simple
| system there is will evolve beyond simple CRUD access to tables.
| The second you put your schema on the wire, that schema can now
| never change. That's just such an insane commitment to make.
| chousuke wrote:
| the database schema is an API like everything else, and you can
| solve API evolution problems the same way as you would with a
| web API, using eg. views, stored procedures and versioning.
|
| There's no reason not to expose parts of your database schema
| directly to users as long as you treat it like you would any
| other API you provide.
| LukeEF wrote:
| this is a really interesting way to think about it and
| something that I've given a fair amount of thought to as we
| embark on a data mesh journey - how does a domain team serve
| up a schema like an API, how does it evolve, and how do we
| think about versions.
| onefuncman wrote:
| that's what a Schema Registry is for... protobuf is a
| little more polished around managing forwards and backwards
| compatibility than avro but they both work.
| chousuke wrote:
| I find that there are very few things related to
| programming that don't constitute an interface in some way,
| so it feels natural to me to think about everything in
| terms of the interface it presents to users, both
| explicitly in what you can express via whatever language
| you're using, and implicitly via conditions that can't be
| enforced but are required for correctness.
|
| And when you have an interface, you really ought to put
| some thought into its design regardless of how "private" it
| is.
|
| An API is an interface used by a program in some way, and
| any user-exposed database objects definitely qualify under
| that definition.
| iammisc wrote:
| > The second you put your schema on the wire, that schema can
| now never change. That's just such an insane commitment to
| make.
|
| If you change your schema, you can just write a sql expression
| to convert between schemas 'at runtime'. Like if you have a
| table with first name and last name and then you decide you
| really just need the name on the frontend, just change your
| SELECT statement to concat the two columns, or vice versa.
| tengbretson wrote:
| The application layer is typically just a view that maps the
| data layer to the expectations of the view layer. If you are
| set on eliminating the application layer surely you could just
| implement that view in the data layer using... a view.
| joevandyk wrote:
| you can use views or functions to easily make changes to the
| schema
| jkingsbery wrote:
| "Your scientists were so preoccupied with whether or not they
| could, they didn't stop to think if they should."
|
| While interesting, the advice offered in this post is generally
| bad, or at least in complete. OK, so, you "cut out the middle
| tier..." what now? Are web clients connecting to PostGRES
| directly? Will PostGRES handle connection authentication and
| request authorization? Logging that a particular user looked at
| particular data? Can the client cache this data? If so, for how
| long? Even taking the author's premise, this is not a good
| pattern - a simple JSON get request still comes with a bunch of
| other stuff that this doesn't bother to address.
|
| But the premise is wrong - few applications just spit out JSON
| like this, they also have to handle updates, and there's business
| logic involved in that. Data needs to be structured in a way
| that's reasonable for clients to consume, which isn't always a
| row of data (maybe it should be collapsing all values in a
| column, so that in the JSON the client just gets a JSON array of
| the values).
| simonw wrote:
| I worked on a Django project recently where we were generating
| GeoJSON from data in a PostgreSQL table.
|
| We investigated using tricks like this to generate the JSON
| directly from a SQL query - my notes here:
| https://til.simonwillison.net/postgresql/constructing-geojso...
|
| The performance was a big enough improvement that, had we been
| bottlenecked on this particular operation, we would have
| shipped this (I'm afraid I don't have notes on benchmarks).
|
| We only decided not to ship it because of concerns about the
| ongoing maintenance overhead, since we had other Python code
| generating GeoJSON that we would have still used - so we would
| have ended up maintaining multiple implementations.
| nicholasjarnold wrote:
| If one wishes to "cut out the middle tier" they might be better
| served by a investigating a project like PostgREST [0]
|
| [0] https://postgrest.org/en/stable/
| herdrick wrote:
| Or PostGraphile: https://www.graphile.org/postgraphile/
| joshgel wrote:
| I've been trying to decide whether to move a Django-DRF API
| to one of these.
| zbjornson wrote:
| Shameless plug: I made bson-to-json for more or less the same
| purpose with MongoDB: converting the raw BSON response from the
| DB directly to JSON strings. Saves lots of CPU cycles and GC
| pressure.
|
| https://github.com/zbjornson/bson-to-json
| rkwasny wrote:
| Don't do this.
|
| It will create various problems when next engineer adds some
| business logic, another one adds some stored procedures....
| Thaxll wrote:
| It's a bad idea and I'd like to see how do people actually do
| test using this model. Now you have to run PG locally everytime
| you need to unit test something? Also PG returns some JSON but if
| the object is not exactly what you want to send back you create
| another object and merge the two? Just don't do it, it's a
| terrible design. It "works" for simple API but for anything
| serious this is wrong.
| gigatexal wrote:
| Docker run postgres... boom testing solved ;)
| Thaxll wrote:
| For DB that you can't run locally how do you do? or the
| CI/CD, there are a millions of example where people don't run
| the DB locally.
| kristiandupont wrote:
| Why can you not run PG locally?
| kerreks wrote:
| It's pretty common to run db locally in docker for
| integration testing these days
| tofuahdude wrote:
| Did you read the article at all before criticizing it as
| "terrible design"? All its talking about are native postgres
| methods for pre-formatting data.
| limaoscarjuliet wrote:
| It is pretty easy to get any result set straight from PostgreSQL
| as json object:
|
| "WITH _ AS (" real query goes here ") SELECT
| COALESCE(array_to_json(array_agg(_.*)), '[]') AS xyz FROM _";
|
| From there it behaves as standard JSON, wich can be easily loaded
| into any language.
| recursivedoubts wrote:
| _> Too often, web tiers are full of boilerplate that does nothing
| except convert a result set into JSON._
|
| Yeah. This is the problem: we've abandoned the hypermedia
| architecture of the web for a dumb-data, RPC model. I suppose if
| you are going to go that direction and things are simple enough,
| you can jam it in the DB and get smoking perf.
|
| But as things get more complicated, where does the business logic
| live? Maybe in the database as stored procedures? It's less crazy
| than it sounds.
|
| An alternative is to cut out the _other_ middle man: JSON, ditch
| heavy front-end javascript and return HTML instead, going back to
| the hypermedia approach. The big problem with that is a step back
| in UX functionality, but there are tools[1][2][3] for addressing
| that.
|
| [1] - https://htmx.org (my library)
|
| [2] - https://unpoly.com
|
| [3] - https://hotwired.dev
| munk-a wrote:
| This is just my experience but I've found that business logic
| usually ends up living _before_ the query - most pages (whether
| read or update) follow a sort of pattern like...
|
| 1. Receive, route and validate (for form) the request
|
| 2. Validate the request for function (business rule validation
| - can such a user do such on a Tuesday and the like)
|
| 3. Compute business attributes around the form of the response
| or update
|
| 4. Execute the query against the database
|
| 5. Send the results to the user
|
| I strongly agree that step 5 there doesn't really need to
| involve very much stuff happening outside of the DB - in our
| particular platform we have a very smooth rowset -> JSON
| translator in the application framework that includes support
| for mapping functions over the rows as they come out of the DB
| - the result is that we pretty much stop executing application
| code as soon as we send a query to the DB. While we do still
| delay the actual JSON encoding until the application layer it's
| thin, dumb and done in a streamed manner so that we don't have
| to hold all that junk in memory - and it comes with mode
| switches to either redirect the big DB blob to an HTML
| template, echo it as a CSV or even run a PDF template to render
| it into what the user wants to see.
| princesse wrote:
| > But as things get more complicated, where does the business
| logic live? Maybe in the database as stored procedures? It's
| less crazy than it sounds.
|
| I've worked with a bunch of systems like this before. One was
| just a collection of PHP scripts that would trigger SQL
| queries. Another one was all lambdas and Cron jobs on top of
| mssql stored procedures.
|
| If you have a decent team and if you use version control as
| intensely as you would with code, I have no reason to believe
| this cannot work. To be fair, those condition are true
| regarding of tech and architecture.
|
| What you cannot do easily though is pivot, hire, and scale.
| This is what bit those teams I worked with and why those
| specific systems are no longer around.
| _jal wrote:
| > One was just a collection of PHP scripts that would trigger
| SQL queries.
|
| Yes, I've had to build a system like that (Java, but same
| diff.)
|
| This model is popular in particular with banks, because they
| can split sensitive responsibilities in a way that makes
| sense to them.
|
| It works fine if you have good communications between
| developers and DBAs. If you don't... well, I won't have to
| suggest finding another gig, you already want to.
| simonw wrote:
| I used to avoid things like stored procedures like the
| plague, because of my fear that crucial business logic would
| end up outside of version control.
|
| These days I'm comfortable enough driving EVERY database
| change through a version controlled migration system (usually
| Django migrations) that I'm not concerned about this any
| more. It's not hard to enforce version controlled migrations
| for this kind of thing, provided you have a good migration
| system in place.
| MuffinFlavored wrote:
| > Maybe in the database as stored procedures?
|
| I always wondered if I could make something like this work:
| https://github.com/plv8/plv8
|
| Maybe couple it with this: https://postgrest.org/
|
| Just not sure if it was worth the effort upfront to learn
| something other than simple Express (node.js)
| servers/middleware functions/router controllers with database
| client access. That paradigm just feels infinitely more "in
| control" and "extensible" to me.
| inopinatus wrote:
| Once the client is the MVC execution environment with the
| client-server interaction used mostly for data replication,
| plus some extra invokable server-side behaviours written in
| isomorphic code, we can congratulate ourselves on having
| more-or-less reinvented Lotus Notes.
| rhacker wrote:
| This all feels backwards for me. I mean I already did YEARS of
| JSF with facelets/richfaces/icefaces/primefaces/JSF2.0
| sureglymop wrote:
| I am doing this at work, I have an application with all the
| business logic in the database. It is a huge hassle though... I
| am versioning a bunch of scripts with git but it really isn't
| convenient.
|
| The thing is that a database already stores data but all the
| schematics and ddl stuff would be much more conveniently kept
| in descriptive plaintext.
| programmarchy wrote:
| Curious how far you've gone with this. You could have your
| CI/CD sync your stored procedures. But you would probably
| want to go further than that and manage them with a tool like
| db-migrate so you can keep your stored procedures tracking
| with your schema changes rather than just versioning them. At
| the end of the day, can you really separate your stored
| procedures from the data they operate on? I would say no you
| can't.
| formerly_proven wrote:
| One thing I particularly dislike about SQL is DDL and all the
| system-specific extensions (e.g. for access control). I don't
| really get why there is a separate _thing_ like "ALTER TABLE
| foo DROP COLUMN bar;" if you could have "DELETE FROM
| schema.foo WHERE name = 'bar';". 90 % of DDL is just
| duplicating basic DML functionality, Except For That Special
| Kind Of Object Which Totally Smells Like It's Relational But
| It's Not. Especially for access control - _why_ is that just
| not a table? I 'm sure it literally is one.
|
| This makes SQL needlessly complex and tools to work with
| schemas far more complicated than they actually have to be.
|
| In Lisp, most _things_ are a list /pair. In SQL, most things
| should have been a table.
| rhacker wrote:
| That would be pretty nice, insert into
| schema.users (type, name, columns) values ('btree_index',
| 'index_dob', 'dob DESC, name ASC')
| inopinatus wrote:
| In many RDBMS this is in fact the case. However, the system
| schema is necessarily close to the database's architecture
| and capabilities, making it implementation specific. DDL is
| also implementation specific but there's a much higher
| degree of abstracted regularity.
| mistersys wrote:
| Okay, yeah we can serve UI over the wire. It's possible to make
| it work. But the experience will never be the same as a a
| native application that's designed to handle offline
| functionality, interactivity and optimistic updates.
|
| The Hey email client is great example, hotwired.dev was built
| for Hey.
|
| Guess what? It kind of sucks. It's buggy and slow. Randomly it
| stops working. When the internet goes down, random things work,
| random things don't work. If it weren't for Hey's unique
| features like the screener, I would much rather use a native
| app.
|
| There's a ton we can do to make the the developer experience of
| rendering on the client side better, but there's only so much
| we can do to make the user experience of serving UI over the
| wire better. When the wire breaks or slows down, the UI
| renderer stops working.
|
| We built an internal tool for our team we call "restless", and
| it lets us write server side code in our project, and import it
| from the client side like a normal functional call, with full
| typescript inference on both ends. It's a dream. No thinking
| about JSON, just return your data from the function and use the
| fully typed result in your code.
|
| We combine that with some tools using React.Suspense for data
| loading and our code looks like classic synchronous PHP
| rendering on the server, but we're rendering on the client so
| we can build much more interactive UIs by default. We don't
| need to worry about designing APIs, GraphQL, etc.
|
| Of course, we still need to make sure that the data we return
| to the client is safe, so we can't use the active record
| approach of fetching all data for a row and sending that to the
| client. We built a SQL client that generates queries like the
| ones in the OP for us. As a result, our endpoints are
| shockingly snappy because we don't need to make round trip
| requests to the db server (solving the n+1 ORM problem)
|
| We write some code like: select(
| Project, "id", "name",
| "expectedCompletion", "client", "status"
| ) .with((project) => { return {
| client: subselect(project.client, "id", "name"),
| teamAssignments: select(ProjectAssignment)
| .with((assignment) => { return {
| user: subselect(assignment.user, "id", "firstName",
| "lastName"), }; })
| .where({ project: project.id,
| }), }; }) .where({
| id: anyOf(params.ids), })
|
| And our tool takes this code and generates a single query to
| fetch exactly the data we need in one shot. These queries are
| easy to write because it's all fully typed as well.
| JohnTHaller wrote:
| > Maybe in the database as stored procedures?
|
| I did this for the very first external web app built for
| Bankers Trust Company back in the day. SQL Server back end with
| (classic) ASP on the front end. Even if someone had gained
| access to the web server, they wouldn't have had access to
| anything they shouldn't have. I built web apps, an online
| store, and a learning/video website using the same tech around
| the same time and they worked without issue for well over a
| decade. So, there is something to be said for this approach,
| though I wouldn't build something like this today.
| zozbot234 wrote:
| You could return linked data via JSON-LD and get the best of
| both worlds - standardization plus smart UX.
| nsriv wrote:
| This has been the appeal behind Phoenix LiveView for me.
| exdsq wrote:
| I worked on a stored-procedure driven finance engine once.
| There was an internal language to build procedures quicker.
| Swear to god I had errors running them through the SQL IDE
| because they were too large. Once had to debug a 1.6m line SQL
| file. These things can really get out of hand if people are
| left unchecked!
| kerreks wrote:
| Does it scale well? I feel like doing all business logic in
| procedures would require company to use one huge database in
| order to intersect various domains
| jansommer wrote:
| The obligatory comment to these posts is always "check out
| PostgREST". We use it for all frontend api's. Combined with Node
| or any other middleware, we add additional code if everything
| can't be done in plpgsql, but requests are usually just passed
| straight through Node.
| ptrwis wrote:
| I did the approach with nesting jsonb_agg queries, it worked for
| me but was a bit hacky and sometimes complex. I hoped for Graph
| Query Language (the one supposed to be part of SQL standard) to
| make it easier to retrieve structred json from flat tables, but
| it doesn't look ready yet.
| xaduha wrote:
| The more things change, the more they stay the same.
|
| If we lived in XHTML 2.0 universe instead of HTML5 universe, then
| that would be the way you do things. Not with SQL and JSON mind
| you, but with XML everywhere.
|
| I had a glimpse into that world and it looked pretty good at the
| time. XML databases + XQuery can produce whatever you want as
| long as it's XML. And XML could be many things. Many horrible
| things too, made by people with arms growing out of places they
| shouldn't.
| tofuahdude wrote:
| A lot of commenters jumping on the "cut out the middle tier"
| title.
|
| The bulk of the content is simply about formatting JSON in the DB
| instead of manually mapping rows in the application layer.
|
| It doesn't say "eliminate your API layer" or "have no application
| logic between client and db" as most are jumping to.
|
| I find the actual methods described as helpful in that I can
| convert to the data structures I ultimately want in one pass
| instead of two.
|
| Doesn't mean I don't have validation or a traditional API layer.
| Just easier to use.
| felixge wrote:
| As with everything, it depends.
|
| I've implemented a large system managing billions of records
| using exactly this approach of cutting out a lot of boilerplate
| in the application layer.
|
| The most important thing is to be pragmatic. This approach works
| great for CRUD as well as many types of complex analytical
| queries. However, for cases where the the complexity or
| performance of the SQL was unacceptable, we also decoded data
| into application data structures for further processing.
|
| When done well, you can get the best of both worlds as needed.
| maxk42 wrote:
| It is a really, really bad idea to put business logic in your
| database and there are people in this thread downvoting everyone
| who points it out. Please do not do this in production - I've had
| to maintain systems that married business logic and database and
| it was a nightmare to maintain and cost a fortune to fix.
| tshaddox wrote:
| What do you mean? Isn't your database precisely where you ought
| to put your business logic?
| throw1234651234 wrote:
| Is this a joke? SPs and SQL Functions are far less
| maintainable than .NET/Java/Node/Python.
| mdavidn wrote:
| That's perhaps true of stored procedures, but SQL views,
| constraints, and triggers written in a declarative manner
| are far more maintainable and far less buggy than
| imperative application code, in my experience.
| ako wrote:
| Why are they less maintainable? It's just code you can
| write in a code editor, version control in a versioning
| system, have automatic tests for, have automatic deployment
| scripts for, etc. In the end the database is just an
| application server executing your code close to the data.
| Only thing you have to ask yourself is how much your code
| needs to scale. But it can be much more efficient to run
| business logic in the database as stored procedures.
| Thaxll wrote:
| "But it can be much more efficient to run business logic
| in the database as stored procedures."
|
| But this is not code, this SQL or equivalent, if SQL was
| known to be better than code to do busines logic everyone
| would do that.
| mousepilot wrote:
| There's all sorts of programming styles in use and
| calling stored procedures from dot net programs for
| instance is at least a thing.
|
| Plus, lots of those ORM things get plenty of hate from
| DBAs, I mean I like them but then again I'm not a very
| good programmer.
| ako wrote:
| How is this not code: https://oracle.su/docs/11g/appdev.1
| 12/e10472/packages.htm#in... ?
|
| What is your definition of code?
| mdavidn wrote:
| Using a migration utility like ActiveRecord, Flyway, or
| go-pg, SQL definitions can also live in version control,
| have automatic tests, and deploy automatically. Who has a
| database in production without these things?
| munk-a wrote:
| I personally have found it's much more difficult to
| partially mock out concepts when writing logic in the DB
| layer though my tooling may have been the weakpoint. In a
| nice application language I can easily isolate specific
| units of code for direct testing - while every test I
| write over complex database logic feels like composing a
| symphony where I need to make sure the horns are coming
| in on the right pitch and the wind section isn't too
| early to properly simulate a relatively easy error
| scenario.
|
| Since the application layer is much more naturally
| aligned with small well isolated chunks of logic it is
| easier to minimize the volume of code and logic in scope
| when testing a particular attribute - when that goes into
| the DB I've always seen things get more complex rapidly.
| [deleted]
| pbreit wrote:
| Wut? You put data in a database. Logic goes in the app.
| tyingq wrote:
| In some cases, things like constraints and the model itself
| represent business logic.
| jayd16 wrote:
| Like everything, it depends. Some business logic like "this
| field should never be negative" is probably fine.
|
| More complex logic is a lot harder to maintain. You don't
| want to stress your DB's CPU if you can avoid it. You still
| need some kind of connection pooler anyway. There's a lot of
| reasons it doesn't make sense to put all your business logic
| in the DB. SQL is not the easiest language to maintain (not
| enough abstraction).
|
| IMO the line to draw is just enough to keep data sanity.
| jasonpeacock wrote:
| Business logic goes into the code, data goes into the
| database.
|
| Sometimes business logic is implemented as rules, in which
| case the rules (configuration) can go into either
| configuration files or a database. But that doesn't make it
| data...
| fiddlerwoaroof wrote:
| Data is just crystallized business logic. Anemic databases
| aren't a big deal when the database is just a persistence
| layer for a single app but, when you're rewriting the
| application code or allowing BI queries directly against
| the database, designing a database as a service whose
| interface is stored procedures starts making data integrity
| much easier.
| ahallock wrote:
| I think this is slightly reductive. The majority of your
| business logic, I agree, should reside in application code,
| but lightweight business logic in the form of database
| constraints are powerful at disallowing bad data. For
| example, you may have a rental booking application that
| shouldn't have overlapping date ranges. The database can
| give you this guarantee out of the box and will work for
| ALL clients and applications.
| oauea wrote:
| Thanks for sharing now wisdom, now if you could also explain
| _why_?
| mistersys wrote:
| This isn't really business logic... unless you're in the
| business of simply converting data formats.
|
| As a data fetching approach and especially as tooling primitive
| to eliminate the n+1 ORM problem this is a terrific tool to be
| aware of.
| jrochkind1 wrote:
| how old were the systems? And have you ever had to maintain
| systems that were that many years old that _weren 't_ a
| nightmare to maintain and cost a fortune to fix?
| tofuahdude wrote:
| The posted article doesn't advocate for putting business logic
| in the DB.
|
| It only describes postgres native functions for formatting
| data.
| green7ea wrote:
| I made a side project to experiment with this approach (and rust)
| a few years ago. It worked pretty well but it made simple queries
| more complex[1]. I was a little bit worried that complex queries
| would become a lot more complex so I'm not sure how well this
| approach would scale.
|
| It removed a crazy amount of boilerplate code [2] though and made
| the project much more focused on the actual data which is always
| a good thing.
|
| Ever since, I've been very curious to try edgedb [3] since it
| promises to do similar things without making the queries more
| complex.
|
| [1]https://github.com/green7ea/newsy/blob/master/src/feed_overv..
| .
|
| [2]https://github.com/green7ea/newsy/blob/master/src/main.rs#L4..
| .
|
| [3]https://www.edgedb.com/
| jexp wrote:
| I'm not impressed. Same in Neo4j's
|
| MATCH (e:Employee) RETURN e { .* } as employee;
|
| MATCH (e:Employee) RETURN collect(e { .* }) as employees;
|
| MATCH (e:Employee)-[:IN]->(d:Department) WITH d, collect(e {.*})
| as employees
___________________________________________________________________
(page generated 2021-07-15 23:00 UTC)