[HN Gopher] Cleaning Up Your Postgres Database
___________________________________________________________________
Cleaning Up Your Postgres Database
Author : plaur782
Score : 110 points
Date : 2021-03-06 13:06 UTC (9 hours ago)
(HTM) web link (blog.crunchydata.com)
(TXT) w3m dump (blog.crunchydata.com)
| ahachete wrote:
| Postgres checkup (https://postgres.ai/products/postgres-checkup)
| produces very detailed and complete reports easily. This is my
| recommendation.
| koeng wrote:
| "if you have a table called messages, logs, events in your
| Postgres database there is a good chance it's consuming the bulk
| of your storage, is your largest table, and could benefit from
| being moved outside of your main Postgres database. "
|
| Any recommendations on a good logging database? I like keeping
| full logs of what the application does (important for
| traceability in my case) and love how much effort has gone into
| postgres backups and querying and such.
| scottshamus wrote:
| There's lot of good solutions out there but if you really like
| using Postgres, you could move your logs to a secondary
| Postgres DB or Postgres server so it doesn't affect your
| application DB's performance. I've scaled this to 10s of
| millions of log messages without much issue.
| OldHand2018 wrote:
| I like this approach and have successfully used both table
| partitioning [1] and (if necessary) foreign data wrappers [2]
| to keep my main database small and fast. There is a pretty
| old blog post that clued me into it [3]. It works really
| nicely, allowing for such things as old data stored on
| spinning disks and newer data on NVMe drives. You can query
| the main table directly with query parameters that limit
| which partitions you hit, or you can query a partition table
| if you know that's where all the data resides. It's pretty
| awesome!
|
| [1] https://www.postgresql.org/docs/current/ddl-
| partitioning.htm...
|
| [2] https://www.postgresql.org/docs/current/ddl-foreign-
| data.htm...
|
| [3] https://pgdash.io/blog/postgres-11-sharding.html
| dzikimarian wrote:
| We're using ELK stack + grafana. There's some learning curve
| and it is a bit more taxing on the hardware, however it can
| easily aggregate logs from multiple applications, search them
| and aggregate in various ways.
| lr4444lr wrote:
| Unless you need joins, type constraints, and referential
| integrity, the ELK stack is built for superb query-ability.
| DasIch wrote:
| As you can probably tolerate reads being a bit slow for logs,
| S3 is a good option. It's cheap, trivial to operate and you can
| query it easily with Athena.
| zlynx wrote:
| Postgres has some features that are excellent for logging.
|
| For example, instead of inserting everything into a log table
| and deleting old entries and needing to vacuum things, use a
| partitioned table and just DROP entire partitions when they age
| out. You can summarize them into some kind of summary table
| before that, of course.
|
| The partitions are good for the summaries too. Instead of doing
| index scans on the range of dates across a terabyte of logs, it
| is a full table scan on a partition of a single day or hour.
|
| You can use a separate database for logging but the one time we
| did that it got weird. We ended up doing reporting scripts that
| had to first copy tables from the active DB to the logging DB
| so we could use them to JOIN on some log columns. They weren't
| huge but if everything had been in the same DB it wouldn't have
| been needed.
|
| All of the scripts and procedures I know of were custom written
| for this. I don't know of any good prepackaged Postgres logging
| configs, but they may be out there.
| stickfigure wrote:
| This is a pretty good use case for BigQuery. It's pretty easy
| to make fairly sophisticated queries across enormous datasets.
| It fits BigQuery's model of write-once, time-indexed
| information.
| gunnarmorling wrote:
| You could keep using Postgres, but in a special way for such
| log tables: insert and delete the log records right away. Then
| use a change data capture tool like Debezium to capture changes
| from the transaction log and stream them to Kafka, or any other
| destination.
|
| Your log table will always be empty (so not taking up disk
| space), but the CDC tool will be able to get all the
| information from the TX logs (which themselves can be discarded
| after all replication slots have advanced a given offset). The
| advantage is that your log entries are transactionally written
| with the actual application data itself.
|
| We use this technique for instance in the Debezium Quarkus
| extension for implementing the outbox pattern [1]. Postgres
| even allows you to insert records directly to the TX log via
| pg_logical_emit_message() - so no table involved at all -, but
| we don't support this in Debezium yet.
|
| Disclaimer: I work on Debezium.
|
| [1]
| https://debezium.io/documentation/reference/integrations/out...
| jaytaylor wrote:
| This looks interesting, looking forward to reading up on it.
| Would be really cool if it worked with SQLite ;)
|
| Any chance JDK11 support will come for the Debezium Java core
| [1]?
|
| [1] https://github.com/debezium/debezium
| gunnarmorling wrote:
| Funny you should ask, we were just talking about raising
| the minimum baseline to JDK 11 the other day (currently it
| still is compatible with Java 1.8). That said, you
| definitely can use Debezium with Java 11 already; so
| perhaps you an entail a bit what exactly you mean by JDK 11
| support here?
|
| Re SQLite, is there any CDC interface which could be used
| to implement a connector?
| jaytaylor wrote:
| Got you, that sounds great!
|
| When the README listed Java 1.8 as a requirement, I
| assumed (perhaps incorrectly) an implication that it
| works with _only_ Java 1.8.
|
| At my work, some critical libraries only work with Java 8
| or 11, so the transition from 8 was challenging on the
| dependency versioning front. It's a gigantic project and
| codebase, though.
|
| Re: SQLite, not that I know of, but I don't have deep
| knowledge on the matter. HN may go crazy (in a good way!)
| if there were a way to pull this off though!!!
| craigkerstiens wrote:
| As another commenter mentioned, there are a lot of options out
| there-probably too many to list off. Postgres can totally work
| but the option of using a separate database is a good one. We
| actually got a few questions on the original post about
| auditing changes, so while not directly "logs" it falls are
| least partially in that category. A colleague wrote a follow-on
| post on how you can use Debezium with Postgres for change data
| capture to send all your changes to Kafka and then to some
| other down stream system.
| azophy_2 wrote:
| there is a HN thread a couple days ago discussing Clickhouse.
| Haven't try it myself, but may be useful
|
| https://news.ycombinator.com/item?id=26316401
| mpitt wrote:
| One of the newest kids in the block is Loki (by Grafana) [1]
|
| We're starting to explore it in production and it's looking
| very promising. Much, much simpler (especially to integrate)
| than ELK.
|
| [1] https://grafana.com/oss/loki/
| cldellow wrote:
| Not sure if you're only looking for self-hosted things, but
| I've been trying out Apex Logs by TJ Holowaychuk:
| https://apex.sh/logs/
|
| So far, I like it. You get a minimalist UI for doing
| interactive queries, and your data is ultimately persisted in
| your own Google BigQuery datasets so you can write arbitrary
| SQL against them.
|
| I have yet to run significant volume through it, so perhaps my
| view will change if it performs worse at scale.
| St-Clock wrote:
| One strategy we are currently implementing is using a second
| Postgres database synchronized with logical replication.
|
| 1. We insert the logs in the main database and periodically
| delete old logs. We can thus create the log in the same
| transaction as a record is created/updated/deleted. Yay
| atomicity!
|
| 2. The second database uses logical replication to receive new
| logs, but it only replicates INSERT operations so old logs are
| never deleted.
|
| 3. The application can show recent logs quickly (who updated
| the customer profile this week?) and if the user wants to dig
| deeper, we can query the slower log database but the user kind
| of expect that pulling the entire history of a record will take
| a bit longer than other operations.
___________________________________________________________________
(page generated 2021-03-06 23:02 UTC)