It’s been quite some months since I last wrote about our Open Source PostgreSQL monitoring tool called pgwatch2, but now there’s again a reason for it – it got again a lot better! Changes include multiple new feature updates per requests from users and also some bugfixes. So a quick overview of new stuff here. […]
When setting up a database schema, indexing is always a consideration. While Postgres supports traditional btree indexes for most data types, it can be quite heavy, often requiring a significant percentage of the table size for index storage.
There are two options for schemas that need to index a large percentage of columns. The first is BRIN indexes which allow for small, low-overhead indexes that can be created on many columns. One downside of BRIN indexes is that they must be defined on each column that needs it.
A more generic solution is to place all data in a JSONB column and create a GIN index. The GIN index indexes every JSON key and value, and only stores the keys and values once in the index. This is a great "index everything" option for administrators that aren't sure of their indexing needs.
A couple of days back a new feature has landed in the PostgreSQL world for the development of version 11, in the shape of the following commit:
commit: 6d9fa52645e71711410a66b5349df3be0dd49608 author: Peter Eisentraut <email@example.com> date: Mon, 11 Sep 2017 16:48:30 -0400 pg_receivewal: Add --endpos option This is primarily useful for making tests of this utility more deterministic, to avoid the complexity of starting pg_receivewal as a deamon in TAP tests. While this is less useful than the equivalent pg_recvlogical option, users can as well use it for example to enforce WAL streaming up to a end-of-backup position, to save only a minimal amount of WAL. Use this new option to stream WAL data in a deterministic way within a new set of TAP tests. Author: Michael Paquier
pg_receivewal is mainly used in production deployments as a standalone daemon continuously streaming WAL from a given Postgres instance, be it a primary or a standby, to archive WAL which is afterwards used for recovery of backups or PITR purposes. This is also useful to allow standbys to catch up a primary when they have been disconnected for a long time, even wanted WAL segments may have been recycled on the primary after too many checkpoints. The use of replication slots can leverage that, but those can show as well their limits if the local pg_wal data gets too much bloated, leading to a replication slot being forcibly dropped (note that there is a patch proposed for integration with Postgres 11 allowing to advance the oldest WAL position of a physical slot).
Being firstly designed for testing purposes thanks to which the code coverage of for example pg_receivewal.c has been increased from 15% to 55% using a set of dedicated TAP tests really simplified by the fact that pg_receivewal can be controlled easily without signals with this feature, this option is as well useful for backup purposes. When taking a base backup using the set of low-level APIs pg_start_backup() and pg_stop_backup(), the WAL position (LSN) returned by pg_stop_backup() marks the point in WAL replay up to which the PostgreSQL instance that has been backed up can be recovered into a consistent state. When archiving is enabled, pg_stop_backup() is careful to wait for WAL segments needed for recovery to be archived before telling back to the client that the backup is successfully completed. Note as well that pg_stop_backup() enforces a WAL segment switch for this purpose.
Archiving using archive_mode and archive_command unfortunately complicates the failure handling in this area and may put in danger the PostgreSQL instance itself with a full pg_wal path if the archive command repeats failing for a reason or another. So some users prefer the possibility to fetch WAL segments in a way similar to what standbys use, via streaming replication. The failure handling happens then independently of the primary instance of Postgres running.
Using a daemon of pg_receivewal, one can be sure that a base is safe to use by monitoring the on-disk progress of WAL segments generated, and then making sure that the LSN position returned by pg_stop_backup() corresponds to a segment taht has been completed (well what I am writing here is not completely true either, one could always look at the contents of the current partial segment written at record level, but this is way more costly than looking at completed segments). However, using –endpos, it is possible to request an instance of pg_receivewal to stop streaming up to the position where backup can reach a consistent state, and no more, saving some WAL archive fetching which may not be needed.
Note that pg_receivewal determines the position it starts streaming by first looking at the contents of the archives it is saving data to and looking at the lastly completed segment, and requests the current LSN position generated by PostgreSQL if it finds nothing. It also always begins streaming from the beginning of a segment. So be careful when using it, and you would likely want to pair it with a replication slot, which in turn requires monitoring of its own with for example a background worker scanning pg_replication_slots periodically.
The Postgres optimizer has improved dramatically over the years. However, there are still a few areas where it can be improved.
First, it would be nice if users could be informed when an increase of default_statistics_target is wise. Second, while multivariate statistics will be added in Postgres 10, there are still other multivariate optimizations to add.
A more sophisticated improvement, which bypasses the problem of inaccurate statistics, would be to have the executor send feedback on selectivity found during query execution to the optimizer to improve the future query plans.