502 stories
·
1 follower

Ozgun Erdogan: Three Approaches to PostgreSQL Replication and Backup

1 Share

The Citus distributed database scales out PostgreSQL through sharding, replication, and query parallelization. For replication, our database as a service (by default) leverages the streaming replication logic built into Postgres.

When we talk to Citus users, we often hear questions about setting up Postgres high availability (HA) clusters and managing backups. How do you handle replication and machine failures? What challenges do you run into when setting up Postgres HA?

The PostgreSQL database follows a straightforward replication model. In this model, all writes go to a primary node. The primary node then locally applies those changes and propagates them to secondary nodes.

In the context of Postgres, the built-in replication (known as “streaming replication”) comes with several challenges:

  • Postgres replication doesn’t come with built-in monitoring and failover. When the primary node fails, you need to promote a secondary to be the new primary. This promotion needs to happen in a way where clients write to only one primary node, and they don’t observe data inconsistencies.
  • Many Postgres clients (written in different programming languages) talk to a single endpoint. When the primary node fails, these clients will keep retrying the same IP or DNS name. This makes failover visible to the application.
  • Postgres replicates its entire state. When you need to construct a new secondary node, the secondary needs to replay the entire history of state change from the primary node. This process is resource intensive—and makes it expensive to kill nodes in the head and bring up new ones.

The first two challenges are well understood. Since the last challenge isn’t as widely recognized, we’ll examine it in this blog post.

Three approaches to replication in PostgreSQL

Most people think that when you have a primary and secondary architecture, there’s only one way to set up replication and backups. In practice, Postgres deployments follow one of three approaches.

  1. PostgreSQL streaming replication to replicate data from primary to secondary node. Back up to S3 / Blob storage.
  2. Volume level replication to replicate at the storage layer from primary to secondary node. Back up to S3 / Blob storage.
  3. Take incremental backups from the primary node to S3. Reconstruct a new secondary node from S3. When secondary is close enough to primary, start streaming from primary.

There’s also an easy way to identify which approach you’re using. Let’s say you added a new secondary node. How do you reconstruct the new secondary node’s state?

Approach 1: Streaming replication in PostgreSQL (with local storage)

Figure 1 - Streaming replication in Postgres, with local storage

This first approach is the most common one. You have a primary node. The primary node has the tables’ data and write-ahead logs (WAL). (When you modify a row in Postgres, the change first gets committed to an append-only redo log. This redo log is known as a write-ahead log, or WAL.) This Postgres WAL log then gets streamed over to a secondary node.

In this first approach, when you build a new secondary node, the new secondary needs to replay the entire state from the primary node—from the beginning of time. The replay operation may then introduce a significant load on the primary node. This load becomes more important if your database’s primary node serves live traffic.

In this approach, you can use local disks or attach persistent volumes to your instances. In the diagram above, we’re using local disks because that’s the more typical setup.

Approach 2: Replicated Block Device

Figure 2 - Replicated Block Device

The second approach relies on disk mirroring (sometimes called volume replication.) In this approach, changes get written to a persistent volume. This volume then gets synchronously mirrored to another volume. The nice thing about this approach is that it works for all relational databases. You can use it for MySQL, PostgreSQL, or SQL Server.

However, the disk mirroring approach to replication in Postgres also requires that you replicate both table and WAL log data. Further, each write to the database now needs to synchronously go over the network. You can’t miss a single byte because that could leave your database in a corrupt state.

Approach #3: Reconstruct from WAL (and switch to streaming replication)

Figure 3 - Reconstruct from WAL

The third approach turns the replication and disaster recovery process inside out. You write to the primary node. The primary node does a full database backup every day, and incremental backups every 60 seconds.

When you need to construct a new secondary node, the secondary reconstructs its entire state from backups. This way, you don’t introduce any load on the primary database. You can bring up new secondary nodes and reconstruct them from S3 / Blob storage. When the secondary node is close enough to the primary, you can start streaming WAL logs from the primary and catch up with it. In normal state, the secondary node follows the primary node.

In this approach, write-ahead logs are first class citizens. This design lends itself to a more cloud-native architecture. You can bring up or shoot down replicas at will without impacting your relational database’s performance. You can also use synchronous or asynchronous replication depending on your requirements.

How do these different approaches to Postgres replication compare?

Here’s a simple table that compares these approaches to each other. For each approach, you can think of its benefits as drawbacks for the other approaches.

Type of Postgres Who does this? Primary benefits
Simply streaming replication
(local disk)
On-prem
Manual EC2
Simpler to setup
High I/O performance and large storage
Replicated block device RDS
Azure Postgres
Works for MySQL, PostgreSQL
Data duratbility in cloud environments
Reconstruct from WAL
(and switch to streaming replication)
HerokuCitus Cloud Node reconstruction in background
Enables fork and PITR

Simple streaming replication is the most common approach. Most on-prem deployments follow this approach. It’s easy to set up. Further, when you set it up using local disks, you can store 10s of TBs of data.

Comparatively, the disk mirroring approach abstracts away the storage layer from the database. In this approach, when you lose an instance, you don’t lose your ephemeral disk with it. This approach also works across database technologies, for example with MySQL and Postgres.

In the third method, when you have a new machine, you reconstruct that machine’s state from WAL logs. Since you’re treating your WAL logs as a first class citizen, certain features become trivial. For example, let’s say you wanted to performance test your application against production data, but not against the production database. In the third method, you can “for”“ your database from a particular point in time in WAL logs without impact to production, and test your app against the forked database.

Which PostgreSQL replication method is more "cloud-native”?

PostgreSQL comes with three different replication methods. As with so many things, each replication method has its pros and cons.

The third approach reconstructs a new secondary node by replaying write-ahead logs (WAL) from blob storage such as S3. As a result, reconstructing a new replica doesn’t introduce any additional load on the primary node. This enables a high-availability (HA) solution where you can easily bring up or shoot down database nodes—a property that’s beneficial in cloud-native environments.

Read the whole story
internetionals
2 days ago
reply
Netherlands
Share this story
Delete

Simon Riggs: PostgreSQL Maximum Table Size

1 Share

Various limits on the PostgreSQL database are listed here: https://www.postgresql.org/about/

One of those limits is the Maximum Table Size, listed as 32TB. It’s been that way for many years now.

Only problem is that it has always been wrong, slightly. And now its got much, much bigger.

The table size is limited by the maximum number of blocks in a table, which is 2^32 blocks. The default block size is 8192 bytes, hence the default limit was 32TB as listed. That was wrong in a two ways, because PostgreSQL has always had a configurable block size which allows up to 32768 bytes per block, which would give a maximum size of 128TB. Only problem is that requires an unload/reload to change the block size, so the effective limit per table was 32TB as advertized.

PostgreSQL has always supported Table Inheritance, which has been used in some cases to implement something similar to Table Partitioning in other databases. The big blocker there was that this wasn’t handled well in the Optimizer, so wasn’t easily usable. PostgreSQL’s initial attempt at that was by myself in PostgreSQL 8.1 in 2005, where we introduced constraint_exclusion, though by my own admission that needed more major work. Various tweaks helped, but didn’t change the game significantly. Major work came in the form of two failed attempts to add Partitioning, the first one using Rules and the second one using Triggers, neither of which was very practical. Luckily the next attempt was some years in the planning and started from a good design before it was written, leading to a successsful implementation of Declarative Partitioning in PostgreSQL 10. There is still work to be done and I’m pleased to say it looks like many of the issues will be solved in PostgreSQL 11, with much work contributed by a great many folk from the big names in PostgreSQL new feature development: 2ndQuadrant, EnterpriseDB, NTT Data (listed alphabetically).

Oh yeah, maximum table size. Partitioning theoretically allows you to have one big table made up of many smaller tables. We store that counter in a 32-bit field, so we ought to be able to store lots of data.

My colleague David Rowley found a bug that has existed for >22 years in PostgreSQL that accidentally limited the number of tables to a 16-bit value, limiting us in PostgreSQL 10 to only 65535 subtables. Fixed in PostgreSQL 11, so now we can go to the full 2^32 subtables, each of size 2^32 * 8192 bytes. Wahoo!

So the maximum size of tables in PostgreSQL is

  1. 32 Terabytes (32TB) in PostgreSQL 9.6 or earlier
  2. 2 Exabytes (2EB) in PostgreSQL 10 – 2048PB (Petabytes) or 2 million TB
  3. 131 Yottabytes(131YB) in PostgreSQL 11 – math brain explosion…

Hope that’s big enough for you all?

Read the whole story
internetionals
3 days ago
reply
Netherlands
Share this story
Delete

Aleksandr Parfenov: PostgreSQL Flexible Full Text Search

1 Share

PostgreSQL is a powerful RDBMS which can store data as well as do some manipulations on it such as search or calculating some analytical information and one the techniques is FTS. The Full Text Search (or FTS for short) was introduced in PostgreSQL firstly as a tsearch extension and added into PostgreSQL 8.3 as a core feature. Unfortunately, since PostgreSQL 8.3 FTS almost didn’t change and have some issues which were detected during years of the work with it. I’m working on a patch to extend the abilities and flexibility of FTS configuration mechanism in PostgreSQL and want to describe it in more details in this post.

Read the whole story
internetionals
3 days ago
reply
Netherlands
Share this story
Delete

Hans-Juergen Schoenig: PostgreSQL performance: How the optimizer handles views

1 Share

“How does the PostgreSQL optimizer handle views?” or “Are views good or bad?” I assume that every database consultant and every SQL performance expert has heard this kind of question already in the past. Most likely this does not only hold true for PostgreSQL experts but is also true for Oracle, DB2 and MS SQL consultants. Given the fact that views are a really essential feature of SQL it makes sense to take a closer look at the topic in general and hopefully help some people to write better and faster code.

PostgreSQL: What it does to views

Let us create a simple table containing just 10 rows, which can be used throughout the blog to show, how PostgreSQL works and how the optimizer treats things:

test=# CREATE TABLE data AS
	SELECT 	*
	FROM 	generate_series(1, 10) AS id;
SELECT 10

Then I have created a very simple view:

test=# CREATE VIEW v1 AS
	SELECT 	*
	FROM 	data
	WHERE 	id < 4;
CREATE VIEW

The idea here is simply to filter some data and return all the columns.

Inlining and flattening

The key thing is: The optimizer will process the view just like a “preprocessor” directive. It will try to inline the code and flatten it. Here is an example:

test=# explain SELECT * FROM v1;
                      QUERY PLAN                       
-------------------------------------------------------
 Seq Scan on data  (cost=0.00..41.88 rows=850 width=4)
   Filter: (id < 4)
(2 rows)

When we try to read from the view it is just like running the SQL statement directly. The optimizer will perform the following steps:

SELECT	*
FROM	(SELECT 	*
		FROM 	data
		WHERE 	id < 4
	) AS v1;

In the next step the subselect will be flattened out completely, which leaves us with:

SELECT * FROM data WHERE id < 4;

We could have done this transformation ourselves but it is easier to make the optimizer do it for us and enjoy the convenience offered by the view.

Joining views

Basically the same mechanism applies, when you are working with joins. PostgreSQL will again inline, flatten and then create a plan to join those tables together. The output is therefore not surprising:

test=# explain SELECT 	*
	FROM 	v1 AS a, v1 AS b
	WHERE 	a.id = 1
			AND a.id = b.id;
                               QUERY PLAN                               
------------------------------------------------------------------------
 Nested Loop  (cost=0.00..96.71 rows=16 width=8)
   ->  Seq Scan on data  (cost=0.00..48.25 rows=4 width=4)
         Filter: ((id < 4) AND (id = 1))
   ->  Materialize  (cost=0.00..48.27 rows=4 width=4)
         ->  Seq Scan on data data_1  (cost=0.00..48.25 rows=4 width=4)
               Filter: ((id < 4) AND (id = 1))
(6 rows)

Again PostgreSQL provides us with a simple plain.

Destructive behavior

However, views are not only good. Sometimes they can be a little destructive too. Consider the following example:

test=# CREATE VIEW v2 AS
	SELECT 	*
	FROM 	data
	WHERE 	id < 4
	ORDER BY 1;
CREATE VIEW

In this case the view provides us with some ordered data. Let us see what happens in the following query:

test=# explain SELECT * FROM v2 ORDER BY 1 DESC;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Sort  (cost=135.22..137.34 rows=850 width=4)
   Sort Key: data.id DESC
   ->  Sort  (cost=83.23..85.36 rows=850 width=4)
         Sort Key: data.id
         ->  Seq Scan on data  (cost=0.00..41.88 rows=850 width=4)
               Filter: (id < 4)
(6 rows)

As you can see PostgreSQL will sort the data twice, which is definitely not good for performance. PostgreSQL won’t optimize this for you so you have to take yourself and be very aware of the issue.

In general it is a good idea to make a mental distinction between two types of views:

• Final results
• Intermediate results

If a view already has the “final” results you can do sorting and all kinds of stuff. However, if you are using a view to perform some large calculations on top of it, it might be a good idea to avoid formatting, sorting and alike.

However, there is more to views. Consider the following example:

test=# CREATE VIEW v3 AS
	SELECT 	*
	FROM 	data
	WHERE 	id < 4
	LIMIT 100000000000;
CREATE VIEW

In this case the view contains a LIMIT clause, which basically says that “all” the data will be returned. However, during those formal transformation the optimizer cannot “rely” on the fact that all data will be included in the final result. Therefore the optimizer cannot silently skip the LIMIT clause, which of course leads to the following plan:

test=# explain SELECT * FROM v3 ORDER BY 1 DESC;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Sort  (cost=91.73..93.86 rows=850 width=4)
   Sort Key: data.id DESC
   ->  Limit  (cost=0.00..41.88 rows=850 width=4)
         ->  Seq Scan on data  (cost=0.00..41.88 rows=850 width=4)
               Filter: (id < 4)
(5 rows)

Conclusion

As always the world is neither only black nor white. Views are a valuable feature and when used nicely there is nothing wrong with them. However, keep in mind what is going on behind the scenes and avoid building views on top of view on top of views and so on.

The post PostgreSQL performance: How the optimizer handles views appeared first on Cybertec.

Read the whole story
internetionals
4 days ago
reply
Netherlands
Share this story
Delete

Jean-Jerome Schmidt: An Overview of Database Indexing in PostgreSQL

1 Share

Database Indexing is the use of special data structures that aim at improving performance, by achieving direct access to data pages. A database index works like the index section of a printed book: by looking in the index section, it is much faster to identify the page(s) which contain the term we are interested in. We can easily locate the pages and access them directly. This is instead of scanning the pages of the book sequentially, till we find the term we are looking for.

Indexes are an essential tool in the hands of a DBA. Using indexes can provide great performance gains for a variety of data domains. PostgreSQL is known for its great extensibility and the rich collection of both core and 3rd party addons, and indexing is no exception to this rule. PostgreSQL indexes cover a rich spectrum of cases, from the simplest b-tree indexes on scalar types to geospatial GiST indexes to fts or json or array GIN indexes.

Indexes, however, as wonderful as they seem (and actually are!) don’t come for free. There is a certain penalty that goes with writes on an indexed table. So the DBA, before examining her options to create a specific index, should first make sure that the said index makes sense in the first place, meaning that the gains from its creation will outweigh the performance loss on writes.

PostgreSQL basic index terminology

Before describing the types of indexes in PostgreSQL and their use, let’s take a look at some terminology that any DBA will come across sooner or later when reading the docs.

  • Index Access Method (also called as Access Method): The index type (B-tree, GiST, GIN, etc)
  • Type: the data type of the indexed column
  • Operator: a function between two data types
  • Operator Family: cross data type operator, by grouping operators of types with similar behaviour
  • Operator Class (also mentioned as index strategy): defines the operators to be used by the index for a column

In PostgreSQL’s system catalog, access methods are stored in pg_am, operator classes in pg_opclass, operator families in pg_opfamily. The dependencies of the above are shown in the diagram below:

Types of Indexes in PostgreSQL

PostgreSQL provides the following Index types:

  • B-tree: the default index, applicable for types that can be sorted
  • Hash: handles equality only
  • GiST: suitable for non-scalar data types (e.g. geometrical shapes, fts, arrays)
  • SP-GiST: space partitioned GIST, an evolution of GiST for handling non-balanced structures (quadtrees, k-d trees, radix trees)
  • GIN: suitable for complex types (e.g. jsonb, fts, arrays )
  • BRIN: a relatively new type of index which supports data that can be sorted by storing min/max values in each block

Low we’ll try to get our hands dirty with some real world examples. All examples given are done with PostgreSQL 10.0 (with both 10 and 9 psql clients) on FreeBSD 11.1.

B-tree Indexes

Let’s suppose we have the following table:

create table part (
id serial primary key, 
partno varchar(20) NOT NULL UNIQUE, 
partname varchar(80) NOT NULL, 
partdescr text,
machine_id int NOT NULL
);
testdb=# \d part
                                  Table "public.part"
   Column       |         Type          |                     Modifiers                     
------------+-----------------------+---------------------------------------------------
 id         | integer                 | not null default nextval('part_id_seq'::regclass)
 partno     | character varying(20)| not null
 partname       | character varying(80)| not null
 partdescr      | text                    |
 machine_id     | integer                 | not null
Indexes:
    "part_pkey" PRIMARY KEY, btree (id)
    "part_partno_key" UNIQUE CONSTRAINT, btree (partno)

When we define this rather common table, PostgreSQL creates two unique B-tree indexes behind the scenes: part_pkey and part_partno_key. So every unique constraint in PostgreSQL is implemented with a unique INDEX. Lets populate our table with a million rows of data:

testdb=# with populate_qry as (select gs from generate_series(1,1000000) as gs )
insert into part (partno, partname,machine_id) SELECT 'PNo:'||gs, 'Part '||gs,0 from populate_qry;
INSERT 0 1000000

Now let’s try to do some queries on our table. First we tell psql client to report query times by typing \timing:

testdb=# select * from part where id=100000;
   id   |   partno   |  partname   | partdescr | machine_id
--------+------------+-------------+-----------+------------
 100000 | PNo:100000 | Part 100000 |           |          0
(1 row)

Time: 0,284 ms
testdb=# select * from part where partno='PNo:100000';
   id   |   partno   |  partname   | partdescr | machine_id
--------+------------+-------------+-----------+------------
 100000 | PNo:100000 | Part 100000 |           |          0
(1 row)

Time: 0,319 ms

We observe that it takes only fractions of the millisecond to get our results. We expected this since for both columns used in the above queries, we have already defined the appropriate indexes. Now let’s try to query on column partname, for which no index exists.

testdb=# select * from part where partname='Part 100000';
   id   |   partno   |  partname   | partdescr | machine_id
--------+------------+-------------+-----------+------------
 100000 | PNo:100000 | Part 100000 |           |          0
(1 row)

Time: 89,173 ms

Here we see clearly that for the non indexed column, the performance drops significantly. Now lets create an index on that column, and repeat the query:

testdb=# create index part_partname_idx ON part(partname);
CREATE INDEX
Time: 15734,829 ms (00:15,735)
testdb=# select * from part where partname='Part 100000';
   id   |   partno   |  partname   | partdescr | machine_id
--------+------------+-------------+-----------+------------
 100000 | PNo:100000 | Part 100000 |           |          0
(1 row)

Time: 0,525 ms

Our new index part_partname_idx is also a B-tree index (the default). First we note that the index creation on the million rows table took a significant amount of time, about 16 seconds. Then we observe that our query speed was boosted from 89 ms down to 0.525 ms. B-tree indexes, besides checking for equality, can also help with queries involving other operators on ordered types, such as <,<=,>=,>. Lets try with <= and >=

testdb=# select count(*) from part where partname>='Part 9999900';
 count
-------
     9
(1 row)

Time: 0,359 ms
testdb=# select count(*) from part where partname<='Part 9999900';
 count  
--------
 999991
(1 row)

Time: 355,618 ms

The first query is much faster than the second, by using the EXPLAIN (or EXPLAIN ANALYZE) keywords we can see if the actual index is used or not:

testdb=# explain select count(*) from part where partname>='Part 9999900';
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Aggregate  (cost=8.45..8.46 rows=1 width=8)
   ->  Index Only Scan using part_partname_idx on part  (cost=0.42..8.44 rows=1 width=0)
         Index Cond: (partname >= 'Part 9999900'::text)
(3 rows)

Time: 0,671 ms
testdb=# explain select count(*) from part where partname<='Part 9999900';
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=14603.22..14603.23 rows=1 width=8)
   ->  Gather  (cost=14603.00..14603.21 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=13603.00..13603.01 rows=1 width=8)
               ->  Parallel Seq Scan on part  (cost=0.00..12561.33 rows=416667 width=0)
                     Filter: ((partname)::text <= 'Part 9999900'::text)
(6 rows)

Time: 0,461 ms

In the first case, the query planner chooses to use the part_partname_idx index. We also observe that this will result in an index-only scan which means no access to the data tables at all. In the second case the planner determines that there is no point in using the index as the returned results are a big portion of the table, in which case a sequential scan is thought to be faster.

Hash Indexes

Use of hash indexes up to and including PgSQL 9.6 was discouraged due to reasons having to do with lack of WAL writing. As of PgSQL 10.0 those issues were fixed, but still hash indexes made little sense to use. There are efforts in PgSQL 11 to make hash indexes a first class index method along with its bigger brothers (B-tree, GiST, GIN). So, with this in mind, let’s actually try a hash index in action.

We will enrich our part table with a new column parttype and populate it with values of equal distribution, and then run a query that tests for parttype equal to ‘Steering’:

testdb=# alter table part add parttype varchar(100) CHECK (parttype in ('Engine','Suspension','Driveline','Brakes','Steering','General')) NOT NULL DEFAULT 'General';
ALTER TABLE
Time: 42690,557 ms (00:42,691)
testdb=# with catqry as  (select id,(random()*6)::int % 6 as cat from part)
update part SET parttype = CASE WHEN cat=1 THEN 'Engine' WHEN cat=2 THEN 'Suspension' WHEN cat=3 THEN 'Driveline' WHEN cat=4 THEN 'Brakes' WHEN cat=5 THEN 'Steering' ELSE 'General' END FROM catqry WHERE part.id=catqry.id;
UPDATE 1000000
Time: 46345,386 ms (00:46,345)
testdb=# select count(*) from part where id % 500 = 0 AND parttype = 'Steering';
 count
-------
   322
(1 row)

Time: 93,361 ms

Now we create a Hash index for this new column, and retry the previous query:

testdb=# create index part_parttype_idx ON part USING hash(parttype);
CREATE INDEX
Time: 95525,395 ms (01:35,525)
testdb=# analyze ;
ANALYZE
Time: 1986,642 ms (00:01,987)
testdb=# select count(*) from part where id % 500 = 0 AND parttype = 'Steering';
 count
-------
   322
(1 row)

Time: 63,634 ms

We note the improvement after using the hash index. Now we will compare the performance of a hash index on integers against the equivalent b-tree index.

testdb=# update part set machine_id = id;
UPDATE 1000000
Time: 392548,917 ms (06:32,549)
testdb=# select * from part where id=500000;
   id   |   partno   |  partname   | partdescr | machine_id |  parttype  
--------+------------+-------------+-----------+------------+------------
 500000 | PNo:500000 | Part 500000 |           |     500000 | Suspension
(1 row)

Time: 0,316 ms
testdb=# select * from part where machine_id=500000;
   id   |   partno   |  partname   | partdescr | machine_id |  parttype  
--------+------------+-------------+-----------+------------+------------
 500000 | PNo:500000 | Part 500000 |           |     500000 | Suspension
(1 row)

Time: 97,037 ms
testdb=# create index part_machine_id_idx ON part USING hash(machine_id);
CREATE INDEX
Time: 4756,249 ms (00:04,756)
testdb=#
testdb=# select * from part where machine_id=500000;
   id   |   partno   |  partname   | partdescr | machine_id |  parttype  
--------+------------+-------------+-----------+------------+------------
 500000 | PNo:500000 | Part 500000 |           |     500000 | Suspension
(1 row)

Time: 0,297 ms

As we see, with the use of hash indexes, the speed of queries that check for equality is very close to the speed of B-tree indexes. Hash indexes are said to be marginally faster for equality than B-trees, in fact we had to try each query two or three times until hash index gave a better result than the b-tree equivalent.

GiST Indexes

GiST (Generalized Search Tree) is more than a single kind of index, but rather an infrastructure to build many indexing strategies. The default PostgreSQL distribution provides support for geometric data types, tsquery and tsvector. In contrib there are implementations of many other operator classes. By reading the docs and the contrib dir, the reader will observe that there is a rather big overlap between GiST and GIN use cases: int arrays, full text search to name the main cases. In those cases GIN is faster, and the official documentation explicitly states that. However, GiST provides extensive geometric data type support. Also, as at the time of this writing, GiST (and SP-GiST) is the only meaningful method that can be used with exclusion constraints. We will see an example on this. Let us suppose (staying in the field of mechanical engineering) that we have a requirement to define machines type variations for a particular machine type, that are valid for a certain period in time; and that for a particular variation, no other variation can exist for the same machine type whose period in time overlaps (conflicts) with the particular variation period.

create table machine_type (
	id SERIAL PRIMARY KEY, 
	mtname varchar(50) not null, 
	mtvar varchar(20) not null, 
	start_date date not null, 
	end_date date, 
	CONSTRAINT machine_type_uk UNIQUE (mtname,mtvar)
);

Above we tell PostgreSQL that for every machine type name (mtname) there can be only one variation (mtvar). Start_date denotes the starting date of the period in which this machine type variation is valid, and end_date denotes the ending date of this period. Null end_date means that the machine type variation is currently valid. Now we want to express the non-overlapping requirement with a constraint. The way to do this is with an exclusion constraint:

testdb=# alter table machine_type ADD CONSTRAINT machine_type_per EXCLUDE USING GIST (mtname WITH =,daterange(start_date,end_date) WITH &&);

The EXCLUDE PostgreSQL syntax allows us to specify many columns of different types and with a different operator for each one. && is the overlapping operator for date ranges, and = is the common equality operator for varchar. But as long as we hit enter PostgreSQL complains with a message:

ERROR:  data type character varying has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.

What is lacking here is GiST opclass support for varchar. Provided we have successfully built and installed the btree_gist extension, we may proceed with creating the extension:

testdb=# create extension btree_gist ;
CREATE EXTENSION

And then re-trying to create the constraint and test it:

testdb=# alter table machine_type ADD CONSTRAINT machine_type_per EXCLUDE USING GIST (mtname WITH =,daterange(start_date,end_date) WITH &&);
ALTER TABLE
testdb=# insert into machine_type (mtname,mtvar,start_date,end_date) VALUES('Subaru EJ20','SH','2008-01-01','2013-01-01');
INSERT 0 1
testdb=# insert into machine_type (mtname,mtvar,start_date,end_date) VALUES('Subaru EJ20','SG','2002-01-01','2009-01-01');
ERROR:  conflicting key value violates exclusion constraint "machine_type_per"
DETAIL:  Key (mtname, daterange(start_date, end_date))=(Subaru EJ20, [2002-01-01,2009-01-01)) conflicts with existing key (mtname, daterange(start_date, end_date))=(Subaru EJ20, [2008-01-01,2013-01-01)).
testdb=# insert into machine_type (mtname,mtvar,start_date,end_date) VALUES('Subaru EJ20','SG','2002-01-01','2008-01-01');
INSERT 0 1
testdb=# insert into machine_type (mtname,mtvar,start_date,end_date) VALUES('Subaru EJ20','SJ','2013-01-01',null);
INSERT 0 1
testdb=# insert into machine_type (mtname,mtvar,start_date,end_date) VALUES('Subaru EJ20','SJ2','2018-01-01',null);
ERROR:  conflicting key value violates exclusion constraint "machine_type_per"
DETAIL:  Key (mtname, daterange(start_date, end_date))=(Subaru EJ20, [2018-01-01,)) conflicts with existing key (mtname, daterange(start_date, end_date))=(Subaru EJ20, [2013-01-01,)).

SP-GiST Indexes

SP-GiST which stands for space-partitioned GiST, like GiST, is an infrastructure that enables the development for many different strategies in the domain of non-balanced disk-based data structures. The default PgSQL distribution offers support for two-dimensional points, (any type) ranges, text and inet types. Like GiST, SP-GiST can be used in exclusion constraints, in a similar way to the example shown in the previous chapter.

GIN Indexes

GIN (Generalized Inverted Index) like GiST and SP-GiST can provide many indexing strategies. GIN is suited when we want to index columns of composite types. The default PostgreSQL distribution provides support for any array type, jsonb and full text search (tsvector). In contrib there are implementations of many other operator classes. Jsonb, a highly praised feature of PostgreSQL (and a relatively recent (9.4+) development) is relying on GIN for index support. Another common use of GIN is indexing for full text search. Full text search in PgSQL deserves an article on its own, so we’ll cover here only the indexing part. First let’s make some preparation to our table, by giving not null values to partdescr column and updating a single row with a meaningful value:

testdb=# update part set partdescr ='';
UPDATE 1000000
Time: 383407,114 ms (06:23,407)
testdb=# update part set partdescr = 'thermostat for the cooling system' where id=500000;
UPDATE 1
Time: 2,405 ms

Then we perform a text search on the newly updated column:

testdb=# select * from part where partdescr @@ 'thermostat';
   id   |   partno   |  partname   |             partdescr             | machine_id |  parttype  
--------+------------+-------------+-----------------------------------+------------+------------
 500000 | PNo:500000 | Part 500000 | thermostat for the cooling system |     500000 | Suspension
(1 row)

Time: 2015,690 ms (00:02,016)

This is quite slow, almost 2 seconds to bring our result. Now let’s try to create a GIN index on type tsvector, and repeat the query, using an index-friendly syntax:

testdb=# CREATE INDEX part_partdescr_idx ON part USING gin(to_tsvector('english',partdescr));
CREATE INDEX
Time: 1431,550 ms (00:01,432)
testdb=# select * from part where to_tsvector('english',partdescr) @@ to_tsquery('thermostat');
   id   |   partno   |  partname   |             partdescr             | machine_id |  parttype  
--------+------------+-------------+-----------------------------------+------------+------------
 500000 | PNo:500000 | Part 500000 | thermostat for the cooling system |     500000 | Suspension
(1 row)

Time: 0,952 ms

And we get a 2000-fold speed up. Also we may note the relatively short time that it took the index to be created. You may experiment with using GiST instead of GIN in the above example, and measure the performance of reads, writes and index creation for both access methods.

BRIN Indexes

BRIN (Block Range Index) is the newest addition to the PostgreSQL’s set of index types, since it was introduced in PostgreSQL 9.5, having only a few years as a standard core feature. BRIN works on very large tables by storing summary information for a set of pages called “Block Range”. BRIN Indexes are lossy (like GiST) and this requires both extra logic in PostgreSQL’s query executor, and also the need for extra maintenance. Let’s see BRIN in action:

testdb=# select count(*) from part where machine_id BETWEEN 5000 AND 10000;
 count
-------
  5001
(1 row)

Time: 100,376 ms
testdb=# create index part_machine_id_idx_brin ON part USING BRIN(machine_id);
CREATE INDEX
Time: 569,318 ms
testdb=# select count(*) from part where machine_id BETWEEN 5000 AND 10000;
 count
-------
  5001
(1 row)

Time: 5,461 ms

Here we see on average a ~ 18-fold speedup by the use of the BRIN index. However, BRIN’s real home is in the domain of big data, so we hope to test this relatively new technology in real world scenarios in the future.

Read the whole story
internetionals
7 days ago
reply
Netherlands
Share this story
Delete

Umair Shahid: PostgreSQL 11 – Server-side Procedures (Part 2)

1 Share

Transaction control in PL procedures

A couple of months back, I wrote about how we now have the ability to write Stored Procedures in PostgreSQL. This post follows up on that and talks about the next step that was implemented: transaction control in PL procedures. The feature was committed on 22-Jan-2018.

With this addition, you now have the ability to call COMMIT and ROLLBACK commands in PL/pgSQL from within a procedure. To illustrate:

CREATE TABLE test1 (a int);

CREATE PROCEDURE transaction_test1()
 AS $$
 BEGIN
     FOR i IN 0..9 LOOP
         INSERT INTO test1 (a) VALUES (i);
         IF i % 2 = 0 THEN
             RAISE NOTICE 'i=%, txid=% will be committed', i, txid_current();
             COMMIT;
         ELSE
             RAISE NOTICE 'i=%, txid=% will be rolledback', i, txid_current();
             ROLLBACK;
         END IF;
     END LOOP;
 END
 $$
 LANGUAGE PLPGSQL;

The results are as follows:

test=# CALL transaction_test1();
 NOTICE:  i=0, txid=723 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=1, txid=724 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=2, txid=725 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=3, txid=726 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=4, txid=727 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=5, txid=728 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=6, txid=729 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=7, txid=730 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=8, txid=731 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=9, txid=732 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 CALL

test=# SELECT xmin,* FROM test1;
  xmin | a 
 ------+---
   723 | 0
   725 | 2
   727 | 4
   729 | 6
   731 | 8
 (5 rows)

Why is this important?

Well, a big driver behind getting the initial infrastructure in for PL procedures was to allow the ability to control transactions within the procedure. Functions don’t allow you to do that, they operate within a transaction. With this functionality, you will be able to operate across transactions. This allows you commit control over your transactions based on your business logic and program flow.

Very cool!

There are a few limitations. An example is that you can only use the transaction control features in a procedure called from the top level, and not one called from another procedure or function instance.

The feature implements this transaction control (in varying syntax) in each of the supplied procedural languages: PL/pgSQL, PL/Perl, PL/Python, PL/Tcl.

Read the whole story
internetionals
7 days ago
reply
Netherlands
Share this story
Delete
Next Page of Stories