574 stories
1 follower

Markus Winand: One Giant Leap For SQL: MySQL 8.0 Released

1 Share

One Giant Leap For SQL: MySQL 8.0 Released

“Still using SQL-92?” is the opening question of my “Modern SQL” presentation. When I ask this question, an astonishingly large portion of the audience openly admits to using 25 years old technology. If I ask who is still using Windows 3.1, which was also released in 1992, only a few raise their hand…but they’re joking, of course.

Clearly this comparison is not entirely fair. It nevertheless demonstrates that the know-how surrounding newer SQL standards is pretty lacking. There were actually five updates since SQL-92—many developers have never heard of them. The latest version is SQL:2016.

As a consequence, many developers don’t know that SQL hasn’t been limited to the relational algebra or the relational model since 1999. SQL:1999 introduced operations that don't exist in relational algebra (with recursive, lateral) and types (arrays!) that break the traditional interpretation of the first normal form.0

Since then, so for 19 years, whether or not a SQL feature fits the relational idea isn’t important anymore. What is important is that a feature has well-defined semantics and solves a real problem. The academic approach has given way to a pragmatic one. Today, the SQL standard has a practical solution for almost every data processing problem. Some of them stay within the relational domain, while others do not.


Don’t say relational database when referring to SQL databases. SQL is really more than just relational.

It’s really too bad that many developers still use SQL in the same way it was being used 25 years ago. I believe the main reasons are a lack of knowledge and interest1 among developers along with poor support for modern SQL in database products.

Let’s have a look at this argument in the context of MySQL. Considering its market share, I think that MySQL’s lack of modern SQL has contributed more than its fair share to this unfortunate situation. I once touched on that argument in my 2013 blog post “MySQL is as Bad for SQL as MongoDB is to NoSQL”. The key message was that “MongoDB is a popular, yet poor representative of its species—just like MySQL is”. Joe Celko has expressed his opinion about MySQL differently: “MySQL is not SQL, it merely borrows the keywords from SQL”.

You can see some examples of the questionable interpretation of SQL in the MySQL WAT talk on YouTube.2 Note that this video is from 2012 and uses MySQL 5.5 (the current GA version at that time). Since then, MySQL 5.6 and 5.7 came out, which improved the situation substantially. The default settings on a fresh installation are much better now.3

It is particularly nice that they were really thinking about how to mitigate the effects of changing defaults. When they enabled ONLY_FULL_GROUP_BY by default, for example, they went the extra mile to implement the most complete functional dependencies checking among the major SQL databases:

Availability of Functional Dependencies

About the same time MySQL 5.7 was released, I stopped bashing MySQL. Of course I'm kidding. I'm still bashing MySQL occasionally…but it has become a bit harder since then.

By the way, did you know MySQL still doesn’t support check constraints? Just as in previous versions, you can use check constraints in the create table statement but they are silently ignored. Yes—ignored without warning. Even MariaDB fixed that a year ago.

Availability of CHECK constraints

Uhm, I’m bashing again! Sorry—old habits die hard.

Nevertheless, the development philosophy of MySQL has visibly changed over the last few releases. What happened? You know the answer already: MySQL is under new management since Oracle bought it through Sun. I must admit: it might have been the best thing that happened to SQL in the past 10 years, and I really mean SQL—not MySQL.

The reason I think a single database release has a dramatic effect on the entire SQL ecosystem is simple: MySQL is the weakest link in the chain. If you strengthen that link, the entire chain becomes stronger. Let me elaborate.

MySQL is very popular. According to db-engines.com, it’s the second most popular SQL database overall. More importantly: it is, by a huge margin, the most popular free SQL database. This has a big effect on anyone who has to cope with more than one specific SQL database. These are often software vendors that make products like content management systems (CRMs), e-commerce software, or object-relational mappers (ORMs). Due to its immense popularity, such vendors often need to support MySQL. Only a few of them bite the bullet and truly support multiple database—Java Object Oriented Querying (jOOQ) really stands out in this regard. Many vendors just limit themselves to the commonly supported SQL dialect, i.e. MySQL.

Another important group affected by MySQL’s omnipresence are people learning SQL. They can reasonably assume that the most popular free SQL database is a good foundation for learning. What they don't know is that MySQL limits their SQL-foo to the weakest SQL dialect among those being widely used. Based loosely on Joe Celko’s statement: these people know the keywords, but don’t understand their real meaning. Worse still, they have not heard anything about modern SQL features.

Last week, that all changed when Oracle finally published a generally available (GA) release of MySQL 8.0. This is a landmark release as MySQL eventually evolved beyond SQL-92 and the purely relational dogma. Among a few other standard SQL features, MySQL now supports window functions (over) and common table expressions (with). Without a doubt, these are the two most important post-SQL-92 features.

The days are numbered in which software vendors claim they cannot use these features because MySQL doesn't support them. Window functions and CTEs are now in the documentation of the most popular free SQL database. Let me therefore boldly claim: MySQL 8.0 is one small step for a database, one giant leap for SQL.4

It gets even better and the future is bright! As a consequence of Oracle getting its hands on MySQL, some of the original MySQL team (among them the original creator) created the MySQL fork MariaDB. Apparently, their strategy is to add many new features to convince MySQL users to consider their competing product. Personally I think they sacrifice quality—very much like they did before with MySQL—but that’s another story. Here it is more relevant that MariaDB has been validating check constraints for a year now. That raises a question: how much longer can MySQL afford to ignore check constraints? Or to put it another way, how much longer can they endure my bashing ;)

Besides check constraints, MariaDB 10.2 also introduced window functions and common table expressions (CTEs). At that time, MySQL had a beta with CTEs but no window functions. MariaDB is moving faster.5

In 10.3, MariaDB is set to release “system versioned tables”. In a nutshell: once activated for a table, system versioning keeps old versions for updated and deleted rows. By default, queries return the current version as usual, but you can use a special syntax (as of) to get older versions. Your can read more about this in MariaDBs announcement.

System versioning was introduced into the SQL standard in 2011. As it looks now, MariaDB will be the first free SQL database supporting it. I hope this an incentive for other vendors—and also for users asking their vendors to support more modern SQL features!

Now that the adoption of modern SQL has finally gained some traction, there is only one problem left: the gory details. The features defined by the standard have many subfeatures, and due to their sheer number, it is common practice to support only some of them. That means it is not enough to say that a database supports window functions. Which window functions does it actually support? Which frame units (rows, range, groups)? The answers to these questions make all the difference between a marketing gag and a powerful feature.

In my mission to make modern SQL more accessible to developers, I’m testing these details so I can highlight the differences between products. The results of these tests are shown in matrices like the ones above. The rest of this article will thus briefly go through the new standard SQL features introduced with MySQL 8.0 and discuss some implementation differences. As you will see, MySQL 8.0 is pretty good in this regard. The notable exception is its JSON functionality.

Window Functions

There is SQL before window functions and SQL after window functions. Without exaggeration, window functions are a game changer. Once you understood window functions, you cannot imagine how you could ever have lived without them. The most common use cases, for example finding the best N rows per group, building running totals or moving averages, and grouping consecutive events, are just the tip of the iceberg. Window functions are one of the most important tools to avoid self-joins. That alone makes many queries less redundant and much faster. Window functions are so powerful that even newcomers like several Apache SQL implementations (Hive, Impala, Spark), NuoDB and Google BigQuery introduced them years ago. It’s really fair to say that MySQL is pretty late to this party.

The following matrix shows the support of the over clause for some major SQL databases. As you can see, MySQL’s implementation actually exceeds the capabilities of “the world’s most advanced open source relational database”, as PostgreSQL claims on its new homepage. However, PostgreSQL 11 is set to recapture the leader position in this area.

Availability of OVER

The actual set of window functions offered by MySQL 8.0 is also pretty close to the state of the art:

Availability of Window-Functions

Common Table Expressions (with [recursive])

The next major enhancement for MySQL 8.0 are common table expressions or the with [recursive] clause. Important use cases are traversing graphs with a single query, generating an arbitrary number of rows, converting CSV strings to rows (reversed listagg / group_concat) or just literate SQL.

Again, MySQL’s first implementation closes the gap.

Availability of WITH

Other Standard SQL Features

Besides window functions and the with clause, MySQL 8.0 also introduces some other standard SQL features. However compared to the previous two, these are by no means killer features.

Other new standard SQL features in MySQL 8.0

As you can see, Oracle pushes standard SQL JSON support. The Oracle database and MySQL are currently the leaders in this area (and both are from the same vendor!). The json_objectagg and json_arrayagg functions were even backported to MySQL 5.7.22. However, it’s also notable that MySQL doesn’t follow the standard syntax for these two functions. Modifiers defined in the standard (e.g. an order by clause) are generally not supported. Json_objectagg neither recognizes the keywords key and value nor accepts the colon (:) to separate attribute names and values. It looks like MySQL parses these as regular functions calls—as opposed to syntax described by the standard.

It’s also interesting to see that json_arrayagg handles null values incorrectly, very much like the Oracle database (they don’t default to absent on null6). Seeing the same issue in two supposedly unrelated products is always interesting. Adding the fact that both products come from the same vendor adds another twist.

The two last features in the list, grouping function (related to rollup) and column names in the from clause are solutions to pretty specific problems. Their MySQL 8.0 implementation is basically on par with that of other databases.

Furthermore, MySQL 8.0 also introduced standard SQL roles. The reason this is not listed in the matrix above is simple: the matrices are based on actual tests I run against all these databases. My homegrown testing framework does not yet support test cases that require multiple users—currently all test are run with a default user, so I cannot test access rights yet. However, the time for that will come—stay tuned.

Other Notable Enhancements

I'd like to close this article with MySQL 8.0 fixes and improvements that are not related to the SQL standard.

One of them is about using the desc modifier in index declarations:

CREATE INDEX … ON … (<column> [ASC|DESC], …)

Most—if not all—databases use the same logic in the index creation as for the order by clause, i.e. by default, the order of column values is ascending. Sometimes it is needed to sort some index columns in the opposite direction. That’s when you specify desc in an index. Here’s what the MySQL 5.7 documentation said about this:

An index_col_name specification can end with ASC or DESC. These keywords are permitted for future extensions for specifying ascending or descending index value storage. Currently, they are parsed but ignored; index values are always stored in ascending order.

“They are parsed but ignored”? To be more specific: they are parsed but ignored without warning very much like check constraints mentioned above.

However, this has been fixed with MySQL 8.0. Now there is a warning. Just kidding! Desc is honored now.

There are many other improvements in MySQL 8.0. Please refer to “What’s New in MySQL 8.0?” for a great overview. How about a small appetizer:

One Giant Leap For SQL: MySQL 8.0 Released” by Markus Winand was originally published at modern SQL.

Read the whole story
4 hours ago
Share this story

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 11 – Add json(b)_to_tsvector function

1 Share
On 7th of April 2018, Teodor Sigaev committed patch: Add json(b)_to_tsvector function Jsonb has a complex nature so there isn't best-for-everything way to convert it to tsvector for full text search. Current to_tsvector(json(b)) suggests to convert only string values, but it's possible to index keys, numerics and even booleans value. To solve that json(b)_to_tsvector has […]
Read the whole story
1 day ago
Share this story

Dimitri Fontaine: PostgreSQL Data Types: XML

1 Share

Continuing our series of PostgreSQL Data Types today we’re going to introduce the PostgreSQL XML type.

The SQL standard includes a SQL/XML which introduces the predefined data type XML together with constructors, several routines, functions, and XML-to-SQL data type mappings to support manipulation and storage of XML in a SQL database, as per the Wikipedia page.

Read the whole story
1 day ago
Share this story

Alexey Lesovsky: Let’s speed things up.

1 Share

Parallelism settings and how to accelerate your database performance.

Few days ago, when performing a routine audit for our client, we noticed that parallel queries in the database were disabled. It wouldn’t be surprising but it’s important to note that our client has powerful servers that run not only lightweight OLTP queries. When we brought it to our client’s attention his first question was "How can we enable it?" and after that he reasonably added "Will there be any negative consequences if we do?"

This kind of question pops up more often than one thinks, hence this post where we look into parallel queries in detail.

Parallel queries were introduced 2 releases ago, and some significant changes were made in Postgres 10, thus a configure procedure in 9.6 and 10 is slightly different. Also, in the upcoming Postgres 11, there are several new features added related to parallelism.

Parallel queries feature has been introduced in Postgres 9.6 and is available for a limited set of operations such as SeqScan, Aggregate and Join. In Postgres 10 the list of supported operations has been broadened to: Bitmap Heap Scan, Index Scan, Index-Only Scan.
Postgres 11 (hopefully!) will be released with the support of parallel index creation and support of parallel DML commands that create tables, such as CREATE TABLE AS, SELECT INTO and CREATE MATERIALIZED VIEW.

parallelism has been introduced in 9.6 it has been disabled by default, but in Postgres 10 it has been enabled - our client’s database runs on 9.6, so this explains why it was disabled. Depending on the Postgres version there are 2 or 3 parameters which enable parallel queries and they are located in postgresql.conf in the "Asynchronous Behavior" section.

The first one is max_worker_processes which has been added in Postgres 9.4 and sets the limit of background processes that Postgres can run (the default is 8). It includes background workers and doesn't include system background processes such as checkpointer, bgwriter, wal senders/receivers, etc. Note, that to change this parameter the Postgres needs to be restarted.

The second parameter is max_parallel_workers_per_gather, which is 0 by default in Postgres 9.6 and means that parallel queries feature is disabled. To enable it, it must be greater than 0. In Postgres 10, the default value has been changed to 2. This parameter defines maximum number of allowed workers per single parallel query.

The third parameter is max_parallel_workers, added in Postgres 10 and defines maximum number of workers only for parallel queries, because of max_worker_processes also relates to the background workers.

Overall, these three parameters define a general limit of workers and limit of workers used for parallel queries. What values should be used? This will depend on the number of CPU cores and resources capacity of any specific storage system. It's obvious that parallel queries may consume more resources than non-parallel queries, it is related to CPU time, memory, IO and so on. Imagine that Postgres launches high number of parallel queries and parallel queries pool is totally exhausted, in this case, system must have free cores and storage throughput to still be able to run non-parallel queries without performance slowdown. For example, for system with 32 CPU cores and SSDs, a good starting point is:
  • max_worker_processes = 12
  • max_parallel_workers_per_gather = 4
  • max_parallel_workers = 12
These settings allow to run at least 3 parallel queries concurrently with maximum of 4 workers per query, and to have 20 cores for other, non-parallel queries. If using background workers max_worker_processes should be increased accordingly.

That, of course, is not all the parameters that need to be taken into consideration when deciding between parallel and non-parallel queries. In "Planner Cost Constants" section of postgresql.conf additional advanced settings can be found.

parallel_tuple_cost and parallel_setup_cost define the extra costs that being added to the total query's cost in case of using parallelism. Another one is min_parallel_relation_size which is only available in Postgres 9.6, it defines a minimal size of relations that can be scanned in parallel. In Postgres 10, indexes also can be scanned in parallel, so this parameter has been split to min_parallel_table_scan_size and min_parallel_index_scan_size - that allow to perform same action for tables and indexes respectively.

In general, cost parameters can be leaved as is. Their configuration might need a little bit of adjustment in rare cases for tuning of particular queries.
I'd also like to mention that in Postgres 11 parallel index creation will be introduced and additional parameter will be added - max_parallel_maintenance_workers that defines the maximum number of workers used in CREATE INDEX command.

Do you use parallelism? Did you have any issues using it? Would be great to hear your thoughts!
Read the whole story
1 day ago
Share this story

Laurenz Albe: What’s in an xmax?

1 Share
Merry xmax!
Based on an image by Robo Android under the Creative Commons Attribution 2.0 license


xmax is a PostgreSQL system column that is used to implement Multiversion Concurrency Control (MVCC). The documentation is somewhat terse:

The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. It is possible for this column to be nonzero in a visible row version. That usually indicates that the deleting transaction hasn’t committed yet, or that an attempted deletion was rolled back.

While this is true, the presence of “weasel words” like “usually” indicates that there is more to the picture. This is what I want to explore in this article.

The two meanings of xmax

I’ll follow the PostgreSQL convention to use the word “tuple” for “row version” (remember that PostgreSQL implements MVCC by holding several versions of a row in the table).

xmax is actually used for two purposes in PostgreSQL:

  • It stores the transaction ID (“xid”) of the transaction that deleted the tuple, like the documentation says. Remember that UPDATE also deletes a tuple in PostgreSQL!
  • It stores row locks on the tuple.

This is possible, because a tuple cannot be locked and deleted at the same time: normal locks are only held for the duration of the transaction, and a tuple is deleted only after the deleting transaction has committed.

Storing row locks on the tuple itself has one vast advantage: it avoids overflows of the “lock table”. The lock table is a fixed-size area that is allocated in shared memory during server startup and could easily be too small to hold all the row locks from a bigger transaction. To cope with this, you’d need techniques like “lock escalation” that are difficult to implement, impact concurrency and lead to all kinds of nasty problems.

There is also a down side to storing row locks in the tuple: each row lock modifies the table, and the modified blocks have to be written back to persistent storage. This means that row locks lead to increased I/O load.

But a number of questions remain:

  • How can you tell which of the two meanings xmax has in a tuple?
  • How can I tell if xmax is valid or not?
  • How exactly are row locks stored?

We will dive deeper in the rest of this article to answer these questions.

An example

In the following, I’ll use a simple schema for demonstration. I am using PostgreSQL v10, but this hasn’t changed in the last couple of releases.

   p_id integer PRIMARY KEY,
   p_val text

   c_id integer PRIMARY KEY,
   p_id integer REFERENCES parent(p_id),
   c_val text

INSERT INTO parent (p_id, p_val)
   VALUES (42, 'parent');

Now let’s look at the relevant system columns:

session1=# SELECT ctid, xmin, xmax, p_id, p_val FROM parent;

 ctid  | xmin  | xmax | p_id | p_val  
 (0,1) | 53163 |    0 |   42 | parent
(1 row)

This is the simple view we expect to see: ctid is the physical location of the tuple (Block 0, item 1), xmin contains the ID of the inserting transaction, and xmax is zero because the row is alive.

Now let’s start a transaction in session 1 and delete the row:

session1=# BEGIN;
session1=# DELETE FROM parent WHERE p_id = 42;

Then session 2 can see that xmax has changed:

session2=# SELECT ctid, xmin, xmax, p_id, p_val FROM parent;

 ctid  | xmin  | xmax  | p_id | p_val  
 (0,1) | 53163 | 53165 |   42 | parent
(1 row)

But wait, we change our mind in session 1 and undo the change:

session1=# ROLLBACK;

To find out what xmax means in this case, let’s call in the cavalry.

pageinspect comes to the rescue

PostgreSQL comes with a “contrib” module called pageinspect that can be used to examine the actual contents of table blocks. It is installed with


We’ll use two of its functions:

  • get_raw_page: reads one 8kB block from the table’s data file
  • heap_page_item_attrs: for each tuple in a data block, this returns the tuple metadata and data

Needless to say, these functions are superuser only.

heap_page_item_attrs returns an integer field named t_infomask that contains several flags, some of which tell us the meaning of xmax. To get the full story, you’ll have to read the code in src/include/access/htup_details.h.

Let’s have a look at table block 0, which contains our tuple:

session2=# SELECT lp, 
       t_ctid AS ctid,
       t_xmin AS xmin,
       t_xmax AS xmax,
       (t_infomask & 128)::boolean AS xmax_is_lock,
       (t_infomask & 1024)::boolean AS xmax_committed,
       (t_infomask & 2048)::boolean AS xmax_rolled_back,
       (t_infomask & 4096)::boolean AS xmax_multixact,
       t_attrs[1] AS p_id,
       t_attrs[2] AS p_val
FROM heap_page_item_attrs(
        get_raw_page('parent', 0), 

-[ RECORD 1 ]----+-----------------
lp               | 1
ctid             | (0,1)
xmin             | 53163
xmax             | 53165
xmax_is_lock     | f
xmax_committed   | f
xmax_rolled_back | f
xmax_multixact   | f
p_id             | \x2a000000
p_val            | \x0f706172656e74

The attributes p_id and p_val are displayed in binary form.

The information in the tuple doesn’t tell us whether the transaction that set xmax has been committed or rolled back, so we (and PostgreSQL when it inspects the tuple) still don’t know what to make of xmax. That is because PostgreSQL does not update the tuple when a transaction ends.

To resolve that uncertainty, we’d have to look at the commit log that stores the state of each transaction. The commit log is persisted in the pg_xact subdirectory of the PostgreSQL data directory (pg_clog in older versions).

A SELECT that modifies data

We cannot examine the commit log from SQL, but when any database transaction reads the tuple and looks up the commit log, it will persist the result in the tuple so that the next reader does not have to do it again (this is called “setting the hint bits”).

So all we have to do is to read the tuple:

session2=# SELECT ctid, xmin, xmax, p_id, p_val FROM parent;

 ctid  | xmin  | xmax  | p_id | p_val  
 (0,1) | 53163 | 53165 |   42 | parent
(1 row)

This changes the information stored in the tuple. Let’s have another look with pageinspect:

-[ RECORD 1 ]----+-----------------
lp               | 1
ctid             | (0,1)
xmin             | 53163
xmax             | 53165
xmax_is_lock     | f
xmax_committed   | f
xmax_rolled_back | t
xmax_multixact   | f
p_id             | \x2a000000
p_val            | \x0f706172656e74

The SELECT statement has set the flags on the tuple, and now we can see that xmax is from a transaction that was rolled back and should be ignored.

As an aside, that means that the first reader of a tuple modifies the tuple, causing surprising write I/O. This is annoying, but it is the price we pay for instant COMMIT and ROLLBACK. It is also the reason why it is a good idea to either use COPY … (FREEZE) to bulk load data or to VACUUM the data after loading.

Now we know how to determine if xmax is from a valid transaction or not, but what about row locks?

Row locks and xmax

Rows are locked by data modifying statements, but there is a simple way to lock a row without inserting or deleting tuples:

session1=# BEGIN;
session1=# SELECT * FROM parent WHERE p_id = 42 FOR UPDATE;

 p_id | p_val  
   42 | parent
(1 row)

Now what does pageinspect tell us?

-[ RECORD 1 ]----+-----------------
lp               | 1
ctid             | (0,1)
xmin             | 53163
xmax             | 53166
xmax_is_lock     | t
xmax_committed   | f
xmax_rolled_back | f
xmax_multixact   | f
p_id             | \x2a000000
p_val            | \x0f706172656e74

We see that the row is locked. In this case, it is a FOR UPDATE lock, but the query does not distinguish between the lock modes for simplicity’s sake. You’ll notice that xmax again is neither committed nor rolled back, but we don’t care because we know it is a row lock.

xmax is set to 53166, which is the transaction ID of the locking transaction. Let’s close that transaction to continue:

session1=# COMMIT;

PostgreSQL does not have to set hint bits here — if xmax contains a row lock, the row is active, no matter what the state of the locking transaction is.

If you think you have seen it all, you are in for a surprise.

Multiple locks on a single row

In the previous example we have seen that PostgreSQL stores the transaction ID of the locking transaction in xmax. This works fine as long as only a single transaction holds a lock on that tuple. With exclusive locks like the one that SELECT … FOR UPDATE takes, this is always the case.

But PostgreSQL also knows other row locks, for example the FOR KEY SHARE lock that is taken on the destination of a foreign key constraint to prevent concurrent modification of the keys in that row. Let’s insert some rows in the child table:

session1=# BEGIN;
session1=# INSERT INTO child (c_id, p_id, c_val)
   VALUES (1, 42, 'first');

session2=# BEGIN;
session2=# INSERT INTO child (c_id, p_id, c_val)
   VALUES (2, 42, 'second');

Now let’s look at our parent row again:

-[ RECORD 1 ]----+-----------------
lp               | 1
ctid             | (0,1)
xmin             | 53163
xmax             | 3
xmax_is_lock     | t
xmax_committed   | f
xmax_rolled_back | f
xmax_multixact   | t
p_id             | \x2a000000
p_val            | \x0f706172656e74

That “3” in xmax cannot be a transaction ID (they keep counting up), and the xmax_multixact flag is set.

This is the ID of a “multiple transaction object”, called “mulitxact” in PostgreSQL jargon for lack of a better word. Such objects are created whenever more than one transaction locks a row, and their IDs are also counted up (you can tell that this database needs few of them). Multixacts are persisted in the pg_multixact subdirectory of the data directory.

You can get information about the members of a multixact with the undocumented pg_get_multixact_members function:

session2=# SELECT * FROM pg_get_multixact_members('3');
  xid  | mode  
 53167 | keysh
 53168 | keysh
(2 rows)

Now you really know what is in an xmax!

The post What’s in an xmax? appeared first on Cybertec.

Read the whole story
1 day ago
Share this story

Shaun M. Thomas: PG Phriday: Securing PgBouncer

1 Share

We all love PgBouncer. It’s a great way to multiplex tens, hundreds, or even thousands of client connections to a small handful of Postgres sessions. What isn’t necessarily so endearing, is that it can’t pass authentication from itself to Postgres, as each Postgres session may exist before the connection to PgBouncer is established. Or can it? Let’s explore how things have evolved, but the news never really got out.

Tell you what I got in mind

As a proxy, PgBouncer authenticates on a per user/database basis. Since Postgres authenticates sessions before they connect, PgBouncer used to have no way to re-auth its own connections. The old, and probably most prevalent way to circumvent this, was to build a userlist.txt file that contained every user and password hash that should be allowed to connect through PgBouncer.

But is that what we really want? This means there’s a static record of every username and password combination on disk. It also means any time a password in the list changes, we must regenerate that file. In a world of databases, this kind of denormalization isn’t ideal.

Luckily we can fix it, though there are several steps involved. We should also note that the default value of auth_query is a direct query to pg_shadow. This is generally bad practice, and the official documentation includes a more secure example using a callable function. We will be using a derived example for this demonstration.

What I have, I knew was true

To keep things simple, we’ll assume there’s a local pgbouncer running as the postgres OS user. One benefit to this, is that we can lock down Postgres itself to only allow local connections as well. This prevents users from connecting to Postgres directly, even if they had the full connection string to do so.

For a setup like this, we might find this line or something similar in the pgbouncer.ini file, under the [databases] heading:


* = host=localhost auth_user=pgbouncer

This particular line means any PgBouncer session will connect to the Postgres server running on or ::1 only. It also makes use of the new auth_user syntax that makes all of this magic work properly. With that enabled, there’s only one more change we need to make to PgBouncer itself.

Won’t you ever set me free?

In the [pgbouncer] section of the pgbouncer.ini file, we need to specify a query. In the documentation, they use an example that directly interrogates the pg_shadow view. This view, and the underlying pg_authid table where Postgres stores authentication, are only available to superusers by default.

Do we really want PgBouncer to operate as a superuser? Of course not! If we use a set-returning function instead, PgBouncer can obtain credentials for comparison without being a superuser. This is how we set it up in our configuration under the [pgbouncer] section of the config:


auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
auth_query = SELECT * FROM pgbouncer.get_auth($1)

Once we’ve made these changes, we just need to supply two things:

  1. The pgbouncer Postgres user.
  2. The get_auth set-returning function.

But to me there’s no surprise

First, let’s start by creating the pgbouncer Postgres user. This allows PgBouncer to connect as itself and invoke the get_auth function.

CREATE USER pgbouncer WITH PASSWORD 'changeme';

Note how we did nothing but create the user and give it a password. It has no special privileges, and currently no extra grants. Now as the postgres OS user, we can export this information to the userlist.txt file that used to be required for all users.

cat <> ~/.pgpass
chmod 600 ~/.pgpass

The second step either creates, or appends our user to, a .pgpass file for the Postgres user. PgBouncer makes use of libpq, the global API library used for establishing and interacting with Postgres sessions. If such a file exists, specific passwords are automatically supplied if they match a corresponding entry. We chose to only allow pgbouncer to supply the password when connecting locally.

We also need to ensure this line is somewhere near the top of our pg_hba.conf file for Postgres itself:

host    all    pgbouncer    md5

This ensures PgBouncer can only connect locally, and only via the password we created.

This waitin’ ’round’s killin’ me

This still leaves the necessary authentication retrieval function. This is the tricky part that can easily go wrong. Let’s start with the function itself:


CREATE OR REPLACE FUNCTION pgbouncer.get_auth(p_usename TEXT)
RETURNS TABLE(username TEXT, password TEXT) AS
    RAISE WARNING 'PgBouncer auth request: %', p_usename;

    SELECT usename::TEXT, passwd::TEXT FROM pg_catalog.pg_shadow
     WHERE usename = p_usename;

To keep things organized, we placed the function in its own schema. This will need to exist in any database where PgBouncer will proxy clients. Note that we also specifically prepended the pg_shadow view with the pg_catalog schema. This is necessary to prevent an attacker from supplying a substitute search path and obtaining data from an arbitrary table. Even though the pgbouncer user is not a superuser, it’s good to be thorough.

We even raise a warning any time PgBouncer authenticates instead of Postgres. This will log the attempt to the Postgres logs in most cases for auditing purposes. We use WARNING here because that’s the default setting for log_min_messages. In systems that use a lower value like NOTICE, our function could follow suit.

Things go wrong, they always do

Despite these safeguards, eagle-eyed readers are probably already cringing. Why? Let’s connect as the pgbouncer user and call our function.

SELECT * FROM pgbouncer.get_auth('postgres');

 usename  |               passwd                
 postgres | md54aeec1a9950d60e0d3e98a5b136222f0

Isn’t that what we wanted? Yes and no. We created the function as the postgres superuser, and we never granted use of the function to anyone. Yet the pgbouncer user can invoke it. This is due primarily to how Postgres implements function security. While functions can execute SQL, they’re more commonly associated with calculations. Think set theory: f(x) = y. Thus functions are automatically granted access to PUBLIC, an alias for all users.

So the function that circumvents the security of pg_shadow is now callable by every user in the database. Oops!

To really lock this down, we need to explicitly revoke permissions, and then grant them only to the pgbouncer user.

REVOKE ALL ON FUNCTION pgbouncer.get_auth(p_usename TEXT) FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pgbouncer.get_auth(p_usename TEXT) TO pgbouncer;

Once we’ve done that, this is what we should see if we use any user other than pgbouncer:

SELECT * FROM pgbouncer.get_auth('postgres');

ERROR:  permission denied for function get_auth

That was a close one!

Everything that’s serious lasts

Once we reload Postgres and PgBouncer, all of these changes should activate. Now we never have to update userlist.txt unless we change the pgbouncer password itself. We also log any password auth attempt to Postgres in case the pgbouncer user is compromised and someone invokes a dictionary attack against our function. Can we go even further? Sure:

  • Exclude superusers from being returned in the auth function at all. Elevated privilege users or roles should not connect through the proxy. This prevents accidentally or maliciously exposing superuser password hashes to PgBouncer.
  • Add a WHERE clause so the function can only auth against specific users. This ensures only front-end connections can operate through the proxy and are thus compatible with the auth function.
  • Join against pg_auth_members and only allow users within certain groups. This is the same as the previous point, but more generally useful. Users within a use_proxy group for example, would allow us to control authentication with a simple GRANT statement, rather than modifying the function.

With all of those in mind, a more secure version of our query might look like this:

SELECT u.rolname::TEXT, u.rolpassword::TEXT
  FROM pg_authid g
  JOIN pg_auth_members m ON (m.roleid = g.oid)
  JOIN pg_authid u ON (u.oid = m.member)
 WHERE NOT u.rolsuper
   AND g.rolname = 'use_proxy'
   AND u.rolname = p_username;

There really is no limit to how we can secure or otherwise enhance this function. However, if we truly had our preferences, PgBouncer would call a function and pass in the username and password values as parameters. This would allow the function to return merely a True or False answer if authentication is allowed, and not directly expose password hashes to a tertiary layer.

Unfortunately that by itself would be a security flaw. In many production systems full query logging is enabled, which writes a query and its parameters to the Postgres log. Giving an in-memory binary access to hashes is much different than directly exposing unsalted passwords in a plain-text log. Even if the function accepted a salted and hashed password, having these in the log would still be unnecessarily permissive.

But beyond that, we still have access to a far more versatile approach than before. Hopefully it starts to catch on instead of languishing in relative obscurity. PgBouncer is too good of a tool to allow one of its best features to go unused. If you’re not already using auth_user and auth_query, why not give it a try?

Read the whole story
2 days ago
Share this story
Next Page of Stories