One Giant Leap For SQL: MySQL 8.0 Released
“Still using SQL-92?” is the opening question of my “Modern SQL” presentation. When I ask this question, an astonishingly large portion of the audience openly admits to using 25 years old technology. If I ask who is still using Windows 3.1, which was also released in 1992, only a few raise their hand…but they’re joking, of course.
Clearly this comparison is not entirely fair. It nevertheless demonstrates that the know-how surrounding newer SQL standards is pretty lacking. There were actually five updates since SQL-92—many developers have never heard of them. The latest version is SQL:2016.
As a consequence, many developers don’t know that SQL hasn’t been limited to the relational algebra or the relational model since 1999. SQL:1999 introduced operations that don't exist in relational algebra (
lateral) and types (
arrays!) that break the traditional interpretation of the first normal form.0
Since then, so for 19 years, whether or not a SQL feature fits the relational idea isn’t important anymore. What is important is that a feature has well-defined semantics and solves a real problem. The academic approach has given way to a pragmatic one. Today, the SQL standard has a practical solution for almost every data processing problem. Some of them stay within the relational domain, while others do not.
Don’t say relational database when referring to SQL databases. SQL is really more than just relational.
It’s really too bad that many developers still use SQL in the same way it was being used 25 years ago. I believe the main reasons are a lack of knowledge and interest1 among developers along with poor support for modern SQL in database products.
Let’s have a look at this argument in the context of MySQL. Considering its market share, I think that MySQL’s lack of modern SQL has contributed more than its fair share to this unfortunate situation. I once touched on that argument in my 2013 blog post “MySQL is as Bad for SQL as MongoDB is to NoSQL”. The key message was that “MongoDB is a popular, yet poor representative of its species—just like MySQL is”. Joe Celko has expressed his opinion about MySQL differently: “MySQL is not SQL, it merely borrows the keywords from SQL”.
You can see some examples of the questionable interpretation of SQL in the MySQL WAT talk on YouTube.2 Note that this video is from 2012 and uses MySQL 5.5 (the current GA version at that time). Since then, MySQL 5.6 and 5.7 came out, which improved the situation substantially. The default settings on a fresh installation are much better now.3
It is particularly nice that they were really thinking about how to mitigate the effects of changing defaults. When they enabled
ONLY_FULL_GROUP_BY by default, for example, they went the extra mile to implement the most complete functional dependencies checking among the major SQL databases:
About the same time MySQL 5.7 was released, I stopped bashing MySQL. Of course I'm kidding. I'm still bashing MySQL occasionally…but it has become a bit harder since then.
By the way, did you know MySQL still doesn’t support
check constraints? Just as in previous versions, you can use
check constraints in the
create table statement but they are silently ignored. Yes—ignored without warning. Even MariaDB fixed that a year ago.
Uhm, I’m bashing again! Sorry—old habits die hard.
Nevertheless, the development philosophy of MySQL has visibly changed over the last few releases. What happened? You know the answer already: MySQL is under new management since Oracle bought it through Sun. I must admit: it might have been the best thing that happened to SQL in the past 10 years, and I really mean SQL—not MySQL.
The reason I think a single database release has a dramatic effect on the entire SQL ecosystem is simple: MySQL is the weakest link in the chain. If you strengthen that link, the entire chain becomes stronger. Let me elaborate.
MySQL is very popular. According to db-engines.com, it’s the second most popular SQL database overall. More importantly: it is, by a huge margin, the most popular free SQL database. This has a big effect on anyone who has to cope with more than one specific SQL database. These are often software vendors that make products like content management systems (CRMs), e-commerce software, or object-relational mappers (ORMs). Due to its immense popularity, such vendors often need to support MySQL. Only a few of them bite the bullet and truly support multiple database—Java Object Oriented Querying (jOOQ) really stands out in this regard. Many vendors just limit themselves to the commonly supported SQL dialect, i.e. MySQL.
Another important group affected by MySQL’s omnipresence are people learning SQL. They can reasonably assume that the most popular free SQL database is a good foundation for learning. What they don't know is that MySQL limits their SQL-foo to the weakest SQL dialect among those being widely used. Based loosely on Joe Celko’s statement: these people know the keywords, but don’t understand their real meaning. Worse still, they have not heard anything about modern SQL features.
Last week, that all changed when Oracle finally published a generally available (GA) release of MySQL 8.0. This is a landmark release as MySQL eventually evolved beyond SQL-92 and the purely relational dogma. Among a few other standard SQL features, MySQL now supports window functions (
over) and common table expressions (
with). Without a doubt, these are the two most important post-SQL-92 features.
The days are numbered in which software vendors claim they cannot use these features because MySQL doesn't support them. Window functions and CTEs are now in the documentation of the most popular free SQL database. Let me therefore boldly claim: MySQL 8.0 is one small step for a database, one giant leap for SQL.4
It gets even better and the future is bright! As a consequence of Oracle getting its hands on MySQL, some of the original MySQL team (among them the original creator) created the MySQL fork MariaDB. Apparently, their strategy is to add many new features to convince MySQL users to consider their competing product. Personally I think they sacrifice quality—very much like they did before with MySQL—but that’s another story. Here it is more relevant that MariaDB has been validating
check constraints for a year now. That raises a question: how much longer can MySQL afford to ignore
check constraints? Or to put it another way, how much longer can they endure my bashing ;)
check constraints, MariaDB 10.2 also introduced window functions and common table expressions (CTEs). At that time, MySQL had a beta with CTEs but no window functions. MariaDB is moving faster.5
In 10.3, MariaDB is set to release “system versioned tables”. In a nutshell: once activated for a table, system versioning keeps old versions for updated and deleted rows. By default, queries return the current version as usual, but you can use a special syntax (
as of) to get older versions. Your can read more about this in MariaDBs announcement.
System versioning was introduced into the SQL standard in 2011. As it looks now, MariaDB will be the first free SQL database supporting it. I hope this an incentive for other vendors—and also for users asking their vendors to support more modern SQL features!
Now that the adoption of modern SQL has finally gained some traction, there is only one problem left: the gory details. The features defined by the standard have many subfeatures, and due to their sheer number, it is common practice to support only some of them. That means it is not enough to say that a database supports window functions. Which window functions does it actually support? Which frame units (
groups)? The answers to these questions make all the difference between a marketing gag and a powerful feature.
In my mission to make modern SQL more accessible to developers, I’m testing these details so I can highlight the differences between products. The results of these tests are shown in matrices like the ones above. The rest of this article will thus briefly go through the new standard SQL features introduced with MySQL 8.0 and discuss some implementation differences. As you will see, MySQL 8.0 is pretty good in this regard. The notable exception is its JSON functionality.
There is SQL before window functions and SQL after window functions. Without exaggeration, window functions are a game changer. Once you understood window functions, you cannot imagine how you could ever have lived without them. The most common use cases, for example finding the best N rows per group, building running totals or moving averages, and grouping consecutive events, are just the tip of the iceberg. Window functions are one of the most important tools to avoid self-joins. That alone makes many queries less redundant and much faster. Window functions are so powerful that even newcomers like several Apache SQL implementations (Hive, Impala, Spark), NuoDB and Google BigQuery introduced them years ago. It’s really fair to say that MySQL is pretty late to this party.
The following matrix shows the support of the
over clause for some major SQL databases. As you can see, MySQL’s implementation actually exceeds the capabilities of “the world’s most advanced open source relational database”, as PostgreSQL claims on its new homepage. However, PostgreSQL 11 is set to recapture the leader position in this area.
The actual set of window functions offered by MySQL 8.0 is also pretty close to the state of the art:
Common Table Expressions (
The next major enhancement for MySQL 8.0 are common table expressions or the
with [recursive] clause. Important use cases are traversing graphs with a single query, generating an arbitrary number of rows, converting CSV strings to rows (reversed
group_concat) or just literate SQL.
Again, MySQL’s first implementation closes the gap.
Other Standard SQL Features
Besides window functions and the
with clause, MySQL 8.0 also introduces some other standard SQL features. However compared to the previous two, these are by no means killer features.
As you can see, Oracle pushes standard SQL JSON support. The Oracle database and MySQL are currently the leaders in this area (and both are from the same vendor!). The
json_arrayagg functions were even backported to MySQL 5.7.22. However, it’s also notable that MySQL doesn’t follow the standard syntax for these two functions. Modifiers defined in the standard (e.g. an
order by clause) are generally not supported.
Json_objectagg neither recognizes the keywords
value nor accepts the colon (
:) to separate attribute names and values. It looks like MySQL parses these as regular functions calls—as opposed to syntax described by the standard.
It’s also interesting to see that
null values incorrectly, very much like the Oracle database (they don’t default to
absent on null6). Seeing the same issue in two supposedly unrelated products is always interesting. Adding the fact that both products come from the same vendor adds another twist.
The two last features in the list,
grouping function (related to
rollup) and column names in the
from clause are solutions to pretty specific problems. Their MySQL 8.0 implementation is basically on par with that of other databases.
Furthermore, MySQL 8.0 also introduced standard SQL roles. The reason this is not listed in the matrix above is simple: the matrices are based on actual tests I run against all these databases. My homegrown testing framework does not yet support test cases that require multiple users—currently all test are run with a default user, so I cannot test access rights yet. However, the time for that will come—stay tuned.
Other Notable Enhancements
I'd like to close this article with MySQL 8.0 fixes and improvements that are not related to the SQL standard.
One of them is about using the
desc modifier in index declarations:
CREATE INDEX … ON … (<column> [ASC|DESC], …)
Most—if not all—databases use the same logic in the index creation as for the
order by clause, i.e. by default, the order of column values is ascending. Sometimes it is needed to sort some index columns in the opposite direction. That’s when you specify
desc in an index. Here’s what the MySQL 5.7 documentation said about this:
index_col_namespecification can end with
DESC. These keywords are permitted for future extensions for specifying ascending or descending index value storage. Currently, they are parsed but ignored; index values are always stored in ascending order.
“They are parsed but ignored”? To be more specific: they are parsed but ignored without warning very much like
check constraints mentioned above.
However, this has been fixed with MySQL 8.0. Now there is a warning. Just kidding!
Desc is honored now.
There are many other improvements in MySQL 8.0. Please refer to “What’s New in MySQL 8.0?” for a great overview. How about a small appetizer:
Histograms (optimizer statistics)
Better defaults (no more