Skip to content

Comparing Oracle MySQL Group Replication and Galera Cluster through a probability perpective seems quite interesting.

At commit time both use a group certification process that requires network round trips. The required time for these network roundtrips is what will mainly determined the cost of a transaction. Let us try to compute an estimate of the certification process cost. The duration of these network roundtrips duration can be model by random variable with an associated probability distribution law.
Keep on reading!

At FOSDEM 2014 ProxySQL and MaxScale were both presented. Both are proxy that can help build sophisticated MariaDB/MySQL architectures.
But currently what is the most used proxy with MySQL? It is HAproxy. HAproxy is a level 4 proxy that has no knowledge of the MySQL protocol. Being low level makes it very fast but it cannot accomplish any advanced proxy task. In many case this is enough for pure load balancing. But it cannot handle filtering, routing, query rewriting. This requires to understand the MySQL protocol and to look at the query passing through the proxy.
Keep on reading!

Like TokuDB, InfiniDB is now a fully open source server product. In the past infiniDB was "almost open source". The open source version was an old release with no access to the advance functions like MPP multi-server execution. This is no more the case. With InfiniDB 4 the open source version is the latest release giving access to all the advanced functionalities.

This is a really great move for the MariaDB / MySQL ecosystem. InfiniDB and TokuDB were two unique pieces of technology in the MySQL ecosystem. Having them both open source will trigger a broader adoption that will benefit to their enterprise releases.

Having in the MySQL ecosystem a column oriented database specifically designed for big data analytics is filling a real customer need. Column stores overcome the query limitations that exist in traditional RDBMS. InfiniDB is extremely good at using multicore server and massively parallel processing with multi-servers. InfiniDB can scale up on multi-cores server and scale out on a distributed architecture.

So thanks a lot to the InfiniDB team for their move. This will greatly benefit to them and to the dynamism of the MySQL / MariaDB ecosystem.

InfiniDB community web site
InfiniDB 4 sources
InfiniDB Enterprise web site

MySQL Connect 2013 has been a great edition. There was of course a lot of nice announcements of improvements in the the core MySQL server technology. One of the major announcement that received a lot of buzz was MySQL Fabric. MySQL Fabric is an infrastructure component aimed at simplifying construction of a highly available, sharded, MySQL server based architecture.

Horizontal scale out for MySQL is a hard problem. The MySQL sharding requirement has up till now only be addressed in a non general solution way. Using MySQL at Scale remains a big challenge.

All the big actors of the web have been faced with this scale out issue. They all have developed their own tools/framework to address this need. The new cloud providers have also been faced with this requirement when trying to offer database as a service solutions around MySQL with transparent elasticity.

We can mention the following Sharding solutions davelopped by the key players of the web :

  • Google/Youtube has built Vitess (in Go) and open sourced it
  • Twitter has built Gizzard (in Scala) and open sourced it
  • Tumblr has built Jetpants (in Ruby) and open sourced it
  • Facebook has also it own sharding framework which is a range based model. The HA is addressed with MHA. they have presented their architecture at various events

Theses frameworks cover the needs:

  • Master promotions,
  • Cloning slaves,
  • Supports a range-based sharding scheme for MySQL
  • Rebalancing shards,
  • Split a range-based shard into N new shards

High Availability is also addressed by most of these frameworks. MySQL Fabric also handles HA. All the improvements made around replication have helped handle correctly the HA part of the architecture in a more resilient way.

So now with the introduction of MySQL Fabric there is new sharding framework available. So what ? Interesting ?
Yes it is very interesting.

First, MySQL fabric has been written in python which in my opinion is a good point as devops love python.
Second, MySQL fabric has been released under the GPL license which is also a very good point.
Last MySQL Fabric is aimed as a general purpose sharding framework.

 

MySQL fabric is quite simple :

  • A Fabric Server holding and serving the sharding metadata.
  • A set of commands to do various action (split shard ,…)
  • Specific connectors : Java, Python to request shard location to the Fabric server through XMLRPC call.

Two questions for me:

  • What is the status of the PHP and C drivers regarding MySQL Fabric ?
  • Is it possible to fully abstract the client code from the fact that the data is sharded. Some connector config could specify the fabric server. The client code would then not be changed at all.

======

A Brief Introduction to MySQL Fabric  2013-09-21 Mats Kindahl
MySQL Fabric - Sharding - Introduction  2013-09-21 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Simple Example  2013-09-22 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Shard Maintenance  2013-09-27 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Migrating From an Unsharded to a Sharded Setup  2013-09-22 VN (Narayanan Venkateswaran)
Installing MySQL Fabric on Windows  2013-10-03 Todd Farmer
MySQL 5.7 Fabric: any good?  2013-09-23 Ulf Wendel
---
Writing a Fault-tolerant Database Application using MySQL Fabric  2013-09-21 Alfranio Junior
MySQL Fabric support in Connector/Python  2013-09-22 Geert Vanderkelen
MySQL Connector/J with Fabric Support  2013-09-21 Jess Balint

InnoDB plugin offers an adaptative compression that is very interesting for performance.
This compression works at a table level. Like most optimization techniques it is not black or white.
You have to decide depending on IO patterns / CPU usage / Memory usage / disks constraints what tables are good candidates for compression.
I was trying to define a methodology to help decide which tables to compress to reach performance gains.
I thought I could use MySQL "show status" command or information_schema tables. Unfortunately MySQL does not offer any per tables IO statistics.

InnoDB plugin offers an adaptative compression that is very interesting for performance.
This compression works at a table level. Like most optimization techniques this is not black or white.
You have to decide depending on IO patterns / CPU usage / Memory usage / disks constraints what tables are good candidates for compression.

I was trying to define a methodology to help decide which tables to compress to reach performance gains.

I thought I could use MySQL "show status" command or information_schema tables. Unfortunately MySQL does not offer any per tables IO statistics.
I thought maybe I could use some linux command. Unfortunately on linux you can only get statistics at a block device level with iostat command.
As a last ressort I thought maybe I could use MySQL Enterprise Monitor / Query Analyzer.
The MySQL Query Analyzer gives you nice aggregated statistics per queries : nb execs / mean time ...
All of these statistics are at a query level. Most queries are joins, and many tables are accessed through multiple different queries. So you get no information at a table level.
So none of the above techniques solve the issue.

Mark Callaghan had packaged a set of patch for MySQL 5.0.
This set of patches includes IO statistics per InnoDB tablespace. In conjunction with the use of the innodb_file_per_table option this gives InnoDB per tables IO statistics.

A good reason to use inodb_file_per_table -- per-table IO statistics

More changes for performance monitoring

The SHOW INNODB FILE STATUS command reports IO statistics per InnoDB tablespace and thus per table When innodb_file_per_table is used.

The InnoDB plugin that offers compression is part of MySQL 5.1. and this patch was for MySQL 5.0 only.

But a very good news came last friday :
Marc Callaghan has just released the Facebook patch for MySQL 5.1 .
This gives us a efficient tool to decide which tables are good candidate for compression.

If anyone has time to write a script that would based on these statistics and on a server workload pinpoint the tables best canditates for compression.
This would be nice. I agree this would be easier if these statistics were part of the information schema. So if someone could do that too this would be definitely a great
contribution.

By default InnoDB uses REPEATABLE READ as its isolation level. So this is the isolation level used with innoDB by almost all MySQL users.

The default isolation level of Oracle is READ COMMITTED. READ COMMITTED is the mode widely used by Oracle users. This mode incurs less penalty on the server scalability by allowing to support more concurrency.

The reason why InnoDB use REPEATABLE READ as its default is historical. This is a related to the way MySQL replication was developed . MySQL replication until 5.1 functioned with a statement based replication mechanism. This means that statements that occurs on the master server are replayed on the slave server. The statement base replication mode does not permit to use the READ COMMITTED isolation level. In that case replication will not guaranty consistency between the slave and the master.

The MySQL widely used statement based replication has some major drawbacks. The first one is that to guaranty the same effect of statements on slave as on master innoDB need to be careful when generating id through the autoincrement mechanism. The second impact is that InnoDB also needs to avoid phantoms to guaranty the same effect of a statement on the master and slave.

To generate consecutive autoincrement id during a multi rows insert a table level lock is necessary.

To handle phantom issue InnoDB has implemented a mechanism that consist in locking gaps.

UPDATE ... WHERE or a DELETE ... FROM ... WHERE requires InnoDB to set an exclusive next-key lock on every record the search encounters. This can dramatically impact concurrency on the server by forbidding inserts into the gaps.
INSERT ... SELECT will put a shared lock on all the selected rows. This basically prevent any update on the selected rows. This also seriously impact concurrency. This case is very common when reports are run on an OLTP server.
SELECT ... FROM ... FOR UPDATE sets exclusive next-key locks on all index records the search encounters and also on the corresponding clustered index records if a secondary index is used in the search.
SELECT ... FROM ... LOCK IN SHARE MODE sets shared next-key locks on all index records the search encounters

Oracle does not behave that way. The Oracle replication mechanism is a row based mechanism. None of the blocking locks mentioned above with InnoDB will happen with Oracle.

Can we have with MySQL the same behavior as Oracle users ?

Yes we can ! Since version 5.1 MySQL offers a row based replication mode closed to what ORACLE is doing. This basically allow to run MySQL in READ COMMITTED isolation level. This can really benefit to MySQL performance and scalability. To achieve that it is mandatory to use Row based replication. This suppress most restrictions related to MySQL statement based replication. The READ COMMITTED isolation level relax most of the locking issues impaired by the REPEATABLE READ MODE. This seems anyway to be used rarely by MySQL 5.1 users !

I have no figures to prove performance improvement impacts of READ COMMITED isolation level with InnoDB,
so you should give it a try ! 🙂