Comparing Oracle MySQL Group Replication and Galera Cluster through a probability perpective seems quite interesting.
At commit time both use a group certification process that requires network round trips. The required time for these network roundtrips is what will mainly determined the cost of a transaction. Let us try to compute an estimate of the certification process cost. The duration of these network roundtrips duration can be model by random variable with an associated probability distribution law.
For Galera all nodes take part in the process so the time taken is dimensioned by the worst roundtrip duration.
For group replication it is a majority voting and the time taken is dimensioned by the worst case of the majority.
Said in another way : if we have N+1 nodes (N being an even number to be sure to have a good majority voting process) you randomly get N non correlated durations for each round trip. We exclude one node(the local node were the transaction is taking place) as no network roundtrip is required even if logically it takes part in the certification process. This local cost is anyway much smaller that the others roundtrip cots.
For Galera Cluster the longest of the N durations is what will dimension the duration of the certification process.
For Group Replication the maximum of the N/2 smallest duration will be the duration of the certification process.
For example for a 9 nodes cluster we need to compare the slowest of 8 vs the slowest of the 4 fastest among 8 😉
These two metrics have a very different statistical behaviour ! One is converging to the mean while the other one is increasing with the number of nodes and unfortunately depending on the statistical standard deviation ! The bigger the number of nodes and the bigger the standard deviation the bigger it will diverge from the mean value 🙁
What is very important to notice is the more you increase the total number of nodes in the cluster the bigger the random variable expressed by the maximum of the N random numbers grows. The speed of the growth depends on the standard deviation and the number of nodes. On the other hand the biggest of the N/2 smallest durations converges to the mean and will show less variance with more nodes.
So the more nodes we have in a cluster the biggest is the performance gap between Group replication and Galera Cluster. In the same way with a high standard deviation Galera Cluster will have a natural tendency to go far from the mean . It is important to notice that the standard deviation will naturally increase with a loaded system. For a loaded system the network roundtrip cost can statistically have outliers values far from the mean value. These outliers do not mater for Group replication but they do for Galera Cluster as the cost of certification is based on the worst roundtrip duration.
This very simple statistical fact certainly explains the difference in benchmarks comparing Group replication and Galera Cluster performance.
For those interested by the mathematics behind this behaviour the expected value for the maximum of N normal random variables is describe here :
This document basically tells that if you have N independent random variables distributed according to a normal law with mean M and standard deviation S then the expected value of the maximum of all these random variables is :
M + k*S
in our case k is a coefficient that depends on the number of nodes :
.56418958354775628695 for 2 nodes
.84628437532163443042 for 3 nodes
1.0293753730039641321 for 4 nodes
1.1629644736405196128 for 5 nodes
1.2672063606114712976 for 6 nodes
So to summarise if we have 4 nodes the Galera cluster certification process will cost on average M + 1.02*S
This can be big as it depends on the standard deviation and unfortunately when you benchmark you are close to the limit and this can introduce very high variance in the network roundtrip cost.
The distribution law was supposed to be a normal law but it seems for network roundtrip a better law might be an Erlang_distribution. I think that might just make the result even worse. There might be some folks in the readers of this blog post that have better knowledge than me on system/network modelling : . any objection is welcome 😉