Skip to content

Azure Database for MySQL is a PaaS offer. It has a specific architecture that relies on a gateway. This has a huge advantage in the way it handle High availability. If a server fails it will automatically restart. The storage for the database is highly resilient and will be reconnected to the new server. You get HA out of the box without having to care about replica and failover handling.

if we look at a connection to a Azure Database for MySQL it is different from a usual MySQL connection.

mysql -h mysqlpaasmaster.mysql.database.azure.com \
  -u sbtest@mysqlpaasmaster -p \
  --ssl-mode=REQUIRED

we notice :
hostname : mysqlpaasmaster.mysql.database.azure.com
username : sbtest@mysqlpaasmaster

Why do we have the instance name in the username ?
If we look at what the host name is, using the unix host command (dig would also do the trick).

$ host mysqlpaasmaster.mysql.database.azure.com
mysqlpaasmaster.mysql.database.azure.com is an alias for cr5.northeurope1-a.control.database.windows.net.
cr5.northeurope1-a.control.database.windows.net has address 52.138.224.6

The host name is just an alias to a gateway server (it is not an A record in the DNS). So the host you connect to is specific to the database's region but carry no information about the mysql instance you connect to. This explains why when you connect you need to embed the database name into the user name. This is the only way for the gateway to know which instance you want to connect to.

Does this fit with proxySQL ? We might think No. But in fact it works perfectly with ProxySQL. ProxySQL which knows the backends hostnames is able to inject this hostname in the MySQL protocol when talking to the azure gateway. This is possible thanks to the fact that ProxySQL uses MariaDB Connector C to communicate with the gateway. With MariaDB Connector C it is possible in the connection options to add '_server_host' which contains the hostname information. In this way the gateway knows what instance is referenced without having it in the username.

ProxySQL is a fantastic technology widely used on MySQL / MariaDB architectures on premise or in the cloud. It has a nice design with the concept of host groups and query rules used to route queries to the desired backend server (based on port or regex).

To achieve this routing proxySQL uses a set of users that will potentially connect to multiple backends depending on the status of these backends and the routing query rules. This is also the same for the monitoring user that is common to all the backends.

So now to test this how this works with Azure Database for MySQL I have setup a Master with 2 replicas (that can be geo replica in another region if you wish). I have created a single user 'sbtest' in proxySQL. On this setup I run a simple sysbench to inject traffic. I use the oltp_read_write.lua script to generate insert, update, delete and select to validate that the read write splitting is working correctly. And it works like a charm 🙂

Here are the host groups, 10 for writes and 20 for reads. Hostgroup 20 contains the 2 replicas plus the master that can also be used for reads(if you want it to focus on write you can put a low weight). Hostgroup 10 contains only the master :

MySQL > SELECT hostgroup_id,hostname,STATUS,comment,use_ssl FROM mysql_servers;
+--------------+--------------------------------------------+--------+-------------+---+
| hostgroup_id | hostname                                   | STATUS | comment     |use_ssl
+--------------+-----------------,---------------------------+--------+-------------+----
| 10           | mysqlpaasmaster.mysql.database.azure.com   | ONLINE | WRITE GROUP | 1 |
| 20           | mysqlpaasreplica1.mysql.database.azure.com | ONLINE | READ GROUP  | 1 |
| 20           | mysqlpaasreplica2.mysql.database.azure.com | ONLINE | READ GROUP  | 1 |
| 20           | mysqlpaasmaster.mysql.database.azure.com   | ONLINE | WRITE GROUP | 1 |
+--------------+--------------------------------------------+--------+-------------+---+
4 ROWS IN SET (0.00 sec)

Here is the single user used for all the backends.

MySQL > SELECT username,password,active,use_ssl,default_hostgroup FROM mysql_users;
+----------+----------+--------+---------+-------------------+
| username | password | active | use_ssl | default_hostgroup |
+----------+----------+--------+---------+-------------------+
| sbtest   | password | 1      | 0       | 10                | 
+----------+----------+--------+---------+-------------------+
1 ROW IN SET (0.00 sec)

And here are the query rules to route the queries to the right backend.

MySQL >  SELECT rule_id,match_digest,destination_hostgroup FROM mysql_query_rules;
+---------+-----------------------+-----------------------+
| rule_id | match_digest          | destination_hostgroup |
+---------+-----------------------+-----------------------+
| 1       | ^SELECT .* FOR UPDATE | 10                    |
| 2       | ^SELECT .*            | 20                    |
+---------+-----------------------+-----------------------+
2 ROWS IN SET (0.00 sec)

Metrics data has also been collected inside the stats schema. We see that master and replicas have received their share of sysbench queries.

MySQL > SELECT hostgroup, srv_host,Queries FROM stats_mysql_connection_pool;
+-----------+--------------------------------------------+---------+
| hostgroup | srv_host                                   | Queries |
+-----------+--------------------------------------------+---------+
| 10        | mysqlpaasmaster.mysql.database.azure.com   | 472     |
| 20        | mysqlpaasreplica1.mysql.database.azure.com | 415     |
| 20        | mysqlpaasmaster.mysql.database.azure.com   | 402     |
| 20        | mysqlpaasreplica2.mysql.database.azure.com | 422     |
+-----------+--------------------------------------------+---------+
4 ROWS IN SET (0.00 sec).

Through the stats we also get the digest of all the queries run and on wich hostgroup they ran. We can see here that all INSERT, UPDATE,DELE were sent to the Write hostgroup and the SELECT to the Read hostgroup.

SELECT hostgroup, username, digest_text FROM  stats_mysql_query_digest;
+-----------+----------+-------------------------------------------------------------+
| hostgroup | username | digest_text                                                 |
+-----------+----------+-------------------------------------------------------------+
| 10        | sbtest   | INSERT INTO sbtest5 (id, k, c, pad) VALUES (?, ?, ?, ?)     |
| 10        | sbtest   | DELETE FROM sbtest2 WHERE id=?                              |
| 10        | sbtest   | UPDATE sbtest2 SET c=? WHERE id=?                           |
| 20        | sbtest   | SELECT c FROM sbtest5 WHERE id BETWEEN ? AND ? ORDER BY c   |
| 20        | sbtest   | SELECT SUM(k) FROM sbtest4 WHERE id BETWEEN ? AND ?         |
...

In the monitor schema we will find data that has been collected by the 'monitoruser'. In the monitor schema we can find log data for connect, ping, read_only ... Here for example the ping data to check the availability of the backends :

MySQL > SELECT hostname FROM mysql_server_ping_log;
+--------------------------------------------+------+------------------+------------+
| hostname                                   | port | time_start_us    | ping_success_time_us |
+--------------------------------------------+------+------------------+------------+
| mysqlpaasreplica1.mysql.database.azure.com | 3306 | 1591785759257052 | 20088      |
| mysqlpaasreplica2.mysql.database.azure.com | 3306 | 1591785759269801 | 19948      |
| mysqlpaasmaster.mysql.database.azure.com   | 3306 | 1591785759282430 | 19831      |

I hope this helped you understand how ProxySQL work with Azure Database for MySQL/MariaDB.
The use of proxySQL on Azure with Azure Database for MySQL/MariaDB definitely brings a lot of value.

In previous post I used Terraform to provision a managed version of MariaDB (AWS RDS for MariaDB). There exist various managed version of MariaDB on the major cloud providers : AWS, Azure, Alibaba Cloud. All of these versions offer a simplification to rapidly deploy and operate MariaDB. You benefit from easy setup including High availability and backup policies. This is nice But what if you want to fully control your MariaDB setup, configuration, and operation procedures.

A pragmatic approach is to use a Docker containerized version of MariaDB and to run it on a kubernetes cluster. Using Kubernetes as an orchestrator and MariaDB Docker images as building blocks is a good choice to remain cloud agnostic and to be prepared for multicloud architectures.

But Kubernetes can be a complex beast. Installing kubernetes, keeping it highly available and maintaining it up to date can consume a lot of time/ressources.

It makes a lot of sense to use a managed version of kubernetes like Azure Kubernetes Service(AKS). It is economically interesting as with AKS we do not pay for the kubernetes control plane. Kubernetes itself is a standard building block that offers the same features on all cloud providers. Kubernetes is a graduated project of the Cloud Native computing foundation and this is a strong guaranty of interoperability across different cloud providers.

First let us login to Azure and set the SUBSCRIPTION_ID to the subscription we want to use to create the principal.

$ az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXX to authenticate.
[
  {
    "cloudName": "AzureCloud",
    "id": "07ea84f6-0ca4-4f66-ad4c-f145fdb9fc39",
    "isDefault": true,
    "name": "Visual Studio Ultimate",
    "state": "Enabled",
    "tenantId": "e1be9ce1-c3be-436c-af1d-46ffecf854cd",
    "user": {
      "name": "serge@gmail.com",
      "type": "user"
    }
  }
]
$ az account show --query "{subscriptionId:id, tenantId:tenantId}"
{
  "subscriptionId": "07ea84f6-0ca4-4f66-ad4c-f145fdb9fc39",
  "tenantId": "e1be9ce1-c3be-436c-af1d-46feecf854cd"
}
$ export SUBSCRIPTION_ID="07ea84f6-0ca4-4f66-ad4c-f145fdb9fc39"

We first need to create an Azure AD service principal. This is the service account that will be used by Terraform to create resources in Azure.

$ az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/${SUBSCRIPTION_ID}"
{
  "appId": "7b6c7430-1d17-49b8-80be-31b087a3b75c",
  "displayName": "azure-cli-2018-11-09-12-15-59",
  "name": "http://azure-cli-2018-11-09-12-15-59",
  "password": "7957f6ca-d33f-4fa3-993a-8fb018c10fe2",
  "tenant": "e1be9ce1-c3be-436c-af1d-46feecf854cd"
}

We also create an azure storage container to store the Terraform state file. This is required to have multiple devops working on the same infrastructure through a shared storage configuration.

az storage container create -n tfstate --account-name=‘csb07ea84f60ca4x4f66xad4’ --account-key=‘Wo27SNUV3Bkanod424uHStXjlch97lQUMssKpIENytbYsUjU3jscz9SrVogO1gClUZd3qvZs/UwBPqX0gJOU8Q==‘

We now define a set of variables that will be used by terraform to provision our resources.

# variables.tf
 
variable "client_id" {}
variable "client_secret" {}
variable "agent_count" {
    default = 1
}
variable "ssh_public_key" {
    default = "~/.ssh/id_rsa.pub"
}
variable "dns_prefix" {
    default = "k8stest"
}
variable cluster_name {
    default = "k8stest"
}
variable resource_group_name {
    default = "azure-k8stest"
}
variable location {
    default = "Central US"
}

We can now define the terraform main.tf file.

# main.tf
 
provider "azurerm" {
    version = "~>1.5"
}
 
terraform {
#    backend "azurerm" {}
}
 
provider "kubernetes" {
  host                   = "${azurerm_kubernetes_cluster.k8s.kube_config.0.host}"
  client_certificate     = "${base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate)}"
  client_key             = "${base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_key)}"
  cluster_ca_certificate = "${base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate)}"
}

we can now describe the managed kubernetes cluster we want to create

# k8s.tf
 
resource "azurerm_resource_group" "k8s" {
    name     = "${var.resource_group_name}"
    location = "${var.location}"
}
 
resource "azurerm_kubernetes_cluster" "k8s" {
    name                = "${var.cluster_name}"
    location            = "${azurerm_resource_group.k8s.location}"
    resource_group_name = "${azurerm_resource_group.k8s.name}"
    dns_prefix          = "${var.dns_prefix}"
 
    linux_profile {
        admin_username = "ubuntu"
 
        ssh_key {
        key_data = "${file("${var.ssh_public_key}")}"
        }
    }
 
    agent_pool_profile {
        name            = "default"
        count           = "${var.agent_count}"
        vm_size         = "Standard_DS2_v2"
        os_type         = "Linux"
        os_disk_size_gb = 30
    }
 
    service_principal {
        client_id     = "${var.client_id}"
        client_secret = "${var.client_secret}"
    }
 
    tags {
        Environment = "Development"
    }
}

We also describe the variables that will be output by the execution of our terraform script.

# output.tf
 
output "client_key" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.client_key}"
}
output "client_certificate" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate}"
}
output "cluster_ca_certificate" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate}"
}
output "cluster_username" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.username}"
}
output "cluster_password" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.password}"
}
output "kube_config" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config_raw}"
}
output "host" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.host}"
}

We first initialize the terraform state.We then check the plan before it get executed.We now ask terraform to create all the resources described in the .tf files including the AKS Kubernetes cluster.

$ terraform init
$ terraform plan
$ terraform apply

We know have our running managed Azure Kubernetes cluster(AKS). We define a shell script file containing the set of variables required to run kubectl (the kubernetes client)

more setvars.sh 
#!/bin/sh
echo "Setting environment variables for Terraform"
export ARM_SUBSCRIPTION_ID="07ea84f6-0ca4-4f66-ad4c-f145fdb9fc37"
export ARM_CLIENT_ID="7b6c7430-1d17-49b8-80be-31b087a3c75c"
export ARM_CLIENT_SECRET="7957f6ca-d33f-4fa3-993a-8fb018c11fe1"
export ARM_TENANT_ID="e1be9ce1-c3be-436c-af1d-46feecf854cd"
export TF_VAR_client_id="7b6c7430-1d17-49b8-80be-31b087a3b76c"
export TF_VAR_client_secret="7957f6ca-d33f-4fa3-993a-8fb019c10fe1"
export KUBECONFIG=./azurek8s
$ . ./setvars.sh 
$ echo "$(terraform output kube_config)" > ./azurek8s

This will basically export the kubernetes configuration file required to be able to use the kubectl command. We can now start to use the kubectl kubernetes client. Let us create a kubernetes secret containing the mariadb password.

$ kubectl create secret generic mariadb-password --from-literal=password=password1
 
$ az aks get-credentials --resource-group azure-k8stest --name k8stest
$ az aks browse --resource-group azure-k8stest --name k8stest

We now create a persistent volume claim. This is very important for MariaDB as MariaDB is a stateful application and we need to define the storage used for persisting the database. We then deploy the mariadb docker container which will use this physical volume for the database storage. we then create a service to expose the database.

$ kubectl apply -f persistent-volume-claim.yaml 
$ kubectl apply -f mariadb.yaml 
$ kubectl apply -f mariadb-service.yaml

Once the service is created the service discovery can be used. Kubernetes offers an internal dns and a ClusterIP associated with the the service. This model make a lot of sense as it abstract us from the pod IP address. Once you have created a kubernetes MariaDB service it can be accessed from another container running inside this same cluster. In that case a mysql client.

$ kubectl expose deployment mariadb --type=NodePort --name=mariadb-service
$ kubectl run -it --rm --image=mariadb --restart=Never mysql-client --  mysql -h mariadbservice -ppassword1

It is also possible to expose the mariadb service through a public IP. This is of course something usually not done for security reasons. kubernetes asks Azure to create a load balancer and a public IP address that can then be used to access the mariaDB server.

$ kubectl expose service mariadbservice --type=LoadBalancer --name=mariadb-service1 --port=3306 --target-port=3306
$ mysql -h 104.43.217.67 -u root -ppassword1

In the scenario we presented the infrastructure is described through code (terraform hcl language). We then use the kubernetes client and yaml kubernetes configuration file to declaratively describe software objects configuration.
It makes a lot of sense to use a managed version of kubernetes as Kubernetes itself is a standard building block that offers the same features on all cloud providers. Kubernetes being standardize with this approach we keep the freedom to run the same open source software stack (here MariaDB) across different cloud providers .

How to rapidly provision a MariaDB in the cloud ? Various option are available.
A very effective approach is to provision MariaDB with Terraform. Terraform is a powerful tool to deploy infrastructure as code. Terraform is developed by Hashicorp that started their business with the very successful Vagrant deployment tool. Terraform allows you to describe through HCL langage all the components of you infrastructure. It is aimed to make it easier to work with multiple cloud providers by abstracting the resources description.
Keep on reading!

Oracle has done a great technical work with MySQL. Specifically a nice job has been done around security. There is one useful feature that exists in Oracle MySQL and that currently does not exist in MariaDB.
Oracle MySQL offers the possibility from within the server to generate asymetric key pairs. It is then possible use them from within the MySQL server to encrypt, decrypt or sign data without exiting the MySQL server. This is a great feature. This is defined as a set of UDF (User Defined Function : CREATE FUNCTION asymmetric_decrypt, asymmetric_encrypt, asymmetric_pub_key … SONAME 'openssl_udf.so';).
Keep on reading!

Oracle MySQL 8.0 has been declared GA but a critical piece is missing …
MySQL 8 is a fantastic release embedding the work of brilliant Oracle engineering.
I will not detail all the great features of MySQL 8 as there are a lot of great presentations around it. https://mysqlserverteam.com/whats-new-in-mysql-8-0-generally-available/
Keep on reading!

Comparing Oracle MySQL Group Replication and Galera Cluster through a probability perpective seems quite interesting.

At commit time both use a group certification process that requires network round trips. The required time for these network roundtrips is what will mainly determined the cost of a transaction. Let us try to compute an estimate of the certification process cost. The duration of these network roundtrips duration can be model by random variable with an associated probability distribution law.
Keep on reading!

MariaDB 10.1 introduced Data at Rest Encryption. By default we provide a file_key_management plugin. This is a basic plugin storing keys in a file that can be itself encrypted. This file can come from a usb stick removed once keys have been brought into memory. But this remains a basic solution not suitable for security compliance rules.
Keep on reading!

A question raised by my previous post is : What about MariaDB and native JSON support ? In my previous post I mentioned the possibility to use the MariaDB CONNECT storage Engine to store and access JSON content in normal text field. Of course having a native JSON datatype brings more value. It introduces JSON validation, a more efficient access to attribute through an optimized storage format.
Keep on reading!

It is not new that we can store a JSON content in a normal table text field. This has always been the case in the past. But two key features were missing : filtering based on JSON content attributes and indexing of the JSON content. With MariaDB 10.1 CONNECT storage Engine we offer support for external content including JSON files. The MariaDB CONNECT storage Engine also comes with a set of JSON related UDFs. This allows us to do the following thing :
Keep on reading!