Skip to content

In previous post I used Terraform to provision a managed version of MariaDB (AWS RDS for MariaDB). There exist various managed version of MariaDB on the major cloud providers : AWS, Azure, Alibaba Cloud. All of these versions offer a simplification to rapidly deploy and operate MariaDB. You benefit from easy setup including High availability and backup policies. This is nice But what if you want to fully control your MariaDB setup, configuration, and operation procedures.

A pragmatic approach is to use a Docker containerized version of MariaDB and to run it on a kubernetes cluster. Using Kubernetes as an orchestrator and MariaDB Docker images as building blocks is a good choice to remain cloud agnostic and to be prepared for multicloud architectures.

But Kubernetes can be a complex beast. Installing kubernetes, keeping it highly available and maintaining it up to date can consume a lot of time/ressources.

It makes a lot of sense to use a managed version of kubernetes like Azure Kubernetes Service(AKS). It is economically interesting as with AKS we do not pay for the kubernetes control plane. Kubernetes itself is a standard building block that offers the same features on all cloud providers. Kubernetes is a graduated project of the Cloud Native computing foundation and this is a strong guaranty of interoperability across different cloud providers.

First let us login to Azure and set the SUBSCRIPTION_ID to the subscription we want to use to create the principal.

$ az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXX to authenticate.
[
  {
    "cloudName": "AzureCloud",
    "id": "07ea84f6-0ca4-4f66-ad4c-f145fdb9fc39",
    "isDefault": true,
    "name": "Visual Studio Ultimate",
    "state": "Enabled",
    "tenantId": "e1be9ce1-c3be-436c-af1d-46ffecf854cd",
    "user": {
      "name": "serge@gmail.com",
      "type": "user"
    }
  }
]
$ az account show --query "{subscriptionId:id, tenantId:tenantId}"
{
  "subscriptionId": "07ea84f6-0ca4-4f66-ad4c-f145fdb9fc39",
  "tenantId": "e1be9ce1-c3be-436c-af1d-46feecf854cd"
}
$ export SUBSCRIPTION_ID="07ea84f6-0ca4-4f66-ad4c-f145fdb9fc39"

We first need to create an Azure AD service principal. This is the service account that will be used by Terraform to create resources in Azure.

$ az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/${SUBSCRIPTION_ID}"
{
  "appId": "7b6c7430-1d17-49b8-80be-31b087a3b75c",
  "displayName": "azure-cli-2018-11-09-12-15-59",
  "name": "http://azure-cli-2018-11-09-12-15-59",
  "password": "7957f6ca-d33f-4fa3-993a-8fb018c10fe2",
  "tenant": "e1be9ce1-c3be-436c-af1d-46feecf854cd"
}

We also create an azure storage container to store the Terraform state file. This is required to have multiple devops working on the same infrastructure through a shared storage configuration.

az storage container create -n tfstate --account-name=‘csb07ea84f60ca4x4f66xad4’ --account-key=‘Wo27SNUV3Bkanod424uHStXjlch97lQUMssKpIENytbYsUjU3jscz9SrVogO1gClUZd3qvZs/UwBPqX0gJOU8Q==‘

We now define a set of variables that will be used by terraform to provision our resources.

# variables.tf
 
variable "client_id" {}
variable "client_secret" {}
variable "agent_count" {
    default = 1
}
variable "ssh_public_key" {
    default = "~/.ssh/id_rsa.pub"
}
variable "dns_prefix" {
    default = "k8stest"
}
variable cluster_name {
    default = "k8stest"
}
variable resource_group_name {
    default = "azure-k8stest"
}
variable location {
    default = "Central US"
}

We can now define the terraform main.tf file.

# main.tf
 
provider "azurerm" {
    version = "~>1.5"
}
 
terraform {
#    backend "azurerm" {}
}
 
provider "kubernetes" {
  host                   = "${azurerm_kubernetes_cluster.k8s.kube_config.0.host}"
  client_certificate     = "${base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate)}"
  client_key             = "${base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_key)}"
  cluster_ca_certificate = "${base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate)}"
}

we can now describe the managed kubernetes cluster we want to create

# k8s.tf
 
resource "azurerm_resource_group" "k8s" {
    name     = "${var.resource_group_name}"
    location = "${var.location}"
}
 
resource "azurerm_kubernetes_cluster" "k8s" {
    name                = "${var.cluster_name}"
    location            = "${azurerm_resource_group.k8s.location}"
    resource_group_name = "${azurerm_resource_group.k8s.name}"
    dns_prefix          = "${var.dns_prefix}"
 
    linux_profile {
        admin_username = "ubuntu"
 
        ssh_key {
        key_data = "${file("${var.ssh_public_key}")}"
        }
    }
 
    agent_pool_profile {
        name            = "default"
        count           = "${var.agent_count}"
        vm_size         = "Standard_DS2_v2"
        os_type         = "Linux"
        os_disk_size_gb = 30
    }
 
    service_principal {
        client_id     = "${var.client_id}"
        client_secret = "${var.client_secret}"
    }
 
    tags {
        Environment = "Development"
    }
}

We also describe the variables that will be output by the execution of our terraform script.

# output.tf
 
output "client_key" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.client_key}"
}
output "client_certificate" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate}"
}
output "cluster_ca_certificate" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate}"
}
output "cluster_username" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.username}"
}
output "cluster_password" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.password}"
}
output "kube_config" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config_raw}"
}
output "host" {
    value = "${azurerm_kubernetes_cluster.k8s.kube_config.0.host}"
}

We first initialize the terraform state.We then check the plan before it get executed.We now ask terraform to create all the resources described in the .tf files including the AKS Kubernetes cluster.

$ terraform init
$ terraform plan
$ terraform apply

We know have our running managed Azure Kubernetes cluster(AKS). We define a shell script file containing the set of variables required to run kubectl (the kubernetes client)

more setvars.sh 
#!/bin/sh
echo "Setting environment variables for Terraform"
export ARM_SUBSCRIPTION_ID="07ea84f6-0ca4-4f66-ad4c-f145fdb9fc37"
export ARM_CLIENT_ID="7b6c7430-1d17-49b8-80be-31b087a3c75c"
export ARM_CLIENT_SECRET="7957f6ca-d33f-4fa3-993a-8fb018c11fe1"
export ARM_TENANT_ID="e1be9ce1-c3be-436c-af1d-46feecf854cd"
export TF_VAR_client_id="7b6c7430-1d17-49b8-80be-31b087a3b76c"
export TF_VAR_client_secret="7957f6ca-d33f-4fa3-993a-8fb019c10fe1"
export KUBECONFIG=./azurek8s
$ . ./setvars.sh 
$ echo "$(terraform output kube_config)" > ./azurek8s

This will basically export the kubernetes configuration file required to be able to use the kubectl command. We can now start to use the kubectl kubernetes client. Let us create a kubernetes secret containing the mariadb password.

$ kubectl create secret generic mariadb-password --from-literal=password=password1
 
$ az aks get-credentials --resource-group azure-k8stest --name k8stest
$ az aks browse --resource-group azure-k8stest --name k8stest

We now create a persistent volume claim. This is very important for MariaDB as MariaDB is a stateful application and we need to define the storage used for persisting the database. We then deploy the mariadb docker container which will use this physical volume for the database storage. we then create a service to expose the database.

$ kubectl apply -f persistent-volume-claim.yaml 
$ kubectl apply -f mariadb.yaml 
$ kubectl apply -f mariadb-service.yaml

Once the service is created the service discovery can be used. Kubernetes offers an internal dns and a ClusterIP associated with the the service. This model make a lot of sense as it abstract us from the pod IP address. Once you have created a kubernetes MariaDB service it can be accessed from another container running inside this same cluster. In that case a mysql client.

$ kubectl expose deployment mariadb --type=NodePort --name=mariadb-service
$ kubectl run -it --rm --image=mariadb --restart=Never mysql-client --  mysql -h mariadbservice -ppassword1

It is also possible to expose the mariadb service through a public IP. This is of course something usually not done for security reasons. kubernetes asks Azure to create a load balancer and a public IP address that can then be used to access the mariaDB server.

$ kubectl expose service mariadbservice --type=LoadBalancer --name=mariadb-service1 --port=3306 --target-port=3306
$ mysql -h 104.43.217.67 -u root -ppassword1

In the scenario we presented the infrastructure is described through code (terraform hcl language). We then use the kubernetes client and yaml kubernetes configuration file to declaratively describe software objects configuration.
It makes a lot of sense to use a managed version of kubernetes as Kubernetes itself is a standard building block that offers the same features on all cloud providers. Kubernetes being standardize with this approach we keep the freedom to run the same open source software stack (here MariaDB) across different cloud providers .

Engine condition pushdown is a MySQL internal mechanism that is intended to avoid to send non matching rows from the storage engine to the SQL layer. This is very important when the storage engine involves traffic over the network.
This mechanism was initially created to optimize MySQL Cluster (NDB) behavior.
For the NDB storage engine it avoids a lot of round trips through the network. Part of the filtering is done on the remote data node. Without engine condition pushed down the filtering is done by the sql layer after rows have been sent by the storage engine.

The MariaDB CONNECT Storage Engine implement various table type. Some of these table type correspond to remote access. This is the case of the ODBC and MySQL table type. These types can be considered as a more powerful evolution of the federated storage Engine.
In both case the engine condition pushdown gives a great benefit by avoiding network roundtrips.

The conditions are trapped by the storage engine handler through this API implementation:
const COND *ha_connect::cond_push(const COND *cond)

How to use engine condition pushdown with the MariaDB Connect Storage Engine.

The "engine condition pushdown" mechanism should not be confused with the "index condition pushdown" optimization introduced in MariaDB 5.5 and MySQL 5.6. The index condition pushdown was introduced to use as much as possible the index to filter rows.

In the recent years a lot of emphasis has been put on the InnoDB storage engine. This is a great thing. This has turned InnoDB from a very limited transactional storage engine to a very powerful piece of the MySQL architecture. InnoDB is now scalable to many cores. It is also very resilient. It supports advanced features(compression, buffer pool saving ... long list ...). No doubt this was the direction to go. There is still a lot of improvements to come and this is definitely needed for enterprise class applications.

But there are some other part of the server that need some attention and improvement. In the early days of the MySQL plugin architecture some nice concepts have been implemented as storage engines : csv storage engine, federated storage engine, merge storage engine, archive ...
In my personal opinion these implementations are very useful but have not been much improved since their first implementation. They can be considered as excellent proof of concept and should have been pushed much further.

If we look at them :

The Merge storage engine implemented the concept of a table composed of multiple underlying table. The first constrain is that all the table have to be in the myISAM format only. With all the improvements innoDB has gone through regarding performance, including read only and complex query execution this is frustrating. The second big limitation is that all the table need to be strictly identical regarding their structure. Third constraint is that there is no parallel execution of the query. This is not a big surprise as there is currently no parallelism at all in MySQL except in marginal areas (slaves apply, innodb full text indexing). One other big limitation of the merge storage engine is that it cannot handle remotely distributed tables.

The Federated storage engine implemented the concept of a MySQL table located on a remote server. The local and remote table must have an identical table structure. The federated storage engine was reengineered into the federatex storage engine which is included in MariaDB. This new design was done to allow in the future to implement a heterogeneous database access like through an ODBC implementation. This never happened. The federated(x) storage engine keeps a lot of limitation. When you issue a select through a federated table a select * is done to access the remote data. There is no condition push down implementation. This works only with MySQL remote database. The limit clause of the sql statement is not pushed to the underlying system. The condition push down mechanism is not used with this engine. These limitations make the federated storage engine of limited use. The Oracle MySQL version of federated is old and does not implement federatedx improvements.

The CSV storage engine implemented the concept of table representing a csv file external from the database. This was a great idea as the csv format is a very common file format. Unfortunately the csv storage engine is very basic and does not implement the limit clause in select. There is no index associated to the file. Condition pushdown is neither implemented.