Serge

MariaDB CONNECT Storage Engine as an ETL (or ELT) ?

SergeJanuary 28, 2014November 22, 2018Leave a comment

The MariaDB CONNECT Storage Engine allows to access heterogeneous data sources. In my previous post I show you how to use the MariaDB CONNECT Storage Engine to access an Oracle database. This is quite easy through the CONNECT Storage Engine ODBC table type.
Keep on reading!

MariaDB CONNECT Storage Engine and non MySQL syntax selects

SergeDecember 30, 2013Leave a comment

When you pass a SQL command through MySQL it has to be compliant with MySQL syntax as it will have to go through the MySQL parser. Can we pass non MySQL syntax compatible selects to a target ODBC database ? MariaDB CONNECT Storage Engine allows through ODBC to access heterogeneous data sources like Oracle or Microsoft SQL Server.
In my previous post I showed you how to use the CONNECT Storage Engine ODBC table type to access an Oracle database. It was possible to run select commands against these remote ODBC tables. In many cases It would be very interesting to run the sql command directly on the target database.
The first reason would be for efficiency as this would avoid a lot of network roundtrips between the Oracle server and the MariaDB Server. The second reason would be to avoid the restriction on MySQL syntax compliance.

With the MySQL CONNECT storage Engine it is possible to define a remote ODBC table and to associate a SQL request to this table definition. This SQL request will be executed on the remote server. For example for a remote Oracle database it is possible to pass a request using proprietary syntax. In the case of Oracle suppose we want to use the Oracle proprietary "CONNECT BY" syntax that does not exists in MySQL, we will do it that way:

create table emp_hierarchy  
ENGINE=CONNECT 
TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager1' 
srcdef='SELECT empno, ename, mgr, LEVEL FROM emp CONNECT BY PRIOR empno = mgr;';

The table definition includes a srcdef argument that defines the SQL request that will be executed on the Oracle target database. When we request this table on MariaDB everything is in fact done on the Oracle server.

MariaDB [test]> select * from emp_hierarchy;
+-------+--------+------+-------+
| EMPNO | ENAME  | MGR  | LEVEL |
+-------+--------+------+-------+
|  7788 | SCOTT  | 7566 |     1 |
|  7876 | ADAMS  | 7788 |     2 |
…
|  7839 | KING   | NULL |     1 |
…
|  7521 | WARD   | 7698 |     3 |
+-------+--------+------+-------+
43 rows in set (2.54 sec)
MariaDB [test]>

Beside this basic example this open the door to the ability to execute any select on the target database. This allows for example to take benefit of Oracle's In-Database SQL Analytics which includes a lot of powerful functions(Ranking, Windowing, Reporting Aggregates, LAG/LEAD, FIRST/LAST, Inverse Percentile, Hypothetical Rank and Distribution, Pattern Matching, Modeling, Advanced aggregations..). The same idea applies to powerful spatial or full text capabilities.
And do not forget that with the MariaDB CONNECT Storage Engine you can aggregate many of these table located on different server to be seen as a single table. For that you use the CONNECT TBL table type. This also gives the benefit of parallel execution on these remote servers.

What would be great would be to have someone write a MySQL procedure wrapper that transparently create the CONNECT table to run the query against(something like "callRemoteSQL") ;-).

MariaDB CONNECT Storage Engine access to Oracle 11GR2

SergeDecember 9, 2013Leave a comment

MariaDB CONNECT Storage Engine allows to access heterogeneous datasourses. This includes various file formats. But this also includes ODBC accessible datasources
The CONNECT Storage Engine ODBC table type allows to access SQLite, Excel, SQL Server or Oracle databases. Some nice features of the ODBC CONNECT table type are:
- Auto discovery of table structure. This means that you do not need to specify the columns of the target table.
- Condition push down. This is an optimization that push filtering conditions to the target database. This can avoid a lot of network roundtrips.

Let us have a look and test the Oracle access through the CONNECT Storage Engine. I downloaded, unzipped and installed Oracle 11gR2 in a virtual box.
Let us start Oracle 11gR2

[oracle@centos1 ~]$ sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 29 15:15:59 2013
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
SQL> connect sys/manager1 as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area  617975808 bytes
Fixed Size		    2215864 bytes
Variable Size		  377487432 bytes
Database Buffers	  230686720 bytes
Redo Buffers		    7585792 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Now let us start the oracle listener (this concept doest not exist in MySQL).

[oracle@centos1 ~]$ lsnrctl start
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 29-NOV-2013 20:08:20
Copyright (c) 1991, 2009, Oracle.  All rights reserved.
Starting /u01/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.1.0 - Production
System parameter file is /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/centos1/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=centos1)(PORT=1521)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date                29-NOV-2013 20:08:20
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/centos1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=centos1)(PORT=1521)))
Services Summary...
Service "orcl" has 1 instance(s).
  Instance "orcl", status UNKNOWN, has 1 handler(s) for this service...
Service "plsextproc" has 1 instance(s).
  Instance "plsextproc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@centos1 ~]$

On the machine with the MariaDB CONNECT Storage Engine we need to install the Oracle ODBC Driver. We obtain it through the installation of the Oracle instant client. Let us download and unzip the instant client. Installation is just unzip of the instant client zip files.

unzip instantclient-basic-linux.x64-12.1.0.1.0.zip
unzip instantclient-odbc-linux.x64-12.1.0.1.0.zip
unzip instantclient-sdk-linux.x64-12.1.0.1.0.zip

We need to add the path of the ODBC driver shared library. Let us 'vi .bash_profile' and add the required LD_LIBRARY_PATH.

CLIENT_HOME=/home/oraclec/instantclient_12_1 ; export CLIENT_HOME
LD_LIBRARY_PATH=$CLIENT_HOME:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH

On linux we also need to install the unixODBC open source ODBC driver manager :

yum install unixODBC
yum install unixODBC-devel

The configuration of the unixODBC driver manager is simple and just requires 2 files updates:

/etc/odbcinst.ini wich points the driver to the installed shared library

[Oracle 12c ODBC driver]
Description     = Oracle ODBC driver for Oracle 12c
Driver          = /home/oraclec/instantclient_12_1/libsqora.so.12.1

/etc/odbc.ini wich defines here the DSN that will be used for the ODBC connection

[orcl]
Driver       = Oracle 12c ODBC driver
ServerName   = //centos1.localdomain:1521/oracle
DSN          = orcl
UserName = scott
Password = manager1

How to test that the unixODBC + oracle ODBC driver is correctly installed ?
The unixODBC includes an ODBC client tool 'isql'. Let us use it to test our installation and the DSN we just created.

root@centos1 ~]# isql orcl scott manager1
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select * from emp;
+-------+-----------+----------+-------+--------------------+----------
| EMPNO | ENAME     | JOB      | MGR   | HIREDATE           | SAL
+-------+-----------+----------+-------+--------------------+----------
| 7369  | SMITH     | CLERK    | 7902  | 1980-12-17 00:00:00| 800
…
+-------+-----------+----------+-------+--------------------+----------
SQLRowCount returns -1
15 rows fetched
SQL>

Now let us create a CONNECT table of ODBC type. we will make this MariaDB ODBC table point to the Oracle 11g 'scott.emp' table.

[root@centos1 ~]# mysql -u root -pmanager1 -S /usr/local/mariadb10data/mariadb10.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.0.6-MariaDB-log MariaDB Server
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
MariaDB [test]> create table emp  ENGINE=CONNECT TABLE_TYPE=ODBC tabname='emp' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';
ERROR 1105 (HY000): Unsupported table type ODBC

Come on ! what happened ? I just installed the latest 10.0.6 MariaDB and the CONNECT storage Engine does not support the ODBC table type. That is true and that is an issue caused by old machine being used to build the binary the unnixODBC. This problem is not present in the MariaDB 10.0.6 rpm packages. But I have multiple tar based MariaDB version installed on my Centos box(in fact a virtual box vm) and I do not want any rpm based installation. Hopefully a storage engine is just a shared library.

Let us download MariaDB-10.0.6-centos6-x86_64-connect-engine.rpm and unpack the rpm without installing it :

[root@centos1 ~]# rpm2cpio ../MariaDB-10.0.6-centos6-x86_64-connect-engine.rpm | cpio --extract  --make-directories
8216 blocs
[root@centos1 plugin]# cp ./usr/lib64/mysql/plugin/ha_connect.so ha_connect.so /usr/local/mariadb10/lib/plugin/

We now have the CONNECT storage with the support of the ODBC table type. Our second try :

[root@centos1 local]# mysql -u root -pmanager1 -S /usr/local/mariadb10data/mariadb10.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.0.6-MariaDB-log MariaDB Server
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use test
Database changed
MariaDB [test]> create table toto  ENGINE=CONNECT TABLE_TYPE=ODBC tabname='emp' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';
ERROR 1105 (HY000): [unixODBC][Driver Manager]Can't open lib '/home/oraclec/instantclient_12_1/libsqora.so.12.1' : file not found

?
I had forgotten that the mysql service is run under the mysql linux user. This user also needs to have access to the Oracle ODBC driver (libsqora.so.12.1 shared library provided by the Oracle instant client).

CLIENT_HOME=/home/oraclec/instantclient_12_1 ; export CLIENT_HOME
LD_LIBRARY_PATH=$CLIENT_HOME:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH

Once done everything should goes better:

MariaDB [test]> create table toto  ENGINE=CONNECT TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';Query OK, 0 rows affected (0.79 sec)
MariaDB [test]>
MariaDB [test]> select * from toto;
+-------+--------+-----------+------+------------+---------+---------+--------+
| EMPNO | ENAME  | JOB       | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO |
+-------+--------+-----------+------+------------+---------+---------+--------+
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800.00 |    NULL |     20 |
…
|  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300.00 |    NULL |     10 |
+-------+--------+-----------+------+------------+---------+---------+--------+
14 rows in set (0.09 sec)
 
MariaDB [test]> insert into toto values ( 8000,'serge','peintre',7902,'1980-12-17',123,234,20);
Query OK, 1 row affected (0.15 sec)
 
MariaDB [test]> select * from toto;+-------+--------+-----------+------+------------+---------+---------+--------+
| EMPNO | ENAME  | JOB       | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO |
+-------+--------+-----------+------+------------+---------+---------+--------+
|  8000 | serge  | peintre   | 7902 | 1980-12-17 |  123.00 |  234.00 |     20 |
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800.00 |    NULL |     20 |
…
|  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300.00 |    NULL |     10 |
+-------+--------+-----------+------+------------+---------+---------+--------+
15 rows in set (0.03 sec)

That looks nice. I have created an ODBC table pointing to a remote Oracle 11g table. That was quite simple as the auto discovery mechanism has automatically discovered the structure of the Oracle table and converted Oracle data types to compatible mysql column type.
Let us try to update Oracle data through MySQL :

MariaDB [test]> update toto set sal=999 where empno=8000;
ERROR 1296 (HY000): Got error 174 'No DELETE/UPDATE of ODBC tablesd' from CONNECT

This is not supported ! That is right. This ODBC table update feature has been developed and is currently only available in the launchpad repository of the CONNECT Storage Engine.
CONNECT Storage Engine Sources on launchpad

MariaDB CONNECT Storage Engine replay & slides available

SergeNovember 8, 2013Leave a comment

The slides and replay of yesterday's webinar on the MariaDB CONNECT storage engine have just been posted. First I want to thank the numerous attendees.
You have shown great interest on the parallel execution of query on distributed MySQL Servers. I agree this is cool.
The ODBC capabilities seems also to generate interest. This make it simple to access an ODBC datasource (SQLServer, Oracle ...) from plain MySQL syntax.

Here to view the replay
Here to get the slides

Let us know about your test. You just need to download MariaDB 10.0.5 which includes the CONNECT storage engine. Give us your feedback : positive, negative, improvement requests, bugs, doc pb, ...

Here to download the latest MariaDB 10.0.5 Beta release
Here to get the CONNECT Storage Engine Documentation

MariaDB CONNECT Storage Engine and parallelism

SergeNovember 5, 2013Leave a comment

The CONNECT Storage engine implement the concept of a table made of multiple tables. These underlying tables can be distributed remotely. For example the underlying remote tables can be of ODBC or MySQL table type. this allows to execute distributed queries. What is nice is that we can execute this distributed query with parallelism.

How does it work ?

To explain it let us suppose we have 4 nodes : Node0 and Node1 Node2 Node3
Node0 has a MariaDB 10.0.4 installed with the CONNECT storage engine activated.

MariaDB [dbt3]> install plugin connect  soname 'ha_connect.so';

On the 3 other nodes we have MariaDB or plain MySQL installed.
the 'lineitem' table of the dbt3 benchmark has been created and loaded with one chunk of the 'lineitem' table on each node. This is a sharded table. For example to generate the second chunk of 3 chunks 'lineitem' table:

./dbgen -T L -fF -q -b dists.dss -s 1 -C 3 -S 2

Now on node0 we can create a CONNECT table. This table definition embed a SRCDEF parameter that force the computation of an aggregate on the remote server :

create Table lineitem1   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node1:3306/dbt3/lineitem3';

if we do a select on this table on node0 we get the result data that has been aggregated on node1. We access only one shard of the data.

MariaDB [dbt3]> select * from lineitem1;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
10000 rows in set (26.82 sec)

Let us now do the same on for node2 and node3 :

create Table lineitem2   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node2:3306/dbt3/lineitem3';
 
create Table lineitem3   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node3:3306/dbt3/lineitem3';

Now we can use the ability of the the CONNECT storage engine to build a table that allows to see the 3 shards as a single table :

create Table alllineitem   (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT
TABLE_TYPE=TBL
table_list='lineitem1,lineitem2,lineitem3';

We can now query this table. All the aggregation computation will be done on the remote nodes and not on node0:

MariaDB [dbt3]> select * from alllineitem;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9998 | 2395 |
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
30000 rows in set (1 min 19.23 sec)

This basically takes 3 times longer than querying a single shard. This is because the request is executed sequentially on node 1 2 and 3.

We can request the CONNECT storage engine to do the job in parallel by adding the extra option : option_list='thread=1';

create Table alllineitem2   (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT
TABLE_TYPE=TBL
table_list='lineitem1,lineitem2,lineitem3' option_list='thread=1';

If we rerun the query we get

MariaDB [dbt3]> select * from alllineitem2;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
30000 rows in set (26.69 sec)

We go back to the same execution time as with a single shard ! Is not that cool. I am sure you have a lot of objections, ideas about this approach. I do. But it is nice anyways.

If you want to know more do not forget to attend to the MariaDB CONNECT Storage Engine webinar :
November 7, 2013 - 5 pm CET / 4 PM UTC / 8 AM PST Register

and if you want to give it a try you are welcome and the doc to help is here:
MariaDB Connect Storage Engine documentation
Your help is welcome : bug reports, documentation fixes, usage feedback, suggestions.
Be indulgent it is still Alpha software 😉

MariaDB CONNECT storage engine webinar / 7 Nov

SergeNovember 4, 2013Leave a comment

This week I will have the opportunity to deliver a webinar on the MariaDB CONNECT Storage Engine.
The MariaDB CONNECT Storage Engine allows to access various file formats (CSV, XML, Excel, etc). It give access to any ODBC data sources (Oracle, DB2, SQLServer, SQLite etc). It also allows to access remote MySQL tables. A CONNECT table itself can be a set of remote MySQL tables. This opens the door to interesting distributed architectures that can help to address big data.
This webinar is a technical overview of the MariaDB CONNECT Storage Engine and it will show you typical use cases to help you get benefits from your existing data sources.
We will see how to use the MariaDB CONNECT Storage Engine.

Register for this webinar to learn what benefits you can get from the MariaDB CONNECT Storage Engine :
November 7, 2013 - 5 pm CET / 4 PM UTC / 8 AM PST Register

MySQL Fabric with MariaDB Galera Cluster ?

SergeOctober 31, 2013Leave a comment

MySQL Fabric is a very promising sharding framework. If I take Ulf Wendel definition of MySQL Fabric :

MySQL Fabric is an administration tool to build large “farms” of MySQL servers. In its most basic form, a farm is a collection of MySQL Replication clusters. In its most advanced form, a farm is a collection of MySQL Replication clusters with sharding on top.

So MySQL Fabric takes care of two very orthogonal features :

High availability of servers
Sharding of data

Let us forget about sharding and look at the High availability infrastructure.

Servers are included in groups, called "High Availability Groups" when we talk about HA.
Each Server has an associated Status (or Role): primary secondary, spare
Each Server has also a mode : Offline, Read-only, and Read-Write.
The implementation has been made to allow various HA implementation patterns.
The most common HA pattern is the Master/Slave HA group ( in that case we should call it a "replica set" which is the terminology used in MongoDB or Facebook MySQL Pool Scanner (MPS).

Mats Kindahl in his blog post on MySQL Fabric High Availability Groups mentioned that other HA solutions are possible for an availability group :

Shared Storage with SAN or NAS
Replicated storage like DRBD
MySQL Cluster shared nothing cluster

In the case of a HA group based on MySQL Cluster the group is self-managing regarding HA and MySQL Fabric does not handles the failover. With the "Shared Storage" and "Replicated storage" availability groups the secondary servers will be offline.

So one of my ideas that I hope is feasible would be to use MariaDB Galera Cluster as another HA solution with MySQL Fabric. The main advantage of this solution relates to the characteristics of MariaDB Galera Cluster. MariaDB Galera Cluster is an Active-active multi-master topology with synchronous replication. MariaDB Galera Cluster being innoDB based does not carry all the usage limitations associated with MySQL Cluster (main one being limited join capabilities).

Regarding to MySQL fabric the behavior of an availability group based on MariaDB Galera Cluster is identical to MySQL Cluster. It is a self-managing availability group.

MariaDB Galera Cluster

MariaDB Galera Cluster
Getting Started with MariaDB Galera Cluster

MySQL Fabric

MySQL Fabric: A new kid in the MySQL sharding world  2013-10-09 Serge Frezefond
MySQL Fabric: High Availability Groups  2013-10-21 Mats Kindahl
A Brief Introduction to MySQL Fabric  2013-09-21 Mats Kindahl
MySQL Fabric - Sharding - Introduction  2013-09-21 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Simple Example  2013-09-22 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Shard Maintenance  2013-09-27 VN
MySQL Fabric - Sharding - Migrating From an Unsharded to a Sharded Setup  2013-09-22 VN
Installing MySQL Fabric on Windows  2013-10-03 Todd Farmer
MySQL 5.7 Fabric: any good?  2013-09-23 Ulf Wendel

Writing a Fault-tolerant Database Application using MySQL Fabric  2013-09-21 Alfranio Junior
Sharding PHP with MySQL Fabric  2013-10-09 Johannes Schlüter
MySQL Fabric support in Connector/Python  2013-09-22 Geert Vanderkelen
MySQL Connector/J with Fabric Support  2013-09-21 Jess Balint

InfiniDB column store moves to open source ! Congrats !

SergeOctober 15, 2013Leave a comment

Like TokuDB, InfiniDB is now a fully open source server product. In the past infiniDB was "almost open source". The open source version was an old release with no access to the advance functions like MPP multi-server execution. This is no more the case. With InfiniDB 4 the open source version is the latest release giving access to all the advanced functionalities.

This is a really great move for the MariaDB / MySQL ecosystem. InfiniDB and TokuDB were two unique pieces of technology in the MySQL ecosystem. Having them both open source will trigger a broader adoption that will benefit to their enterprise releases.

Having in the MySQL ecosystem a column oriented database specifically designed for big data analytics is filling a real customer need. Column stores overcome the query limitations that exist in traditional RDBMS. InfiniDB is extremely good at using multicore server and massively parallel processing with multi-servers. InfiniDB can scale up on multi-cores server and scale out on a distributed architecture.

So thanks a lot to the InfiniDB team for their move. This will greatly benefit to them and to the dynamism of the MySQL / MariaDB ecosystem.

InfiniDB community web site
InfiniDB 4 sources
InfiniDB Enterprise web site

A new kid in the MySQL sharding world

SergeOctober 9, 2013Leave a comment

MySQL Connect 2013 has been a great edition. There was of course a lot of nice announcements of improvements in the the core MySQL server technology. One of the major announcement that received a lot of buzz was MySQL Fabric. MySQL Fabric is an infrastructure component aimed at simplifying construction of a highly available, sharded, MySQL server based architecture.

Horizontal scale out for MySQL is a hard problem. The MySQL sharding requirement has up till now only be addressed in a non general solution way. Using MySQL at Scale remains a big challenge.

All the big actors of the web have been faced with this scale out issue. They all have developed their own tools/framework to address this need. The new cloud providers have also been faced with this requirement when trying to offer database as a service solutions around MySQL with transparent elasticity.

We can mention the following Sharding solutions davelopped by the key players of the web :

Google/Youtube has built Vitess (in Go) and open sourced it
Twitter has built Gizzard (in Scala) and open sourced it
Tumblr has built Jetpants (in Ruby) and open sourced it
Facebook has also it own sharding framework which is a range based model. The HA is addressed with MHA. they have presented their architecture at various events

Theses frameworks cover the needs:

Master promotions,
Cloning slaves,
Supports a range-based sharding scheme for MySQL
Rebalancing shards,
Split a range-based shard into N new shards

High Availability is also addressed by most of these frameworks. MySQL Fabric also handles HA. All the improvements made around replication have helped handle correctly the HA part of the architecture in a more resilient way.

So now with the introduction of MySQL Fabric there is new sharding framework available. So what ? Interesting ?
Yes it is very interesting.

First, MySQL fabric has been written in python which in my opinion is a good point as devops love python.
Second, MySQL fabric has been released under the GPL license which is also a very good point.
Last MySQL Fabric is aimed as a general purpose sharding framework.

MySQL fabric is quite simple :

A Fabric Server holding and serving the sharding metadata.
A set of commands to do various action (split shard ,…)
Specific connectors : Java, Python to request shard location to the Fabric server through XMLRPC call.

Two questions for me:

What is the status of the PHP and C drivers regarding MySQL Fabric ?
Is it possible to fully abstract the client code from the fact that the data is sharded. Some connector config could specify the fabric server. The client code would then not be changed at all.

======

A Brief Introduction to MySQL Fabric  2013-09-21 Mats Kindahl
MySQL Fabric - Sharding - Introduction  2013-09-21 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Simple Example  2013-09-22 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Shard Maintenance  2013-09-27 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Migrating From an Unsharded to a Sharded Setup  2013-09-22 VN (Narayanan Venkateswaran)
Installing MySQL Fabric on Windows  2013-10-03 Todd Farmer
MySQL 5.7 Fabric: any good?  2013-09-23 Ulf Wendel
---
Writing a Fault-tolerant Database Application using MySQL Fabric  2013-09-21 Alfranio Junior
MySQL Fabric support in Connector/Python  2013-09-22 Geert Vanderkelen
MySQL Connector/J with Fabric Support  2013-09-21 Jess Balint

Filtering MariaDB Audit Plugin data

SergeOctober 5, 2013Leave a comment

When full auditing is activated with the the MariaDB Audit Plugin a large volume of audit data is generated and it can put an extra burden on the server. The MariaDB audit plugin offers the two following setup variables that allow to restrict what data will be logged.

MariaDB [test]> show variables like '%audit%users';
+-----------------------------+-----------------------+
| Variable_name               | Value                 |
+-----------------------------+-----------------------+
| server_audit_excl_users     |                       |
| server_audit_incl_users     |                       |
+-----------------------------+-----------------------+
14 rows in set (0.00 sec)

These variables allow to restrict the data that is going to be pushed to the audit log.
You can say: I only want to log data from these users

MariaDB [test]> set global server_audit_syslog_incl_users='proxy';

Only activities coming from the user 'proxy' will be logged.

Or you can say : I want to log data except for these users.

MariaDB [test]> set global server_audit_excl_users='user2';

No activities coming from the user 'user2' will be logged.

What happens if a user is both included and excluded ? If a user is both included and excluded database activities for that user will be logged. Include has priority over exclude. It is important to notice that the audit plugin logs data only based on the username. This username is different from the MariaDB and MySQL user definition. For them a user is combination of the username and hostname ('user'@'host').

This capability to filter audit data is crucial to avoid too much burden on the audited server and to avoid generating too much volume of audit data.