MaxScale has now its own public irc channel

MaxScale is a Proxy for the MySQL protocol built with a modular architecture.
The underlying concept of modules allows to extend the MaxScale proxy services. The current version implements Read Write splitting and Connection Load Balancing. Internally MySQL queries go through a SQL parsing phase. This gives MaxScale great capabilities regarding queries routing.

So if you are interested by MaxScale do not hesitate to enter the chat room #maxscale on irc.freenode.net. its developers will be there. Bring your ideas and problems. You are welcome.

The MaxScale sources code is on github

THere is a MaxScale Google group available

And a good starting point on how to install and configure MaxScale is Ivan Zoratti’s blog post

MaxScale, ProxySQL and MySQL Proxy

At FOSDEM 2014 ProxySQL and MaxScale were both presented. Both are proxy that can help build sophisticated MariaDB/MySQL architectures.
But currently what is the most used proxy with MySQL? It is HAproxy. HAproxy is a level 4 proxy that has no knowledge of the MySQL protocol. Being low level makes it very fast but it cannot accomplish any advanced proxy task. In many case this is enough for pure load balancing. But it cannot handle filtering, routing, query rewriting. This requires to understand the MySQL protocol and to look at the query passing through the proxy.

ProxySQL and MaxScale contrary to HAproxy are level 7 proxies. This means that they both understand the MySQL network protocol and they have access to the query payload.

At FOSDEM ProxySQL was presented by its developer René Cannaò. ProxySQL has interesting features like a very efficient query cache, query rewriting capabilities and a request queuing mechanism when the backend is not available. ProxySQL use regex to do query filtering and rewriting. It does not provide full query parsing.

Maxscale was presented by two of its developers Massimiliano Pinto and Vilho Raatikka. MaxScale is a more ambitious project. It supports a modular architecture. The underlying concept of modules allows to extend the MaxScale proxy services. The current version implements Read Write splitting and Load Balancing. Internally MySQL queries go through a SQL parsing phase. This gives MaxScale great capability regarding queries routing.

Talking about proxy technology without talking about MySQL Proxy would not be complete. MySQL Proxy was mentioned during presentations. When MySQL Proxy was born it attracted a lot of users as it was a very promising technology. The extensibility capability through the LUA scripting language makes it a very flexible technology. MySQL proxy does not embed a SQL parser and basic tokenization was made through LUA language. Unfortunately MySQL Proxy stayed forever in alpha status with no more improvement and no visibility on its future.

So if you are interested in proxy technology and want to know more about MaxScale do not miss tomorrow webinar MariaDB Enterprise and MaxScale – why all the buzz? delivered by Ivan Zoratti.

To get MaxScale sources
To fill bugs.

MariaDB CONNECT Storage Engine as an ETL (or ELT) ?

The MariaDB CONNECT Storage Engine allows to access heterogeneous data sources. In my previous post I show you how to use the MariaDB CONNECT Storage Engine to access an Oracle database. This is quite easy through the CONNECT Storage Engine ODBC table type.

For most architectures where heterogeneous databases are involved an ETL (Extract-Transform-Load) is used to move data across the databases. Most ETL like Informatica, DataStage, Talend, Goldengate or Tungsten are tools that perform complex data transformations using middle-tier ETL engines. It require to describe the transformation process in a proprietary manner. This approach requires an extra product with extra expertise. Data is generally duplicated when it is moved around.

In many cases this extra product and competence requirement can be avoided. Data can be moved directly from database to database. Transformation can be directly apply inside the source or target database. This can be done through standard SQL commands.
This concept was successfully brought to the market by products like Sunopsis (bought by Oracle). This approach is known as ELT (Extract-Load-Transform) where data is moved directly from database to database. The database itself is used to transform data through standard SQL commands. This makes sense as some databases have very strong data transformation capabilities through powerful SQL extensions (analytical SQL)and parallel execution capabilities.

With the MariaDB CONNECT Storage Engine the approach of linking directly the two databases through an ODBC connection allows to implement this ELT pattern. Doing data transformation at the database level is very beneficial. All the advanced features of the source or target databases can be used.

In our case transformation will be applied both at the source and target database. Let us consider the following ODBC table created with the MariaDB CONNECT Storage Engine

create Table lineitem2   ENGINE=CONNECT TABLE_TYPE=ODBC
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
CONNECTION='DSN=orcl;UID=scott;PWD=manager1';

We can now query this table. All the aggregation specified in the SRCDEF argument will be done on the remote Oracle RDBMS node and not on the MySQL node. Some extra computation can be done on the local MySQL table. Here we compute the sum of the 10 biggest aggregated quantities.

MariaDB [dbt3]> select sum(qt) from (select qt  from lineitem2  order by qt DESC limit 10) as derivedT;
+-----------+
| sum(qt)   |
+-----------+
|     31444 |
+-----------+

This example is very simplistic but is here to give you an idea of whan can be achieved whith this pattern. So if you want to move data from an Oracle (or any ODBC accessible) database to MariaDB you should give the MariaDB CONNECT storage engine a try.

We already have a few customers using the MariaDB CONNECT Storage Engine to simplify their data integration processes (currently with Oracle and Sybase databases).

If you come to FOSDEM 2014 MySQL devroom on Sunday Feb 2nd. I wil present and will be available to answer your questions. if you are here do not hesitae to register to the MySQL Community diner Saturday Feb 1rst.

Slides on the MariaDB CONNECT Storage Engine

Let us know about your test. You just need to download MariaDB 10.0.7 which includes the CONNECT storage engine. Give us your feedback : positive, negative, improvement requests, bugs, doc pb, …

Download the latest MariaDB 10.0.7 Beta release
CONNECT Storage Engine Documentation

MariaDB CONNECT Storage Engine and non MySQL syntax selects

When you pass a SQL command through MySQL it has to be compliant with MySQL syntax as it will have to go through the MySQL parser. Can we pass non MySQL syntax compatible selects to a target ODBC database ? MariaDB CONNECT Storage Engine allows through ODBC to access heterogeneous data sources like Oracle or Microsoft SQL Server.
In my previous post I showed you how to use the CONNECT Storage Engine ODBC table type to access an Oracle database. It was possible to run select commands against these remote ODBC tables. In many cases It would be very interesting to run the sql command directly on the target database.
The first reason would be for efficiency as this would avoid a lot of network roundtrips between the Oracle server and the MariaDB Server. The second reason would be to avoid the restriction on MySQL syntax compliance.

With the MySQL CONNECT storage Engine it is possible to define a remote ODBC table and to associate a SQL request to this table definition. This SQL request will be executed on the remote server. For example for a remote Oracle database it is possible to pass a request using proprietary syntax. In the case of Oracle suppose we want to use the Oracle proprietary “CONNECT BY” syntax that does not exists in MySQL, we will do it that way:

create table emp_hierarchy  
ENGINE=CONNECT 
TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager1' 
srcdef='SELECT empno, ename, mgr, LEVEL FROM emp CONNECT BY PRIOR empno = mgr;';

The table definition includes a srcdef argument that defines the SQL request that will be executed on the Oracle target database. When we request this table on MariaDB everything is in fact done on the Oracle server.

MariaDB [test]> select * from emp_hierarchy;
+-------+--------+------+-------+
| EMPNO | ENAME  | MGR  | LEVEL |
+-------+--------+------+-------+
|  7788 | SCOTT  | 7566 |     1 |
|  7876 | ADAMS  | 7788 |     2 |
…
|  7839 | KING   | NULL |     1 |
…
|  7521 | WARD   | 7698 |     3 |
+-------+--------+------+-------+
43 rows in set (2.54 sec)
MariaDB [test]>

Beside this basic example this open the door to the ability to execute any select on the target database. This allows for example to take benefit of Oracle’s In-Database SQL Analytics which includes a lot of powerful functions(Ranking, Windowing, Reporting Aggregates, LAG/LEAD, FIRST/LAST, Inverse Percentile, Hypothetical Rank and Distribution, Pattern Matching, Modeling, Advanced aggregations..). The same idea applies to powerful spatial or full text capabilities.
And do not forget that with the MariaDB CONNECT Storage Engine you can aggregate many of these table located on different server to be seen as a single table. For that you use the CONNECT TBL table type. This also gives the benefit of parallel execution on these remote servers.

What would be great would be to have someone write a MySQL procedure wrapper that transparently create the CONNECT table to run the query against(something like “callRemoteSQL”) ;-) .

MariaDB CONNECT Storage Engine access to Oracle 11GR2

MariaDB CONNECT Storage Engine allows to access heterogeneous datasourses. This includes various file formats. But this also includes ODBC accessible datasources
The CONNECT Storage Engine ODBC table type allows to access SQLite, Excel, SQL Server or Oracle databases. Some nice features of the ODBC CONNECT table type are:
- Auto discovery of table structure. This means that you do not need to specify the columns of the target table.
- Condition push down. This is an optimization that push filtering conditions to the target database. This can avoid a lot of network roundtrips.

Let us have a look and test the Oracle access through the CONNECT Storage Engine. I downloaded, unzipped and installed Oracle 11gR2 in a virtual box.
Let us start Oracle 11gR2

[oracle@centos1 ~]$ sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 29 15:15:59 2013
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
SQL> connect sys/manager1 as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area  617975808 bytes
Fixed Size		    2215864 bytes
Variable Size		  377487432 bytes
Database Buffers	  230686720 bytes
Redo Buffers		    7585792 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Now let us start the oracle listener (this concept doest not exist in MySQL).

[oracle@centos1 ~]$ lsnrctl start
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 29-NOV-2013 20:08:20
Copyright (c) 1991, 2009, Oracle.  All rights reserved.
Starting /u01/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.1.0 - Production
System parameter file is /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/centos1/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=centos1)(PORT=1521)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date                29-NOV-2013 20:08:20
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/centos1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=centos1)(PORT=1521)))
Services Summary...
Service "orcl" has 1 instance(s).
  Instance "orcl", status UNKNOWN, has 1 handler(s) for this service...
Service "plsextproc" has 1 instance(s).
  Instance "plsextproc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@centos1 ~]$

On the machine with the MariaDB CONNECT Storage Engine we need to install the Oracle ODBC Driver. We obtain it through the installation of the Oracle instant client. Let us download and unzip the instant client. Installation is just unzip of the instant client zip files.

unzip instantclient-basic-linux.x64-12.1.0.1.0.zip
unzip instantclient-odbc-linux.x64-12.1.0.1.0.zip
unzip instantclient-sdk-linux.x64-12.1.0.1.0.zip

We need to add the path of the ODBC driver shared library. Let us ‘vi .bash_profile’ and add the required LD_LIBRARY_PATH.

CLIENT_HOME=/home/oraclec/instantclient_12_1 ; export CLIENT_HOME
LD_LIBRARY_PATH=$CLIENT_HOME:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH

On linux we also need to install the unixODBC open source ODBC driver manager :

yum install unixODBC
yum install unixODBC-devel

The configuration of the unixODBC driver manager is simple and just requires 2 files updates:

/etc/odbcinst.ini wich points the driver to the installed shared library

[Oracle 12c ODBC driver]
Description     = Oracle ODBC driver for Oracle 12c
Driver          = /home/oraclec/instantclient_12_1/libsqora.so.12.1

/etc/odbc.ini wich defines here the DSN that will be used for the ODBC connection

[orcl]
Driver       = Oracle 12c ODBC driver
ServerName   = //centos1.localdomain:1521/oracle
DSN          = orcl
UserName = scott
Password = manager1

How to test that the unixODBC + oracle ODBC driver is correctly installed ?
The unixODBC includes an ODBC client tool ‘isql’. Let us use it to test our installation and the DSN we just created.

root@centos1 ~]# isql orcl scott manager1
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select * from emp;
+-------+-----------+----------+-------+--------------------+----------
| EMPNO | ENAME     | JOB      | MGR   | HIREDATE           | SAL
+-------+-----------+----------+-------+--------------------+----------
| 7369  | SMITH     | CLERK    | 7902  | 1980-12-17 00:00:00| 800
…
+-------+-----------+----------+-------+--------------------+----------
SQLRowCount returns -1
15 rows fetched
SQL>

Now let us create a CONNECT table of ODBC type. we will make this MariaDB ODBC table point to the Oracle 11g ‘scott.emp’ table.

[root@centos1 ~]# mysql -u root -pmanager1 -S /usr/local/mariadb10data/mariadb10.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.0.6-MariaDB-log MariaDB Server
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
MariaDB [test]> create table emp  ENGINE=CONNECT TABLE_TYPE=ODBC tabname='emp' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';
ERROR 1105 (HY000): Unsupported table type ODBC

Come on ! what happened ? I just installed the latest 10.0.6 MariaDB and the CONNECT storage Engine does not support the ODBC table type. That is true and that is an issue caused by old machine being used to build the binary the unnixODBC. This problem is not present in the MariaDB 10.0.6 rpm packages. But I have multiple tar based MariaDB version installed on my Centos box(in fact a virtual box vm) and I do not want any rpm based installation. Hopefully a storage engine is just a shared library.

Let us download MariaDB-10.0.6-centos6-x86_64-connect-engine.rpm and unpack the rpm without installing it :

[root@centos1 ~]# rpm2cpio ../MariaDB-10.0.6-centos6-x86_64-connect-engine.rpm | cpio --extract  --make-directories
8216 blocs
[root@centos1 plugin]# cp ./usr/lib64/mysql/plugin/ha_connect.so ha_connect.so /usr/local/mariadb10/lib/plugin/

We now have the CONNECT storage with the support of the ODBC table type. Our second try :

[root@centos1 local]# mysql -u root -pmanager1 -S /usr/local/mariadb10data/mariadb10.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.0.6-MariaDB-log MariaDB Server
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use test
Database changed
MariaDB [test]> create table toto  ENGINE=CONNECT TABLE_TYPE=ODBC tabname='emp' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';
ERROR 1105 (HY000): [unixODBC][Driver Manager]Can't open lib '/home/oraclec/instantclient_12_1/libsqora.so.12.1' : file not found

?
I had forgotten that the mysql service is run under the mysql linux user. This user also needs to have access to the Oracle ODBC driver (libsqora.so.12.1 shared library provided by the Oracle instant client).

CLIENT_HOME=/home/oraclec/instantclient_12_1 ; export CLIENT_HOME
LD_LIBRARY_PATH=$CLIENT_HOME:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH

Once done everything should goes better:

MariaDB [test]> create table toto  ENGINE=CONNECT TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';Query OK, 0 rows affected (0.79 sec)
MariaDB [test]>
MariaDB [test]> select * from toto;
+-------+--------+-----------+------+------------+---------+---------+--------+
| EMPNO | ENAME  | JOB       | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO |
+-------+--------+-----------+------+------------+---------+---------+--------+
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800.00 |    NULL |     20 |
…
|  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300.00 |    NULL |     10 |
+-------+--------+-----------+------+------------+---------+---------+--------+
14 rows in set (0.09 sec)
 
MariaDB [test]> insert into toto values ( 8000,'serge','peintre',7902,'1980-12-17',123,234,20);
Query OK, 1 row affected (0.15 sec)
 
MariaDB [test]> select * from toto;+-------+--------+-----------+------+------------+---------+---------+--------+
| EMPNO | ENAME  | JOB       | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO |
+-------+--------+-----------+------+------------+---------+---------+--------+
|  8000 | serge  | peintre   | 7902 | 1980-12-17 |  123.00 |  234.00 |     20 |
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800.00 |    NULL |     20 |
…
|  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300.00 |    NULL |     10 |
+-------+--------+-----------+------+------------+---------+---------+--------+
15 rows in set (0.03 sec)

That looks nice. I have created an ODBC table pointing to a remote Oracle 11g table. That was quite simple as the auto discovery mechanism has automatically discovered the structure of the Oracle table and converted Oracle data types to compatible mysql column type.
Let us try to update Oracle data through MySQL :

MariaDB [test]> update toto set sal=999 where empno=8000;
ERROR 1296 (HY000): Got error 174 'No DELETE/UPDATE of ODBC tablesd' from CONNECT

This is not supported ! That is right. This ODBC table update feature has been developed and is currently only available in the launchpad repository of the CONNECT Storage Engine.
CONNECT Storage Engine Sources on launchpad

MariaDB CONNECT Storage Engine replay & slides available

The slides and replay of yesterday’s webinar on the MariaDB CONNECT storage engine have just been posted. First I want to thank the numerous attendees.
You have shown great interest on the parallel execution of query on distributed MySQL Servers. I agree this is cool.
The ODBC capabilities seems also to generate interest. This make it simple to access an ODBC datasource (SQLServer, Oracle …) from plain MySQL syntax.

Here to view the replay
Here to get the slides

Let us know about your test. You just need to download MariaDB 10.0.5 which includes the CONNECT storage engine. Give us your feedback : positive, negative, improvement requests, bugs, doc pb, …

Here to download the latest MariaDB 10.0.5 Beta release
Here to get the CONNECT Storage Engine Documentation

MariaDB CONNECT Storage Engine and parallelism

The CONNECT Storage engine implement the concept of a table made of multiple tables. These underlying tables can be distributed remotely. For example the underlying remote tables can be of ODBC or MySQL table type. this allows to execute distributed queries. What is nice is that we can execute this distributed query with parallelism.

How does it work ?

To explain it let us suppose we have 4 nodes : Node0 and Node1 Node2 Node3
Node0 has a MariaDB 10.0.4 installed with the CONNECT storage engine activated.

MariaDB [dbt3]> install plugin connect  soname 'ha_connect.so';

On the 3 other nodes we have MariaDB or plain MySQL installed.
the ‘lineitem’ table of the dbt3 benchmark has been created and loaded with one chunk of the ‘lineitem’ table on each node. This is a sharded table. For example to generate the second chunk of 3 chunks ‘lineitem’ table:

./dbgen -T L -fF -q -b dists.dss -s 1 -C 3 -S 2

Now on node0 we can create a CONNECT table. This table definition embed a SRCDEF parameter that force the computation of an aggregate on the remote server :

create Table lineitem1   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node1:3306/dbt3/lineitem3';

if we do a select on this table on node0 we get the result data that has been aggregated on node1. We access only one shard of the data.

MariaDB [dbt3]> select * from lineitem1;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
10000 rows in set (26.82 sec)

Let us now do the same on for node2 and node3 :

create Table lineitem2   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node2:3306/dbt3/lineitem3';
 
create Table lineitem3   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node3:3306/dbt3/lineitem3';

Now we can use the ability of the the CONNECT storage engine to build a table that allows to see the 3 shards as a single table :

create Table alllineitem   (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT
TABLE_TYPE=TBL
table_list='lineitem1,lineitem2,lineitem3';

We can now query this table. All the aggregation computation will be done on the remote nodes and not on node0:

MariaDB [dbt3]> select * from alllineitem;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9998 | 2395 |
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
30000 rows in set (1 min 19.23 sec)

This basically takes 3 times longer than querying a single shard. This is because the request is executed sequentially on node 1 2 and 3.

We can request the CONNECT storage engine to do the job in parallel by adding the extra option : option_list=’thread=1′;

create Table alllineitem2   (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT
TABLE_TYPE=TBL
table_list='lineitem1,lineitem2,lineitem3' option_list='thread=1';

If we rerun the query we get

MariaDB [dbt3]> select * from alllineitem2;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
30000 rows in set (26.69 sec)

We go back to the same execution time as with a single shard ! Is not that cool. I am sure you have a lot of objections, ideas about this approach. I do. But it is nice anyways.

If you want to know more do not forget to attend to the MariaDB CONNECT Storage Engine webinar :
November 7, 2013 – 5 pm CET / 4 PM UTC / 8 AM PST Register

and if you want to give it a try you are welcome and the doc to help is here:
MariaDB Connect Storage Engine documentation
Your help is welcome : bug reports, documentation fixes, usage feedback, suggestions.
Be indulgent it is still Alpha software ;-)

MariaDB CONNECT storage engine webinar / 7 Nov

This week I will have the opportunity to deliver a webinar on the MariaDB CONNECT Storage Engine.
The MariaDB CONNECT Storage Engine allows to access various file formats (CSV, XML, Excel, etc). It give access to any ODBC data sources (Oracle, DB2, SQLServer, SQLite etc). It also allows to access remote MySQL tables. A CONNECT table itself can be a set of remote MySQL tables. This opens the door to interesting distributed architectures that can help to address big data.
This webinar is a technical overview of the MariaDB CONNECT Storage Engine and it will show you typical use cases to help you get benefits from your existing data sources.
We will see how to use the MariaDB CONNECT Storage Engine.

Register for this webinar to learn what benefits you can get from the MariaDB CONNECT Storage Engine :
November 7, 2013 – 5 pm CET / 4 PM UTC / 8 AM PST Register

MySQL Fabric with MariaDB Galera Cluster ?

MySQL Fabric is a very promising sharding framework. If I take Ulf Wendel definition of MySQL Fabric :

MySQL Fabric is an administration tool to build large “farms” of MySQL servers. In its most basic form, a farm is a collection of MySQL Replication clusters. In its most advanced form, a farm is a collection of MySQL Replication clusters with sharding on top.

So MySQL Fabric takes care of two very orthogonal features :

  • High availability of servers
  • Sharding of data

Let us forget about sharding and look at the High availability infrastructure.

Servers are included in groups, called “High Availability Groups” when we talk about HA.
Each Server has an associated Status (or Role): primary secondary, spare
Each Server has also a mode : Offline, Read-only, and Read-Write.
The implementation has been made to allow various HA implementation patterns.
The most common HA pattern is the Master/Slave HA group ( in that case we should call it a “replica set” which is the terminology used in MongoDB or Facebook MySQL Pool Scanner (MPS).

Mats Kindahl in his blog post on MySQL Fabric High Availability Groups mentioned that other HA solutions are possible for an availability group :

  • Shared Storage with SAN or NAS
  • Replicated storage like DRBD
  • MySQL Cluster shared nothing cluster

In the case of a HA group based on MySQL Cluster the group is self-managing regarding HA and MySQL Fabric does not handles the failover. With the “Shared Storage” and “Replicated storage” availability groups the secondary servers will be offline.

So one of my ideas that I hope is feasible would be to use MariaDB Galera Cluster as another HA solution with MySQL Fabric. The main advantage of this solution relates to the characteristics of MariaDB Galera Cluster. MariaDB Galera Cluster is an Active-active multi-master topology with synchronous replication. MariaDB Galera Cluster being innoDB based does not carry all the usage limitations associated with MySQL Cluster (main one being limited join capabilities).

Regarding to MySQL fabric the behavior of an availability group based on MariaDB Galera Cluster is identical to MySQL Cluster. It is a self-managing availability group.

MariaDB Galera Cluster

MariaDB Galera Cluster
Getting Started with MariaDB Galera Cluster

MySQL Fabric

MySQL Fabric: A new kid in the MySQL sharding world  2013-10-09 Serge Frezefond
MySQL Fabric: High Availability Groups  2013-10-21 Mats Kindahl
A Brief Introduction to MySQL Fabric  2013-09-21 Mats Kindahl
MySQL Fabric – Sharding – Introduction  2013-09-21 VN (Narayanan Venkateswaran)
MySQL Fabric – Sharding – Simple Example  2013-09-22 VN (Narayanan Venkateswaran)
MySQL Fabric – Sharding – Shard Maintenance  2013-09-27 VN
MySQL Fabric – Sharding – Migrating From an Unsharded to a Sharded Setup  2013-09-22 VN
Installing MySQL Fabric on Windows  2013-10-03 Todd Farmer
MySQL 5.7 Fabric: any good?  2013-09-23 Ulf Wendel

Writing a Fault-tolerant Database Application using MySQL Fabric  2013-09-21 Alfranio Junior
Sharding PHP with MySQL Fabric  2013-10-09 Johannes Schlüter
MySQL Fabric support in Connector/Python  2013-09-22 Geert Vanderkelen
MySQL Connector/J with Fabric Support  2013-09-21 Jess Balint

InfiniDB column store moves to open source ! Congrats !

Like TokuDB, InfiniDB is now a fully open source server product. In the past infiniDB was “almost open source”. The open source version was an old release with no access to the advance functions like MPP multi-server execution. This is no more the case. With InfiniDB 4 the open source version is the latest release giving access to all the advanced functionalities.

This is a really great move for the MariaDB / MySQL ecosystem. InfiniDB and TokuDB were two unique pieces of technology in the MySQL ecosystem. Having them both open source will trigger a broader adoption that will benefit to their enterprise releases.

Having in the MySQL ecosystem a column oriented database specifically designed for big data analytics is filling a real customer need. Column stores overcome the query limitations that exist in traditional RDBMS. InfiniDB is extremely good at using multicore server and massively parallel processing with multi-servers. InfiniDB can scale up on multi-cores server and scale out on a distributed architecture.

So thanks a lot to the InfiniDB team for their move. This will greatly benefit to them and to the dynamism of the MySQL / MariaDB ecosystem.

InfiniDB community web site
InfiniDB 4 sources
InfiniDB Enterprise web site