The MariaDB CONNECT Storage Engine allows to access heterogeneous data sources. In my previous post I show you how to use the MariaDB CONNECT Storage Engine to access an Oracle database. This is quite easy through the CONNECT Storage Engine ODBC table type.
Keep on reading!
Author: Serge
MariaDB CONNECT Storage Engine and non MySQL syntax selects
When you pass a SQL command through MySQL it has to be compliant with MySQL syntax as it will have to go through the MySQL parser. Can we pass non MySQL syntax compatible selects to a target ODBC database ? MariaDB CONNECT Storage Engine allows through ODBC to access heterogeneous data sources like Oracle or Microsoft SQL Server.
In my previous post I showed you how to use the CONNECT Storage Engine ODBC table type to access an Oracle database. It was possible to run select commands against these remote ODBC tables. In many cases It would be very interesting to run the sql command directly on the target database.
The first reason would be for efficiency as this would avoid a lot of network roundtrips between the Oracle server and the MariaDB Server. The second reason would be to avoid the restriction on MySQL syntax compliance.
With the MySQL CONNECT storage Engine it is possible to define a remote ODBC table and to associate a SQL request to this table definition. This SQL request will be executed on the remote server. For example for a remote Oracle database it is possible to pass a request using proprietary syntax. In the case of Oracle suppose we want to use the Oracle proprietary "CONNECT BY" syntax that does not exists in MySQL, we will do it that way:
create table emp_hierarchy ENGINE=CONNECT TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager1' srcdef='SELECT empno, ename, mgr, LEVEL FROM emp CONNECT BY PRIOR empno = mgr;';
The table definition includes a srcdef argument that defines the SQL request that will be executed on the Oracle target database. When we request this table on MariaDB everything is in fact done on the Oracle server.
MariaDB [test]> select * from emp_hierarchy; +-------+--------+------+-------+ | EMPNO | ENAME | MGR | LEVEL | +-------+--------+------+-------+ | 7788 | SCOTT | 7566 | 1 | | 7876 | ADAMS | 7788 | 2 | … | 7839 | KING | NULL | 1 | … | 7521 | WARD | 7698 | 3 | +-------+--------+------+-------+ 43 rows in set (2.54 sec) MariaDB [test]>
Beside this basic example this open the door to the ability to execute any select on the target database. This allows for example to take benefit of Oracle's In-Database SQL Analytics which includes a lot of powerful functions(Ranking, Windowing, Reporting Aggregates, LAG/LEAD, FIRST/LAST, Inverse Percentile, Hypothetical Rank and Distribution, Pattern Matching, Modeling, Advanced aggregations..). The same idea applies to powerful spatial or full text capabilities.
And do not forget that with the MariaDB CONNECT Storage Engine you can aggregate many of these table located on different server to be seen as a single table. For that you use the CONNECT TBL table type. This also gives the benefit of parallel execution on these remote servers.
What would be great would be to have someone write a MySQL procedure wrapper that transparently create the CONNECT table to run the query against(something like "callRemoteSQL") ;-).
MariaDB CONNECT Storage Engine access to Oracle 11GR2
MariaDB CONNECT Storage Engine allows to access heterogeneous datasourses. This includes various file formats. But this also includes ODBC accessible datasources
The CONNECT Storage Engine ODBC table type allows to access SQLite, Excel, SQL Server or Oracle databases. Some nice features of the ODBC CONNECT table type are:
- Auto discovery of table structure. This means that you do not need to specify the columns of the target table.
- Condition push down. This is an optimization that push filtering conditions to the target database. This can avoid a lot of network roundtrips.
Let us have a look and test the Oracle access through the CONNECT Storage Engine. I downloaded, unzipped and installed Oracle 11gR2 in a virtual box.
Let us start Oracle 11gR2
[oracle@centos1 ~]$ sqlplus /nolog SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 29 15:15:59 2013 Copyright (c) 1982, 2009, Oracle. All rights reserved. SQL> connect sys/manager1 as sysdba Connected to an idle instance. SQL> startup ORACLE instance started. Total System Global Area 617975808 bytes Fixed Size 2215864 bytes Variable Size 377487432 bytes Database Buffers 230686720 bytes Redo Buffers 7585792 bytes Database mounted. Database opened. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options
Now let us start the oracle listener (this concept doest not exist in MySQL).
[oracle@centos1 ~]$ lsnrctl start LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 29-NOV-2013 20:08:20 Copyright (c) 1991, 2009, Oracle. All rights reserved. Starting /u01/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait... TNSLSNR for Linux: Version 11.2.0.1.0 - Production System parameter file is /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora Log messages written to /u01/app/oracle/diag/tnslsnr/centos1/listener/alert/log.xml Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=centos1)(PORT=1521))) Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production Start Date 29-NOV-2013 20:08:20 Uptime 0 days 0 hr. 0 min. 0 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora Listener Log File /u01/app/oracle/diag/tnslsnr/centos1/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=centos1)(PORT=1521))) Services Summary... Service "orcl" has 1 instance(s). Instance "orcl", status UNKNOWN, has 1 handler(s) for this service... Service "plsextproc" has 1 instance(s). Instance "plsextproc", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully [oracle@centos1 ~]$
On the machine with the MariaDB CONNECT Storage Engine we need to install the Oracle ODBC Driver. We obtain it through the installation of the Oracle instant client. Let us download and unzip the instant client. Installation is just unzip of the instant client zip files.
unzip instantclient-basic-linux.x64-12.1.0.1.0.zip unzip instantclient-odbc-linux.x64-12.1.0.1.0.zip unzip instantclient-sdk-linux.x64-12.1.0.1.0.zip
We need to add the path of the ODBC driver shared library. Let us 'vi .bash_profile' and add the required LD_LIBRARY_PATH.
CLIENT_HOME=/home/oraclec/instantclient_12_1 ; export CLIENT_HOME LD_LIBRARY_PATH=$CLIENT_HOME:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
On linux we also need to install the unixODBC open source ODBC driver manager :
yum install unixODBC yum install unixODBC-devel
The configuration of the unixODBC driver manager is simple and just requires 2 files updates:
/etc/odbcinst.ini wich points the driver to the installed shared library
[Oracle 12c ODBC driver] Description = Oracle ODBC driver for Oracle 12c Driver = /home/oraclec/instantclient_12_1/libsqora.so.12.1
/etc/odbc.ini wich defines here the DSN that will be used for the ODBC connection
[orcl] Driver = Oracle 12c ODBC driver ServerName = //centos1.localdomain:1521/oracle DSN = orcl UserName = scott Password = manager1
How to test that the unixODBC + oracle ODBC driver is correctly installed ?
The unixODBC includes an ODBC client tool 'isql'. Let us use it to test our installation and the DSN we just created.
root@centos1 ~]# isql orcl scott manager1 +---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | quit | | | +---------------------------------------+ SQL> select * from emp; +-------+-----------+----------+-------+--------------------+---------- | EMPNO | ENAME | JOB | MGR | HIREDATE | SAL +-------+-----------+----------+-------+--------------------+---------- | 7369 | SMITH | CLERK | 7902 | 1980-12-17 00:00:00| 800 … +-------+-----------+----------+-------+--------------------+---------- SQLRowCount returns -1 15 rows fetched SQL>
Now let us create a CONNECT table of ODBC type. we will make this MariaDB ODBC table point to the Oracle 11g 'scott.emp' table.
[root@centos1 ~]# mysql -u root -pmanager1 -S /usr/local/mariadb10data/mariadb10.sock Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 3 Server version: 10.0.6-MariaDB-log MariaDB Server Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> MariaDB [test]> create table emp ENGINE=CONNECT TABLE_TYPE=ODBC tabname='emp' CONNECTION='DSN=orcl;UID=scott;PWD=manager1'; ERROR 1105 (HY000): Unsupported table type ODBC
Come on ! what happened ? I just installed the latest 10.0.6 MariaDB and the CONNECT storage Engine does not support the ODBC table type. That is true and that is an issue caused by old machine being used to build the binary the unnixODBC. This problem is not present in the MariaDB 10.0.6 rpm packages. But I have multiple tar based MariaDB version installed on my Centos box(in fact a virtual box vm) and I do not want any rpm based installation. Hopefully a storage engine is just a shared library.
Let us download MariaDB-10.0.6-centos6-x86_64-connect-engine.rpm and unpack the rpm without installing it :
[root@centos1 ~]# rpm2cpio ../MariaDB-10.0.6-centos6-x86_64-connect-engine.rpm | cpio --extract --make-directories 8216 blocs [root@centos1 plugin]# cp ./usr/lib64/mysql/plugin/ha_connect.so ha_connect.so /usr/local/mariadb10/lib/plugin/
We now have the CONNECT storage with the support of the ODBC table type. Our second try :
[root@centos1 local]# mysql -u root -pmanager1 -S /usr/local/mariadb10data/mariadb10.sock Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 3 Server version: 10.0.6-MariaDB-log MariaDB Server Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use test Database changed MariaDB [test]> create table toto ENGINE=CONNECT TABLE_TYPE=ODBC tabname='emp' CONNECTION='DSN=orcl;UID=scott;PWD=manager1'; ERROR 1105 (HY000): [unixODBC][Driver Manager]Can't open lib '/home/oraclec/instantclient_12_1/libsqora.so.12.1' : file not found
?
I had forgotten that the mysql service is run under the mysql linux user. This user also needs to have access to the Oracle ODBC driver (libsqora.so.12.1 shared library provided by the Oracle instant client).
CLIENT_HOME=/home/oraclec/instantclient_12_1 ; export CLIENT_HOME LD_LIBRARY_PATH=$CLIENT_HOME:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
Once done everything should goes better:
MariaDB [test]> create table toto ENGINE=CONNECT TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';Query OK, 0 rows affected (0.79 sec) MariaDB [test]> MariaDB [test]> select * from toto; +-------+--------+-----------+------+------------+---------+---------+--------+ | EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO | +-------+--------+-----------+------+------------+---------+---------+--------+ | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | NULL | 20 | … | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | NULL | 10 | +-------+--------+-----------+------+------------+---------+---------+--------+ 14 rows in set (0.09 sec) MariaDB [test]> insert into toto values ( 8000,'serge','peintre',7902,'1980-12-17',123,234,20); Query OK, 1 row affected (0.15 sec) MariaDB [test]> select * from toto;+-------+--------+-----------+------+------------+---------+---------+--------+ | EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO | +-------+--------+-----------+------+------------+---------+---------+--------+ | 8000 | serge | peintre | 7902 | 1980-12-17 | 123.00 | 234.00 | 20 | | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | NULL | 20 | … | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | NULL | 10 | +-------+--------+-----------+------+------------+---------+---------+--------+ 15 rows in set (0.03 sec)
That looks nice. I have created an ODBC table pointing to a remote Oracle 11g table. That was quite simple as the auto discovery mechanism has automatically discovered the structure of the Oracle table and converted Oracle data types to compatible mysql column type.
Let us try to update Oracle data through MySQL :
MariaDB [test]> update toto set sal=999 where empno=8000; ERROR 1296 (HY000): Got error 174 'No DELETE/UPDATE of ODBC tablesd' from CONNECT
This is not supported ! That is right. This ODBC table update feature has been developed and is currently only available in the launchpad repository of the CONNECT Storage Engine.
CONNECT Storage Engine Sources on launchpad
MariaDB CONNECT Storage Engine replay & slides available
The slides and replay of yesterday's webinar on the MariaDB CONNECT storage engine have just been posted. First I want to thank the numerous attendees.
You have shown great interest on the parallel execution of query on distributed MySQL Servers. I agree this is cool.
The ODBC capabilities seems also to generate interest. This make it simple to access an ODBC datasource (SQLServer, Oracle ...) from plain MySQL syntax.
Here to view the replay
Here to get the slides
Let us know about your test. You just need to download MariaDB 10.0.5 which includes the CONNECT storage engine. Give us your feedback : positive, negative, improvement requests, bugs, doc pb, ...
Here to download the latest MariaDB 10.0.5 Beta release
Here to get the CONNECT Storage Engine Documentation
MariaDB CONNECT Storage Engine and parallelism
The CONNECT Storage engine implement the concept of a table made of multiple tables. These underlying tables can be distributed remotely. For example the underlying remote tables can be of ODBC or MySQL table type. this allows to execute distributed queries. What is nice is that we can execute this distributed query with parallelism.
How does it work ?
To explain it let us suppose we have 4 nodes : Node0 and Node1 Node2 Node3
Node0 has a MariaDB 10.0.4 installed with the CONNECT storage engine activated.
MariaDB [dbt3]> install plugin connect soname 'ha_connect.so';
On the 3 other nodes we have MariaDB or plain MySQL installed.
the 'lineitem' table of the dbt3 benchmark has been created and loaded with one chunk of the 'lineitem' table on each node. This is a sharded table. For example to generate the second chunk of 3 chunks 'lineitem' table:
./dbgen -T L -fF -q -b dists.dss -s 1 -C 3 -S 2
Now on node0 we can create a CONNECT table. This table definition embed a SRCDEF parameter that force the computation of an aggregate on the remote server :
create Table lineitem1 ENGINE=CONNECT TABLE_TYPE=MYSQL SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey' connection='mysql://proxy:manager1@node1:3306/dbt3/lineitem3';
if we do a select on this table on node0 we get the result data that has been aggregated on node1. We access only one shard of the data.
MariaDB [dbt3]> select * from lineitem1; +-----------+------+ | l_suppkey | qt | +-----------+------+ | 1 | 2302 | | 2 | 1960 | ... | 9999 | 1908 | | 1000 | 1756 | +-----------+------+ 10000 rows in set (26.82 sec)
Let us now do the same on for node2 and node3 :
create Table lineitem2 ENGINE=CONNECT TABLE_TYPE=MYSQL SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey' connection='mysql://proxy:manager1@node2:3306/dbt3/lineitem3'; create Table lineitem3 ENGINE=CONNECT TABLE_TYPE=MYSQL SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey' connection='mysql://proxy:manager1@node3:3306/dbt3/lineitem3';
Now we can use the ability of the the CONNECT storage engine to build a table that allows to see the 3 shards as a single table :
create Table alllineitem (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT TABLE_TYPE=TBL table_list='lineitem1,lineitem2,lineitem3';
We can now query this table. All the aggregation computation will be done on the remote nodes and not on node0:
MariaDB [dbt3]> select * from alllineitem; +-----------+------+ | l_suppkey | qt | +-----------+------+ | 1 | 2302 | | 2 | 1960 | ... | 9998 | 2395 | | 9999 | 1908 | | 1000 | 1756 | +-----------+------+ 30000 rows in set (1 min 19.23 sec)
This basically takes 3 times longer than querying a single shard. This is because the request is executed sequentially on node 1 2 and 3.
We can request the CONNECT storage engine to do the job in parallel by adding the extra option : option_list='thread=1';
create Table alllineitem2 (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT TABLE_TYPE=TBL table_list='lineitem1,lineitem2,lineitem3' option_list='thread=1';
If we rerun the query we get
MariaDB [dbt3]> select * from alllineitem2; +-----------+------+ | l_suppkey | qt | +-----------+------+ | 1 | 2302 | | 2 | 1960 | ... | 9999 | 1908 | | 1000 | 1756 | +-----------+------+ 30000 rows in set (26.69 sec)
We go back to the same execution time as with a single shard ! Is not that cool. I am sure you have a lot of objections, ideas about this approach. I do. But it is nice anyways.
If you want to know more do not forget to attend to the MariaDB CONNECT Storage Engine webinar :
November 7, 2013 - 5 pm CET / 4 PM UTC / 8 AM PST Register
and if you want to give it a try you are welcome and the doc to help is here:
MariaDB Connect Storage Engine documentation
Your help is welcome : bug reports, documentation fixes, usage feedback, suggestions.
Be indulgent it is still Alpha software 😉
MariaDB CONNECT storage engine webinar / 7 Nov
This week I will have the opportunity to deliver a webinar on the MariaDB CONNECT Storage Engine.
The MariaDB CONNECT Storage Engine allows to access various file formats (CSV, XML, Excel, etc). It give access to any ODBC data sources (Oracle, DB2, SQLServer, SQLite etc). It also allows to access remote MySQL tables. A CONNECT table itself can be a set of remote MySQL tables. This opens the door to interesting distributed architectures that can help to address big data.
This webinar is a technical overview of the MariaDB CONNECT Storage Engine and it will show you typical use cases to help you get benefits from your existing data sources.
We will see how to use the MariaDB CONNECT Storage Engine.
Register for this webinar to learn what benefits you can get from the MariaDB CONNECT Storage Engine :
November 7, 2013 - 5 pm CET / 4 PM UTC / 8 AM PST Register
MySQL Fabric with MariaDB Galera Cluster ?
MySQL Fabric is a very promising sharding framework. If I take Ulf Wendel definition of MySQL Fabric :
MySQL Fabric is an administration tool to build large “farms” of MySQL servers. In its most basic form, a farm is a collection of MySQL Replication clusters. In its most advanced form, a farm is a collection of MySQL Replication clusters with sharding on top.
So MySQL Fabric takes care of two very orthogonal features :
- High availability of servers
- Sharding of data
Let us forget about sharding and look at the High availability infrastructure.
Servers are included in groups, called "High Availability Groups" when we talk about HA.
Each Server has an associated Status (or Role): primary secondary, spare
Each Server has also a mode : Offline, Read-only, and Read-Write.
The implementation has been made to allow various HA implementation patterns.
The most common HA pattern is the Master/Slave HA group ( in that case we should call it a "replica set" which is the terminology used in MongoDB or Facebook MySQL Pool Scanner (MPS).
Mats Kindahl in his blog post on MySQL Fabric High Availability Groups mentioned that other HA solutions are possible for an availability group :
- Shared Storage with SAN or NAS
- Replicated storage like DRBD
- MySQL Cluster shared nothing cluster
In the case of a HA group based on MySQL Cluster the group is self-managing regarding HA and MySQL Fabric does not handles the failover. With the "Shared Storage" and "Replicated storage" availability groups the secondary servers will be offline.
So one of my ideas that I hope is feasible would be to use MariaDB Galera Cluster as another HA solution with MySQL Fabric. The main advantage of this solution relates to the characteristics of MariaDB Galera Cluster. MariaDB Galera Cluster is an Active-active multi-master topology with synchronous replication. MariaDB Galera Cluster being innoDB based does not carry all the usage limitations associated with MySQL Cluster (main one being limited join capabilities).
Regarding to MySQL fabric the behavior of an availability group based on MariaDB Galera Cluster is identical to MySQL Cluster. It is a self-managing availability group.
MariaDB Galera Cluster
MariaDB Galera Cluster
Getting Started with MariaDB Galera Cluster
MySQL Fabric
MySQL Fabric: A new kid in the MySQL sharding world 2013-10-09 Serge Frezefond
MySQL Fabric: High Availability Groups 2013-10-21 Mats Kindahl
A Brief Introduction to MySQL Fabric 2013-09-21 Mats Kindahl
MySQL Fabric - Sharding - Introduction 2013-09-21 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Simple Example 2013-09-22 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Shard Maintenance 2013-09-27 VN
MySQL Fabric - Sharding - Migrating From an Unsharded to a Sharded Setup 2013-09-22 VN
Installing MySQL Fabric on Windows 2013-10-03 Todd Farmer
MySQL 5.7 Fabric: any good? 2013-09-23 Ulf Wendel
Writing a Fault-tolerant Database Application using MySQL Fabric 2013-09-21 Alfranio Junior
Sharding PHP with MySQL Fabric 2013-10-09 Johannes Schlüter
MySQL Fabric support in Connector/Python 2013-09-22 Geert Vanderkelen
MySQL Connector/J with Fabric Support 2013-09-21 Jess Balint
InfiniDB column store moves to open source ! Congrats !
Like TokuDB, InfiniDB is now a fully open source server product. In the past infiniDB was "almost open source". The open source version was an old release with no access to the advance functions like MPP multi-server execution. This is no more the case. With InfiniDB 4 the open source version is the latest release giving access to all the advanced functionalities.
This is a really great move for the MariaDB / MySQL ecosystem. InfiniDB and TokuDB were two unique pieces of technology in the MySQL ecosystem. Having them both open source will trigger a broader adoption that will benefit to their enterprise releases.
Having in the MySQL ecosystem a column oriented database specifically designed for big data analytics is filling a real customer need. Column stores overcome the query limitations that exist in traditional RDBMS. InfiniDB is extremely good at using multicore server and massively parallel processing with multi-servers. InfiniDB can scale up on multi-cores server and scale out on a distributed architecture.
So thanks a lot to the InfiniDB team for their move. This will greatly benefit to them and to the dynamism of the MySQL / MariaDB ecosystem.
InfiniDB community web site
InfiniDB 4 sources
InfiniDB Enterprise web site
A new kid in the MySQL sharding world
MySQL Connect 2013 has been a great edition. There was of course a lot of nice announcements of improvements in the the core MySQL server technology. One of the major announcement that received a lot of buzz was MySQL Fabric. MySQL Fabric is an infrastructure component aimed at simplifying construction of a highly available, sharded, MySQL server based architecture.
Horizontal scale out for MySQL is a hard problem. The MySQL sharding requirement has up till now only be addressed in a non general solution way. Using MySQL at Scale remains a big challenge.
All the big actors of the web have been faced with this scale out issue. They all have developed their own tools/framework to address this need. The new cloud providers have also been faced with this requirement when trying to offer database as a service solutions around MySQL with transparent elasticity.
We can mention the following Sharding solutions davelopped by the key players of the web :
- Google/Youtube has built Vitess (in Go) and open sourced it
- Twitter has built Gizzard (in Scala) and open sourced it
- Tumblr has built Jetpants (in Ruby) and open sourced it
- Facebook has also it own sharding framework which is a range based model. The HA is addressed with MHA. they have presented their architecture at various events
Theses frameworks cover the needs:
- Master promotions,
- Cloning slaves,
- Supports a range-based sharding scheme for MySQL
- Rebalancing shards,
- Split a range-based shard into N new shards
High Availability is also addressed by most of these frameworks. MySQL Fabric also handles HA. All the improvements made around replication have helped handle correctly the HA part of the architecture in a more resilient way.
So now with the introduction of MySQL Fabric there is new sharding framework available. So what ? Interesting ?
Yes it is very interesting.
First, MySQL fabric has been written in python which in my opinion is a good point as devops love python.
Second, MySQL fabric has been released under the GPL license which is also a very good point.
Last MySQL Fabric is aimed as a general purpose sharding framework.
MySQL fabric is quite simple :
- A Fabric Server holding and serving the sharding metadata.
- A set of commands to do various action (split shard ,…)
- Specific connectors : Java, Python to request shard location to the Fabric server through XMLRPC call.
Two questions for me:
- What is the status of the PHP and C drivers regarding MySQL Fabric ?
- Is it possible to fully abstract the client code from the fact that the data is sharded. Some connector config could specify the fabric server. The client code would then not be changed at all.
======
A Brief Introduction to MySQL Fabric 2013-09-21 Mats Kindahl
MySQL Fabric - Sharding - Introduction 2013-09-21 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Simple Example 2013-09-22 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Shard Maintenance 2013-09-27 VN (Narayanan Venkateswaran)
MySQL Fabric - Sharding - Migrating From an Unsharded to a Sharded Setup 2013-09-22 VN (Narayanan Venkateswaran)
Installing MySQL Fabric on Windows 2013-10-03 Todd Farmer
MySQL 5.7 Fabric: any good? 2013-09-23 Ulf Wendel
---
Writing a Fault-tolerant Database Application using MySQL Fabric 2013-09-21 Alfranio Junior
MySQL Fabric support in Connector/Python 2013-09-22 Geert Vanderkelen
MySQL Connector/J with Fabric Support 2013-09-21 Jess Balint
Filtering MariaDB Audit Plugin data
When full auditing is activated with the the MariaDB Audit Plugin a large volume of audit data is generated and it can put an extra burden on the server. The MariaDB audit plugin offers the two following setup variables that allow to restrict what data will be logged.
MariaDB [test]> show variables like '%audit%users'; +-----------------------------+-----------------------+ | Variable_name | Value | +-----------------------------+-----------------------+ | server_audit_excl_users | | | server_audit_incl_users | | +-----------------------------+-----------------------+ 14 rows in set (0.00 sec)
These variables allow to restrict the data that is going to be pushed to the audit log.
You can say: I only want to log data from these users
MariaDB [test]> set global server_audit_syslog_incl_users='proxy';
Only activities coming from the user 'proxy' will be logged.
Or you can say : I want to log data except for these users.
MariaDB [test]> set global server_audit_excl_users='user2';
No activities coming from the user 'user2' will be logged.
What happens if a user is both included and excluded ? If a user is both included and excluded database activities for that user will be logged. Include has priority over exclude. It is important to notice that the audit plugin logs data only based on the username. This username is different from the MariaDB and MySQL user definition. For them a user is combination of the username and hostname ('user'@'host').
This capability to filter audit data is crucial to avoid too much burden on the audited server and to avoid generating too much volume of audit data.