Category: analytics

MariaDB and Native JSON support ?

SergeJanuary 20, 2016November 22, 2018Leave a comment

A question raised by my previous post is : What about MariaDB and native JSON support ? In my previous post I mentioned the possibility to use the MariaDB CONNECT storage Engine to store and access JSON content in normal text field. Of course having a native JSON datatype brings more value. It introduces JSON validation, a more efficient access to attribute through an optimized storage format.
Keep on reading!

Using JSON’s Arrays for MariaDB Dynamic Columns

SergeDecember 4, 2015November 22, 2018Leave a comment

The JSON format includes the concept of array. A JSON object cant contain an attribute of array type. We have seen that we can use the MariaDB CONNECT Storage Engine provided UDFs (user defined functions) to implement dynamic columns.
Keep on reading!

MariaDB CONNECT Storage Engine replay & slides available

SergeNovember 8, 2013Leave a comment

The slides and replay of yesterday's webinar on the MariaDB CONNECT storage engine have just been posted. First I want to thank the numerous attendees.
You have shown great interest on the parallel execution of query on distributed MySQL Servers. I agree this is cool.
The ODBC capabilities seems also to generate interest. This make it simple to access an ODBC datasource (SQLServer, Oracle ...) from plain MySQL syntax.

Here to view the replay
Here to get the slides

Let us know about your test. You just need to download MariaDB 10.0.5 which includes the CONNECT storage engine. Give us your feedback : positive, negative, improvement requests, bugs, doc pb, ...

Here to download the latest MariaDB 10.0.5 Beta release
Here to get the CONNECT Storage Engine Documentation

MariaDB CONNECT Storage Engine and parallelism

SergeNovember 5, 2013Leave a comment

The CONNECT Storage engine implement the concept of a table made of multiple tables. These underlying tables can be distributed remotely. For example the underlying remote tables can be of ODBC or MySQL table type. this allows to execute distributed queries. What is nice is that we can execute this distributed query with parallelism.

How does it work ?

To explain it let us suppose we have 4 nodes : Node0 and Node1 Node2 Node3
Node0 has a MariaDB 10.0.4 installed with the CONNECT storage engine activated.

MariaDB [dbt3]> install plugin connect  soname 'ha_connect.so';

On the 3 other nodes we have MariaDB or plain MySQL installed.
the 'lineitem' table of the dbt3 benchmark has been created and loaded with one chunk of the 'lineitem' table on each node. This is a sharded table. For example to generate the second chunk of 3 chunks 'lineitem' table:

./dbgen -T L -fF -q -b dists.dss -s 1 -C 3 -S 2

Now on node0 we can create a CONNECT table. This table definition embed a SRCDEF parameter that force the computation of an aggregate on the remote server :

create Table lineitem1   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node1:3306/dbt3/lineitem3';

if we do a select on this table on node0 we get the result data that has been aggregated on node1. We access only one shard of the data.

MariaDB [dbt3]> select * from lineitem1;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
10000 rows in set (26.82 sec)

Let us now do the same on for node2 and node3 :

create Table lineitem2   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node2:3306/dbt3/lineitem3';
 
create Table lineitem3   ENGINE=CONNECT TABLE_TYPE=MYSQL
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
connection='mysql://proxy:manager1@node3:3306/dbt3/lineitem3';

Now we can use the ability of the the CONNECT storage engine to build a table that allows to see the 3 shards as a single table :

create Table alllineitem   (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT
TABLE_TYPE=TBL
table_list='lineitem1,lineitem2,lineitem3';

We can now query this table. All the aggregation computation will be done on the remote nodes and not on node0:

MariaDB [dbt3]> select * from alllineitem;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9998 | 2395 |
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
30000 rows in set (1 min 19.23 sec)

This basically takes 3 times longer than querying a single shard. This is because the request is executed sequentially on node 1 2 and 3.

We can request the CONNECT storage engine to do the job in parallel by adding the extra option : option_list='thread=1';

create Table alllineitem2   (`l_suppkey` INT(4) NOT NULL,`qt` DOUBLE ) ENGINE=CONNECT
TABLE_TYPE=TBL
table_list='lineitem1,lineitem2,lineitem3' option_list='thread=1';

If we rerun the query we get

MariaDB [dbt3]> select * from alllineitem2;
+-----------+------+
| l_suppkey | qt   |
+-----------+------+
|         1 | 2302 |
|         2 | 1960 |
...
|      9999 | 1908 |
|      1000 | 1756 |
+-----------+------+
30000 rows in set (26.69 sec)

We go back to the same execution time as with a single shard ! Is not that cool. I am sure you have a lot of objections, ideas about this approach. I do. But it is nice anyways.

If you want to know more do not forget to attend to the MariaDB CONNECT Storage Engine webinar :
November 7, 2013 - 5 pm CET / 4 PM UTC / 8 AM PST Register

and if you want to give it a try you are welcome and the doc to help is here:
MariaDB Connect Storage Engine documentation
Your help is welcome : bug reports, documentation fixes, usage feedback, suggestions.
Be indulgent it is still Alpha software 😉

InfiniDB column store moves to open source ! Congrats !

SergeOctober 15, 2013Leave a comment

Like TokuDB, InfiniDB is now a fully open source server product. In the past infiniDB was "almost open source". The open source version was an old release with no access to the advance functions like MPP multi-server execution. This is no more the case. With InfiniDB 4 the open source version is the latest release giving access to all the advanced functionalities.

This is a really great move for the MariaDB / MySQL ecosystem. InfiniDB and TokuDB were two unique pieces of technology in the MySQL ecosystem. Having them both open source will trigger a broader adoption that will benefit to their enterprise releases.

Having in the MySQL ecosystem a column oriented database specifically designed for big data analytics is filling a real customer need. Column stores overcome the query limitations that exist in traditional RDBMS. InfiniDB is extremely good at using multicore server and massively parallel processing with multi-servers. InfiniDB can scale up on multi-cores server and scale out on a distributed architecture.

So thanks a lot to the InfiniDB team for their move. This will greatly benefit to them and to the dynamism of the MySQL / MariaDB ecosystem.

InfiniDB community web site
InfiniDB 4 sources
InfiniDB Enterprise web site