Skip to content

MariaDB CONNECT Storage Engine as an ETL (or ELT) ?

The MariaDB CONNECT Storage Engine allows to access heterogeneous data sources. In my previous post I show you how to use the MariaDB CONNECT Storage Engine to access an Oracle database. This is quite easy through the CONNECT Storage Engine ODBC table type.

For most architectures where heterogeneous databases are involved an ETL (Extract-Transform-Load) is used to move data across the databases. Most ETL like Informatica, DataStage, Talend, Goldengate or Tungsten are tools that perform complex data transformations using middle-tier ETL engines. It require to describe the transformation process in a proprietary manner. This approach requires an extra product with extra expertise. Data is generally duplicated when it is moved around.

In many cases this extra product and competence requirement can be avoided. Data can be moved directly from database to database. Transformation can be directly apply inside the source or target database. This can be done through standard SQL commands.
This concept was successfully brought to the market by products like Sunopsis (bought by Oracle). This approach is known as ELT (Extract-Load-Transform) where data is moved directly from database to database. The database itself is used to transform data through standard SQL commands. This makes sense as some databases have very strong data transformation capabilities through powerful SQL extensions (analytical SQL)and parallel execution capabilities.

With the MariaDB CONNECT Storage Engine the approach of linking directly the two databases through an ODBC connection allows to implement this ELT pattern. Doing data transformation at the database level is very beneficial. All the advanced features of the source or target databases can be used.

In our case transformation will be applied both at the source and target database. Let us consider the following ODBC table created with the MariaDB CONNECT Storage Engine

create Table lineitem2   ENGINE=CONNECT TABLE_TYPE=ODBC
SRCDEF='select l_suppkey, sum(l_quantity) qt from dbt3.lineitem3 group by l_suppkey'
CONNECTION='DSN=orcl;UID=scott;PWD=manager1';

We can now query this table. All the aggregation specified in the SRCDEF argument will be done on the remote Oracle RDBMS node and not on the MySQL node. Some extra computation can be done on the local MySQL table. Here we compute the sum of the 10 biggest aggregated quantities.

MariaDB [dbt3]> select sum(qt) from (select qt  from lineitem2  order by qt DESC limit 10) as derivedT;
+-----------+
| sum(qt)   |
+-----------+
|     31444 |
+-----------+

This example is very simplistic but is here to give you an idea of whan can be achieved whith this pattern. So if you want to move data from an Oracle (or any ODBC accessible) database to MariaDB you should give the MariaDB CONNECT storage engine a try.

We already have a few customers using the MariaDB CONNECT Storage Engine to simplify their data integration processes (currently with Oracle and Sybase databases).

If you come to FOSDEM 2014 MySQL devroom on Sunday Feb 2nd. I wil present and will be available to answer your questions. if you are here do not hesitae to register to the MySQL Community diner Saturday Feb 1rst.

Slides on the MariaDB CONNECT Storage Engine

Let us know about your test. You just need to download MariaDB 10.0.7 which includes the CONNECT storage engine. Give us your feedback : positive, negative, improvement requests, bugs, doc pb, ...

Download the latest MariaDB 10.0.7 Beta release
CONNECT Storage Engine Documentation

Leave a Reply

Your email address will not be published. Required fields are marked *