Category: mysql

The new CONNECT Storage Engine with MariaDB 10.0.2

SergeApril 24, 2013Leave a comment

SkySQL and MariaDB are very pleased to announce a new baby in the storage Engine family : The "CONNECT Storage Engine"
I have had the pleasure to work with Olivier Bertrand creator of the CONNECT storage Engine. Olivier has over 50 years of experience in database development as an ex-IBM database researcher. Olivier was very interested in adding the result of his experience around BI data access to MariaDB. He wanted a more versatile way to access external data sources in various formats. The CONNECT storage engine enables MariaDB to directly use external data as if it were tables in the server without actually being obliged to load them into the physical database.
Olivier has been welcomed by the MariaDB team and has been working with them to integrate his work, build test cases and fix remaining bugs. The CONNECT storage engine is 100% open source and released under the GPL license.

The integration of CONNECT to MariaDB is yet another example of the innovation going on around the MariaDB server. The "MariaDB CONNECT Storage Engine" adds one more innovative differentiation to the MariaDB ecosystem (Cassandra, LevelDB, TokuDB, Spider being other nice examples). All these extra features are available under the GPL and they will come with the standard MariaDB release.

So yes the storage engine arena is very lively. Stewart was worried in his post Where are they now: MySQL Storage Engines . I think that storage engines is where innovation is taking place to turn MariaDB into a database platform.

The CONNECT storage engine give you access to various external data sources either local or remote. This includes files in various formats such as DBase, CSV, DOS, FMT and XML.
It also includes access to remote MySQL tables and almost any other data-source through an ODBC supported target.

You can also build a table based on a list of tables (equivalent of the MERGE storage engine). These table list (TBL) can be built on any type of table including any CONNECT table. This means that you can build a table whose data is distributed across multiple remote servers.

You can also define a table with per column storage (VEC type). This can greatly benefit tables with a lot of columns where only a limited subset is queried.
The CONNECT storage engine offers a major enhancement of what was previously offered by a combination of the CSV, MERGE and FEDERATED storage engines.

This storage engine is smart :
it can auto discover the structure of your data based on the metadata contained within the file.
it can index your data, optimizing for read only access.
it can work on compressed data.

It is a storage engine. It is built into MariaDB 10 which means that it can be used once the server is up and running. Just start creating tables with the "engine=connect" option and specify the files or the remote tables you want to access. This is documented with various examples.
As with any storage engine it is built using the MySQL Plugin API. It is a smart storage engine that uses condition push down to filter at the target data source. This can greatly increase performance by limiting data transfer. It also implements the limit clause of select. This storage engine has been designed with Business Intelligence (BI) queries in mind . It is not designed to be used with Online Transaction Processing (OLTP) activity. This choice allows for the optimization of code by leaving out a lot of costly internal mechanisms needed only for OLTP. As with any storage engine your external data source data can be joined with your local data.

When should you use the CONNECT storage engine ? When you have data in various file formats or remotely available or both and you would like to query this data. The main benefit of this approach to BI is that there is no need to go through a lengthy ETL (Extract, Transform, Load) process. The data can be directly accessed in place.

If you want to talk about it you are welcomed at the free SkySQL/MariaDB event immediately following Percona Live 2013 : "MySQL & Cloud Database Solutions Day"
To try it
CONNECT storage Engine documentation in MariaDB knowledge base
SkySQL support the CONNECT storage Engine
Get the white paper

The forgotten MySQL Storage Engines

SergeApril 15, 2013Leave a comment

In the recent years a lot of emphasis has been put on the InnoDB storage engine. This is a great thing. This has turned InnoDB from a very limited transactional storage engine to a very powerful piece of the MySQL architecture. InnoDB is now scalable to many cores. It is also very resilient. It supports advanced features(compression, buffer pool saving ... long list ...). No doubt this was the direction to go. There is still a lot of improvements to come and this is definitely needed for enterprise class applications.

But there are some other part of the server that need some attention and improvement. In the early days of the MySQL plugin architecture some nice concepts have been implemented as storage engines : csv storage engine, federated storage engine, merge storage engine, archive ...
In my personal opinion these implementations are very useful but have not been much improved since their first implementation. They can be considered as excellent proof of concept and should have been pushed much further.

If we look at them :

The Merge storage engine implemented the concept of a table composed of multiple underlying table. The first constrain is that all the table have to be in the myISAM format only. With all the improvements innoDB has gone through regarding performance, including read only and complex query execution this is frustrating. The second big limitation is that all the table need to be strictly identical regarding their structure. Third constraint is that there is no parallel execution of the query. This is not a big surprise as there is currently no parallelism at all in MySQL except in marginal areas (slaves apply, innodb full text indexing). One other big limitation of the merge storage engine is that it cannot handle remotely distributed tables.

The Federated storage engine implemented the concept of a MySQL table located on a remote server. The local and remote table must have an identical table structure. The federated storage engine was reengineered into the federatex storage engine which is included in MariaDB. This new design was done to allow in the future to implement a heterogeneous database access like through an ODBC implementation. This never happened. The federated(x) storage engine keeps a lot of limitation. When you issue a select through a federated table a select * is done to access the remote data. There is no condition push down implementation. This works only with MySQL remote database. The limit clause of the sql statement is not pushed to the underlying system. The condition push down mechanism is not used with this engine. These limitations make the federated storage engine of limited use. The Oracle MySQL version of federated is old and does not implement federatedx improvements.

The CSV storage engine implemented the concept of table representing a csv file external from the database. This was a great idea as the csv format is a very common file format. Unfortunately the csv storage engine is very basic and does not implement the limit clause in select. There is no index associated to the file. Condition pushdown is neither implemented.

How can we audit a MySQL server ?

SergeApril 8, 2013Leave a comment

Auditing of a MySQL server activity is a request raised more and more often as compliance rules are more strict for companies. MySQL is used in more and more critical areas.

What are the various loging technologies available on a MySQL server that could be use for auditing :

The error log
The slow query log
The binary log
Custom made triggers
Using MySQL Proxy
The general log

Using the error log :

The error log contains information indicating when mysqld was started and stopped and also any critical errors.
The log_warnings system variable can be used to control warning logging to the error log. If enabled aborted connections are written to the error log, and access-denied errors for new connection attempts are written.

mysql> set global log_warnings=2;

if we generate a connection with wrong password we get in the error log

$ mysql --protocol=TCP -hlocalhost -u root -P3309 -pxxx2
 
$ tail -f serge.err
...
130403 15:24:19 [Warning] Access denied for user 'root'@'localhost' (using password: YES)

Unfortunately the error log does not contain any information about the queries run against the database so it cannot be used as the basis of an auditing solution.

Using the he slow query log :

Can we use the MySQL Slow Query log as an audit solution ? It give some info but unfortunately this is not enough to be considered as an audit solution.
The slow query log consists of SQL statements that took more than long_query_time seconds to execute. The long _query_time can be set to 0 to log everything. Catching all queries through the slow query log can be costly as timing information is also collected and written.

mysql> set global slow_query_log=on;
mysql> set global long_query_time=0;
mysql> set global log_queries_not_using_indexes=on;

For a givent query we got in the log :

Time Id Command Argument
# Time: 130403 15:51:43
# User@Host: root[root] @ localhost [127.0.0.1] Id: 1
# Query_time: 0.591516 Lock_time: 0.043317 Rows_sent: 100 Rows_examined: 29214
use test;
SET timestamp=1364997103;
select * from (SELECT title, post_url,date(post_date) dt, author, MATCH(title,extract) AGAINST (' backup restore' in boolean mode) as Relevance FROM post where post_date &gt; '2010-01-01'ORDER BY Relevance DESC limit 100) xx order by dt desc;

All statement are logged (select, insert update, delete). Connections are missing. We do not get trace of failed requests. There is no idea of filtering on users / schemas / tables. It is an everything or nothing mechanism. The impact on performance can be big.

Using binary log :

The binary log contains only queries that have modified the data through commited transactions. So if a suspicious select is generated it will not appear here. This can however (if no other audit information is available) be used to discover when a modification was made. The binary log can be read an filtered to discover the modification.

Using triggers :

Another sometimes used solution would be to use triggers on critical table. This solution is painful to maintain. Like with binary logs this only deals with data modification tracking. There is no trigger on connection or on select. Moreover as MySQL allow only one trigger per table this can become complicated for application requiring triggers for other purposes.

Using the MySQL Proxy :

http://dev.mysql.com/doc/refman/5.6/en/mysql-proxy.html MySQL Proxy when it firs appeared has raised a lot of expectations. It can effectively be used to implement auditing, policy enforcement. Unfortunately the MySQL Proxy product has remained in the alpha status since many years. Because of this alpha status and total lack of visibility it cannot be part of a production long term solution.

Using the general log :

The general log is a logging capability of the MySQL server allowing to log all queries recieved by the server. As it logs all the queries it gives more info than the precedent techniques.

Let us try it

[sfrezefo@serge ~]$ mysql --protocol=TCP -hlocalhost -u root -P3309 -pmanager1
...
mysql> use test;
mysql> show tables;
mysql> create table titi(col1 int);
mysql> insert into titi values(3);
mysql> commit;
mysql> exit;

let do a failing connection :

[sfrezefo@serge ~]$ mysql --protocol=TCP -hlocalhost -u root -P3309 -pmanager2
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

This is what we get int the general log

Time Id Command Argument
130327 10:51:34 2 Connect root@localhost on
                2 Query select @@version_comment limit 1
130327 10:55:03 2 Quit
130327 10:55:07 3 Connect root@localhost on
                3 Query select @@version_comment limit 1
130327 10:55:15 3 Query SELECT DATABASE()
                3 Init DB test
                3 Query show databases
                3 Query show tables
                3 Field List planetpost
130327 10:55:26 3 Query show tables
130327 10:55:50 3 Query create table titi(col1 int)
130327 10:56:08 3 Query insert into titi values(3)
130327 10:56:13 3 Query commit
130327 13:33:41 3 Quit
130327 13:33:47 4 Connect root@localhost on
                4 Connect Access denied for user 'root'@'localhost' (using password: YES)

What is missing ? We get the successful or failed connection. Subsequent queries are linked to the opened sessions We get trace of failed request except when syntactically invalid. There is no idea of filtering on users/schemas/tables. It is an everything or nothing mechanism. The impact on performance can be big.

Conclusion :

None of the technique presented carry enough information or flexibility to be considered as auditing solutions.

MariaDB Galera Cluster is a unique technology worth testing

SergeMarch 26, 2013Leave a comment

As alway with new technology there is always different approaches regarding the adoption.
You can try to use the bleeding edge features or start with a very standard configuration.
My personal advise to new users is to start with the most basic configuration.

This allow you to get familiar with the fundamentals :
- how to install
- how to operate
- how to monitor

For MariaDB Galera Cluster he most basic configuration is a 3 nodes cluster. You can chose to use it :
as an HA solution
Galera Cluster is currently the easiest way to solve the HA problem.
when you think HA think Galera Cluster it is so much simpler. Failover is totally transparent and you have nothing to do like you would have with standard replication. if you have a lodbalancer in front you just have to push out from the configuration THE FAILED NODE and that is done.

as a scale out solution
A usual MySQL scale out architecture is based on master/slaves architecture. This solution incurs to the application the choice of where to send the read (Master or most up to date slave…). Scale out of read with Galera cluster synchronous replication is much simpler. Nothing need to be done at the application layer.
You have a synchronous data up to data available on all nodes . You do not have the risk to read stale data when the replication lags. Nothing need to be done at the application level like taking care of reading were you write to have correct data.

Contrary to usual HA solutions or compare to MySQL cluster Galera Cluster is very simple to setup and operate.
Getting Started with MariaDB Galera Cluster

Then of course next step is to push Galera usage a little bit further. One main area is of course write scalability.
A few benchmark have been produced but we do not have yet much experience and you have to be more careful. With write/write configurations you have to be careful about hot spots in the database. This can lead to deadlock and the behavior has to be correctly understood.

A good distribution for download to do your testing is : MariaDB Galera Cluster 5.5.29 Stable

Some useful pointers to understand various behaviours of Galera Cluster.

------- upcoming events :

Tomorrow Seppo Jaakola, Codership, will present Galera Cluster for MySQL & MariaDB at the Meetup SkySQL & MariaDB - Paris

Henrik Ingo, Max Mether and Colin Charles will present "MariaDB Galera Cluster Overview" at the free MySQL & Cloud Solutions Day taking place in Santa Clara the 26th of April. You can register for free.

MySQL 5.6 innodb_undo_tablespaces very usefull only if …

SergeMarch 23, 2013Leave a comment

Who has not experienced a fast growing system table space. This is related to innoDB undos (Rollback segments).

In MySQL 5.5 and in 5.6 by default undos are located in the innoDB system tablespace. Beside the space consomption issue, the second penalty is that the access to the undos generates a lot of random IOs on the system tablespace.

Once the system table space has grow too much there is no way to reclaim the space. The only solution is to recreate the database. This is very problematic for online 24/7 databases. This happens most of the time because of overgrowing undos. Under normal circumstances undos are cleaned up by an innoDB purge mechanism. In MySQL 5.6 the multi threaded purge mechanism has greatly improved this purge undo cleanup process. But even with this purge mechanism it can still happen that the undos growth gets out of control. The main reason is long running transactions or very slow queries. Both these circumstances obliges innoDB to keep undos for consistent read.
A new feature has been introduced in 5.6 that allow to have the undos out of the system tablespace. The undo tablespace can be split into multiple files located in a user defined directory.

One of the advantage of this separate directory for undos is the ability to move the random UNDOs related IOs to to a different place. This allows for example to place undos on SSD for better random IOs.

This feature is available through these 3 new configuration variables :
innodb_undo_dir – where on disk to place undos
innodb_undo_tablespaces – number of UNDO tabespaces. this number must be set at database creation
innodb_undo_logs - number of rollback segments. The number of rollback segments physically present in the system never decreases.

If you have a running MySQL 5.6 instance , you shut it down and try to change innodb_undo_tablespaces to 4 like I did you will get :

130322 22:33:57 InnoDB: Expected to open 4 undo tablespaces but was able
130322 22:33:57 InnoDB: to find only 0 undo tablespaces.
130322 22:33:57 InnoDB: Set the innodb_undo_tablespaces parameter to the
130322 22:33:57 InnoDB: correct value and retry. Suggested value is 0
130322 22:33:57 [ERROR] Plugin 'InnoDB' init function returned error.
130322 22:33:57 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

So yes MySQL 5.6 innodb_undo_tablespaces is very useful but only if this choice has been made at the database creation. The second points that must be noted is that this feature unfortunately does not help to reclaim space. So yes this features is great but it is the first step in a correct management of undos space.

innodb_undo_dir, innodb_undo_tablespaces are part of the numerous MySQL 5.6 server configuration variables. MySQL 5.6 is a major release and there are a lot of obsoleted parameters, new parameters, parameters with different default values. It is very important for users moving to MySQL 5.6 to understand all these database setting changes.
Sheeri Cabral has tracked depreciated variables. Peter Zaitsev has done a complete comparison of the default values differences between variables in MySQL 5.5 and 5.6.

-------------------- Useful Pointers -------------------------------

Reasons-for-run-away-main-innodb-tablespace June 10, 2010 Peter Zaitsev
Purge-thread-spiral-of-death June 10, 2010 Peter Zaitsev
Being-highly-irresponsible-or-howto-dos-nearly-all-rdbmss  07/02/2013 Stewart Smith
Finding-and-killing-long-running-innodb-transactions-with-events  May 31, 2011 Mark Leith
Tracking-long-running-transactions APRIL 6, 2011 Yoshinori Matsunobu
Kill-mysql-procs 2012-12-27  rugmonster.org
How-to-debug-long-running-transactions-in-mysql March 8, 2011 by Baron Schwartz
Killing-idle-transactions-to-increase-database-uptime August 13th, 2012 by Xaprb

Questions worth asking Oracle at Percona Live …

SergeMarch 13, 2013Leave a comment

Oracle will be at Percona Live. This is a great news. Oracle was totally absent at last year edition. Not taking part to business event organized by competitors is a frequent Oracle Corporation behavior. Percona Live is a business event own by Percona (one of Oracle competitor) and some other Oracle competitors like SkySQL/MariaDB will also take part to the event. This is really nice that Oracle make this exception this year. MySQL is an open source ecosystem and having everyone at a common even is a very positive thing (even if the event is own by one of the parties and not yet a truly community even).

Gathering so many MySQL talents at the same place is exceptional. After FOSDEM 2013 event where all components of MySQL were gathered in the same room I realized how great it was to have MariaDB / Percona / Oracle peoples at the same Place. This triggers nice discussions as we all know each others for a long time and we share some common interests.

Last year Oracle was totally absent of Percona live. Oracle preferred to send all their new announcement during the conference through blogging. The community was expecting the same process this year 😉 Pushing key technical informations without interacting directly with the community was not what an open source community expect. This is great to have people live to interact with the MySQL community. MySQL is a turbulent community were discussion is a fundamental process to move things in the right direction. We are all fans of technology and having everyone at the same place will for sure trigger nice discussions.

Oracle is sending very nice ambassadors :

Tomas Ullin managing MySQL engineering at Oracle has successfully be turning MySQL into an "Enterprise Class Database". This is great and we can definitely thank him for that. I personally was not expecting that Oracle would push MySQL in area like scalability on 64 cores, crash safe replication, enhanced security features … which turn MySQL into a product that in many cases competes very efficiently against Oracle flagship product Oracle 11g (with a much lower cost of ownership 🙂 ). Oracle has not yet acknowledge this fact and its marketing still position MySQL as the database for the web. For many users MySQL now is definitely much more. Most of the new support paying customers are in the enterprise market. This is a fantastic achievement.

Dimitry Kravtchuk whose benchmarking expertise has given many people an invaluable insight into the MySQL Server behavior. I consider him as the Sherlock Holmes of MySQL with all the astounding investigation he conducted on the server behavior. Most of these investigations have shed light on very peculiar behaviors and lead to the correct interpretation and trigger big progress in the code.

Luis Soares takes care of MySQL replication. Replication is with is with scalability the second most important area of MySQL. This is an area of great innovation.

All these people have a long history of commitment with MySQL as an open source product before it belonged to SUN or Oracle. They really know what an open source ecosystem is. We will have here a real opportunity to talk technical subjects and confront approach to various problems.

But beside being a product where technical excellence matter MySQL is an ecosystem with a lively community. For me leading the community does not just mean leading by having technical excellence. This is necessary but far from enough. In an open source community there are fundamental questions that cannot be avoided. Beside being a product where technical excellence matter governance and fairness is key. We all know that open source is build on values and principles based on sharing and fairness. Like most of you I prefer technology to politics but sometime politics is essential no to be lead by small nice steps to a place where you were not initially planing to go.

Thank to Oracle for the technical excellence and the investment made and to come.

But I think of a few non technical questions that really matter for the MySQL community (Support providers, Storage engine developers, OEM customers …) and that need to be raised. These questions are fundamental for many peoples to have a clear idea of what kind of governance Oracle envision for MySQL future.

Do Oracle believe that the current MySQL development process is adapted to an open source product with a vibrant community and many active contributors ?
Oracle approach is based on no communication / no sharing with the community except the end result. This give the community no insight on the design choices which are not open to discussion. This can have a very negative impact on the ability of the MySQL community to improve the product and keep up with the code. In the past most of the key improvements, innovations, ideas have come from community members (individual, big users like Google, Facebook, Twitter …). MariaDB in this respect has kept the principle of 100% transparency. This event will be a good opportunity for the MariaDB team to show the long term benefit you get from this 100% transparent / 100% open source / open to fair competition approach.

What is the official process to guaranty that in the future the code will be released on launchpad in a timely and fair way ?
There has been multiple issues in the past regarding releasing the code late or regarding the revision history. Stewart smith(Percona) has been a good advocate on this subject. Percona depend heavily on the good will of Oracle to be able to produce their own version of the server. Any problem can greatly impact them and many others. MariaDB that fully own its own codebase as of MariaDB 10 is more immune to these issues. But MariaDB also depend on Oracle to back port features and guaranty that the MariaDB product is truly a MySQL dropin.

Do you plan to go further into providing new features as closed source extensions. Does the community has any guaranty that MySQL will under Oracle governance remain an open source product in the long term ?
The first closed source extension for MySQL(Thread pool, authentication plugin) appeared under Oracle governance in 2011. MySQL 5.6 has brought a new closed source features with the audit plugin. Adding closed source features lead to lock-in situation. Most of the closed source feature have been redeveloped by MariaDB as open source features. Redeveloping these features is an unnecessary burden for the community. Some people even consider that in this respect you do not comply with the promise made to the European commission to keep MySQL open source.

Do you plan to have a process to keep the bugs database open and to release test cases ?
There has been recently a lots of complains by the community regarding this subject. Keeping an open bug database is vital to guaranty that third party or linux distribution can validate non regression of their version. Valeri Kravtchuk (Perconna) has with a deep knowledge of the bug process put some light on all these issues. Do we have statistics on the amount of bug entered in the public database that are turn into private ? It seems that the proportion is increasing over time. It would be great to clarify on this point and to have public numbers and justifications. Closing the bug database even partially is a long term threat to the MySQL product. I am not speaking of the second bug database used by paying customer. I imagine that in the end both the community and people running the Enterprise should hit the same bug that should end up in both bug databases. I am of course not speaking neither of bugs related to closed source features which are the sole responsibility of Oracle and where the community cannot help 😉

Will oracle extend the period regarding the commitments they made to European Commission regarding MySQL to get clearance when buying SUN ( MySQL included ) ?
In my personal opinion this is necessary to keep MySQL community trust in Oracle governance of MySQL. Oracle commitment made to EC for 5 years will reach an end at the end of 2014 and this is raising worries for many members of the MySQL community (customers, OEM vendors, third parties developing storage engines…). To be honest rereading these commitment and putting a negative operator in front of each of them is totally and absolutely frightening for MySQL future :
http://www.oracle.com/us/corporate/press/042364
Oracle should give guaranties to the community. They have a big responsibility in this respect.

So having Oracle at Percona Live is a really nice great news. Oracle people will be really welcome by the MySQL community. This will be a good opportunity to talk about technology and confront our vision of the future.
But it would also be nice if Oracle could bring some answers regarding their fair governance of MySQL.
Oracle has a huge responsibility in this respect and the MySQL community is very attentive on this subject.
Technical excellence of one of the actor of the MySQL ecosystem is one element among others regarding the successful future of MySQL as an open source product.

---------------------

Based on Matt Keep valuable twit on : "For those who engage in swearing & general profanity at work, a handy guide of suggested alternative phrases" I have tried to expunge my initial prose from any bad words. 🙂

Planet MySQL has a long history …

SergeMarch 1, 2013Leave a comment

MySQL has a long history since its birth 18 years ago when Monty Widenius and David Axmark started it.
Planet MySQL started in 2004 and all its history is archived. Let us dig in it.
It is always good to look back at the past to get some lessons for the future. Here are the 20 most positively voted blog posts since Planet MySQL birth. By positive blog post I mean the one with the biggest positive number of votes versus negative ones. So inside this chart you have false positive which are also very controversial subject ( lots of + and lots of - with a majority of +).

Planet MySQL is like military archives that you can open after some time when passions have cooled down (it might not be true for everyone or for every subject ;-))

First column is the excess of plus vote versus negative ones.
61 / 2009-12-21 What do MySQL staff think of the acquisition? - Ronald Bradford
38 / 2011-03-08    The MySQL Council addresses the public bug database issue   - Giuseppe Maxia
30 / 2009-12-30 Save MySQL save the world   - Mark Callaghan
27 / 2010-10-12   Welcome SkySQL! - Gary Pendergast
25 / 2009-12-29 Save MySQL by letting Oracle keep it GPL   - Sheeri K. Cabral
24 / 2011-08-09 Santa Clara MySQL Conference 2012: Unity or division?   - Kaj Arno
22 / 2011-08-10 Call for disclosure on MySQL Conference 2012   - Giuseppe Maxia
22 / 2010-10-19 Using MySQL as a NoSQL - A story for exceeding 750000 qps on a commodity server   - Yoshinori Matsunobu
20 / 2011-01-13   Temporary files binlog_cache_size and row-based binary logging  - Chris Calender
19 / 2010-01-03   Tales of the Trade #2: The Oracle-Sun deal - Shlomi Noach
18 / 2010-12-15 MySQL 5.5 is GA! - Oracle MySQL Group
17 / 2011-02-21 Where have the bugs gone?   - Mark Callaghan
17 / 2010-09-29 The MySQL swap insanity problem and the effects of the NUMA architecture   - Jeremy Cole
16 / 2009-08-04 XtraDB has been commited to MariaDB - MySQL Performance Blog
16 / 2010-07-09 Using EXPLAIN EXTENDED / SHOW WARNINGS to Help Troubleshoot Inefficient Queries   - Chris Calender
16 / 2010-09-28 I need a new keyboard   - Domas Mituzas
16 / 2011-08-10 What is happening with the MySQL conference?   - Michael
16 / 2012-08-17 (less) open source   - Mark Callaghan
16 / 2010-08-28 MySQL 5.1 Plugins Development Published   - Andrew Hutchings
15 / 2011-02-22 Oracle introduces new levels of sucking to new versions of old software   - Monty Taylor

First kudos to Shlomi who wrote 2009-12-21 : "Ronald: you will remain #1 of all times for long time to come!" This was a good prediction as this blog post is still number one. I love Tales of the Trade #2: The Oracle-Sun deal where this prediction happened.

Second lesson learned is not to trust the internet. For example MySQL 5.1 Plugins Development Published does not point any more to where it should. This is unfortunate that there is nothing like UUID URI. Internet memories are not garantied : all MySQL DBA knows that : do backups 🙂 And "MySQL 5.1 Plugins Development" remains an excellent book even if the pointer is gone.

This also illustrates the ability of a crowd to make important choices : "I need a new keyboard" got a very good ranking. I really hope Domas that you got your new keyboard. Is it wireless ?

Some question should be hard as it seem they remain unanswered : Where have the bugs gone? was raised 2011-02-21.

Some titles can be misinterpreted : Oracle introduces new levels of sucking to new versions of old software . That is a good lesson if you make book reviews. You should not just read the title . What sucked for Monty Taylor at that time is not what sucks for us now.

Some question do have an answer : What is happening with the MySQL conference? Well, It is taking place in April in Santa Clara but it is now called "Percona Live". It could have had a different answer and in the future it might have another different one. Who knows ? Shlomi you who made a good prediction : Do you have any prediction, hint, advices ?

Enjoy 🙂

MySQL 5.6 introduced GoogleTest C++ Testing Framework

SergeFebruary 20, 2013Leave a comment

MySQL 5.6 has introduced a new unit test framework beside the existing ones.
The googletest test framework is now part of the MySQL test framework for 5.6.
GoogleTest is a C++ Testing Framework that helps write better C++ tests.

I realized that it was used when i tried to buid a MySQL release from sources on a fedora platform. I got the following error :

-- Googletest was not found. gtest-based unit tests will be disabled. 
You can run cmake . -DENABLE_DOWNLOADS=1 to automatically download and build required components from source.

I tried tu install the package through yum :

$ sudo yum install gtest

But it did not solve the issue. I had to run cmake to allow cmake to download the googletest sources so that it can be compiled within the MySQL distribution.

cmake .  -DENABLE_DOWNLOADS=1
...
-- Successfully downloaded http://googletest.googlecode.com/files/gtest-1.5.0.tar.gz
 to /home/sfrezefo/Downloads/mysql/mysql-5.6.8-rc/source_downloads
...

and then as usual :

make
sudo make install

Tests are a very fondamental element of MySQL to validate the compiled software behave correctly.
It is also very fondamental for developpers, contributor or people building MySQL on a specific platform or OS.
This is the only way to validate that there is no regressions on a particular platform.
If you fix a bug or if you develop a new functionality it is fondamental that all the tests go well.
Of course for that to work nicely all test cases have to be fairly released when bugs are fixed. 🙂

For a full description of all the components of The MySQL Test Framework :
http://dev.mysql.com/doc/mysqltest/2.0/en/

MySQL 5.6 credentials securely stored ?

SergeFebruary 19, 2013Leave a comment

MySQL 5.6 has introduced a set of new features related to security (default randomised root password at install, password Validation Plugin ...). MySQL now also provides a method for storing authentication credentials securely in an option file named .mylogin.cnf.
This new feature has been introduced to avoid exposing password when running MySQL scripts. Its behaviour and limitations have been well described here :
http://mysqlblog.fivefarmers.com/2012/08/16/understanding-mysql_config_editors-security-aspects/

In all previous version and also in 5.6 if you run a MySQL script without interactively asking for password the password is exposed on the commands line and it can be accessed though a "ps" command or through history files.
With MySQL 5.6 it is now possible to transform a command like this :

mysql -hlocalhost -uroot -pmypass

into :

mysql --login-path=safelogin

To create this login-path containing the credential

mysql_config_editor set --login-path=safelogin --host=localhost --user=root --password

The login file must be readable and writeable to the current user, and inaccessible to other users. Otherwise, mysql_config_editor ignores it, and the file is not used by client programs, either.
-rw-------. 1 sfrezefo sfrezefo 152 15 févr. 21:25 .mylogin.cnf

To list the content of this login file you can run mysql_config_editor and through this command you will not see the password :

[sfrezefo@serge ~]$ mysql_config_editor  print --all
[local]
user = localuser
password = *****
host = localhost
[safelogin]
user = root
password = *****
host = localhost

So one of the principal benefits of mysql_config_editor is ease-of-use. But users must realize that even though it uses AES for encrypting the credentials security is not the main point. In fact the key is left on the door you just have to turn it to enter.
I have developed a very basic program "mysql_showconfigpwd" that expose the credentials. This could be useful if you have forgotten your encrypted credentials.

[sfrezefo@serge ~]$ mysql_showconfigpwd
File exists.
file_buff [local]
user = localuser
password = manager1
host = localhost
[safelogin]
user = root
password = manager1
host = localhost

The way mysql_config_editor works is quite simple : All the information associated with a login path is stored encrypted through AES. A random key is generated which is used to encrypt all the login path information (password included).
All these information are stored in a file. But the key used to encrypt the login path is store in clear.
The OS protection is thus the only thing that protect the content of the file. This is the same issue you get with ssh keys except in that case you have the idea of key pair.

This is even more critical because beside the operating system the database itself could get access to the login file.
A database user with the FILE privilege can do a LOAD DATA INFILE and then get access to all the encrypted password. This problem is not an easy one and we always fall back to the egg and hen problem 🙂 .All mechanism based on private key encryption meet these issue of where to sore the key. How does it work with other databases. Oracle 11G has the same concept "Secure External Password Store". The only difference is that being closed source it create a false feeling of security 🙂

Here is the code of "mysql_showconfigpwd.cc" :

#include "my_config.h"
#include "my_aes.h"
#include "client_priv.h"
#include "my_default.h"
#include "my_default_priv.h"
 
#define MY_LINE_MAX 4096
#define MY_LOGIN_HEADER_LEN (4 + LOGIN_KEY_LEN)
static int g_fd;
static size_t file_size;
static char my_login_file[FN_REFLEN];
static char my_key[LOGIN_KEY_LEN];
 
int main(int argc, char *argv[]) {
 
DYNAMIC_STRING file_buf, path_buf;
char cipher[MY_LINE_MAX], plain[MY_LINE_MAX];
uchar len_buf[MAX_CIPHER_STORE_LEN];
int cipher_len = 0, dec_len = 0, total_len = 0;
MY_STAT stat_info;
const int access_flag = (O_RDWR | O_BINARY);
 
init_dynamic_string(&amp;path_buf, "", MY_LINE_MAX, MY_LINE_MAX);
init_dynamic_string(&amp;file_buf, "", file_size, 3 * MY_LINE_MAX);
 
if (!my_default_get_login_file(my_login_file, sizeof(my_login_file))) {
  fprintf(stderr, "Error! Failed to set login file name.\n");
  goto error;
}
 
if (my_stat(my_login_file, &amp;stat_info, MYF(0))) {
  fprintf(stderr, "File exists.\n");
  file_size = stat_info.st_size;
  if ((g_fd = my_open(my_login_file, access_flag, MYF(MY_WME))) == -1) {
    fprintf(stderr, "Error! Couldn't open the file.\n");
    goto error;
  }
}
 
if (my_seek(g_fd, 4, SEEK_SET, MYF(MY_WME)) != 4) {
  fprintf(stderr, "Error! Couldn't seek 4.\n");
  goto error;
}
 
if (my_read(g_fd, (uchar *) my_key, LOGIN_KEY_LEN, MYF(MY_WME)) != LOGIN_KEY_LEN) {
  fprintf(stderr, "Failed to read login key.\n");
  goto error;
}
 
if (file_size) {
  while (my_read(g_fd, len_buf, MAX_CIPHER_STORE_LEN, MYF(MY_WME)) == MAX_CIPHER_STORE_LEN) {
    cipher_len = sint4korr(len_buf);
 
    if (cipher_len &gt; MY_LINE_MAX) {
      fprintf(stderr, "Error!cipher_len &gt; MY_LINE_MAX.\n");
      goto error;
    }
    if ((int) my_read(g_fd, (uchar *) cipher, cipher_len, MYF(MY_WME)) == cipher_len) {
      if ((dec_len = my_aes_decrypt(cipher, cipher_len,(char *) plain, my_key, LOGIN_KEY_LEN)) &lt; 0) {
        fprintf(stderr, "Error! failed to decrypt the file.\n");
        goto error;
      }
      total_len += dec_len;
      plain[dec_len] = 0;
      dynstr_append(&amp;file_buf, plain);
    }
  }
  fprintf(stderr, "file_buff %s \n", file_buf.str);
}
 
error:
dynstr_trunc(&amp;file_buf, 0);
dynstr_trunc(&amp;path_buf, 0);
my_close(g_fd, MYF(MY_WME));
dynstr_free(&amp;file_buf);
dynstr_free(&amp;path_buf);
 
exit(0);
}

Tu use it install this file in the MySQL source tree (mysql-5.6.8-rc/client in my case).

Add these 2 lines to the CMakeLists.txt in this folder.

MYSQL_ADD_EXECUTABLE(mysql_showconfigpwd mysql_showconfigpwd.cc)
TARGET_LINK_LIBRARIES(mysql_showconfigpwd mysqlclient)

Go to the root of your MySQL source tree and do :

rm CMakeCache.txt
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/mysql56
make -j 8
sudo make install

Parallel databases vs MySQL

SergeFebruary 8, 2013Leave a comment

Following my discussion with Mikael Ronstrom regarding Parallel databases through comments on a old blog post ( mysql-for-a-massively-parallel-database/#comments ) I want to clarify why i like the Shard-Query tool.

Shard-Query is a open source distributed parallel query engine for MySQL. It offers a way to answer the question of parallel query execution across shards http://code.google.com/p/shard-query/.

But first what about MySQL Cluster as a parallel database.
MySQL Cluster has effectively a very efficient MPP architecture. It is a remarkable product for massive write and primary key lookups. Its fast failover, thanks to it true shared nothing architecture is quite unique. With respect to these two points it is much better that other architectures. Oracle RAC with its shared disk architecture, distributed cache and distributed Lock manager is much less efficient on both points.

Bust as usual you have the default associated with your qualities. Regarding complex read queries MySQL cluster has always been weak.I agree that some great progress have been with condition pushdown and Adaptive Query localization that allow to push joins and filtering to the data nodes. Both mechanism alleviate the sql node from doing work (filtering, joins) and also reduce network bandwidth consumption.

But this is far from what do parallel databases like Greenplum or Microsoft Parallel Datawarehouse Solution. To have a truly parallel database you really need to analyze the query and split it in pieces. There is a lot of intelligence to be put at the SQL node to split the query plan into subtasks. You sometime even have to implement a multi path approach(a map reduce pattern) to fully execute the request. In that case the sql node need to implement a lot of intelligence regarding parallel query execution. The MySQL Cluster sql node does not currently do it. If this work was ever to happen in the SQL layer it would be great to have it work with various storage engines. it should be abstracted from any particular storage engine like NDB.

Shard-Query presented by Justin Swanhart at FOSDEM 2013 (slides) does it at the client layer. It is using php and Gearman to spread the work in parallel to various MySQL shards. This is not perfect but this is a nice approach that can greatly help in many cases. It can parallelize across shards or using MySQL partitioning. Another interesting point about Shard-Query is that it can also works with MySQL compliant column store.

I would prefer to have this intelligence implemented in the MySQL server with full guaranty of correctly supporting the client/server protocol but this is a much harder approach.