Serge Frezefond 's blog

MySQL 5.6 credentials securely stored ?

SergeFebruary 19, 2013Leave a comment

MySQL 5.6 has introduced a set of new features related to security (default randomised root password at install, password Validation Plugin ...). MySQL now also provides a method for storing authentication credentials securely in an option file named .mylogin.cnf.
This new feature has been introduced to avoid exposing password when running MySQL scripts. Its behaviour and limitations have been well described here :
http://mysqlblog.fivefarmers.com/2012/08/16/understanding-mysql_config_editors-security-aspects/

In all previous version and also in 5.6 if you run a MySQL script without interactively asking for password the password is exposed on the commands line and it can be accessed though a "ps" command or through history files.
With MySQL 5.6 it is now possible to transform a command like this :

mysql -hlocalhost -uroot -pmypass

into :

mysql --login-path=safelogin

To create this login-path containing the credential

mysql_config_editor set --login-path=safelogin --host=localhost --user=root --password

The login file must be readable and writeable to the current user, and inaccessible to other users. Otherwise, mysql_config_editor ignores it, and the file is not used by client programs, either.
-rw-------. 1 sfrezefo sfrezefo 152 15 févr. 21:25 .mylogin.cnf

To list the content of this login file you can run mysql_config_editor and through this command you will not see the password :

[sfrezefo@serge ~]$ mysql_config_editor  print --all
[local]
user = localuser
password = *****
host = localhost
[safelogin]
user = root
password = *****
host = localhost

So one of the principal benefits of mysql_config_editor is ease-of-use. But users must realize that even though it uses AES for encrypting the credentials security is not the main point. In fact the key is left on the door you just have to turn it to enter.
I have developed a very basic program "mysql_showconfigpwd" that expose the credentials. This could be useful if you have forgotten your encrypted credentials.

[sfrezefo@serge ~]$ mysql_showconfigpwd
File exists.
file_buff [local]
user = localuser
password = manager1
host = localhost
[safelogin]
user = root
password = manager1
host = localhost

The way mysql_config_editor works is quite simple : All the information associated with a login path is stored encrypted through AES. A random key is generated which is used to encrypt all the login path information (password included).
All these information are stored in a file. But the key used to encrypt the login path is store in clear.
The OS protection is thus the only thing that protect the content of the file. This is the same issue you get with ssh keys except in that case you have the idea of key pair.

This is even more critical because beside the operating system the database itself could get access to the login file.
A database user with the FILE privilege can do a LOAD DATA INFILE and then get access to all the encrypted password. This problem is not an easy one and we always fall back to the egg and hen problem 🙂 .All mechanism based on private key encryption meet these issue of where to sore the key. How does it work with other databases. Oracle 11G has the same concept "Secure External Password Store". The only difference is that being closed source it create a false feeling of security 🙂

Here is the code of "mysql_showconfigpwd.cc" :

#include "my_config.h"
#include "my_aes.h"
#include "client_priv.h"
#include "my_default.h"
#include "my_default_priv.h"
 
#define MY_LINE_MAX 4096
#define MY_LOGIN_HEADER_LEN (4 + LOGIN_KEY_LEN)
static int g_fd;
static size_t file_size;
static char my_login_file[FN_REFLEN];
static char my_key[LOGIN_KEY_LEN];
 
int main(int argc, char *argv[]) {
 
DYNAMIC_STRING file_buf, path_buf;
char cipher[MY_LINE_MAX], plain[MY_LINE_MAX];
uchar len_buf[MAX_CIPHER_STORE_LEN];
int cipher_len = 0, dec_len = 0, total_len = 0;
MY_STAT stat_info;
const int access_flag = (O_RDWR | O_BINARY);
 
init_dynamic_string(&amp;path_buf, "", MY_LINE_MAX, MY_LINE_MAX);
init_dynamic_string(&amp;file_buf, "", file_size, 3 * MY_LINE_MAX);
 
if (!my_default_get_login_file(my_login_file, sizeof(my_login_file))) {
  fprintf(stderr, "Error! Failed to set login file name.\n");
  goto error;
}
 
if (my_stat(my_login_file, &amp;stat_info, MYF(0))) {
  fprintf(stderr, "File exists.\n");
  file_size = stat_info.st_size;
  if ((g_fd = my_open(my_login_file, access_flag, MYF(MY_WME))) == -1) {
    fprintf(stderr, "Error! Couldn't open the file.\n");
    goto error;
  }
}
 
if (my_seek(g_fd, 4, SEEK_SET, MYF(MY_WME)) != 4) {
  fprintf(stderr, "Error! Couldn't seek 4.\n");
  goto error;
}
 
if (my_read(g_fd, (uchar *) my_key, LOGIN_KEY_LEN, MYF(MY_WME)) != LOGIN_KEY_LEN) {
  fprintf(stderr, "Failed to read login key.\n");
  goto error;
}
 
if (file_size) {
  while (my_read(g_fd, len_buf, MAX_CIPHER_STORE_LEN, MYF(MY_WME)) == MAX_CIPHER_STORE_LEN) {
    cipher_len = sint4korr(len_buf);
 
    if (cipher_len &gt; MY_LINE_MAX) {
      fprintf(stderr, "Error!cipher_len &gt; MY_LINE_MAX.\n");
      goto error;
    }
    if ((int) my_read(g_fd, (uchar *) cipher, cipher_len, MYF(MY_WME)) == cipher_len) {
      if ((dec_len = my_aes_decrypt(cipher, cipher_len,(char *) plain, my_key, LOGIN_KEY_LEN)) &lt; 0) {
        fprintf(stderr, "Error! failed to decrypt the file.\n");
        goto error;
      }
      total_len += dec_len;
      plain[dec_len] = 0;
      dynstr_append(&amp;file_buf, plain);
    }
  }
  fprintf(stderr, "file_buff %s \n", file_buf.str);
}
 
error:
dynstr_trunc(&amp;file_buf, 0);
dynstr_trunc(&amp;path_buf, 0);
my_close(g_fd, MYF(MY_WME));
dynstr_free(&amp;file_buf);
dynstr_free(&amp;path_buf);
 
exit(0);
}

Tu use it install this file in the MySQL source tree (mysql-5.6.8-rc/client in my case).

Add these 2 lines to the CMakeLists.txt in this folder.

MYSQL_ADD_EXECUTABLE(mysql_showconfigpwd mysql_showconfigpwd.cc)
TARGET_LINK_LIBRARIES(mysql_showconfigpwd mysqlclient)

Go to the root of your MySQL source tree and do :

rm CMakeCache.txt
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/mysql56
make -j 8
sudo make install

Parallel databases vs MySQL

SergeFebruary 8, 2013Leave a comment

Following my discussion with Mikael Ronstrom regarding Parallel databases through comments on a old blog post ( mysql-for-a-massively-parallel-database/#comments ) I want to clarify why i like the Shard-Query tool.

Shard-Query is a open source distributed parallel query engine for MySQL. It offers a way to answer the question of parallel query execution across shards http://code.google.com/p/shard-query/.

But first what about MySQL Cluster as a parallel database.
MySQL Cluster has effectively a very efficient MPP architecture. It is a remarkable product for massive write and primary key lookups. Its fast failover, thanks to it true shared nothing architecture is quite unique. With respect to these two points it is much better that other architectures. Oracle RAC with its shared disk architecture, distributed cache and distributed Lock manager is much less efficient on both points.

Bust as usual you have the default associated with your qualities. Regarding complex read queries MySQL cluster has always been weak.I agree that some great progress have been with condition pushdown and Adaptive Query localization that allow to push joins and filtering to the data nodes. Both mechanism alleviate the sql node from doing work (filtering, joins) and also reduce network bandwidth consumption.

But this is far from what do parallel databases like Greenplum or Microsoft Parallel Datawarehouse Solution. To have a truly parallel database you really need to analyze the query and split it in pieces. There is a lot of intelligence to be put at the SQL node to split the query plan into subtasks. You sometime even have to implement a multi path approach(a map reduce pattern) to fully execute the request. In that case the sql node need to implement a lot of intelligence regarding parallel query execution. The MySQL Cluster sql node does not currently do it. If this work was ever to happen in the SQL layer it would be great to have it work with various storage engines. it should be abstracted from any particular storage engine like NDB.

Shard-Query presented by Justin Swanhart at FOSDEM 2013 (slides) does it at the client layer. It is using php and Gearman to spread the work in parallel to various MySQL shards. This is not perfect but this is a nice approach that can greatly help in many cases. It can parallelize across shards or using MySQL partitioning. Another interesting point about Shard-Query is that it can also works with MySQL compliant column store.

I would prefer to have this intelligence implemented in the MySQL server with full guaranty of correctly supporting the client/server protocol but this is a much harder approach.

FOSDEM 2013 : All the MySQL ecosystem together

SergeFebruary 6, 2013Leave a comment

This was nice to see all the MySQL ecosystem in the same very crowded room :
Oracle, Percona, MariaDB, SkySQL. We definitely need this kind of open event. MySQL Connect (owned by Oracle) or Percona Live (owned by Percona) are 2 great events but they do not gather the same variety of speakers and attendees. Thanks to the FOSDEM MySQL Dev room review committee the agenda reflected this openness.

The "MySQL and friends dev room" was crowded which was for me a surprise as I expected most people to be at NoSQL or Cloud topics rooms ;-). This reflect the strong attraction of MySQL with its constant innovation momentum. The challenge for presenters was the short 30 mns format. This has a big advantage that speakers have to get rid of marketing bullshit and concentrate on key points.

Collin Charles gave a pres on MHA solution. It nice to hear that it is a very active project. It is a proven technology used at Facebook. It allows very versatile usage from a manual switch-over to fully automatic failover. It can be combine with a clusterware like pacemaker. This is a technology we fully support at SkySQL and that we have deployed successfully at many customers.

Peter Boros gave a very interesting talk on an ongoing project at the Groupon company. The solution address the automatic warmup of a slave server during binlog apply. it is mix of a tool that read and stream the slow query log with the percona playback tool to warmup slave. I definitely would like to know more. I really like when people address an old problem with a new creative approach. The part streaming the slow query log should be open sourced soon.

Lars Thalmann presented a full picture of Oracle innovation around MySQL. There was a nice question about why Oracle which has released a migration toolkit does not address Oracle migration. Lars was surprised too 🙂 But I think it is fair to leave this part be developed by the community ;-). Lars presented all the progress made around windows platform. They are impressive. This is a good point as Windows remain important as the entry point for adoption, development and testing.

Luis Soares who manage the replication at Oracle described in detailed the new features of 5.6. He gave a workaround on how to implement multi master replication.
MariaDB will offer natively this feature (Thanks to TAOBAO contribution).

Stephane Combaudon had the task to chose the 5 tools he considered the most useful in the percona toolkit. The nominees were :
pt-query-digest, pt-online-schema-change, pt-table-checksum, pt-stalk, pt-archiver
Stephan insisted on the fact that it is really important to understand how the tools work to avoid damaging mistakes. This is in particular true for pt-archiver which is a nice tool to purge old data in MySQL.

Maciej Dobzranski gave a talk on MySQL and security. Tis subject is a hot one. The key take away is "MySQL is not secure out of the box! Many users just leave it at that.". A lot of tips and valuable examples of security challenges faced with MySQL.

When I read the schedule I saw there was no pause for lunch for the MySQL Dev Room ! Unacceptable for a french guy 😉 ! I saw Justin Swanhart pres title "Divide and conquer in the cloud" I thought I might use this slot to go as I expected classic marketing about how the cloud is going to save our life. I bought my sandwich and run back to the MySQL Dev Room as I wanted to have for my money (kidding : FOSDEM is free with no need for registration 😉 and the beer is cheap).

Justin Swanhart was presenting "Shard-Query" which is a framework that add parallelism and improve query performance based on SQL language constructs. This is a great tool. Some nice ideas mentioned like using Shard-Key-Mapper in conjunction with mysqlnd plugins (PHP connector) . Justin gave example of query speedup for a SUM aggregate. He mention that MEAN and STDDEV were not candidate to parallelization. I think it can be parallelized in a multi pass approach. I will blog on it later.

Other interesting pres by Andrew Morgan, Sveta Smirnova, Oystein Grovlen, Giuseppe Maxia, Henrik Ingo, Raghavendra Prabhu.

I am sure that next year FOSDEM MySQL Dev Room will have more people as this is a really great event. MySQL remain a hot topic with a lot of innovation to come.
FOSDEM is definitely the right place for all the MySQL community to gather !

Next week MySQL Innovation Day has a great agenda

SergeMay 31, 2012Leave a comment

MySQL developers will cover upcoming features. The MySQL engineering team is driving MySQL forward and users feedback is welcomed. For those having tested the Development Milestone releases this is a good opportunity to share your feedback and ideas.

The agenda includes :

InnoDB online operations

Replication :
Global transaction IDs for replication and failover
Multi-Threaded Slaves and Group Commit

NoSQL interfaces to InnoDB and NDB

Performance :
PERFORMANCE_SCHEMA improvements
Optimizer enhancements

Best Practices for Using MySQL in the Cloud

It is a physical event in Redwood Shores on Tuesday, June 05, 2012 8:00 AM – 5:30 PM
But there is also a live webcast option ! Register
I will definitely follow it.

MySQL for a Massively Parallel Database ?

SergeOctober 15, 2010Leave a comment

Last week in Paris I wen to Microsoft days show. It was a big event with steve Balmer, mainly focused on Cloud strategy, Window Phone 7 launch and other stuff. But beside that marketing part there was some interesting technical sessions. I attended one on the Microsoft Parallel Datawarehouse Solution (result of the madison project based on Datallegro acquisition).
This is a shared nothing architecture. Big tables data is distributed across node, smaller tables are replicated across node to avoid traffic network during join. The query is first issue on a coordinating node that then push pieces of it to the other nodes.
This is the same architecture that Postgres has with the greenplum solution.
For MySQL I do no know of similar architecture.
...continue reading "MySQL for a Massively Parallel Database ?"

Using ioprofile with MySQL is cool !

SergeOctober 12, 2010Leave a comment

Tracing per file IOs is a mandatory task for anyone doing performance issue analysis on a MySQL database. This could help to find candidates for innodb table compression, to decide to move data around on disks to optimize IOs, to decide what needs caching ...
There is no direct command to do it on linux. iostat gives you IOs per device or logical volumes.(I once thought of having loopback device mounted on my innodb datafile but it does not work).

ioprofile is a smart shell script developed by Baron Schwartz with the following principle :
...continue reading "Using ioprofile with MySQL is cool !"

When should I use Oracle 11g and when should I use MySQL ?

SergeSeptember 29, 2010

This is a real question asked by Ivan Zoratti in his blog post.
Another frequently heard question is : Should I still use MySQL now that my business is turning into a very serious business or should I switch to Oracle 11G.

To illustrate this question let me talked about MySQL users I met last week.
Last week I went to a big e-commerce exhibition in Paris ( 3 days 27.500 visitors)

The e-commerce | m-commerce business is a booming market segment. It involves various activities : digital marketing, price comparators, web analytics, social commerce ...
But it also includes very critical areas like logistics, payment, monetization ... This business is now representing more than 100 Billions both in US and EU.

So I questioned many company in various area. I must confess that all of the one I met relied on MySQL except one that was using Microsoft + SAS ( the statistics tool to do market segmentation). None of them had really chosen MySQL vs Oracle. They chose MySQL when there were small start-ups companies and could not afford Oracle.
In fact many of them still use MySQL community (GPL product) for free.

But now they realized that their business is becoming a serious one generating nice turnover with profits and excellent growth perspective.
As volume, traffic, criticity increase many are worried about the ability of MySQL to fit with their growing business.
Will MySQL be technically able to handle the volume, the high transaction level, the high availability ? Will it technically allow to maintain a 100% on-line business.

They have been fully reassured by the commitment of Oracle to invest in MySQL and make it a better database.
The latest announces about MySQL 5.5 improvement in performance and scalability made them confident they made the right choice.
The improvement on replication reliability, the availability of online backup with MySQL, the improvement on partitioning, all this made them feel Oracle is truly making MySQL better.

They have been also totally convinced by the very lively and innovative MySQL community : spider storage engine , flexview materialized views, Qgraph storage engine, column based storage engine, smart patches or improvement developed by key users. They got the feeling that this is the place where things are happening and innovation is taking place openly.

The capacity to replicate from Oracle to MySQL through Goldengate made them confident they could integrate with more classic IT if necessary.

So for me the main question that was answered by Oracle for these customers was : Should I still rely on MySQL now that I am doing big business ?
and the answer is obviously : YES you can rely on MySQL.

Nice BTRFS webinar by Oracle

Last week I followed an very interesting ORACLE webinar delivered by Chris Mason : The State of Btrfs File System for Linux
BTRFS was initiated by Chris Mason who used to be responsible for Reiserfs at Suse and now works for Oracle. The first release started in 2007. BTRFS has been merged into Linux kernel in 2009. Now there are developers from REDHAT, INTEL SUSE, IBM, HP ... storage vendors. The project is very active. Ubuntu is considering to use it soon as its default filesystem. BTRFS is licensed under the GPL license. An interesting to read short summary of the life of BTRFS : A short history of BTRFS
...

SergeAugust 31, 2010Leave a comment

Last week I followed a very interesting ORACLE webinar delivered by Chris Mason : The State of the BTRFS File System for Linux

BTRFS was initiated by Chris Mason who used to be responsible for Reiserfs at Suse and now works for Oracle. The first release started in 2007. BTRFS has been merged into Linux kernel in 2009. Now there are developers from REDHAT, INTEL SUSE, IBM, HP ... storage vendors. The project is very active. Ubuntu is considering to use it soon as its default filesystem. BTRFS is licensed under the GPL license. An interesting to read short summary of the life of BTRFS : A short history of BTRFS
...continue reading "Nice BTRFS webinar by Oracle"

Solutions Linux / Open Source 2010

Last week Solutions Linux / Open Source event was held in Paris.

Kuassi MENSAH (Head of Product Management Database Technologies, Oracle Corporation) presented the open source Oracle strategy. Linux, MySQL, virtualization, GlassFish, Eclipse, dynamic scripting languages ,... etc . It was well received by the audience. Knowing that MySQL organization will be kept safe in Oracle is perceived as a nice move.

Florian Haas(LINBIT) gave a tutorial on DRBD and did some demos with NFS and video streaming. And of course he reminded people that now since Linux 2.6.33, DRBD is officially integrated into the Linux kernel source. DRBD making the push for mainline Linux kernel is going to make HA easier.

...

SergeMarch 23, 2010Leave a comment

Last week Solutions Linux / Open Source event was held in Paris.

Kuassi MENSAH (Head of Product Management Database Technologies, Oracle Corporation) presented the open source Oracle strategy. Linux, MySQL, virtualization, GlassFish, Eclipse, dynamic scripting languages ,... etc . It was well received by the audience. Knowing that MySQL organization will be kept safe in Oracle is perceived as a nice move.

Florian Haas(LINBIT) gave a tutorial on DRBD and did some demos with NFS and video streaming. And of course he reminded people that now since Linux 2.6.33, DRBD is officially integrated into the Linux kernel source. DRBD making the push for mainline Linux kernel is going to make HA easier.

As is normal now, some subjects are getting stronger :
The cloud was a hot topic with a good keynote by UBUNTU. Virtualization related to cloud architecture was also presented by hosting providers.
Extreme scaleout with Hadoop (Yahoo implementation of Google Map reduce) get a lot of interest.
NoSQL databases solutions was also a hot subject. I must confess that I was skeptical about broad usage of NoSQL beside the top web actors.
But this times I met my first customers talking seriously about it and doing serious testing with Cassandra and Voldemort.
So I decided to follow a tutorial on using Hadoop. very interesting.

Microsoft was gold sponsor of the event. Microsoft presented themselves as now as a good friendly contributor to the Linux kernel. Beside that they try to optimize all open source solutions running on Windows Server. And of course they want to play a key role in the new cloud economy. In their Azure cloud framework they can handle Linux, PHP, MySQL . They seem to sincerely want to be active and friendly in the open source segment.

There was a full set of tutorials on database technology. Postgresql, NoSQL technologies, Ingres, PostGIS were there.
I presented a tutorial on MySQL. Our MySQL users and customers are always as passionate. They are eager to use our next production releases. Many of them have started using InnoDB pluggin and its advanced capabilities (compression, fast index rebuild ...). Some have started testing MySQL 5.5 and are very impressed by all the performance gains + extra functionalities. MySQL is really on it way to become "the fasted open source commodity database on the market". I of course got some questions on when we will release GA version. MySQL proxy GA was also mentioned and asked for.

Why do we need InnoDB per table IO statistics ?

InnoDB plugin offers an adaptative compression that is very interesting for performance.
This compression works at a table level. Like most optimization techniques it is not black or white.
You have to decide depending on IO patterns / CPU usage / Memory usage / disks constraints what tables are good candidates for compression.
I was trying to define a methodology to help decide which tables to compress to reach performance gains.
I thought I could use MySQL "show status" command or information_schema tables. Unfortunately MySQL does not offer any per tables IO statistics.

SergeMarch 1, 2010Leave a comment

InnoDB plugin offers an adaptative compression that is very interesting for performance.
This compression works at a table level. Like most optimization techniques this is not black or white.
You have to decide depending on IO patterns / CPU usage / Memory usage / disks constraints what tables are good candidates for compression.

I was trying to define a methodology to help decide which tables to compress to reach performance gains.

I thought I could use MySQL "show status" command or information_schema tables. Unfortunately MySQL does not offer any per tables IO statistics.
I thought maybe I could use some linux command. Unfortunately on linux you can only get statistics at a block device level with iostat command.
As a last ressort I thought maybe I could use MySQL Enterprise Monitor / Query Analyzer.
The MySQL Query Analyzer gives you nice aggregated statistics per queries : nb execs / mean time ...
All of these statistics are at a query level. Most queries are joins, and many tables are accessed through multiple different queries. So you get no information at a table level.
So none of the above techniques solve the issue.

Mark Callaghan had packaged a set of patch for MySQL 5.0.
This set of patches includes IO statistics per InnoDB tablespace. In conjunction with the use of the innodb_file_per_table option this gives InnoDB per tables IO statistics.

A good reason to use inodb_file_per_table -- per-table IO statistics

More changes for performance monitoring

The SHOW INNODB FILE STATUS command reports IO statistics per InnoDB tablespace and thus per table When innodb_file_per_table is used.

The InnoDB plugin that offers compression is part of MySQL 5.1. and this patch was for MySQL 5.0 only.

But a very good news came last friday :
Marc Callaghan has just released the Facebook patch for MySQL 5.1 .
This gives us a efficient tool to decide which tables are good candidate for compression.

If anyone has time to write a script that would based on these statistics and on a server workload pinpoint the tables best canditates for compression.
This would be nice. I agree this would be easier if these statistics were part of the information schema. So if someone could do that too this would be definitely a great
contribution.