FOSDEM 2013 : All the MySQL ecosystem together

This was nice to see all the MySQL ecosystem in the same very crowded room :
Oracle, Percona, MariaDB, SkySQL. We definitely need this kind of open event. MySQL Connect (owned by Oracle) or Percona Live (owned by Percona) are 2 great events but they do not gather the same variety of speakers and attendees. Thanks to the FOSDEM MySQL Dev room review committee the agenda reflected this openness.

The “MySQL and friends dev room” was crowded which was for me a surprise as I expected most people to be at NoSQL or Cloud topics rooms ;-) . This reflect the strong attraction of MySQL with its constant innovation momentum. The challenge for presenters was the short 30 mns format. This has a big advantage that speakers have to get rid of marketing bullshit and concentrate on key points.

Collin Charles gave a pres on MHA solution. It nice to hear that it is a very active project. It is a proven technology used at Facebook. It allows very versatile usage from a manual switch-over to fully automatic failover. It can be combine with a clusterware like pacemaker. This is a technology we fully support at SkySQL and that we have deployed successfully at many customers.

Peter Boros gave a very interesting talk on an ongoing project at the Groupon company. The solution address the automatic warmup of a slave server during binlog apply. it is mix of a tool that read and stream the slow query log with the percona playback tool to warmup slave. I definitely would like to know more. I really like when people address an old problem with a new creative approach. The part streaming the slow query log should be open sourced soon.

Lars Thalmann presented a full picture of Oracle innovation around MySQL. There was a nice question about why Oracle which has released a migration toolkit does not address Oracle migration. Lars was surprised too :-) But I think it is fair to leave this part be developed by the community ;-) . Lars presented all the progress made around windows platform. They are impressive. This is a good point as Windows remain important as the entry point for adoption, development and testing.

Luis Soares who manage the replication at Oracle described in detailed the new features of 5.6. He gave a workaround on how to implement multi master replication.
MariaDB will offer natively this feature (Thanks to TAOBAO contribution).

Stephane Combaudon had the task to chose the 5 tools he considered the most useful in the percona toolkit. The nominees were :
pt-query-digest, pt-online-schema-change, pt-table-checksum, pt-stalk, pt-archiver
Stephan insisted on the fact that it is really important to understand how the tools work to avoid damaging mistakes. This is in particular true for pt-archiver which is a nice tool to purge old data in MySQL.

Maciej Dobzranski gave a talk on MySQL and security. Tis subject is a hot one. The key take away is “MySQL is not secure out of the box! Many users just leave it at that.”. A lot of tips and valuable examples of security challenges faced with MySQL.

When I read the schedule I saw there was no pause for lunch for the MySQL Dev Room ! Unacceptable for a french guy ;-) ! I saw Justin Swanhart pres title “Divide and conquer in the cloud” I thought I might use this slot to go as I expected classic marketing about how the cloud is going to save our life. I bought my sandwich and run back to the MySQL Dev Room as I wanted to have for my money (kidding : FOSDEM is free with no need for registration ;-) and the beer is cheap).

Justin Swanhart was presenting “Shard-Query” which is a framework that add parallelism and improve query performance based on SQL language constructs. This is a great tool. Some nice ideas mentioned like using Shard-Key-Mapper in conjunction with mysqlnd plugins (PHP connector) . Justin gave example of query speedup for a SUM aggregate. He mention that MEAN and STDDEV were not candidate to parallelization. I think it can be parallelized in a multi pass approach. I will blog on it later.

Other interesting pres by Andrew Morgan, Sveta Smirnova, Oystein Grovlen, Giuseppe Maxia, Henrik Ingo, Raghavendra Prabhu.

I am sure that next year FOSDEM MySQL Dev Room will have more people as this is a really great event. MySQL remain a hot topic with a lot of innovation to come.
FOSDEM is definitely the right place for all the MySQL community to gather !

6 comments to FOSDEM 2013 : All the MySQL ecosystem together

  • MEAN is distributed by Shard-Query automatically as sum(x)/count(x). I just didn’t realize mathematically that a stddev can be decomposed into a calculation which involves only mean. This is because I’m a SQL guy not a math major. I need to look at the mathematical definition of stddev and variance. I should have used GROUP_CONCAT as my example instead of STDDEV :)

    You can fully distribute any aggregate function that can be decomposed into some combination of the following operations:
    SUM(x)
    MIN(x)
    MAX(x)
    COUNT(x)
    COUNT(distinct x)
    AVG(x)

    Shard-Query does support custom aggregate functions too, like PERCENTILE, which currently use a tagging approach for computational parallelism. This functionality is going to be used to add window functions to Shard-Query.

    • Serge

      Yes Justin I agree MEAN and STDDEV can be parallelized too :

      For MEAN if we suppose you have n shards. let us call MEAN(i) the MEAN computed on shard number i and CARD(i) the cardinality of shard i.
      then MEAN(ALL) = (M(1)*CARD(1) + …/MEAN(n)*CARD(n))/CARD(ALL)

      For standard deviation parallelization is also possible
      STDDEV = SQRT(1/n (SUM(POW(x,2)) – POW(MEAN(ALL),2))
      The MEAN(ALL) is computed in parallel using the previous method.
      you also need to compute the sum of square which can also be parallelized.
      Having both you can compute the STDDEV.

      So yes Shard-query is a powerful tool in many cases.
      I like too the addition of window functions to Shard-Query.

  • I agree about seeing all the players in one room. A while back the IOUG MySQL Council approached all the key players (oracle, skysql, percona, monty program, tokutek, etc) and wanted to set up a community-run conference like FOSDEM, Confoo or even Linux Conf AU. But Percona said they wanted to run a business conference (see http://palominodb.com/blog/2011/08/10/disclosure-truth-about-mysql-2012-conference-planning) and we didn’t want to run a competing conference with Percona, so the big community conference idea died.

    We still do smaller events, like the Open Database Camps, but I still miss what could have been. FOSDEM is great but having our *own* MySQL conference, 4-5 days, would be even better.

  • It has to work through a single SQL rewrite:
    mysql> select stddev(id), pow((sum(pow(id – l(sum(id)/count(id)),2)) / count(*)),.5) from t1 where id between 1 and 10;
    ERROR 1111 (HY000): Invalid use of group function

    That could be rewritten over many shards, but it is invalid because you can’t nest aggregate functions that way, which is why MySQL has explicit stddev() function because it can’t be implemented with regular aggregate functions.

  • It might be more efficient to do the query in two steps:
    1) get the average for each group
    2) run the rest of the query using the mean values collected

    Thanks for the good idea. I’ll implement it in the next version of Shard-Query. It is currently feature frozen for version 2.0 and I’m not adding new features, just bug fixes.

    The result from the mean calculation will have to be correlated via a hash function over the group by attributes to connect the mean back to the detail row. This is similar to how custom aggregate functions tag rows, so I already know how to make it work.

    Thanks for the idea.

  • FWIW:
    VAR_SAMP = (SUM(pow(X,2)) – ((pow(SUM(X),2)) / (COUNT(X)))) / (COUNT(X) – 1)
    STDDEV_SAMP = POW(VAR_SAMP(X), .5)
    VARIANCE = SUM(POW(expr,2))/COUNT(expr) – POW(SUM(expr)/COUNT(expr),2)
    STDDEV = POW(SUM(POW(expr,2))/COUNT(expr) – POW(SUM(expr)/COUNT(expr),2),.5)

    All can thus be fully distributed. Thanks to you and Anthony Curtis for pointing me in the right direction.

Leave a Reply

  

  

  


8 × = eight

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">