A new kid in the MySQL sharding world

MySQL Connect 2013 has been a great edition. There was of course a lot of nice announcements of improvements in the the core MySQL server technology. One of the major announcement that received a lot of buzz was MySQL Fabric. MySQL Fabric is an infrastructure component aimed at simplifying construction of a highly available, sharded, MySQL server based architecture.

Horizontal scale out for MySQL is a hard problem. The MySQL sharding requirement has up till now only be addressed in a non general solution way. Using MySQL at Scale remains a big challenge.

All the big actors of the web have been faced with this scale out issue. They all have developed their own tools/framework to address this need. The new cloud providers have also been faced with this requirement when trying to offer database as a service solutions around MySQL with transparent elasticity.

We can mention the following Sharding solutions davelopped by the key players of the web :

  • Google/Youtube has built Vitess (in Go) and open sourced it
  • Twitter has built Gizzard (in Scala) and open sourced it
  • Tumblr has built Jetpants (in Ruby) and open sourced it
  • Facebook has also it own sharding framework which is a range based model. The HA is addressed with MHA. they have presented their architecture at various events

Theses frameworks cover the needs:

  • Master promotions,
  • Cloning slaves,
  • Supports a range-based sharding scheme for MySQL
  • Rebalancing shards,
  • Split a range-based shard into N new shards

High Availability is also addressed by most of these frameworks. MySQL Fabric also handles HA. All the improvements made around replication have helped handle correctly the HA part of the architecture in a more resilient way.

So now with the introduction of MySQL Fabric there is new sharding framework available. So what ? Interesting ?
Yes it is very interesting.

First, MySQL fabric has been written in python which in my opinion is a good point as devops love python.
Second, MySQL fabric has been released under the GPL license which is also a very good point.
Last MySQL Fabric is aimed as a general purpose sharding framework.

 

MySQL fabric is quite simple :

  • A Fabric Server holding and serving the sharding metadata.
  • A set of commands to do various action (split shard ,…)
  • Specific connectors : Java, Python to request shard location to the Fabric server through XMLRPC call.

Two questions for me:

  • What is the status of the PHP and C drivers regarding MySQL Fabric ?
  • Is it possible to fully abstract the client code from the fact that the data is sharded. Some connector config could specify the fabric server. The client code would then not be changed at all.

======

A Brief Introduction to MySQL Fabric  2013-09-21 Mats Kindahl
MySQL Fabric – Sharding – Introduction  2013-09-21 VN (Narayanan Venkateswaran)
MySQL Fabric – Sharding – Simple Example  2013-09-22 VN (Narayanan Venkateswaran)
MySQL Fabric – Sharding – Shard Maintenance  2013-09-27 VN (Narayanan Venkateswaran)
MySQL Fabric – Sharding – Migrating From an Unsharded to a Sharded Setup  2013-09-22 VN (Narayanan Venkateswaran)
Installing MySQL Fabric on Windows  2013-10-03 Todd Farmer
MySQL 5.7 Fabric: any good?  2013-09-23 Ulf Wendel

Writing a Fault-tolerant Database Application using MySQL Fabric  2013-09-21 Alfranio Junior
MySQL Fabric support in Connector/Python  2013-09-22 Geert Vanderkelen
MySQL Connector/J with Fabric Support  2013-09-21 Jess Balint

9 comments to A new kid in the MySQL sharding world

  • You forgot Shard-Query and Spider dude, the two most popular sharding tools for MySQL that actually work and are transparent, unlike the other tools.

    I’m disappointed in the lack of respect in the community for the effort I have put into this generic tool for MySQL sharding.

    • Serge

      No. In fact I had a thougt to Shard Query when writing the post ;-)

      This is another variety of sharding mainly aimed at pushing the query to the data in a parallel way.
      We could mention in this respect the SPIDER storage engine with his remote partition approach.
      We could also mention the new MariaDB CONNECT engine with his TBL type made of a collection of remote MySQL tables. The 3 are more there to speed up big data queries.

      They can in fact be used on the shards produced by MySQL Fabric.
      They are complementary.

      What I like about Shard Query is that the query being parsed allows to do smart things. But this has a cost so not OLTP targeted. No perfect simple world.
      What I also like about Shard Query is how it was born ;-)

      • Justin Swanhart

        Well, you can use Fabric with Shard-Query, or Jetpants with Shard-Query. Shard-Query has pluggable metadata layer. So you get the best of both worlds.

        And Spider can do OLTP. The heavy lifting is in the SE layer. It isn’t really any more expensive than normal SQL parsing. And it can do cross-shard queries and scans in parallel when you do. And now it aggregates in parallel for simple operations like SUM/MIN/MAX/COUNT without a group by.

        What is nice about Shard-Query is that it can work with NDB or Spider on 5.6 or MariaDB 10 because they support the PARTITION hint. This can allow a SQL node to no longer be single threaded on a count(*), though a lot of network data might still need to be transited.

  • Hi Serge,

    Thanks for noting the release of MySQL Fabric! A couple quick answers to your questions:

    1. This Labs release is meant to solicit early feedback. PHP (in particular) and C driver support for Fabric will help obtain that feedback from an even wider audience, so you’ll likely see something here in the not-distant future.

    2. This initial release has limited support for truly “transparent” sharding – the application must supply the shard key before operations. In certain deployments – most notably where ORMs are in use – this context can be known and injected without changes to the application layer. That’s why, in addition to the Connector/J support which allows applications to specify shard keys, support was added for Hibernate Shards. You’ll also see something similar for Doctrine.

  • Serge

    This sounds great. PHP is definitely needed and will attract a wider testing comunity.
    DOCTRINE php ORM support would be great too.

    Point 2 sounds interesting.
    Being able to automatically inject the shard id based on the oject would be great for ORM. But the current hibernate shard multitenant example in the Fabric doc seems to explicitely set the tenant id (equivalent to the shard id). maybe I have not read enough and missed something ;-)

    How does it work now for queries like select needing to access multiple shards ?
    Any parallel execution ? or Fabric is not targeted for this kind of access pattern and you need to know the shard key before any action.

  • Serge,

    nope, cross-shard queries are not part of the deal in the initial version. Fabric is just an admin tool to setup large clusters of MySQL.

    You hit the nail on its head. The initial vision is on “bigggggg, biggg dataset that require divide-and-conquer”. The data model is not made for cross-shard queries. You are supposed to keep your queries on one machine. You got some fraction of a sharded table on one node, plus all the smaller (so-called “global”) tables that you frequently join. All the big boy systems apply divide-and-conquer and tell you to limit your most common, most performance demanding queries on one node. They just “cut” the data differently into parts (values/documents, columnar, entity groups, …).

    Sorry, no server-side support for cross-shard queries.

    (Yes, Justin, Shard-Query for the client part…, yes Oli/Erkan, Spider for the server part…).

    As for ORM/my_favorite_framework, yeah, nice to have this stuff delivered by us. Though, it is really no voodoo for a user to add it. And, yes, there’s something for Doctrine hidden in the labs utility download. However, instead of keeping resources busy with this sugar on top, it would be better to have solid drivers first.

    PHP is quite well prepared for sharding environments. There’re a non-blocking query() call since 2008, there’s a proof-of-concept for multiplexing queries. However, things have not been put together yet.

    • Serge

      It would be great to have the “scatter/gather” pattern that we find in MongoDB.
      The difference is that mongoDB is a proxy based approach. their mongos proxy server can do parallel queries to the shard and partial results agregation/sorting.

      With MySQL Fabric there is no proxy so the connector would have to take care.
      It could be helfull anyway to have a connector that can do this job and do a sort/merge of partial result from shards.

  • Hi Serge,

    I checked the Hibernate example, and you’re entirely right. At the risk of polluting your blog, here’s the relevant code:

    Session session = sf.withOptions()
    	.tenantIdentifier(""+j) // choose a db server
    	.openSession();
     
    // vanilla hibernate code
    session.beginTransaction();
    Employee e = new Employee();
    e.setId(j);

    Most frequently, these type of operations would be encapsulated in some application method, so passing the identifier isn’t likely to be a burden. But it still doesn’t adequately demonstrate what I would like to see from a “transparent” solution – rather, it establishes a platform on which we should build. Ideally, the s.setId(j) method would understand that this is the shard key, and route appropriately. And the concept of cross-shard or aggregation of parallel queries and such are obviously tougher problems to be solved down the road.

    Regarding PHP suport, you’ll want to check out this new post from Johannes.

  • Serge, you may also want to check out ParElastic (full disclosure, I work there). It does all of the things you describe + fully transparent parallel queries and elastic scale without re-sharding. You can read more at http://www.parelastic.com/blog</a?; drop me a note if you want to take it for a test drive. It works fine with MariaDB, Percona and MySQL.

    Thx,

    -amrith

Leave a Reply

  

  

  


four − 1 =

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">