Skip to content

MySQL for a Massively Parallel Database ?

Last week in Paris I wen to Microsoft days show. It was a big event with steve Balmer, mainly focused on Cloud strategy, Window Phone 7 launch and other stuff. But beside that marketing part there was some interesting technical sessions. I attended one on the Microsoft Parallel Datawarehouse Solution (result of the madison project based on Datallegro acquisition).
This is a shared nothing architecture. Big tables data is distributed across node, smaller tables are replicated across node to avoid traffic network during join. The query is first issue on a coordinating node that then push pieces of it to the other nodes.
This is the same architecture that Postgres has with the greenplum solution.
For MySQL I do no know of similar architecture.

MySQL cluster which is a share nothing architecture is not fitted for datawarehouse as most data need to be held in memory.
Storage engine like spider allows to benefit from partitioning across multiple hosts. This is more a sharding technique than a datawarehouse solution.
We also have column based storage engine solutions like infobright or infiniDB. infiniDB offers MPP capabilities across multiple nodes.

I do not know if there is any technical reason that make it difficult to build the same MPP architecture with MySQL ?
is it the lack of hash join, the difficulty to manipulate the query plan ?

Leave a Reply

Your email address will not be published. Required fields are marked *