Why do we need InnoDB per table IO statistics ?

InnoDB plugin offers an adaptative compression that is very interesting for performance.
This compression works at a table level. Like most optimization techniques this is not black or white.
You have to decide depending on IO patterns / CPU usage / Memory usage / disks constraints what tables are good candidates for compression.

I was trying to define a methodology to help decide which tables to compress to reach performance gains.

I thought I could use MySQL "show status" command or information_schema tables. Unfortunately MySQL does not offer any per tables IO statistics.
I thought maybe I could use some linux command. Unfortunately on linux you can only get statistics at a block device level with iostat command.
As a last ressort I thought maybe I could use MySQL Enterprise Monitor / Query Analyzer.
The MySQL Query Analyzer gives you nice aggregated statistics per queries : nb execs / mean time ...
All of these statistics are at a query level. Most queries are joins, and many tables are accessed through multiple different queries. So you get no information at a table level.
So none of the above techniques solve the issue.

Mark Callaghan had packaged a set of patch for MySQL 5.0.
This set of patches includes IO statistics per InnoDB tablespace. In conjunction with the use of the innodb_file_per_table option this gives InnoDB per tables IO statistics.

A good reason to use inodb_file_per_table -- per-table IO statistics

More changes for performance monitoring

The SHOW INNODB FILE STATUS command reports IO statistics per InnoDB tablespace and thus per table When innodb_file_per_table is used.

The InnoDB plugin that offers compression is part of MySQL 5.1. and this patch was for MySQL 5.0 only.

But a very good news came last friday :
Marc Callaghan has just released the Facebook patch for MySQL 5.1 .
This gives us a efficient tool to decide which tables are good candidate for compression.

If anyone has time to write a script that would based on these statistics and on a server workload pinpoint the tables best canditates for compression.
This would be nice. I agree this would be easier if these statistics were part of the information schema. So if someone could do that too this would be definitely a great
contribution.

Leave a Reply Cancel reply