InnoDB : Making Buffer Cache Scan Resistant

What more frustrating than full table scan wiping out the buffer pool. You have a nicely tuned busy OLTP server with a warm buffer pool containing the current working set. Then someone submit a report needing to access a table through a full table scan. The normal and current MySQL behavior is to wipe out the content of the cache. if the table is never reused this is pure loss. The server will have to go through a full new warm up phase that can last quite long with a big buffer pool. This issue is now solved with MySQL 5.1.41.

The latest innodb plugin 1.0.5 included in MySQL 5.1.41 introduced the settable global variables innodb_old_blocks_pct and innodb_old_blocks_time. These two variables allow to control the buffer pool eviction policy. This makes it possible to tune the buffer pool LRU eviction policy to be more resistant against full scans. This 1.0.5 innoDB plugin is the latest version of the innodb plugin. It is included in MySQL 5.1.41 and more up to date than the plugin downloadable from innobase.
To test this new functionality we first have modified our .cnf file. This is necessary to enable the InnoDB plugin and the InnoDB Information Schema tables implemented in Currently this is not done by default in MySQL 5.1.41 which by default use an old embedded version of innoDB not including all the fancy functionalities of the plugin. We have to wait for MySQL 5.4 to have the embedded innoDB use the new innoDB plugin code.

my.cnf :


Let us consider a MySQL instance with a rather small buffer pool :

innodb_buffer_pool_size = 20M
innodb_flush_method = O_DIRECT # this is to avoid filesystem cache

First we issue a request for which all data fit in the buffer pool(dbt2 9M new_order table) :

mysql> select sum(no_w_id) from new_order;
| sum(no_w_id) |
| 54000 |
1 row in set (0.18 sec)

We reissue it and now benefit from the buffer pool :

mysql> select sum(no_w_id) from new_order;

1 row in set (0.02 sec)

We then issue a query that do a full table scan on a quite big table (dbt2 125M stock table) that does not fit in the buffer pool. To allow a full scan to fill the buffer pool we set :

set global innodb_old_blocks_time=0;
set global innodb_old_blocks_pct=90;

mysql> select sum(s_quantity) from stock where s_data like ‘a%’;
| sum(s_quantity) |
|769958 |
1 row in set (4.58 sec)

We then reissue the initial query which gets slow as the data has been wiped out of the buffer pool :

mysql> select sum(no_w_id) from new_order;

1 row in set (0.20 sec) <=========== back to slow time

The second run of course is faster as the data is back in the cache :

mysql> select sum(no_w_id) from new_order;

1 row in set (0.03 sec) <=========== back fast time

If we want to keep data in the buffer pool and avoid full scan to wipe out the buffer pool we set :

set global innodb_old_blocks_time=5;
set global innodb_old_blocks_pct=5;

When we issue the small query it is fast :

mysql> select sum(no_w_id) from new_order;

1 row in set (0.03 sec) <========== fast

We then issue the request that does a full table scan and that previously wiped out the small table out of the buffer pool :

mysql> select sum(s_quantity) from stock where s_data like ‘a%’;

1 row in set (4.34 sec)

We then reissue the initial small query and it remains fast. The data has not been wiped out of the buffer pool :

mysql> select sum(no_w_id) from new_order;

1 row in set (0.02 sec) <=========== REMAINS FAST / DATA NOT WIPED OUT OF INNODB BUFFER POOL:-)

How does it now works ?

InnoDB Plugin 1.0.5 does now provides two system variables that enable LRU algorithm tuning:
The buffer pool is divided into new blocks and old blocks.
innodb_old_blocks_pct : Specifies the approximate percentage of the buffer pool used for the old block sublist. The range of values is 5 to 95. The default value is 37 (that is, 3/8 of the pool).
innodb_old_blocks_time : Specifies how long in milliseconds (ms) a block inserted into the old sublist must stay there after its first access before it can be moved to the new sublist. The default value is 0: A block inserted into the old sublist moves immediately to the new sublist the first time it is accessed, no matter how soon after insertion the access occurs. If the value is greater than 0, blocks remain in the old sublist until an access occurs at least that many ms after the first access. For example, a value of 1000 causes blocks to stay in the old sublist for 1 second after the first access before they become eligible to move to the new sublist.

Beware that giving a too high value could also be inefficient as the block will certainly be wiped out of the old sublist before having a chance to enter the new sublist. There is an equilibrium to find between a cache that get wiped out and one that never change ! This needs some monitoring.

How to monitor ?

To monitor and get a better insight of what happens in the buffer pool the show innodb status command will provide some extra info :

Database pages 1279
Old database pages 62

Pages made young 39508, not young 185363
121.26 youngs/s, 2942.48 non-youngs/s
Pages read 64769, created 0, written 1
467.03 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 889 / 1000, young-making rate 29 / 1000 not 704 / 1000

Pages made young is the total number of old pages that have been made young.
youngs/s is the rate at which page accesses to the old pages have resulted in making such pages young since the last invocation of the command.
young-making rate provides the same rate but in terms of overall buffer cache accesses instead of accesses just to the old pages.
These information give an idea at what rhythm the bufferpool is renewed and help tune innodb_old_blocks_pct and innodb_old_blocks_time to reach a target goal.

As noticed by Mark Callaghan this functionality is a nice feature to have when you occasionally use mysqldump to copy a table from a busy OLTP server.

InnoDB plugin definitely gets better and better. It is a good reason to upgrade to MySQL Server 5.1.41 and activate the InnoDDB plugin.

Leave a Reply




four − 2 =

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">