Slow long-term increase in dirty pages

From: Paul Evans
Date: Wed Mar 18 2009 - 11:11:49 EST


We have a server whose dirty page count keeps increasing all the time,
to the point where 'sync' takes ages to flush the pages:

root@freehand:~# time sync

real 1m15.570s
user 0m0.000s
sys 0m0.052s

We have some graphs of the dirty page count, as captured
from /proc/vmstat's "nr_dirty" entry:

http://opensource.mxtelecom.com/tmp/freehand-dirty-day.png
http://opensource.mxtelecom.com/tmp/freehand-dirty-week.png

I have tuned the dirty page flushing sysctls to the following:

root@freehand:~# for F in /proc/sys/vm/dirty_*; do echo -n "$F: "; cat $F; done
/proc/sys/vm/dirty_background_ratio: 1
/proc/sys/vm/dirty_expire_centisecs: 3000
/proc/sys/vm/dirty_ratio: 3
/proc/sys/vm/dirty_writeback_centisecs: 500

The role of the machine itself is that it performings large amount of
kernel iptables routing/firewalling traffic, and runs a set of apache
servers as HTTP<->Tomcat gateways.

root@freehand:~# uname -r
2.6.27-fes

(this is a build of stock 2.6.27 source, with some extra iptables
patches. There shouldn't be anything mm-related here)

By my understanding of the dirty page flush algorithm, we shouldn't be
accumulating these pages all the time; any page older than 30 seconds
ought to be written out, yes?

If we manually 'sync', as above, then the count drops to zero, but then
slowly starts ramping up again as observed.

As a temporary workaround I've put 'sync' in cron every 10 minutes, but
is there some more tuning I can do; or at least probing to see where
these pages are being accumulated from?

--
Paul Evans <paul@xxxxxxxxxxxxx>
Tel: +44 (0) 845 666 7778
Fax: +44 (0) 870 163 4694
http://www.mxtelecom.com

Attachment: signature.asc
Description: PGP signature