Re: [RFC][PATCH 0/3] VM throttling: avoid blocking occasional writers
From: Leroy van Logchem
Date: Thu Mar 01 2007 - 07:50:41 EST
Tomoki Sekiyama <tomoki.sekiyama.qu <at> hitachi.com> writes:
> thanks for your comments.
The default dirty_ratio on most 2.6 kernels tend to be too large imo.
If you are going to do sustained writes multiple times the size of
the memory you have at least two problems.
1) The precious dentry and inodecache will be dropped leaving you with
a *very* unresponsive system
2) The amount of dirty_pages which need to be flushed to disk is huge,
taking all VM while the i/o channel takes uninterruptable time to flush it.
What we really need is a 'cfq' for all processes -especially misbehaving
ones like dd if=/dev/zero of=/location/large bs=1M count=10000-.
If you want to DoS the 2.6 kernel, start a ever running dd write and
you know what I mean. Huge latencies due the fact that all name_to_inode
caches are lost and have to be fetched from disk again only to be quickly
flushed again and again. I already explained this disaster scenario with
Linus, Andrew and Jens; hoping for a auto-tuning solution which takes
diskspeed per partition into account.
At the moment we cope with this feature by preserving imported caches with
sysctl vm.vfs_cache_pressure = 1, vm.dirty_ratio = 2 combined with
vm.dirty_background_ratio = 1. Some benchmarks may get worse but you have
a more resiliant server.
I hope the VM subsystem will cope with applications which do not advise
what to do with the cached pages. For now we use posix_fadvice DONT_NEED
as patch to Samba 3 in order to at least be able to write larger then
memory files without discarding the important slab caches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/