How to get a sense of VM pressure
From: Jeremy Fitzhardinge
Date: Fri Jul 25 2008 - 13:55:56 EST
I'm thinking about ways to improve the Xen balloon driver. This is the
driver which allows the guest domain to expand or contract by either
asking for more memory from the hypervisor, or giving unneeded memory
back. From the kernel's perspective, it simply looks like a driver
which allocates and frees pages; when it allocates memory it gives the
underlying physical page back to the hypervisor. And conversely, when
it gets a page from the hypervisor, it glues it under a given pfn and
releases that page back to the kernel for reuse.
At the moment it's very dumb, and is pure mechanism. It's told how much
memory to target, and it either allocates or frees memory until the
target is reached. Unfortunately, that means if it's asked to shrink to
an unreasonably small size, it will do so without question, killing the
domain in a thrash-storm in the process.
There are several problems:
1. it doesn't know what a reasonable lower limit is, and
2. it doesn't moderate the rate of shrinkage to give the rest of the
VM time to adjust to having less memory (by paging out, dropping
inactive, etc)
And possibly the third point is that the only mechanism it has for
applying memory pressure to the system is by allocating memory. It
allocates with (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY |
__GFP_NOMEMALLOC), trying not to steal memory away from things that
really need it. But in practice, it can still easy drive the machine
into a massive unrecoverable swap storm.
So I guess what I need is some measurement of "memory use" which is
perhaps akin to a system-wide RSS; a measure of the number of pages
being actively used, that if non-resident would cause a large amount of
paging. If you shrink the domain down to that number of pages + some
padding (x%?), then the system will run happily in a stable state. If
that number increases, then the system will need new memory soon, to
stop it from thrashing. And if that number goes way below the domain's
actual memory allocation, then it has "too much" memory.
Is this what "Active" accounts for? Is Active just active
usermode/pagecache pages, or does it also include kernel allocations?
Presumably Inactive Clean memory can be freed very easily with little
impact on the system, Inactive Dirty memory isn't needed but needs IO to
free; is there some way to measure how big each class of memory is?
If you wanted to apply gentle memory pressure on the system to attempt
to accelerate freeing memory, how would you go about doing that? Would
simply allocating memory at a controlled rate achieve it?
I guess it also gets more complex when you bring nodes and zones into
the picture. Does it mean that this computation would need to be done
per node+zone rather than system-wide?
Or is there some better way to implement all this?
Thanks,
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/