Re: [RFC] respect the referenced bit of KVM guest pages?

From: Andrea Arcangeli
Date: Thu Aug 06 2009 - 06:09:40 EST


On Wed, Aug 05, 2009 at 05:58:05PM +0200, Andrea Arcangeli wrote:
> On Wed, Aug 05, 2009 at 10:40:58AM +0800, Wu Fengguang wrote:
> > */
> > - if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> > + if ((vm_flags & VM_EXEC) || PageAnon(page)) {
> > list_add(&page->lru, &l_active);
> > continue;
> > }
> >
>
> Please nuke the whole check and do an unconditional list_add;
> continue; there.

After some conversation it seems reactivating on large systems
generates troubles to the VM as young bit have excessive time to be
reactivated, giving troubles to shrink active list. I see that, so
then the check should be still nuked, but the unconditional
deactivation should happen instead. Otherwise it's trivial to put the
VM to its knees and DoS it with a simple mmap of a file with MAP_EXEC
as parameter of mmap. My whole point is that deciding if activating or
deactivating pages can't be in function of VM_EXEC, and clearly it
helps on desktops but then it probably is a signal that the VM isn't
good enough by itself to identify the important working set using
young bits and stuff on desktop systems, and if there's a good reason
to not activate, we shouldn't activate the VM_EXEC either as anything
and anybody can generate a file mapping with VM_EXEC set...

Likely we need a cut-off point, if we detect it takes more than X
seconds to scan the whole active list, we start ignoring young bits,
as young bits don't provide any meaningful information then and they
just hang the VM in preventing it to shrink active list and looping
over it endlessy with million pages inside that list. But on small
systems if inactive list is short it may be too quick to just clear
the young bit and only giving it time to be re-enabled in inactive
list. That may be the source of the problem. Actually I'm speculating
here, because I barely understood that this is swapin... not sure
exactly what this regression is about but testing the patch posted is
good idea and it will tell us if we just need to dynamically
differentiating the algorithm between large and small systems and start
ignoring young bits only at some point.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/