Re: 2.4.23aa2 (bugfixes and important VM improvements for the highend)

From: Rik van Riel
Date: Thu Feb 26 2004 - 23:45:09 EST


On Fri, 27 Feb 2004, Andrea Arcangeli wrote:

> becomes freeable/swappable. Some lib function in the patch is taken
> from the objrmap patch for 2.6 in the mbligh tree implemented by IBM
> (thanks to Martin and IBM for maintaining that patch uptodate for 2.6,
> that is a must-have starting point for the 2.6 VM too). The original
> idea of using objrmap for the vm unmapping procedure is from David
> Miller (objrmap itself has always existed in every linux kernel out

Good to hear that you're finally convinced that some form
of reverse mapping is needed.

I agree with you that object based rmap may well be better
for 2.6, if you want to look into that I wouldn't mind at
all. Especially if we can keep akpm's and Nick's nice VM
balancing intact ...

> The rest is infinite swapping and machine total hung, since those 4.8G
> of swapcache are now freeable and dirty, and the kernel will not
> notice the freeable and clean swapcache generated by this 64M
> swapout, since it's being queued at the opposite side of the lru

An obvious solution for this is the O(1) VM stuff that Arjan
wrote and I integrated into rmap 15. Should be worth looking
into this for the 2.6 kernel ...

Basically it keeps the just-written pages near the end of the
LRU, so they're easily found and freed, before the kernel even
starts thinking about submitting the other gigabytes of dirty
data for writeout.

> efficient swapping because it was taking a long time before the
> vm could notice the clean swapcache, after it started the I/O on
> it.

... Arjan's O(1) VM stuff ;)

> It was pretty clear after that, that we've to prioritize and to
> prefer discarding memory that is zerocost to collect, than to do
> extremely expensive things to release free memory instead.

I'm not convinced. If we need to free up 10MB of memory, we just
shouldn't do much more than 10MB of IO. Doing just that should be
cheap enough, after all.

The problem is when you do two orders of magnitude more writes than
the amount of memory you need to free. Trying to do zero IO probably
isn't quite needed ...

> vm is an order of magnitude worse in the high end. So the fix I
> implemented is to run a inactive_list/vm_cache_scan_ratio pass
> on the clean immediatly freeable cache in the inactive list

Should work ok for a while, until you completely run out of
clean pages and then you might run into a wall ... unless you
implement smarter cleaning & freeing like Arjan's stuff does.

Then again, your stuff will also find pages the moment they're
cleaned, just at the cost of a (little?) bit more CPU time.
Shouldn't be too critical, unless you've got more than maybe
a hundred GB of memory, which should be a year off.

> A better fix would be to have an anchor in the lru (can be a per-lru
> page_t with a PG_anchor set) and to avoid the clean-cache search to
> alter the point where we keep swapping with writepage, but it
> shouldn't matter that much and 2.4 being obsolete isn't very
> worthwhile to make it even better.

Hey, that's Arjan's stuff ;) Want to help get that into 2.6 ? ;)

cheers,

Rik
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/