Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

From: Christoph Lameter
Date: Wed Aug 15 2007 - 16:30:24 EST


On Wed, 15 Aug 2007, Peter Zijlstra wrote:

> Christoph's suggestion to set min_free_kbytes to 20% is ridiculous - nor
> does it solve all deadlocks :-(

Only if min_free_kbytes is really the mininum number of free pages and not
the mininum number of clean pages as I suggested.

All deadlocks? There are numerous ones that can come about for different
reasons. Which ones are we talking about?

> RX
> - we basically need infinite memory to receive the network reply
> to complete writeout. Consider the following scenario:

There is no infinite memory. At some point you need to bound the amount
of memory that the network allocates.

> - so we need a threshold of some sorts to start tossing non-critical
> network packets away. (because the consumer of these packets may be
> the one swapping and is therefore frozen)

Right.

> <> What Christoph is proposing is doing recursive reclaim and not
> initiating writeout. This will only work _IFF_ there are clean pages
> about. Which in the general case need not be true (memory might be
> packed with anonymous pages - consider an MPI cluster doing computation
> stuff). So this gets us a workload dependant solution - which IMHO is
> bad!

In the general case this is true even for an MPI job because the MPI job
needs to have executable code and libraries in memory. At mininum these
are reclaimable.

> Also his suggestion to crank up min_free_kbytes to 20% of machine memory
> is not workable (again imagine this MPI cluster loosing 20% of its
> collective memory, very much out of the question).

It is workable. If you crank the min_clean_pages (this is essentially
what it is) up to 20% then you basically reserve 20% of your memory for
executable pages and page cache pages. And in an emergency these can be
reclaimed to resolve any OOM issues. Note that my patch only accesses
these reserves when we would otherwise OOM. This is rare.

> Nor does that solve the TCP deadlock, you need some additional condition
> to break that.

But that is an issue that is better handled in the network stack.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/