Re: [RFC][PATCH 0/8] Critical Page Pool

From: Matthew Dobson
Date: Mon Nov 21 2005 - 00:38:18 EST

Next message: Matthew Dobson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Previous message: David S. Miller: "Re: [2.6 patch] mark virt_to_bus/bus_to_virt as __deprecated oni386"
In reply to: Paul Jackson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Next in thread: Matthew Dobson: "[RFC][PATCH 7/8] __cache_grow()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Appologies for the delay in responding to comments, but I have been en
route to the East Coast of the US to visit family.

Avi Kivity wrote:
> Matthew Dobson wrote:
>
>> Avi Kivity wrote:
>>
>>
>>> 1. If you have two subsystems which allocate critical pages, how do you
>>> protect against the condition where one subsystem allocates all the
>>> critical memory, causing the second to oom?
>>>
>>
>>
>> You don't. You make sure that you size the critical pool
>> appropriately for
>> your workload.
>>
>>
>>
> This may not be possible. What if subsystem A depends on subsystem B to
> do its work, both are critical, and subsystem A allocated all the memory
> reserve?
> If A and B have different allocation thresholds, the deadlock is avoided.
>
> At the very least you need a critical pool per subsystem.

As Paul suggested in his follow up to your mail, to even attempt a
"guarantee" that you won't still run out of memory, your subsystem does
need an upper bound on how much memory it could possibly need. If there is
NO upper limit, then the possibility of exhausting your critical pool is
very real.

>>> 2. There already exists a critical pool: ordinary allocations fail if
>>> free memory is below some limit, but special processes (kswapd) can
>>> allocate that memory by setting PF_MEMALLOC. Perhaps this should be
>>> extended, possibly with a per-process threshold.
>>>
>>
>>
>> The exception for threads with PF_MEMALLOC set is there because those
>> threads are essentially promising that if the kernel gives them memory,
>> they will use that memory to free up MORE memory. If we ignore that
>> promise, and (ab)use the PF_MEMALLOC flag to simply bypass the
>> zone_watermarks, we'll simply OOM faster, and potentially in situations
>> that could be avoided (ie: we steal memory that kswapd could have used to
>> free up more memory).
>>
>>
> Sure, but that's just an example of a critical subsystem.
>
> If we introduce yet another mechanism for critical memory allocation,
> we'll have a hard time making different subsystems, which use different
> critical allocation mechanisms, play well together.
>
> I propose that instead of a single watermark, there should be a
> watermark per critical subsystem. The watermarks would be arranged
> according to the dependency graph, with the depended-on services allowed
> to go the deepest into the reserves.
>
> (instead of PF_MEMALLOC have a tsk->memory_allocation_threshold, or
> similar. set it to 0 for kswapd, and for other systems according to taste)

Your idea is certainly an interesting approach to solving the problem. I'm
not sure it quite does what I'm looking for, but I'll have to think about
your idea some more to be sure. One problem is that networking doesn't
have a specific "task" associated with it, where we could set a
memory_allocation_threshold.

Thanks!

-Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Matthew Dobson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Previous message: David S. Miller: "Re: [2.6 patch] mark virt_to_bus/bus_to_virt as __deprecated oni386"
In reply to: Paul Jackson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Next in thread: Matthew Dobson: "[RFC][PATCH 7/8] __cache_grow()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]