Re: [PATCH]: Clean up of __alloc_pages

From: Martin J. Bligh
Date: Tue Oct 04 2005 - 11:11:09 EST




--Ray Bryant <raybry@xxxxxxxxxxxxxxxxx> wrote (on Tuesday, October 04, 2005 11:26:52 -0500):

> On Tuesday 04 October 2005 08:27, Andi Kleen wrote:
>> Rohit Seth <rohit.seth@xxxxxxxxx> writes:
>> > I think conceptually this ask for a new flag __GFP_NODEONLY that
>> > indicate allocations to come from current node only.
>> >
>> > This definitely though means I will need to separate out the allocation
>> > from pcp patch (as Nick suggested earlier).
>>
>> This reminds me - the current logic is currently a bit suboptimal on
>> many NUMA systems. Often it would be better to be a bit more
>> aggressive at freeing memory (maybe do a very low overhead light try to
>> free pages) in the first node before falling back to other nodes. What
>> right now happens is that when you have even minor memory pressure
>> because e.g. you node is filled up with disk cache the local memory
>> affinity doesn't work too well anymore.
>>
>> -Andi
>>
> That's exactly what Martin Hick's additions to __alloc_pages() were trying to
> achieve. However, we've never figured out how to make the "very low
> overhead light try to free pages" thing work with low enough overhead that it
> can be left on all of the time. As soon as we make this the least bit more
> expensive, then this hurts those workloads (file servers being one example)
> who don't care about local, but who need the fastest possible allocations.
>
> This problem is often a showstopper on larger NUMA systems, at least for HPC
> type applications, where the inability to guarantee local storage allocation
> when it is requested can make the application run significantly slower.

Can we not do some migration / more targeted pressure balancing in kswapd?
Ie if we had a constant measure of per-node pressure, we could notice an
imbalance, and start migrating the least recently used pages from the node
under most pressure to the node under least ...

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/