Re: [PATCH]: Clean up of __alloc_pages

From: Ray Bryant
Date: Tue Oct 04 2005 - 11:05:41 EST


On Tuesday 04 October 2005 08:27, Andi Kleen wrote:
> Rohit Seth <rohit.seth@xxxxxxxxx> writes:
> > I think conceptually this ask for a new flag __GFP_NODEONLY that
> > indicate allocations to come from current node only.
> >
> > This definitely though means I will need to separate out the allocation
> > from pcp patch (as Nick suggested earlier).
>
> This reminds me - the current logic is currently a bit suboptimal on
> many NUMA systems. Often it would be better to be a bit more
> aggressive at freeing memory (maybe do a very low overhead light try to
> free pages) in the first node before falling back to other nodes. What
> right now happens is that when you have even minor memory pressure
> because e.g. you node is filled up with disk cache the local memory
> affinity doesn't work too well anymore.
>
> -Andi
>
That's exactly what Martin Hick's additions to __alloc_pages() were trying to
achieve. However, we've never figured out how to make the "very low
overhead light try to free pages" thing work with low enough overhead that it
can be left on all of the time. As soon as we make this the least bit more
expensive, then this hurts those workloads (file servers being one example)
who don't care about local, but who need the fastest possible allocations.

This problem is often a showstopper on larger NUMA systems, at least for HPC
type applications, where the inability to guarantee local storage allocation
when it is requested can make the application run significantly slower.
--
Ray Bryant
AMD Performance Labs Austin, Tx
512-602-0038 (o) 512-507-7807 (c)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/