Re: [PATCH 02/15] mm: sl[au]b: Add knowledge of PFMEMALLOC reservepages

From: Mel Gorman
Date: Wed Feb 08 2012 - 16:23:27 EST


On Wed, Feb 08, 2012 at 01:49:05PM -0600, Christoph Lameter wrote:
> On Wed, 8 Feb 2012, Mel Gorman wrote:
>
> > Ok, I looked into what is necessary to replace these with checking a page
> > flag and the cost shifts quite a bit and ends up being more expensive.
>
> That is only true if you go the slab route.

Well, yes but both slab and slub have to be supported. I see no reason
why I would choose to make this a slab-only or slub-only feature. Slob is
not supported because it's not expected that a platform using slob is also
going to use network-based swap.

> Slab suffers from not having
> the page struct pointer readily available. The changes are likely already
> impacting slab performance without the virt_to_page patch.
>

The performance impact only comes into play when swap is on a network
device and pfmemalloc reserves are in use. The rest of the time the check
on ac avoids all the cost and there is a micro-optimisation later to avoid
calling a function (patch 12).

> > In slub, it's sufficient to check kmem_cache_cpu to know whether the
> > objects in the list are pfmemalloc or not.
>
> We try to minimize the size of kmem_cache_cpu. The page pointer is readily
> available. We just removed the node field from kmem_cache_cpu because it
> was less expensive to get the node number from the struct page field.
>
> The same is certainly true for a PFMEMALLOC flag.
>

Ok, are you asking that I use the page flag for slub and leave kmem_cache_cpu
alone in the slub case? I can certainly check it out if that's what you
are asking for.

> > Yeah, you're right on the button there. I did my checking assuming that
> > PG_active+PG_slab were safe to use. The following is an untested patch that
> > I probably got details wrong in but it illustrates where virt_to_page()
> > starts cropping up.
>
> Yes you need to come up with a way to not use virt_to_page otherwise slab
> performance is significantly impacted.

I did come up with a way: the necessary information is in ac and slabp
on slab :/ . There are not exactly many ways that the information can
be recorded.

> On NUMA we are already doing a page struct lookup on free in slab.
> If you would save the page struct pointer
> there and reuse it then you would not have an issue at least on free.
>

That information is only available on NUMA and only when there is more than
one node. Having cache_free_alien return the page for passing to ac_put_obj()
would also be ugly. The biggest downfall by far is that single-node machines
incur the cost of virt_to_page() where they did not have to before. This
is not a solution and it is not better than the current simply check on
a struct field.

> You still would need to determine which "struct slab" pointer is in use
> which will also require similar lookups in varous places.
>
> Transfer of the pfmemalloc flags (guess you must have a pfmemalloc
> field in struct slab then) in slab is best be done when allocating and
> freeing a slab page from the page allocator.
>

The page->pfmemalloc is already been transferred to the slab in
cache_grow.

> I think its rather trivial to add the support you want in a non intrusive
> way to slub. Slab would require some more thought and discussion.
>

I'm slightly confused by this sentence. Support for slub is already in the
patch and as you say, it's fairly straight-forward. Supporting a page flag
and leaving kmem_cache_cpu alone may also be easier as kmem_cache_cpu->page
can be used instead of a kmem_cache_cpu->pfmemalloc field.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/