Re: [RFC PATCH 00/19] Cleanup and optimise the page allocator V2

From: Nick Piggin
Date: Mon Mar 02 2009 - 06:39:53 EST


On Mon, Mar 02, 2009 at 11:21:22AM +0000, Mel Gorman wrote:
> (Added Ingo as a second scheduler guy as there are queries on tg_shares_up)
>
> On Fri, Feb 27, 2009 at 04:44:43PM +0800, Lin Ming wrote:
> > On Thu, 2009-02-26 at 19:22 +0800, Mel Gorman wrote:
> > > In that case, Lin, could I also get the profiles for UDP-U-4K please so I
> > > can see how time is being spent and why it might have gotten worse?
> >
> > I have done the profiling (oltp and UDP-U-4K) with and without your v2
> > patches applied to 2.6.29-rc6.
> > I also enabled CONFIG_DEBUG_INFO so you can translate address to source
> > line with addr2line.
> >
> > You can download the oprofile data and vmlinux from below link,
> > http://www.filefactory.com/file/af2330b/
> >
>
> Perfect, thanks a lot for profiling this. It is a big help in figuring out
> how the allocator is actually being used for your workloads.
>
> The OLTP results had the following things to say about the page allocator.

Is this OLTP, or UDP-U-4K?


> Samples in the free path
> vanilla: 6207
> mg-v2: 4911
> Samples in the allocation path
> vanilla 19948
> mg-v2: 14238
>
> This is based on glancing at the following graphs and not counting the VM
> counters as it can't be determined which samples are due to the allocator
> and which are due to the rest of the VM accounting.
>
> http://www.csn.ul.ie/~mel/postings/lin-20090228/free_pages-vanilla-oltp.png
> http://www.csn.ul.ie/~mel/postings/lin-20090228/free_pages-mgv2-oltp.png
>
> So the path costs are reduced in both cases. Whatever caused the regression
> there doesn't appear to be in time spent in the allocator but due to
> something else I haven't imagined yet. Other oddness
>
> o According to the profile, something like 45% of time is spent entering
> the __alloc_pages_nodemask() function. Function entry costs but not
> that much. Another significant part appears to be in checking a simple
> mask. That doesn't make much sense to me so I don't know what to do with
> that information yet.
>
> o In get_page_from_freelist(), 9% of the time is spent deleting a page
> from the freelist.
>
> Neither of these make sense, we're not spending time where I would expect
> to at all. One of two things are happening. Something like cache misses or
> bounces are dominating for some reason that is specific to this machine. Cache
> misses are one possibility that I'll check out. The other is that the sample
> rate is too low and the profile counts are hence misleading.
>
> Question 1: Would it be possible to increase the sample rate and track cache
> misses as well please?

If the events are constantly biased, I don't think sample rate will
help. I don't know how the internals of profiling counters work exactly,
but you would expect yes cache misses, and stalls from any number of
different resources could put results in funny places.

Intel's OLTP workload is very sensitive to cacheline footprint of the
kernel, and if you touch some extra cachelines at point A, it can just
result in profile hits getting distributed all over the place. Profiling
cache misses might help, but probably see a similar phenomenon.

I can't remember, does your latest patchset include any patches that change
the possible order in which pages move around? Or is it just made up of
straight-line performance improvement of existing implementation?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/