Re: [RFC PATCH 0/3] Add some trace events for the page allocator

From: Frederic Weisbecker
Date: Tue Jul 28 2009 - 18:48:29 EST


On Tue, Jul 28, 2009 at 11:23:36PM +0100, Mel Gorman wrote:
> The following three patches add some trace events for the page allocator under
> the heading of kmem (should there be a pagealloc heading instead?). Testing
> under qemu seems to show up reasonable results but this is a prototype for
> comment that hasn't been very heavily tested. I was able to find at least
> one anomaly looking a the output in relation to anti-fragmentation which
> I'm still thinking about so minimally, it was useful for that but I've made
> an attempt to justify each of the events added.
>
> The patches are as follows
>
> Patch 1 adds events for plain old allocate and freeing of pages
> Patch 2 gives information useful for analysing fragmentation avoidance
> Patch 3 tracks pages going to and from the buddy lists as an indirect
> indication of zone lock hotness
>
> The first one could be used as an indicator as to whether the workload was
> heavily dependant on the page allocator or not. You can make a guess based
> on vmstat but you can't get a per-process breakdown. I did have trouble with
> the call-site portion of the allocation. Depending on the path, you might
> just get the address of __get_free_pages() instead of a useful callsite. I
> didn't see a nice way to always report a "useful" call_site.
>
> The second patch would mainly be useful for users of hugepages and
> particularly dynamic hugepage pool resizing as it could be used to tune
> min_free_kbytes to a level that fragmentation was rarely a problem. My
> main concern is that maybe I'm trying to jam too much into the TP_printk
> that could be extrapolated after the fact if you were familiar with the
> implementation. I couldn't determine if it was best to hold the hand of
> the administrator even if it cost more to figure it out.
>
> The last patch is trickier to draw conclusions from but high activity on
> those events could explain why there were a large number of cache misses
> on a page-allocator-intensive workload. The coalescing and splitting of
> buddies involves a lot of writing of page metadata and cache line bounces
> not to mention the acquisition of an interrupt-safe lock necessary to enter
> this path. One problem is that one function traced is likely to change its
> name in the future. When that happens, the trace event will be replaced
> with something similar, but not identical. I've been told this is probably
> ok but there has been whinging in the past about whether debugfs represents
> an ABI or not.
>
> This is the first time I've looked at adding trace events so apologies
> for any obvious mistakes made as I haven't been keeping a close eye on all
> the tracing discussions describing How Things Should Be Done. checkpatch
> throws major wobblies about this patchset, but it's consistent with the
> style of other events so I ignored it. The "To:" list is based taken from
> another tracepoint mail, if there is a specific list I should have used,
> feel free to slap with clue stick. All comments indicating whether this is
> generally useful and how it might be improved are welcome.


(Adding some other tracing + slab allocator/kmemtrace people in Cc)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/