[RFC PATCH 0/3] Add some trace events for the page allocator

From: Mel Gorman
Date: Tue Jul 28 2009 - 18:24:13 EST


The following three patches add some trace events for the page allocator under
the heading of kmem (should there be a pagealloc heading instead?). Testing
under qemu seems to show up reasonable results but this is a prototype for
comment that hasn't been very heavily tested. I was able to find at least
one anomaly looking a the output in relation to anti-fragmentation which
I'm still thinking about so minimally, it was useful for that but I've made
an attempt to justify each of the events added.

The patches are as follows

Patch 1 adds events for plain old allocate and freeing of pages
Patch 2 gives information useful for analysing fragmentation avoidance
Patch 3 tracks pages going to and from the buddy lists as an indirect
indication of zone lock hotness

The first one could be used as an indicator as to whether the workload was
heavily dependant on the page allocator or not. You can make a guess based
on vmstat but you can't get a per-process breakdown. I did have trouble with
the call-site portion of the allocation. Depending on the path, you might
just get the address of __get_free_pages() instead of a useful callsite. I
didn't see a nice way to always report a "useful" call_site.

The second patch would mainly be useful for users of hugepages and
particularly dynamic hugepage pool resizing as it could be used to tune
min_free_kbytes to a level that fragmentation was rarely a problem. My
main concern is that maybe I'm trying to jam too much into the TP_printk
that could be extrapolated after the fact if you were familiar with the
implementation. I couldn't determine if it was best to hold the hand of
the administrator even if it cost more to figure it out.

The last patch is trickier to draw conclusions from but high activity on
those events could explain why there were a large number of cache misses
on a page-allocator-intensive workload. The coalescing and splitting of
buddies involves a lot of writing of page metadata and cache line bounces
not to mention the acquisition of an interrupt-safe lock necessary to enter
this path. One problem is that one function traced is likely to change its
name in the future. When that happens, the trace event will be replaced
with something similar, but not identical. I've been told this is probably
ok but there has been whinging in the past about whether debugfs represents
an ABI or not.

This is the first time I've looked at adding trace events so apologies
for any obvious mistakes made as I haven't been keeping a close eye on all
the tracing discussions describing How Things Should Be Done. checkpatch
throws major wobblies about this patchset, but it's consistent with the
style of other events so I ignored it. The "To:" list is based taken from
another tracepoint mail, if there is a specific list I should have used,
feel free to slap with clue stick. All comments indicating whether this is
generally useful and how it might be improved are welcome.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/