Re: [PATCH 0/9] Per-cgroup /proc/stat

From: Paul Turner
Date: Mon Sep 19 2011 - 19:07:39 EST

Next message: Eric Dumazet: "Re: [PATCH 1/2] posix-timers: move global timer id management tosignal_struct v4"
Previous message: H Hartley Sweeten: "RE: [PATCH] arch/x86/mm/numa.c: quiet sparse noise whenCONFIG_X86_64 is not set"
Next in thread: Peter Zijlstra: "Re: [PATCH 0/9] Per-cgroup /proc/stat"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 09/15/11 01:56, Peter Zijlstra wrote:

On Wed, 2011-09-14 at 13:23 -0700, Andi Kleen wrote:
Peter Zijlstra<a.p.zijlstra@xxxxxxxxx> writes:

Guys we should seriously trim back a lot of that code, not grow ever
more and more. The sad fact is that if you build a kernel with
cpu-cgroup support the context switch cost is more than double that of a
kernel without, and then you haven't even started creating cgroups yet.

That sounds indeed quite bad. Is it known why it is so costly?

Mostly because all data structures grow and all code paths grow, some by
quite a bit, its spread all over the place, lots of little cuts etc..

pjt and I tried trimming some of the code paths with static_branch() but
didn't really get anywhere.. need to get back to looking at this stuff
sometime soon.

When I get some time I think I'm just going to post a patch[*] that merges the useful _field_ (usage, usage_percpu) from cpuacct into cpu since we are *already* doing the accounting on the entity level making this addition free.

At that point we could !CONFIG_CGROUP_CPUACCT by default and deprecate the beast without breaking ABI for those who really need it (either because their applications have hard-coded paths or because they really like cgroup user/sys time -- which we COULD duplicate into cpu but I'm inclined not to).

[*]: the only real caveat is how loudly people scream about the code duplication; I think it's worth it if it let's us kill cpuacct in the long run.

Another unrelated optimization on this path I have sitting around in patches/ to push at some point is keeping the left-most entity out of tree; since the worst case is an entity with a lower-vruntime comes along and we insert the previous left-most and the best case is we get to pick it without futzing with the rb-tree. I think this was good for a percent or two when I hacked it together before.

Another idea I have kicking around for this path is the introduction of a link_entity which bridges over nr_running=1 chains (break it opportunistically when an element in the chain goes to nr_running=2). This one requires some pretty careful accounting around the breaking of a chain though so I'm not touching it until I get the new load tracking code out. (Incidentally when I benchmarked it before LPC I had it working out to be a little more efficient than the current math good for ~2-3% on pipe_test.)

- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Eric Dumazet: "Re: [PATCH 1/2] posix-timers: move global timer id management tosignal_struct v4"
Previous message: H Hartley Sweeten: "RE: [PATCH] arch/x86/mm/numa.c: quiet sparse noise whenCONFIG_X86_64 is not set"
Next in thread: Peter Zijlstra: "Re: [PATCH 0/9] Per-cgroup /proc/stat"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]