Re: regression caused by cgroups optimization in 3.17-rc2

From: Johannes Weiner
Date: Tue Sep 02 2014 - 18:18:40 EST


Hi Dave,

On Tue, Sep 02, 2014 at 12:05:41PM -0700, Dave Hansen wrote:
> I'm seeing a pretty large regression in 3.17-rc2 vs 3.16 coming from the
> memory cgroups code. This is on a kernel with cgroups enabled at
> compile time, but not _used_ for anything. See the green lines in the
> graph:
>
> https://www.sr71.net/~dave/intel/regression-from-05b843012.png
>
> The workload is a little parallel microbenchmark doing page faults:

Ouch.

> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault2.c
>
> The hardware is an 8-socket Westmere box with 160 hardware threads. For
> some reason, this does not affect the version of the microbenchmark
> which is doing completely anonymous page faults.
>
> I bisected it down to this commit:
>
> > commit 05b8430123359886ef6a4146fba384e30d771b3f
> > Author: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Date: Wed Aug 6 16:05:59 2014 -0700
> >
> > mm: memcontrol: use root_mem_cgroup res_counter
> >
> > Due to an old optimization to keep expensive res_counter changes at a
> > minimum, the root_mem_cgroup res_counter is never charged; there is no
> > limit at that level anyway, and any statistics can be generated on
> > demand by summing up the counters of all other cgroups.
> >
> > However, with per-cpu charge caches, res_counter operations do not even
> > show up in profiles anymore, so this optimization is no longer
> > necessary.
> >
> > Remove it to simplify the code.

Accounting new pages is buffered through per-cpu caches, but taking
them off the counters on free is not, so I'm guessing that above a
certain allocation rate the cost of locking and changing the counters
takes over. Is there a chance you could profile this to see if locks
and res_counter-related operations show up?

I can't reproduce this complete breakdown on my smaller test gear, but
I do see an improvement with the following patch:

---