Re: regression caused by cgroups optimization in 3.17-rc2

From: Johannes Weiner
Date: Thu Sep 04 2014 - 11:09:09 EST


On Tue, Sep 02, 2014 at 05:30:38PM -0700, Dave Hansen wrote:
> On 09/02/2014 05:10 PM, Johannes Weiner wrote:
> > On Tue, Sep 02, 2014 at 03:36:29PM -0700, Dave Hansen wrote:
> >> On 09/02/2014 03:18 PM, Johannes Weiner wrote:
> >>> Accounting new pages is buffered through per-cpu caches, but taking
> >>> them off the counters on free is not, so I'm guessing that above a
> >>> certain allocation rate the cost of locking and changing the counters
> >>> takes over. Is there a chance you could profile this to see if locks
> >>> and res_counter-related operations show up?
> >>
> >> It looks pretty much the same, although it might have equalized the
> >> charge and uncharge sides a bit. Full 'perf top' output attached.
> >
> > That looks like a partial profile, where did the page allocator, page
> > zeroing etc. go? Because the distribution among these listed symbols
> > doesn't seem all that crazy:
>
> Perf was only outputting the top 20 functions. Believe it or not, page
> zeroing and the rest of the allocator path wasn't even in the path of
> the top 20 functions because there is so much lock contention.
>
> Here's a longer run of 'perf top' along with the top 100 functions:
>
> http://www.sr71.net/~dave/intel/perf-top-1409702817.txt.gz
>
> you can at least see copy_page_rep in there.

Thanks for the clarification, that is truly horrible. Does the
following revert restore performance in your case?

---