Re: regression caused by cgroups optimization in 3.17-rc2

From: Michal Hocko
Date: Wed Sep 10 2014 - 12:29:47 EST


On Fri 05-09-14 11:25:37, Michal Hocko wrote:
> On Thu 04-09-14 13:27:26, Dave Hansen wrote:
> > On 09/04/2014 07:27 AM, Michal Hocko wrote:
> > > Ouch. free_pages_and_swap_cache completely kills the uncharge batching
> > > because it reduces it to PAGEVEC_SIZE batches.
> > >
> > > I think we really do not need PAGEVEC_SIZE batching anymore. We are
> > > already batching on tlb_gather layer. That one is limited so I think
> > > the below should be safe but I have to think about this some more. There
> > > is a risk of prolonged lru_lock wait times but the number of pages is
> > > limited to 10k and the heavy work is done outside of the lock. If this
> > > is really a problem then we can tear LRU part and the actual
> > > freeing/uncharging into a separate functions in this path.
> > >
> > > Could you test with this half baked patch, please? I didn't get to test
> > > it myself unfortunately.
> >
> > 3.16 settled out at about 11.5M faults/sec before the regression. This
> > patch gets it back up to about 10.5M, which is good.
>
> Dave, would you be willing to test the following patch as well? I do not
> have a huge machine at hand right now. It would be great if you could

I was playing with 48CPU with 32G of RAM machine but the res_counter
lock didn't show up in the traces much (this was with 96 processes doing
mmap (256M private file, faul, unmap in parallel):
|--0.75%-- __res_counter_charge
| res_counter_charge
| try_charge
| mem_cgroup_try_charge
| |
| |--81.56%-- do_cow_fault
| | handle_mm_fault
| | __do_page_fault
| | do_page_fault
| | page_fault
[...]
| |
| --18.44%-- __add_to_page_cache_locked
| add_to_page_cache_lru
| mpage_readpages
| ext4_readpages
| __do_page_cache_readahead
| ondemand_readahead
| page_cache_async_readahead
| filemap_fault
| __do_fault
| do_cow_fault
| handle_mm_fault
| __do_page_fault
| do_page_fault
| page_fault

Nothing really changed in that regards when I reduced mmap size to 128M
and run with 4*CPUs.

I do not have a bigger machine to play with unfortunately. I think the
patch makes sense on its own. I would really appreciate if you could
give it a try on your machine with !root memcg case to see how much it
helped. I would expect similar results to your previous testing without
the revert and Johannes' patch.

[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/