[patch 0/9] mm: memcontrol: naturalize charge lifetime

From: Johannes Weiner
Date: Wed Apr 30 2014 - 16:26:00 EST


Hi,

these patches rework memcg charge lifetime to integrate more naturally
with the lifetime of user pages. This drastically simplifies the code
and reduces charging and uncharging overhead. The most expensive part
of charging and uncharging is the page_cgroup bit spinlock, which is
removed entirely after this series.

Here are the top-10 profile entries of a stress test that reads a 128G
sparse file on a freshly booted box, without a dedicated cgroup
(i.e. executing in the root memcg). Before:

15.36% cat [kernel.kallsyms] [k] copy_user_generic_string
13.31% cat [kernel.kallsyms] [k] memset
11.48% cat [kernel.kallsyms] [k] do_mpage_readpage
4.23% cat [kernel.kallsyms] [k] get_page_from_freelist
2.38% cat [kernel.kallsyms] [k] put_page
2.32% cat [kernel.kallsyms] [k] __mem_cgroup_commit_charge
2.18% kswapd0 [kernel.kallsyms] [k] __mem_cgroup_uncharge_common
1.92% kswapd0 [kernel.kallsyms] [k] shrink_page_list
1.86% cat [kernel.kallsyms] [k] __radix_tree_lookup
1.62% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn

And after:

15.67% cat [kernel.kallsyms] [k] copy_user_generic_string
13.48% cat [kernel.kallsyms] [k] memset
11.42% cat [kernel.kallsyms] [k] do_mpage_readpage
3.98% cat [kernel.kallsyms] [k] get_page_from_freelist
2.46% cat [kernel.kallsyms] [k] put_page
2.13% kswapd0 [kernel.kallsyms] [k] shrink_page_list
1.88% cat [kernel.kallsyms] [k] __radix_tree_lookup
1.67% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn
1.39% kswapd0 [kernel.kallsyms] [k] free_pcppages_bulk
1.30% cat [kernel.kallsyms] [k] kfree

The code also survived some prolonged stress testing with a swapping
workload being moved continuously between memcgs.

My apologies in advance for the reviewability. I tried to split out
the rewrite into more steps, but had to declare the current code as
unsalvagaeble after it took me more than a day to convince myself how
the swap accounting works. It's probably easiest to read this as
newly written code.

Documentation/cgroups/memcg_test.txt | 160 +--
include/linux/memcontrol.h | 94 +-
include/linux/page_cgroup.h | 43 +-
include/linux/swap.h | 15 +-
kernel/events/uprobes.c | 1 +
mm/filemap.c | 13 +-
mm/huge_memory.c | 51 +-
mm/memcontrol.c | 1724 ++++++++++++--------------------
mm/memory.c | 41 +-
mm/migrate.c | 46 +-
mm/rmap.c | 6 -
mm/shmem.c | 28 +-
mm/swap.c | 22 +
mm/swap_state.c | 8 +-
mm/swapfile.c | 21 +-
mm/truncate.c | 1 -
mm/vmscan.c | 9 +-
mm/zswap.c | 2 +-
18 files changed, 833 insertions(+), 1452 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/