Re: + memcg-oom-provide-more-info-while-memcg-oom-happening.patchadded to -mm tree

From: Michal Hocko
Date: Mon Jul 30 2012 - 09:49:14 EST


Hi and sorry for the late reply.

On Tue 24-07-12 13:33:03, Andrew Morton wrote:
>
> The patch titled
> Subject: memcg, oom: provide more info while memcg oom happening
> has been added to the -mm tree. Its filename is
> memcg-oom-provide-more-info-while-memcg-oom-happening.patch
>
> Before you just go and hit "reply", please:
> a) Consider who else should be cc'ed
> b) Prefer to cc a suitable mailing list as well
> c) Ideally: find the original patch on the mailing list and do a
> reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
>
> ------------------------------------------------------
> From: Sha Zhengju <handai.szj@xxxxxxxxxx>
> Subject: memcg, oom: provide more info while memcg oom happening
>
> When an memcg oom is happening the current memcg related dump information
> is limited for debugging. Provide more detailed memcg page statistics
> together with the total one while hierarchy is enabled.

I do agree that we need to print something more useful for memcg-oom
(who cares about the global state when the problem is per-memcg) but
this doesn't seem to be the best way to address it because it just adds
more to the OOM output and it doesn't prevent the global state being
printed out.
Please note that memcg oom can happen much more often than the global
oom so we should print only the important information.
Some more questions/notes below.

In short, I think the patch should be dropped.

> Signed-off-by: Sha Zhengju <handai.szj@xxxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxx>
> Cc: Greg Thelen <gthelen@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
> mm/memcontrol.c | 71 ++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 63 insertions(+), 8 deletions(-)
>
> diff -puN mm/memcontrol.c~memcg-oom-provide-more-info-while-memcg-oom-happening mm/memcontrol.c
> --- a/mm/memcontrol.c~memcg-oom-provide-more-info-while-memcg-oom-happening
> +++ a/mm/memcontrol.c
> @@ -113,6 +113,14 @@ static const char * const mem_cgroup_eve
> "pgmajfault",
> };
>
> +static const char * const mem_cgroup_lru_names[] = {
> + "inactive_anon",
> + "active_anon",
> + "inactive_file",
> + "active_file",
> + "unevictable",
> +};
> +
> /*
> * Per memcg event counter is incremented at every pagein/pageout. With THP,
> * it will be incremated by the number of pages. This counter is used for
> @@ -1372,6 +1380,59 @@ static void move_unlock_mem_cgroup(struc
> spin_unlock_irqrestore(&memcg->move_lock, *flags);
> }
>
> +#define K(x) ((x) << (PAGE_SHIFT-10))
> +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
> +{
> + int i;
> + struct mem_cgroup *mi;

We tend to call this iter in the context you are using it. The
declaration can be moved to the appropriate for loop as well.

> +
> + printk(KERN_INFO "Memory cgroup stat:\n");
> + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> + if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> + continue;
> + printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],
> + K(mem_cgroup_read_stat(memcg, i)));
> + }
> +
> + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> + printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
> + mem_cgroup_read_events(memcg, i));
> +
> + for (i = 0; i < NR_LRU_LISTS; i++)
> + printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
> + K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));

Does it really make sense to print this separately? Which additional
information does this give to you?
OOM acts on a hierarchy in general (with a single tree node if
use_hierarchy==0) so showing the root statistics doesn't sound like very
much useful to me.

> +
> + /* Dump the total statistics if hierarchy is enabled. */
> + for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> + long long val = 0;
> +
> + if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> + continue;
> + for_each_mem_cgroup_tree(mi, memcg)
> + val += mem_cgroup_read_stat(mi, i);
> + printk(KERN_CONT "total_%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
> + }
> +
> + for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
> + unsigned long long val = 0;
> +
> + for_each_mem_cgroup_tree(mi, memcg)
> + val += mem_cgroup_read_events(mi, i);
> + printk(KERN_CONT "total_%s:%llu ", mem_cgroup_events_names[i], val);
> + }
> +
> + for (i = 0; i < NR_LRU_LISTS; i++) {
> + unsigned long long val = 0;
> +
> + for_each_mem_cgroup_tree(mi, memcg)
> + val += mem_cgroup_nr_lru_pages(mi, BIT(i));
> + printk(KERN_CONT "total_%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
> + }
> +
> + printk(KERN_CONT "\n");
> +
> +}
> +
> /**
> * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode.
> * @memcg: The memory cgroup that went over limit
> @@ -1436,6 +1497,8 @@ done:
> res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10,
> res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10,
> res_counter_read_u64(&memcg->memsw, RES_FAILCNT));
> +
> + mem_cgroup_print_oom_stat(memcg);
> }
>
> /*
> @@ -4129,14 +4192,6 @@ static int memcg_numa_stat_show(struct c
> }
> #endif /* CONFIG_NUMA */
>
> -static const char * const mem_cgroup_lru_names[] = {
> - "inactive_anon",
> - "active_anon",
> - "inactive_file",
> - "active_file",
> - "unevictable",
> -};
> -
> static inline void mem_cgroup_lru_names_not_uptodate(void)
> {
> BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
> _
> Subject: Subject: memcg, oom: provide more info while memcg oom happening
>
> Patches currently in -mm which might be from handai.szj@xxxxxxxxxx are
>
> mm-oom-introduce-helper-function-to-process-threads-during-scan.patch
> mm-memcg-introduce-own-oom-handler-to-iterate-only-over-its-own-threads.patch
> memcg-oom-provide-more-info-while-memcg-oom-happening.patch
> memcg-oom-clarify-some-oom-dump-messages.patch
>

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/