RE: [PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat counter

From: PINTU KUMAR
Date: Mon Oct 12 2015 - 10:44:28 EST


Hi,

Sorry, I forgot to mention the V2 update.
I will highlight the V2 changes and RESEND.

> -----Original Message-----
> From: Pintu Kumar [mailto:pintu.k@xxxxxxxxxxx]
> Sent: Monday, October 12, 2015 7:03 PM
> To: akpm@xxxxxxxxxxxxxxxxxxxx; minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx;
> pintu.k@xxxxxxxxxxx; mhocko@xxxxxxx; koct9i@xxxxxxxxx;
> rientjes@xxxxxxxxxx; hannes@xxxxxxxxxxx; penguin-kernel@i-
> love.sakura.ne.jp; bywxiaobai@xxxxxxx; mgorman@xxxxxxx; vbabka@xxxxxxx;
> js1304@xxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx;
> alexander.h.duyck@xxxxxxxxxx; sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx;
> fengguang.wu@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx
> Cc: cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.ping@xxxxxxxxx;
> vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx; c.rajkumar@xxxxxxxxxxx;
> sreenathd@xxxxxxxxxxx
> Subject: [PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat counter
>
> This patch maintains the number of oom victims kill count in /proc/vmstat.
> Currently, we are dependent upon kernel logs when the kernel OOM occurs.
> But kernel OOM can went passed unnoticed by the developer as it can silently
> kill some background applications/services.
> In some small embedded system, it might be possible that OOM is captured in
> the logs but it was over-written due to ring-buffer.
> Thus this interface can quickly help the user in analyzing, whether there were
> any OOM kill happened in the past, or whether the system have ever entered
> the oom kill stage till date.
>
> Thus, it can be beneficial under following cases:
> 1. User can monitor kernel oom kill scenario without looking into the
> kernel logs.
> 2. It can help in tuning the watermark level in the system.
> 3. It can help in tuning the low memory killer behavior in user space.
> 4. It can be helpful on a logless system or if klogd logging
> (/var/log/messages) are disabled.
>
> A snapshot of the result of 3 days of over night test is shown below:
> System: ARM Cortex A7, 1GB RAM, 8GB EMMC
> Linux: 3.10.xx
> Category: reference smart phone device
> Loglevel: 7
> Conditions: Fully loaded, BT/WiFi/GPS ON
> Tests: auto launching of ~30+ apps using test scripts, in a loop for
> 3 days.
> At the end of tests, check:
> $ cat /proc/vmstat
> nr_oom_victims 6
>
> As we noticed, there were around 6 oom kill victims.
>
> The OOM is bad for any system. So, this counter can help in quickly tuning the
> OOM behavior of the system, without depending on the logs.
>
> Signed-off-by: Pintu Kumar <pintu.k@xxxxxxxxxxx>
> ---
> include/linux/vm_event_item.h | 1 +
> mm/oom_kill.c | 2 ++
> mm/page_alloc.c | 1 -
> mm/vmstat.c | 1 +
> 4 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 2b1cef8..dd2600d 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -57,6 +57,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN,
> PSWPOUT, #ifdef CONFIG_HUGETLB_PAGE
> HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, #endif
> + NR_OOM_VICTIMS,
> UNEVICTABLE_PGCULLED, /* culled to noreclaim list */
> UNEVICTABLE_PGSCANNED, /* scanned for reclaimability */
> UNEVICTABLE_PGRESCUED, /* rescued from noreclaim list */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 03b612b..802b8a1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -570,6 +570,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> * space under its control.
> */
> do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
> + count_vm_event(NR_OOM_VICTIMS);
> mark_oom_victim(victim);
> pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-
> rss:%lukB\n",
> task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
> @@ -600,6 +601,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> task_pid_nr(p), p->comm);
> task_unlock(p);
> do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
> + count_vm_event(NR_OOM_VICTIMS);
> }
> rcu_read_unlock();
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9bcfd70..fafb09d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2761,7 +2761,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned
> int order,
> schedule_timeout_uninterruptible(1);
> return NULL;
> }
> -
> /*
> * Go through the zonelist yet one more time, keep very high watermark
> * here, this is only to catch a parallel oom killing, we must fail if
diff --git
> a/mm/vmstat.c b/mm/vmstat.c index 1fd0886..8503a2e 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -808,6 +808,7 @@ const char * const vmstat_text[] = {
> "htlb_buddy_alloc_success",
> "htlb_buddy_alloc_fail",
> #endif
> + "nr_oom_victims",
> "unevictable_pgs_culled",
> "unevictable_pgs_scanned",
> "unevictable_pgs_rescued",
> --
> 1.7.9.5

Regards,
Pintu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/