Re: [PATCH -mm] Kill existing current task quickly

From: Minchan Kim
Date: Wed Feb 17 2010 - 01:26:58 EST


On Wed, Feb 17, 2010 at 7:03 AM, David Rientjes <rientjes@xxxxxxxxxx> wrote:
> On Wed, 17 Feb 2010, Minchan Kim wrote:
>
>> If we found current task is existing but didn't set TIF_MEMDIE
>> during OOM victim selection, let's stop unnecessary looping for
>> getting high badness score task and go ahead for killing current.
>>
>> This patch would make side effect skip OOM_DISABLE test.
>> But It's okay since the task is existing and oom_kill_process
>> doesn't show any killing message since __oom_kill_task will
>> interrupt it in oom_kill_process.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx>
>> Cc: Nick Piggin <npiggin@xxxxxxx>
>> ---
>> Âmm/oom_kill.c | Â Â1 +
>> Â1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 3618be3..5c21398 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -295,6 +295,7 @@ static struct task_struct
>> *select_bad_process(unsigned long *ppoints,
>>
>> Â Â Â Â Â Â Â Â Â Â Â chosen = p;
>> Â Â Â Â Â Â Â Â Â Â Â *ppoints = ULONG_MAX;
>> + Â Â Â Â Â Â Â Â Â Â break;
>> Â Â Â Â Â Â Â }
>>
>> Â Â Â Â Â Â Â if (p->signal->oom_adj == OOM_DISABLE)
>
> No, we don't want to break because there may be other candidate tasks that
> have TIF_MEMDIE set that will be detected if we keep scanning. ÂReturning
> ERR_PTR(-1UL) from select_bad_process() has a special meaning: it means we
> return to the page allocator without doing anything. ÂWe don't want more
> than one candidate task to ever have TIF_MEMDIE at a time, otherwise they
> can deplete all memory reserves and not make any forward progress. ÂSo we
> always have to iterate the entire tasklist unless we find an already oom
> killed task with access to memory reserves (to prevent needlessly killing
> additional tasks before the first had a chance to exit and free its
> memory) or a different candidate task is exiting so we'll be freeing
> memory shortly (or it will be invoking the oom killer itself as current
> and then get chosen as the victim).
>

Thanks you very much for the kind explanation, David.

How about this?

I can't use smtp port. so at a loss, I have to send patch by attachment
in webmail which would mangle my tab space.
Pz, forgive my wrong behavior.
If God help me, below inlined patch isn't mangled. :)

I think exit_mm's new test about TIF_MEMDIE isn't big overhead.
That's because is is on cache line due to exit_signal's pending check.
And expand task_lock in __oom_kill_task isn't big, I think. That's because
__oom_kill_task isn't called frequently.

-- CUT_HERE --

Date: Wed, 17 Feb 2010 23:59:40 +0900
Subject: [PATCH] [PATCH -mm] Kill existing current task quickly

If we found current task is existing but didn't set TIF_MEMDIE
during OOM victim selection, let's stop unnecessary
looping for getting high badness score task and go ahead
for killing current.

For forkbomb scenarion, there are many processes on system
so tasklist scanning time might be much big. Sometime we used to
use oom_kill_allocating_task to avoid expensive
task list scanning time.

For it, we have to make sure there are not TIF_MEMDIE's
another tasks. This patch introduces nr_memdie
for counting TIF_MEMDIE's tasks.

This patch would make side effect skip OOM_DISABLE test.
But It's okay since the task is existing and oom_kill_process
doesn't show any killing message since __oom_kill_task will
interrupt it in oom_kill_process.

Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Nick Piggin <npiggin@xxxxxxx>
---
include/linux/oom.h | 2 ++
kernel/exit.c | 3 +++
mm/oom_kill.c | 12 +++++++++---
3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 505aa9e..9babcce 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -48,5 +48,7 @@ extern int sysctl_panic_on_oom;
extern int sysctl_oom_kill_allocating_task;
extern int sysctl_oom_dump_tasks;

+extern unsigned int nr_memdie;
+
#endif /* __KERNEL__*/
#endif /* _INCLUDE_LINUX_OOM_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index c5305fc..932c67b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -50,6 +50,7 @@
#include <linux/perf_event.h>
#include <trace/events/sched.h>
#include <linux/hw_breakpoint.h>
+#include <linux/oom.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -685,6 +686,8 @@ static void exit_mm(struct task_struct * tsk)
/* more a memory barrier than a real lock */
task_lock(tsk);
tsk->mm = NULL;
+ if (test_thread_flag(TIF_MEMDIE))
+ nr_memdie--;
up_read(&mm->mmap_sem);
enter_lazy_tlb(mm, current);
/* We don't want this task to be frozen prematurely */
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3618be3..d5e3d70 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -32,6 +32,8 @@ int sysctl_panic_on_oom;
int sysctl_oom_kill_allocating_task;
int sysctl_oom_dump_tasks;
static DEFINE_SPINLOCK(zone_scan_lock);
+
+unsigned int nr_memdie; /* count of TIF_MEMDIE processes */
/* #define DEBUG */

/*
@@ -295,6 +297,8 @@ static struct task_struct
*select_bad_process(unsigned long *ppoints,

chosen = p;
*ppoints = ULONG_MAX;
+ if (nr_memdie == 0)
+ break;
}

if (p->signal->oom_adj == OOM_DISABLE)
@@ -403,8 +407,6 @@ static void __oom_kill_task(struct task_struct *p,
int verbose)
K(p->mm->total_vm),
K(get_mm_counter(p->mm, MM_ANONPAGES)),
K(get_mm_counter(p->mm, MM_FILEPAGES)));
- task_unlock(p);
-
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
@@ -412,7 +414,11 @@ static void __oom_kill_task(struct task_struct
*p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
-
+ /*
+ * nr_memdie is protected by task_lock.
+ */
+ nr_memdie++;
+ task_unlock(p);
force_sig(SIGKILL, p);
}

--
1.6.0.4



--
Kind regards,
Minchan Kim
Date: Wed, 17 Feb 2010 23:59:40 +0900
Subject: [PATCH] [PATCH -mm] Kill existing current task quickly

If we found current task is existing but didn't set TIF_MEMDIE
during OOM victim selection, let's stop unnecessary
looping for getting high badness score task and go ahead
for killing current.

For forkbomb scenarion, there are many processes on system
so tasklist scanning time might be much big. Sometime we used to
use oom_kill_allocating_task to avoid expensive
task list scanning time.

For it, we have to make sure there are not TIF_MEMDIE's
another tasks. This patch introduces nr_memdie
for counting TIF_MEMDIE's tasks.

This patch would make side effect skip OOM_DISABLE test.
But It's okay since the task is existing and oom_kill_process
doesn't show any killing message since __oom_kill_task will
interrupt it in oom_kill_process.

Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Nick Piggin <npiggin@xxxxxxx>
---
include/linux/oom.h | 2 ++
kernel/exit.c | 3 +++
mm/oom_kill.c | 12 +++++++++---
3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 505aa9e..9babcce 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -48,5 +48,7 @@ extern int sysctl_panic_on_oom;
extern int sysctl_oom_kill_allocating_task;
extern int sysctl_oom_dump_tasks;

+extern unsigned int nr_memdie;
+
#endif /* __KERNEL__*/
#endif /* _INCLUDE_LINUX_OOM_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index c5305fc..932c67b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -50,6 +50,7 @@
#include <linux/perf_event.h>
#include <trace/events/sched.h>
#include <linux/hw_breakpoint.h>
+#include <linux/oom.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -685,6 +686,8 @@ static void exit_mm(struct task_struct * tsk)
/* more a memory barrier than a real lock */
task_lock(tsk);
tsk->mm = NULL;
+ if (test_thread_flag(TIF_MEMDIE))
+ nr_memdie--;
up_read(&mm->mmap_sem);
enter_lazy_tlb(mm, current);
/* We don't want this task to be frozen prematurely */
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3618be3..d5e3d70 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -32,6 +32,8 @@ int sysctl_panic_on_oom;
int sysctl_oom_kill_allocating_task;
int sysctl_oom_dump_tasks;
static DEFINE_SPINLOCK(zone_scan_lock);
+
+unsigned int nr_memdie; /* count of TIF_MEMDIE processes */
/* #define DEBUG */

/*
@@ -295,6 +297,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,

chosen = p;
*ppoints = ULONG_MAX;
+ if (nr_memdie == 0)
+ break;
}

if (p->signal->oom_adj == OOM_DISABLE)
@@ -403,8 +407,6 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
K(p->mm->total_vm),
K(get_mm_counter(p->mm, MM_ANONPAGES)),
K(get_mm_counter(p->mm, MM_FILEPAGES)));
- task_unlock(p);
-
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
@@ -412,7 +414,11 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
-
+ /*
+ * nr_memdie is protected by task_lock.
+ */
+ nr_memdie++;
+ task_unlock(p);
force_sig(SIGKILL, p);
}

--
1.6.0.4