Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1val:-59

From: Konstantin Khlebnikov
Date: Wed May 23 2012 - 02:07:51 EST


Andrew Morton wrote:
On Wed, 23 May 2012 00:41:28 +0200
Martin Mokrejs<mmokrejs@xxxxxxxxxxxxxxxxxx> wrote:

Hi Andrew,
while shutting down my laptop (Dell Vostro 3550 with 16GB RAM, core i7) with 3.4-rc7 I got:

May 23 00:07:54 vostro kernel: [352687.968267] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
May 23 00:07:54 vostro kernel: [352687.968312] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:2 val:59
May 23 00:07:55 vostro acpid: exiting
May 23 00:07:55 vostro syslog-ng[2838]: syslog-ng shutting down; version='3.3.4'

I found by Google the below thread and thought that maybe it is related?
http://comments.gmane.org/gmane.linux.kernel.mm/76459

...



Well hopefully the below will fix this?

I notice that I don't have this tagged for -stable backporting. That
seems wrong. Konstantin, do we know for how long this bug has been in
there?

It there for years, by itself it is mostly harmless.
This warning was added in c3f0327f8e9d7a503f0d64573c311eddd61f197d
so only v3.4 has this, I thought this fix will be there before release.




From: Konstantin Khlebnikov<khlebnikov@xxxxxxxxxx>
Subject: mm: correctly synchronize rss-counters at exit/exec

mm->rss_stat counters have per-task delta: task->rss_stat. Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats, audit
and other stuff. Unfortunately the kernel does this before calling
mm_release(), which can call put_user() for processing
task->clear_child_tid. So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again. As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier. After mm_release() there should be
no pagefaults.

[akpm@xxxxxxxxxxxxxxxxxxxx: tweak comment]
Signed-off-by: Konstantin Khlebnikov<khlebnikov@xxxxxxxxxx>
Reported-by: Markus Trippelsdorf<markus@xxxxxxxxxxxxxxx>
Cc: Hugh Dickins<hughd@xxxxxxxxxx>
Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@xxxxxxxxxxxxxx>
Cc: Oleg Nesterov<oleg@xxxxxxxxxx>
Signed-off-by: Andrew Morton<akpm@xxxxxxxxxxxxxxxxxxxx>
---

fs/exec.c | 1 -
kernel/exit.c | 13 ++++++++-----
kernel/fork.c | 8 ++++++++
3 files changed, 16 insertions(+), 6 deletions(-)

diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
--- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);

if (old_mm) {
diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
--- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
* user space pages. We don't need them, and if we didn't close them
* they would be locked into memory.
*/
+ mm_release(current, current->mm);
exit_mm(current);
/*
* We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
struct mm_struct *mm = tsk->mm;
struct core_state *core_state;

- mm_release(tsk, mm);
if (!mm)
return;
/*
@@ -959,9 +959,13 @@ void do_exit(long code)
preempt_count());

acct_update_integrals(tsk);
- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
+
+ /* Set exit_code before complete_vfork_done() in mm_release() */
+ tsk->exit_code = code;
+
+ /* Release mm and sync mm's RSS info before statistics gathering */
+ mm_release(tsk, tsk->mm);
+
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
@@ -974,7 +978,6 @@ void do_exit(long code)
tty_audit_exit();
audit_free(tsk);

- tsk->exit_code = code;
taskstats_exit(tsk, group_dead);

exit_mm(tsk);
diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
--- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/fork.c
@@ -809,6 +809,14 @@ void mm_release(struct task_struct *tsk,
}
tsk->clear_child_tid = NULL;
}
+
+ /*
+ * Final rss-counter synchronization. After this point there must be
+ * no pagefaults into this mm from the current context. Otherwise
+ * mm->rss_stat will be inconsistent.
+ */
+ if (mm)
+ sync_mm_rss(mm);
}

/*
_


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/