Re: Processes hang in an unkillable state

From: Linus Torvalds
Date: Tue Apr 12 2011 - 17:47:18 EST


On Tue, Apr 12, 2011 at 1:56 PM, Robert ÅwiÄcki <robert@xxxxxxxxxxx> wrote:
>
> Ok, just to update you with what I'm currently doing:
>
> I'm testing now with 2.6.39-rc3 - according to
> http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc3
> it has vma_to_resize patch included
> (982134ba62618c2d69fbbbd166d0a11ee3b7e3d8) - I applied the latest
> Linus' patch for sys_mlock (the one patching memory.c and mlock.c),
> disabled the sys_madvise in the fuzzer, and now I got the following
> (full kdb dump attached)

Ok, that's different from the apparent livelock.

Except it once again is one of the BUG_ON's in vma_prio_tree_add() -
and again, your kgdb thing has corrupted the bug information.

Can you make a bug-report to the kgdb people? It's annoying as hell
that all the *critical* bug information that the kernel prints out
apparently gets totally lost when you attach with the debugger. It's
not an Oops, it should have that nice BUG: together with filename and
line number.

> <d>Pid: 18598, comm: iknowthis Not tainted 2.6.39-rc3 #1<c> Dell Inc.
> Â Â Â Â Â Â Â Precision WorkStation 390 Â Â<c>/0GH911<c>
> <d>RIP: 0010:[<ffffffff8116c842>] Â[<ffffffff8116c842>] vma_prio_tree_add+0xc2/0xd0

Code disassembly shows:

0: 58 pop %rax
1: 48 89 7e 68 mov %rdi,0x68(%rsi)
5: c9 leaveq
6: c3 retq
7: 66 90 xchg %ax,%ax
9: 48 8b 56 50 mov 0x50(%rsi),%rdx
d: 48 8d 47 50 lea 0x50(%rdi),%rax
11: 48 89 42 08 mov %rax,0x8(%rdx)
15: 48 89 57 50 mov %rdx,0x50(%rdi)
19: 48 8d 56 50 lea 0x50(%rsi),%rdx
1d: 48 89 57 58 mov %rdx,0x58(%rdi)
21: 48 89 46 50 mov %rax,0x50(%rsi)
25: c9 leaveq
26: c3 retq
27:* 0f 0b ud2 <-- trapping instruction
29: eb fe jmp 0x29
2b:* 0f 0b ud2 <-- trapping instruction
2d: eb fe jmp 0x2d
2f: eb 08 jmp 0x39

and scripts/decodecode is wrong, it's the _second_ of the two ud2's
that traps, as shown by the Code: line.

But whether that is the first or the second in the source code, who
knows? Gcc may have re-ordered things completely, and kdb has thrown
away the information that the kernel should have printed out.

Anyway, it looks _very_ much exactly like the old mremap() issue. But
if you are running -rc3, then you already have commit 42933bac11e8 in
your tree, so maybe there is some other way to trigger a vm_pgoff
overflow.

You've lost Hugh's patch that did the vma dump instead of having the
BUG_ON(). Can you try that one? And once more, I think that if you had
CONFIG_OPTIMIZE_SIZE on, then I think gcc wouldn't re-order the basic
blocks, and the BUG_ON() info would be easier to track.

> Call Trace:
> Â[<ffffffff8116c9a1>] vma_prio_tree_insert+0x41/0x60
> Â[<ffffffff8117cb8c>] __vma_link_file+0x4c/0x90
> Â[<ffffffff8117d568>] vma_adjust+0xe8/0x570
> Â[<ffffffff8117db31>] __split_vma+0x141/0x280
> Â[<ffffffff8117dc95>] split_vma+0x25/0x30
> Â[<ffffffff8117c1a1>] mlock_fixup+0x171/0x1c0
> Â[<ffffffff8117c529>] do_mlock+0xc9/0x100
> Â[<ffffffff8117c6d7>] sys_mlock+0xe7/0x130
> Â[<ffffffff82284e03>] ia32_do_call+0x13/0x13

Hmm. mlock() itself should not be causing any pgoff expansion.

I wonder if this is related to that whole stack expansion thing (you
clearly are hitting the stack vma judging by the other bug you found),
and we have a pgoff underflow when expanding the stack?

Attached patch for your enjoyment. COMPLETELY UNTESTED, as usual.

Guys, can you think of any other thing that might expand a mapping?
Rather than find them one-by-one as Robert plays with his fuzzer?

Linus
mm/mmap.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 2ec8eb5a9cdd..8c05e5b43b69 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1814,11 +1814,14 @@ static int expand_downwards(struct vm_area_struct *vma,
size = vma->vm_end - address;
grow = (vma->vm_start - address) >> PAGE_SHIFT;

- error = acct_stack_growth(vma, size, grow);
- if (!error) {
- vma->vm_start = address;
- vma->vm_pgoff -= grow;
- perf_event_mmap(vma);
+ error = -ENOMEM;
+ if (grow <= vma->vm_pgoff) {
+ error = acct_stack_growth(vma, size, grow);
+ if (!error) {
+ vma->vm_start = address;
+ vma->vm_pgoff -= grow;
+ perf_event_mmap(vma);
+ }
}
}
vma_unlock_anon_vma(vma);