Re: Lockless/Get_User_Pages_Fast causes Xorg 1.4.99.* to lock

From: Hugh Dickins
Date: Fri Jul 04 2008 - 16:28:58 EST


On Fri, 4 Jul 2008, Zan Lynx wrote:
> Ryan Hope wrote:
> > I have tested this with 2.6.26-rc5-mm3 and with 2.6.26-rc8 w/ the
> > get_user_pages_fast patches from -rc5-mm3... Xorg 1.4.99.* will start
> > to load but hangs at a black screen. At this point, I can not switch
> > to another tty. When I try pressing ctrl+alt+del the kernel ooopses
> > and the caps lock led starts to blink. This happens using the nv,
> > radeon and radeonhd drivers (the nv was tested on another box
> > obviously). I have also tried to unselect HAVE_GET_USER_PAGES_FAST in
> > my kernel config but this does not help. I can not figure out where or
> > what the bug is. I can provide any other info you guys need to figure
> > this out. Let me know what I can do.
>
> I think that I've seen this too on 2.6.26-rc8 on my laptop. It's 64-bit

Your attachment tells us it's actually 2.6.26-rc8-mm1, less of a worry.

> AMD-64 single core (although I run a SMP-alternatives kernel on it) and I was
> using the nv driver.
>
> The symptoms are the same. Everything was working great until I started X. I
> SSH'd in and got the dmesg after X locked up. The full dmesg and config are
> attached because Thunderbird is a very stupid program and won't let me paste
> without wrapping.
>
> Here is a bit of what I got though.
>
> [ 269.224276] BUG: unable to handle kernel paging request at ffffe20003480000
> [ 269.224291] IP: [<ffffffff80297e30>] copy_page_range+0x520/0x760
> [ 269.224306] PGD 1102067 PUD 1103067 PMD 0
> [ 269.224315] Oops: 0000 [1] SMP
>
> [ 269.224473] Call Trace:
> [ 269.224495] [<ffffffff80239a9b>] dup_mm+0x26b/0x3c0
> [ 269.224507] [<ffffffff8023a84c>] copy_process+0xc2c/0x1210
> [ 269.224518] [<ffffffff8023aea3>] do_fork+0x73/0x310
> [ 269.224526] [<ffffffff8024719e>] sys_rt_sigaction+0x8e/0xd0
> [ 269.224536] [<ffffffff8020c2db>] system_call_after_swapgs+0x7b/0x80
> [ 269.224542] [<ffffffff8020c5d7>] ptregscall_common+0x67/0xb0

Useful info, thank you; even more useful was the Code line in your attachment

> [ 269.224557] Code: 00 00 48 b9 00 00 00 00 00 e2 ff ff 48 21 d8 48 c1 e8 0c 48 8d 14 c5 00 00 00 00 48 c1 e0 06 48 29 d0 48 01 c1 0f 84 10 ff ff ff <48> 8b 01 48 89 ca f6 c4 40 74 04 48 8b 51 10 90 ff 42 08 90 ff

which is enough to identify the oops as in copy_one_pte's get_page(page).
Here's a patch I think we need, which I'm hoping will fix both your
crashes - please let us know - thanks a lot.

Stop mprotect's pte_modify from wiping out the x86 pte_special bit, which
caused oops thereafter when vm_normal_page thought X's abnormal was normal.

Signed-off-by: Hugh Dickins <hugh@xxxxxxxxxxx>
---
Fix to 2.6.26-rc8-mm1 x86-implement-pte_special.patch
Perhaps something similar needed for powerpc? Nick will know.

include/asm-x86/pgtable.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- 2.6.26-rc8-mm1/include/asm-x86/pgtable.h 2008-07-03 11:34:55.000000000 +0100
+++ linux/include/asm-x86/pgtable.h 2008-07-04 20:58:36.000000000 +0100
@@ -57,7 +57,7 @@

/* Set of bits not changed in pte_modify */
#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PCD | _PAGE_PWT | \
- _PAGE_ACCESSED | _PAGE_DIRTY)
+ _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)

#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT)
#define _PAGE_CACHE_WB (0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/