Re: [PATCH] x86: endless page faults in mount_block_root for Linux2.6 - v2

From: Henry Nestler
Date: Wed May 07 2008 - 16:52:17 EST


Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to access pte/pgd inside function
spurious_fault.

To fix, move vmalloc address range checks from vmalloc_fault to
do_page_fault for 32 and 64bit.

Signed-off-by: Henry Nestler <henry.nestler@xxxxxxxxx>
---
32bit example, where adresss hole was faulting endless again (after the
patch from 2008-04-23):
=======
Linux version 2.6.25 (hn@hn-dt) (gcc version 4.2.1 (SUSE Linux)) #48
PREEMPT ...
64MB LOWMEM available.
[...]
entry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory: 61108k/65536k available (1482k kernel code, 0k reserved, 455k
data, 136k init, 0k highmem)
virtual kernel memory layout:
fixmap : 0xffffa000 - 0xfffff000 ( 20 kB)
vmalloc : 0xc4800000 - 0xffff8000 ( 951 MB)
lowmem : 0xc0000000 - 0xc4000000 ( 64 MB)
.init : 0xcCPA: page pool initialized 1 of 1 pages preallocated
[...]
checking if image is initramfs...it isn't (no cpio magic); looks like an
initrd
BUG: unable to handle kernel paging request at c0000e68
IP: [<c010cb84>] __change_page_attr_set_clr+0x104/0x590
*pde = 00000063 BUG: unable to handle kernel paging request at c0000000
IP: [<c010c5c9>] do_page_fault+0x639/0x730
*pde = 00000063 BUG: unable to handle kernel paging request at c0000000
IP: [<c010c5c9>] do_page_fault+0x639/0x730
*pde = 00000063 BUG: unable to handle kernel paging request at c0000000
IP: [<c010c5c9>] do_page_fault+0x639/0x730
===== ... this never ends or with a stack overflow ... ===

Shure, the "out of range address" was from buggy driver development.
But not of adresses should kill the complete system.

"__change_page_attr_set_clr" is some of the macros inside spurious_fault.

_After_ this patch, I got such normal trace back print:
========
checking if image is initramfs...it isn't (no cpio magic); looks like an
initrd
BUG: unable to handle kernel paging request at c0000e68
IP: [<c010cb74>] __change_page_attr_set_clr+0x104/0x590
*pde = 08a96063
Oops: 0000 [#1] PREEMPT
Modules linked in:

Pid: 1, comm: swapper Not tainted (2.6.25 #49)
EIP: 0060:[<c010cb74>] EFLAGS: 00010282 CPU: 0
EIP is at __change_page_attr_set_clr+0x104/0x590
EAX: c0000e68 EBX: 00000002 ECX: c030ac3c EDX: c3819edc
ESI: c4000000 EDI: c3f9a000 EBP: c3819eec ESP: c3819e80
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=c3818000 task=c38175f0 task.ti=c3818000)
<0>Stack: 00000046 00000000 00000000 c4000000 00000001 c3819efc ...
[...]
<0>Call Trace:
[<c010d05f>] ? change_page_attr_set_clr+0x5f/0x1e0
[<c010d1f7>] ? set_memory_rw+0x17/0x20
[<c010b5b0>] ? free_init_pages+0x20/0xa0
[<c015ed08>] ? fput+0x18/0x20
[<c015bae7>] ? filp_close+0x47/0x70
[<c010b641>] ? free_initrd_mem+0x11/0x20
[<c02edf5c>] ? free_initrd+0x1c/0x40
[<c02ee03b>] ? populate_rootfs+0xbb/0x100
[<c02e8793>] ? kernel_init+0x83/0x260
[...]
========

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd7e179..59f612c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -518,10 +518,6 @@ static int vmalloc_fault(unsigned long address)
pmd_t *pmd, *pmd_ref;
pte_t *pte, *pte_ref;

- /* Make sure we are in vmalloc area */
- if (!(address >= VMALLOC_START && address < VMALLOC_END))
- return -1;
-
/* Copy kernel mappings over when needed. This can also
happen within a race in page table update. In the later
case just flush. */
@@ -620,13 +616,17 @@ void __kprobes do_page_fault(struct pt_regs *regs,
unsigned long error_code)
#else
if (unlikely(address >= TASK_SIZE64)) {
#endif
- if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
- vmalloc_fault(address) >= 0)
- return;
+ /* Make sure we are in vmalloc area */
+ if (address >= VMALLOC_START && address < VMALLOC_END) {

- /* Can handle a stale RO->RW TLB */
- if (spurious_fault(address, error_code))
- return;
+ if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
+ vmalloc_fault(address) >= 0)
+ return;
+
+ /* Can handle a stale RO->RW TLB */
+ if (spurious_fault(address, error_code))
+ return;
+ }

/*
* Don't take the mm semaphore here. If we fixup a prefetch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/