Re: [PATCH] powerpc/mm: make NULL pointer deferences explicit on bad page faults.

From: Christophe Leroy
Date: Fri Dec 14 2018 - 03:01:56 EST

Hi Michael,

Le 14/12/2018 Ã 01:57, Michael Ellerman a ÃcritÂ:
Hi Christophe,

You know it's the trivial patches that are going to get lots of review
comments :)

I'm so happy to get comments.

Christophe Leroy <christophe.leroy@xxxxxx> writes:
As several other arches including x86, this patch makes it explicit
that a bad page fault is a NULL pointer dereference when the fault
address is lower than PAGE_SIZE

I'm being pedantic, but it's not necessarily a NULL pointer dereference.
It might just be a direct access to a low address, eg:

char *p = 0x100;
*p = 0;

That's not a NULL pointer dereference.

But other arches do print this so I guess it's OK to add, and in most
cases it will be an actual NULL pointer dereference.

I wonder though if we should use 4096 rather than PAGE_SIZE, given
that's the actual value other arches are using. We support 256K pages on
some systems, which is getting quite large.

Those invalid accesses are catched because the first page is marked non present or non accessible in the page table, so I thing using PAGE_SIZE here is valid regardless of the page size.

Looks like the arches have PAGE_SHIFT ranging from 12 to 16 mainly.

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index d51cf5f4e45e..501a1eadb3e9 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -631,13 +631,16 @@ void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
switch (TRAP(regs)) {
case 0x300:
case 0x380:
- printk(KERN_ALERT "Unable to handle kernel paging request for "
- "data at address 0x%08lx\n", regs->dar);
+ pr_alert("Unable to handle kernel %s for data at address 0x%08lx\n",
+ regs->dar < PAGE_SIZE ? "NULL pointer dereference" :
+ "paging request",
+ regs->dar);

This is now too long I think, with printk time you get:

[ 1096.450711] Unable to handle kernel NULL pointer dereference for data at address 0x00000000

Which is 93 columns. It's true on many systems it doesn't really matter
any more, but it would still be good if it was shorter.

I like that on x86 they prefix it with "BUG:", just to avoid any confusion.

What if we had for the NULL pointer case:

BUG: Kernel NULL pointer dereference at 0x00000000

And for the normal case:

BUG: Unable to handle kernel data access at 0x00000000

Note on the very next line we print:
Faulting instruction address: 0xc000000000795cc8

So there should be no confusion about whether "at" refers to the data
address or the instruction address.


case 0x400:
case 0x480:
- printk(KERN_ALERT "Unable to handle kernel paging request for "
- "instruction fetch\n");
+ pr_alert("Unable to handle kernel %s for instruction fetch\n",
+ regs->nip < PAGE_SIZE ? "NULL pointer dereference" :
+ "paging request");

I don't really like using "NULL pointer dereference" here, that
terminology makes me think of a load/store, I think it confuses things
rather than making it clearer.

What about:

BUG: Unable to handle kernel instruction fetch at 0x00000000

I think we still need to make it explicit that we jumped there due to a NULL function pointer, allthought I don't have a good text idea yet for this.

case 0x600:
printk(KERN_ALERT "Unable to handle kernel paging request for "

It would be good to clean up these other cases as well. They seem to be
trying to use the "page request for" terminology which leads to them
being very wordy. I assume that was done to help people grepping kernel
logs for errors, but I think we should not worry about that if we have
the "BUG:" prefix.

So we have:
printk(KERN_ALERT "Unable to handle kernel paging request for "
"unaligned access at address 0x%08lx\n", regs->dar);

What about:

BUG: Unable to handle kernel unaligned access at 0x00000000

printk(KERN_ALERT "Unable to handle kernel paging request for "
"unknown fault\n");

What about:

BUG: Unable to handle unknown paging fault at 0x00000000


Looks like good ideas I'll carry on.