Re: [PATCH] powerpc/mm: make NULL pointer deferences explicit on bad page faults.
From: Michael Ellerman
Date: Mon Dec 17 2018 - 06:36:52 EST
Christophe Leroy <christophe.leroy@xxxxxx> writes:
> Hi Michael,
>
> Le 14/12/2018 Ã 01:57, Michael Ellerman a ÃcritÂ:
>> Hi Christophe,
>>
>> You know it's the trivial patches that are going to get lots of review
>> comments :)
>
> I'm so happy to get comments.
Haha :)
>> Christophe Leroy <christophe.leroy@xxxxxx> writes:
>>> As several other arches including x86, this patch makes it explicit
>>> that a bad page fault is a NULL pointer dereference when the fault
>>> address is lower than PAGE_SIZE
>>
>> I'm being pedantic, but it's not necessarily a NULL pointer dereference.
>> It might just be a direct access to a low address, eg:
>>
>> char *p = 0x100;
>> *p = 0;
>>
>> That's not a NULL pointer dereference.
>>
>> But other arches do print this so I guess it's OK to add, and in most
>> cases it will be an actual NULL pointer dereference.
>>
>> I wonder though if we should use 4096 rather than PAGE_SIZE, given
>> that's the actual value other arches are using. We support 256K pages on
>> some systems, which is getting quite large.
>
> Those invalid accesses are catched because the first page is marked non
> present or non accessible in the page table, so I thing using PAGE_SIZE
> here is valid regardless of the page size.
It's not a question of whether we catch the fault it's what we print
when we catch it. Most of the time on 64-bit the first few GB of the
page tables will be empty, so those will all fault, but we don't call
them NULL pointer deferences.
So I'm just saying that this is a heuristic, ie. an access close to zero
is probably an access at a small offset from a NULL pointer, but it may
not be. And so it's kind of arbitrary where we decide to make the cut
off point between printing that it's a NULL pointer vs a regularly bad
access.
Anyway I'm happy to use PAGE_SHIFT for now, if anyone complains we can
always change it.
>> What about:
>>
>> BUG: Unable to handle kernel instruction fetch at 0x00000000
>
> I think we still need to make it explicit that we jumped there due to a
> NULL function pointer, allthought I don't have a good text idea yet for
> this.
Being pedantic again, we don't know that it was a NULL function pointer.
You might have done a bad setcontext and set your NIP to zero.
But it's probably fine to print it as a hint, and it's probably right
most of the time.
cheers