Re: [PATCH v2 12/32] mm/vmalloc: vmalloc_to_page() use pte_offset_kernel()

From: Hugh Dickins
Date: Tue Jul 11 2023 - 00:35:07 EST


On Mon, 10 Jul 2023, Mark Brown wrote:
> On Mon, Jul 10, 2023 at 06:18:27PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Jul 10, 2023 at 03:42:31PM +0100, Mark Brown wrote:
>
> > > We end up seeing NULL or otherwise bad pointer dereferences, the
> > > specific error does vary a bit though it mostly appears to be in the
> > > pinctrl code. A bisect (full log below) identified this patch as
> > > introducing the failure, nothing is jumping out at me about the patch
> > > and it's not affecting everything so I'd not be surprised if it's just
> > > unconvering some bug in the platform support but I'm not super familiar
> > > with the code.
>
> > Yeah seems likely. Do you have a .config you can share for this board? For
> > a 64-bit device you'd expect that this change would probably be a nop.
>
> It's definitely happening with arm64 defconfig, possibly with other
> configs but that's the main one.

I'm sorry for dropping you in it, Mark, but I'm totally baffled.
I've spent most of the day trying to come up with ideas, but failed.
I've no doubt that you're seeing what you're seeing, but how it comes
about is a mystery.

Lorenzo is right that the change should be a no-op - compared with 6.4.
But it's not quite a no-op in this series, because 04/32 0d940a9b270b
("mm/pgtable: allow pte_offset_map[_lock]() to fail") diverts the old
pte_offset_map() macro off to a new function in mm/pgtable-generic.c;
then this commit restores it back to being the pte_offset_kernel() macro.

So the asm in vmalloc_to_page() is expected to change in this commit,
but change back to what it would have been in 6.4.

This feels like one of those bugs which depends on the code size in
some way (a bit like those bugs we used to have, where a function was
mistakenly marked __init, then in some configs its code landed on a
page which got freed at startup - I'm not saying this is that at all,
just saying it feels weird in that way).

Yet your bisection converges convincingly, which I wouldn't expect
in that case.

I suppose I should ask you to try reverting this 0d1c81edc61e alone
from 6.5-rc1: the consistency of your bisection implies that it will
"fix" the issues, and it is a commit which we could drop. It makes
me a little nervous, applying userspace-pagetable validation to kernel
pagetables, so I don't want to drop it; and it would really be cargo-
culting to drop it without understanding. But we could drop it.

I guess it would be interesting to know whether vmalloc_to_page() is
ever even called in your kernel, before it crashes on the pinctrl stuff.
But putting in a printk to report on that may change everything.

And I guess it would be interesting to know (from a DEBUG_INFO build
of the crashing kernel) which line of dt_remember_or_free_map() it
oopses on i.e. which pointer is NULL when it shouldn't be - or maybe
you already worked that out.

And what device (which ->dt_node_to_map) is involved. If one of the
many dt_node_to_map's fails to initialize *map to NULL when it should,
and has relied on it happening to be a NULL on the stack already...
that might explain it.

Another thing to try, would be the kernel at 0d940a9b270b^, just before
pte_offset_map() grew a function call: there's a faint possibility that
the bug came in before this series, that 0d940a9b270b somehow masked it
(I don't see how: vmalloc_to_page() does sensible validation itself),
and then 0d1c81edc61e unmasked it again - so that the bisection skipped
over, and converged on the wrong point.

But I'm thrashing about: I have no confidence that any of this info will
help us. Sorry for wasting your time.

Thanks,
Hugh