Re: Crash on armv7-a using KASAN

From: Ard Biesheuvel
Date: Tue Oct 15 2024 - 10:45:16 EST


On Tue, 15 Oct 2024 at 16:35, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> On Tue, Oct 15, 2024 at 04:22:20PM +0200, Ard Biesheuvel wrote:
> > On Tue, 15 Oct 2024 at 16:00, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > >
> > > On Tue, Oct 15, 2024 at 03:51:02PM +0200, Linus Walleij wrote:
> > > > On Tue, Oct 15, 2024 at 12:28 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > > > On Mon, Oct 14, 2024 at 03:19:49PM +0200, Clement LE GOFFIC wrote:
> > > >
> > > > > I think what's happening here is that when switching from prev to next
> > > > > in the scheduler, we switch to next's mm before we actually switch to
> > > > > next's register state, and there's a transient window where prev is
> > > > > executed using next's mm. AFAICT we don't map prev's KASAN stack shadow
> > > > > into next's mm anywhere, and so inlined KASAN_STACK checks recursively
> > > > > fault on this until we switch to the overflow stack.
>
> [...]
>
> > > > Yeah it looks like a spot-on identification of the problem, I can try to
> > > > think about how we could fix this if I can reproduce it, I keep trying
> > > > to provoke the crash :/
> > >
> > > It's a bit grotty -- AFAICT you'd either need to prefault in the
> > > specific part of the vmalloc space when switching tasks, or we'd need to
> > > preallocate all the shared vmalloc tables at the start of time so that
> > > they're always up-to-date.
> > >
> > > While we could disable KASAN_STACK, that's only going to mask the
> > > problem until this happens for any other vmalloc shadow...
> >
> > Is the other vmalloc shadow not covered by the ordinary on-demand faulting?
>
> It depends on what the vmalloc memory is used for; if it's anything else
> used in the fault handling path, that'll fault recursively, and it's
> possible that'll happen indirectly via other instrumentation.
>
> > When I implemented VMAP_STACK for ARM, I added an explicit load from
> > the new stack while still running from the old one (in __switch_to) so
> > that the ordinary faulting code can deal with it. Couldn't we do the
> > same for the vmalloc shadow of the new stack?
>
> We could do something similar, but note that it's backwards: we need to
> ensure that the old/current stack shadow will be mapped in the new mm.
>
> So the usual fault handling can't handle that as-is, because you need to
> fault-in pages for an mm which isn't yet in use. That logic could be
> factored out and shared, though.
>

Not sure I follow you here. The crash is in the kernel, no?

So there is only a single vmalloc space where all the mappings should
reside, but each process has its own copy of the top level page table,
which needs to be synced up when it goes stale.