Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated
From: Giancarlo Ferrari
Date: Mon Feb 01 2021 - 14:11:02 EST
Hi,
On Mon, Feb 01, 2021 at 03:30:12PM +0000, Mark Rutland wrote:
> Hi,
>
> On Mon, Feb 01, 2021 at 02:39:46PM +0000, Giancarlo Ferrari wrote:
> > On Mon, Feb 01, 2021 at 12:47:20PM +0000, Mark Rutland wrote:
> > > On Mon, Feb 01, 2021 at 12:44:56AM +0000, Giancarlo Ferrari wrote:
> > > > machine_kexec() need to set rw permission in text and rodata sections
> > > > to assign some variables (e.g. kexec_start_address). To do that at
> > > > the end (after flushing pdm in memory, etc.) it needs to invalidate
> > > > TLB [section] entries.
> > >
> > > It'd be worth noting explicitly that set_kernel_text_rw() alters
> > > current->active_mm...
> > >
> > > > If during the TLB invalidation an interrupt occours, which might cause
> > > > a context switch, there is the risk to inject invalid TLBs, with ro
> > > > permissions.
> > >
> > > ... which is why if there's a context switch things can go wrong, since
> > > active_mm isn't stable, and so it's possible that set_kernel_text_rw()
> > > updates multiple tables, none of which might be the active table at the
> > > point we try to make an access.
> >
> > Maybe the behaviour causing issue is not completely clear to me, and I do
> > apologize for that (moreover I haven't eougth debug capabilities).
>
> I think we're in rough agreement that the issue is likely related to the
> context switch, but our understanding of the specifics differs (and I
> think we're missing a detail here).
>
> > However, current-active_mm is switched among context switches. Correct ?
>
> In some cases current->active_mm is not switched, and can be inherited
> over a context switch. When switching to a user task, we always switch
> to its mm (which becomes the active_mm), but when switching to a kthread
> we retain the previous task's mm as the active_mm as part of the lazy
> context switch.
>
> So while a kthread is preemptible, its active_mm (and active ASID) can
> change under its feet. That could happen anywhere while the task is
> preemptible, e.g. in the middle of set_kernel_text_rw(), or
> mid-modification to the kexec variables.
>
Yes.
In my understanding, even in the case of user process, when current->active_mm is switched,
we can run into trouble. For instance:
- Process A issue kexec (PageTables entry of A has 0x8000_0000-0x8010_0000 with ro
permission and section is global, no NG bit set).
- A context switch happens in the middle of set_kernel_text_rw(), right after the
section 0x8000_0000-0x8010_0000 has been invalidated.
- Process B, in its execution, re-inject its own PageTable entry with ro permission, which
is not shared with Process A (and is not touched by the previous invalidation) in the MMU
cache.
- When Process A, is rescheduled, it carries on with the invalidation, but unfortunately I have
"wrong" entries in the MMU.
> > So, in principle, the invalidation, if stopped, is carried on where it
> > left.
>
> That's true so long as all the processes we switch between share the
> same leaf tables for the region we're altering. If not, then the lazy
> context switch means that those tables can change under our feet.
>
> I believe the tables mapping the kernel text are shared by all threads,
> and if so this _should_ be true. Russell might be able to confirm that
> or correct me if I have that wrong.
>
I am not ready to put my hand on the fire, but I seen the behaviour described above.
> > I thought the issue was that the PageTable entry for the section 0x8010_0000
> > is global, thus not indexed by ASID (Address Space ID). By the fact that each
> > process has its own version of that entry, is the cause of the issue, as the
> > schedule process might bringing a spurious entry (with ro permission) in the
> > MMU cache.
>
> The TLB invalidation performed under set_kernel_text_rw() affects all
> ASIDs on the current CPU, so there shouldn't be any stale RO TLB entries
> to hit unless the kthread is migrated to another CPU.
>
> > If the entry is not global holds the ASID, and the issue cannot happen.
>
> I don't think that's true, since switching to a different active_mm
> would also change ASID, and we'd have no additional guarantee.
>
I agree, my fault.
> Thanks,
> Mark.
Thanks,
GF