Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3
From: Will Deacon
Date: Wed Aug 16 2017 - 06:07:15 EST
On Wed, Aug 16, 2017 at 10:53:38AM +0100, Mark Rutland wrote:
> On Wed, Aug 16, 2017 at 10:31:12AM +0100, Ard Biesheuvel wrote:
> > (+ Mark, Will)
> >
> > On 15 August 2017 at 22:46, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> > > On Tue, Aug 15, 2017 at 12:18 PM, Sai Praneeth Prakhya
> > > <sai.praneeth.prakhya@xxxxxxxxx> wrote:
> > >> +/*
> > >> + * Makes the calling kernel thread switch to/from efi_mm context
> > >> + * Can be used from SetVirtualAddressMap() or during efi runtime calls
> > >> + * (Note: This routine is heavily inspired from use_mm)
> > >> + */
> > >> +void efi_switch_mm(struct mm_struct *mm)
> > >> +{
> > >> + struct task_struct *tsk = current;
> > >> +
> > >> + task_lock(tsk);
> > >> + efi_scratch.prev_mm = tsk->active_mm;
> > >> + if (efi_scratch.prev_mm != mm) {
> > >> + mmgrab(mm);
> > >> + tsk->active_mm = mm;
> > >> + }
> > >> + switch_mm(efi_scratch.prev_mm, mm, NULL);
> > >> + task_unlock(tsk);
> > >> +
> > >> + if (efi_scratch.prev_mm != mm)
> > >> + mmdrop(efi_scratch.prev_mm);
> > >
> > > I'm confused. You're mmdropping an mm that you are still keeping a
> > > pointer to. This is also a bit confusing in the case where you do
> > > efi_switch_mm(efi_scratch.prev_mm).
> > >
> > > This whole manipulation seems fairly dangerous to me for another
> > > reason -- you're taking a user thread (I think) and swapping out its
> > > mm to something that the user in question should *not* have access to.
> > > What if a perf interrupt happens while you're in the alternate mm?
> > > What if you segfault and dump core? Should we maybe just have a flag
> > > that says "this cpu is using a funny mm", assert that the flag is
> > > clear when scheduling, and teach perf, coredumps, etc not to touch
> > > user memory when the flag is set?
> >
> > It appears we may have introduced this exact issue on arm64 and ARM by
> > starting to run the UEFI runtime services with interrupts enabled.
> > (perf does not use NMI on ARM, so the issue did not exist beforehand)
> >
> > Mark, Will, any thoughts?
>
> Yup, I can cause perf to take samples from the EFI FW code, so that's
> less than ideal.
But that should only happen if you're profiling EL1, right, which needs
root privileges? (assuming the skid issue is solved -- not sure what
happened to those patches after they broke criu).
> The "funny mm" flag sounds like a good idea to me, though given recent
> pain with sampling in the case of skid, I don't know exactly what we
> should do if/when we take an overflow interrupt while in EFI.
I don't think special-casing perf interrupts is the right thing to do here.
If we're concerned about user-accesses being made off the back of interrupts
taken whilst in EFI, then we should probably either swizzle back in the
user page table on the IRQ path or postpone handling it until we're done
with the firmware. Having a flag feels a bit weird: would the uaccess
routines return -EFAULT if it's set?
Will