Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3

From: Sai Praneeth Prakhya
Date: Tue Aug 15 2017 - 20:28:00 EST


On Tue, 2017-08-15 at 14:46 -0700, Andy Lutomirski wrote:
> On Tue, Aug 15, 2017 at 12:18 PM, Sai Praneeth Prakhya
> <sai.praneeth.prakhya@xxxxxxxxx> wrote:
> > +/*
> > + * Makes the calling kernel thread switch to/from efi_mm context
> > + * Can be used from SetVirtualAddressMap() or during efi runtime calls
> > + * (Note: This routine is heavily inspired from use_mm)
> > + */
> > +void efi_switch_mm(struct mm_struct *mm)
> > +{
> > + struct task_struct *tsk = current;
> > +
> > + task_lock(tsk);
> > + efi_scratch.prev_mm = tsk->active_mm;
> > + if (efi_scratch.prev_mm != mm) {
> > + mmgrab(mm);
> > + tsk->active_mm = mm;
> > + }
> > + switch_mm(efi_scratch.prev_mm, mm, NULL);
> > + task_unlock(tsk);
> > +
> > + if (efi_scratch.prev_mm != mm)
> > + mmdrop(efi_scratch.prev_mm);
>

Thanks for the quick review Andy,

> I'm confused. You're mmdropping an mm that you are still keeping a
> pointer to. This is also a bit confusing in the case where you do
> efi_switch_mm(efi_scratch.prev_mm).
>

This makes sense, I will look into it.

> This whole manipulation seems fairly dangerous to me for another
> reason -- you're taking a user thread (I think) and swapping out its
> mm to something that the user in question should *not* have access to.

We are switching to efi_mm from user mm_struct because
EFI_RUNTIME_SERVICES like efi_set_variable()/efi_get_variable() are
accessible only through efi_pgd. The user thread calls ioctl() which in
turn calls efi_call() and thus efi_switch_mm(). So, I think, the user
still does not have direct access to EFI_RUNTIME_SERVICES memory regions
but accesses them through sys call.

> What if a perf interrupt happens while you're in the alternate mm?

Since we are disabling/enabling interrupts around switching, I think we
are safe. We do these in following functions
phys_efi_set_virtual_address_map()
efi_thunk_set_virtual_address_map()
efi_call_virt_pointer()

> What if you segfault and dump core?

We could seg fault only if firmware touches regions which it shouldn't.
i.e. Firmware touching regions outside EFI_RUNTIME_SERVICES (this is a
UEFI Spec violation). So, in this case of buggy firmware, we panic (this
is an existing problem). We also map EFI_BOOT_TIME_SERVICES into efi_pgd
because we know some buggy firmware touches these regions.

> Should we maybe just have a flag
> that says "this cpu is using a funny mm", assert that the flag is
> clear when scheduling, and teach perf, coredumps, etc not to touch
> user memory when the flag is set?
>
> Admittedly, the latter problem may well have existed even before these patches.

Please let me know if you think otherwise.

Matt,
Please feel free to correct my understanding.

Regards,
Sai