Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3
From: Sai Praneeth Prakhya
Date: Wed Aug 23 2017 - 18:57:21 EST
On Mon, 2017-08-21 at 08:23 -0700, Andy Lutomirski wrote:
>
> > On Aug 21, 2017, at 7:08 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> >> On Mon, Aug 21, 2017 at 06:56:01AM -0700, Andy Lutomirski wrote:
> >>
> >>
> >>> On Aug 21, 2017, at 3:33 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> >>>>
> >>>> Using a kernel thread solves the problem for real. Anything that
> >>>> blindly accesses user memory in kernel thread context is terminally
> >>>> broken no matter what.
> >>>
> >>> So perf-callchain doesn't do it 'blindly', it wants either:
> >>>
> >>> - user_mode(regs) true, or
> >>> - task_pt_regs() set.
> >>>
> >>> However I'm thinking that if the kernel thread has ->mm == &efi_mm, the
> >>> EFI code running could very well have user_mode(regs) being true.
> >>>
> >>> intel_pmu_pebs_fixup() OTOH 'blindly' assumes that the LBR addresses are
> >>> accessible. It bails on error though. So while its careful, it does
> >>> attempt to access the 'user' mapping directly. Which should also trigger
> >>> with the EFI code.
> >>>
> >>> And I'm not seeing anything particularly broken with either. The PEBS
> >>> fixup relies on the CPU having just executed the code, and if it could
> >>> fetch and execute the code, why shouldn't it be able to fetch and read?
> >>
> >> There are two ways this could be a problem. One is that u privileged
> >> user apps shouldn't be able to read from EFI memory.
> >
> > Ah, but only root can create per-cpu events or attach events to kernel
> > threads (with sensible paranoia levels).
>
> But this may not need to be percpu. If a non root user can trigger, say, an EFI variable read in their own thread context, boom.
>
+ Tony
Hi Andi,
I am trying to reproduce the issue that we are discussing and hence
tried an experiment like this:
A user process continuously reads efi variable by
"cat /sys/firmware/efi/efivars/Boot0000-8be4df61-93ca-11d2-aa0d-00e098032b8c" for specified time (Eg: 100 seconds) and simultaneously I ran "perf top" as root (which I suppose should trigger NMI's). I see that everything is fine, no lockups, no kernel crash, no warnings/errors in dmesg.
I see that perf top reports 50% of time is spent in efi function
(probably efi_get_variable()).
Overhead Shared Object Symbol
50% [unknown] [k] 0xfffffffeea967416
50% is max, on avg it's 35%.
I have tested this on two kernels v4.12 and v3.19. My machine has 8
cores and to stress test, I further offlined all cpus except cpu0.
Could you please let me know a way to reproduce the issue that we are
discussing here.
I think the issue we are concerned here is, when kernel is in efi
context and an NMI happens and if the NMI handler tries to access user
space, boom! we don't have user space in efi context. Am I right in
understanding the issue or is it something else?
Regards,
Sai