Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3

From: Andy Lutomirski
Date: Fri Aug 25 2017 - 11:13:56 EST


On Wed, Aug 23, 2017 at 3:52 PM, Sai Praneeth Prakhya
<sai.praneeth.prakhya@xxxxxxxxx> wrote:
> On Mon, 2017-08-21 at 08:23 -0700, Andy Lutomirski wrote:
>>
>> > On Aug 21, 2017, at 7:08 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> >
>> >> On Mon, Aug 21, 2017 at 06:56:01AM -0700, Andy Lutomirski wrote:
>> >>
>> >>
>> >>> On Aug 21, 2017, at 3:33 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> >
>> >>>>
>> >>>> Using a kernel thread solves the problem for real. Anything that
>> >>>> blindly accesses user memory in kernel thread context is terminally
>> >>>> broken no matter what.
>> >>>
>> >>> So perf-callchain doesn't do it 'blindly', it wants either:
>> >>>
>> >>> - user_mode(regs) true, or
>> >>> - task_pt_regs() set.
>> >>>
>> >>> However I'm thinking that if the kernel thread has ->mm == &efi_mm, the
>> >>> EFI code running could very well have user_mode(regs) being true.
>> >>>
>> >>> intel_pmu_pebs_fixup() OTOH 'blindly' assumes that the LBR addresses are
>> >>> accessible. It bails on error though. So while its careful, it does
>> >>> attempt to access the 'user' mapping directly. Which should also trigger
>> >>> with the EFI code.
>> >>>
>> >>> And I'm not seeing anything particularly broken with either. The PEBS
>> >>> fixup relies on the CPU having just executed the code, and if it could
>> >>> fetch and execute the code, why shouldn't it be able to fetch and read?
>> >>
>> >> There are two ways this could be a problem. One is that u privileged
>> >> user apps shouldn't be able to read from EFI memory.
>> >
>> > Ah, but only root can create per-cpu events or attach events to kernel
>> > threads (with sensible paranoia levels).
>>
>> But this may not need to be percpu. If a non root user can trigger, say, an EFI variable read in their own thread context, boom.
>>
> + Tony
>
> Hi Andi,
>
> I am trying to reproduce the issue that we are discussing and hence
> tried an experiment like this:
> A user process continuously reads efi variable by
> "cat /sys/firmware/efi/efivars/Boot0000-8be4df61-93ca-11d2-aa0d-00e098032b8c" for specified time (Eg: 100 seconds) and simultaneously I ran "perf top" as root (which I suppose should trigger NMI's). I see that everything is fine, no lockups, no kernel crash, no warnings/errors in dmesg.
>
> I see that perf top reports 50% of time is spent in efi function
> (probably efi_get_variable()).
> Overhead Shared Object Symbol
> 50% [unknown] [k] 0xfffffffeea967416
>
> 50% is max, on avg it's 35%.
>
> I have tested this on two kernels v4.12 and v3.19. My machine has 8
> cores and to stress test, I further offlined all cpus except cpu0.
>
> Could you please let me know a way to reproduce the issue that we are
> discussing here.
> I think the issue we are concerned here is, when kernel is in efi
> context and an NMI happens and if the NMI handler tries to access user
> space, boom! we don't have user space in efi context. Am I right in
> understanding the issue or is it something else?

The boom isn't a crash, though -- it'll be (potentially) sensitive
information that shows up in the perf record.

As long as EFI isn't using low addresses, there may not be an issue.
But EFI should (maybe) use low addresses, and this'll be more
important.