Re: [PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services
From: Bhupesh Sharma
Date: Fri Jun 01 2018 - 15:01:49 EST
Hi Sai, Ard,
On Tue, May 29, 2018 at 4:39 PM, Ard Biesheuvel
> On 29 May 2018 at 04:21, Sai Praneeth Prakhya
> <sai.praneeth.prakhya@xxxxxxxxx> wrote:
>> From: Sai Praneeth <sai.praneeth.prakhya@xxxxxxxxx>
>> Problem statement:
>> Presently, efi_runtime_services() silently switch %cr3 from swapper_pgd
>> to efi_pgd. As a consequence, kernel code that runs in efi_pgd (e.g.,
>> perf code via an NMI) will have incorrect user space mappings. This
>> could lead to otherwise unexpected access errors and, worse, unauthorized
>> access to firmware code and data.
>> Detailed discussion of problem statement:
>> As this switch is not propagated to other kernel subsystems; they will
>> wrongly assume that swapper_pgd is still in use and it can lead to
>> following issues:
>> 1. If kernel code tries to access user space addresses while in efi_pgd,
>> it could lead to unauthorized accesses to firmware code/data.
>> (e.g: <__>/copy_from_user_nmi()).
>> [This could also be disastrous if the frame pointer happens to point at
>> MMIO in the EFI runtime mappings] - Mark Rutland.
>> An example of a subsystem that could touch user space while in efi_pgd is
>> perf. Assume that we are in efi_pgd, a user could use perf to profile
>> some user data and depending on the address the user is trying to
>> profile, two things could happen.
>> 1. If the mappings are absent, perf fails to profile.
>> 2. If efi_pgd does have mappings for the requested address (these
>> mappings are erroneous), perf profiles firmware code/data. If the
>> address is MMIO'ed, perf could have potentially changed some device state.
>> The culprit in both the cases is, EFI subsystem swapping out pgd and not
>> perf. Because, EFI subsystem has broken the *general assumption* that
>> all other subsystems rely on - "user space might be valid and nobody has
>> switched %cr3".
>> There are two ways to fix this issue:
>> 1. Educate about pgd change to *all* the subsystems that could
>> potentially access user space while in efi_pgd.
>> On x86, AFAIK, it could happen only when some one touches user space
>> from the back of an NMI (a quick audit on <__>/copy_from_user_nmi,
>> showed perf and oprofile). On arm, it could happen from multiple
>> places as arm runs efi_runtime_services() interrupts enabled (ARM folks,
>> please comment on this as I might be wrong); whereas x86 runs
>> efi_runtime_services() interrupts disabled.
>> I think, this solution isn't holistic because
>> a. Any other subsystem might well do the same, if not now, in future.
>> b. Also, this solution looks simpler on x86 but not true if it's the
>> same for arm (ARM folks, please comment on this as I might be wrong).
>> c. This solution looks like a work around rather than addressing the issue.
>> 2. Running efi_runtime_services() in kthread context.
>> This makes sense because efi_pgd doesn't have user space and kthread
>> by definition means that user space is not valid. Any kernel code that
>> tries to touch user space while in kthread is buggy in itself. If so,
>> it should be an easy fix in the other subsystem. This also take us one
>> step closer to long awaiting proposal of Andy - Running EFI at CPL 3.
>> What does this patch set do?
>> Introduce efi_rts_wq (EFI runtime services work queue).
>> When a user process requests the kernel to execute any efi_runtime_service(),
>> kernel queues the work to efi_rts_wq, a kthread comes along, switches to
>> efi_pgd and executes efi_runtime_service() in kthread context. IOW, this
>> patch set adds support to the EFI subsystem to handle all calls to
>> efi_runtime_services() using a work queue (which in turn uses kthread).
>> How running efi_runtime_services() in kthread fixes above discussed issues?
>> If we run efi_runtime_services() in kthread context and if perf
>> checks for it, we could get both the above scenarios correct by perf
>> aborting the profiling. Not only perf, but any subsystem that tries to
>> touch user space should first check for kthread context and if so,
>> should abort.
>> Q. If we still need check for kthread context in other subsystems that
>> access user space, what does this patch set fix?
>> A. This patch set makes sure that EFI subsystem is not at fault.
>> Without this patch set the blame is upon EFI subsystem, because it's the
>> one that changed pgd and hasn't communicated this change to everyone and
>> hence broke the general assumption. Running efi_runtime_services() in
>> kthread means explicitly communicating that user space is invalid, now
>> it's the responsibility of other subsystem to make sure that it's
>> running in right context.
>> Tested using LUV (Linux UEFI Validation) for x86_64, x86_32 and arm64
>> (qemu only). Will appreciate the effort if someone could test the
>> patches on real ARM/ARM64 machines.
I would give the latest v5 a try on my ARM64 qualcomm board as well.
WIll get back with the test results soon.
>> LUV: https://01.org/linux-uefi-validation
>> Thanks to Ricardo, Dan, Miguel, Mark, Peter Z and Ard for reviews and
>> suggestions. Thanks to Boris and Andy for making me think through/help
>> on what I am addressing with this patch set.
>> Please feel free to pour in your comments and concerns.
>> Patches are based on Linus's kernel v4.17-rc7
>>  Backup: Detailing efi_pgd:
>> efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
>> Code/Data) regions. Due to the nature of these mappings, they fall
>> in user space address ranges and they are not the same as swapper.
>> [On arm64, the EFI mappings are in the VA range usually used for user
>> space. The two halves of the address space are managed by separate
>> tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
>> user space or EFI runtime mappings in TTBR0.] - Mark Rutland
>> Changes from V4 to V5:
>> 1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
>> Non-blocking variants are supposed to not block and using workqueue
>> exactly does the opposite, hence refrain from using it.
>> 2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
>> blocking variants means that we have to call efi_delete_dummy_variable()
>> after efi_rts_wq has been created.
>> 3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
>> Any caller wishing to use set_variable() and query_variable_info() in
>> atomic context should use their non-blocking variants.
>> Changes from V3 to V4:
>> 1. As suggested by Peter, use completions instead of flush_work() as the
>> former is cheaper
>> 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
>> wasn't able to find a better alternative to keep this change local to
>> Changes from V2 to V3:
>> 1. Rewrite the cover letter to clearly state the problem. What we are
>> fixing and what we are not fixing.
>> 2. Make efi_delete_dummy_variable() change local to x86.
>> 3. Avoid using BUG(), instead, print error message and exit gracefully.
>> 4. Move struct efi_runtime_work to runtime-wrappers.c file.
>> 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
>> 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.
>> Changes from V1 to V2:
>> 1. Remove unnecessary include of asm/efi.h file - Fixes build error on
>> ia64, reported by 0-day
>> 2. Use enum to identify efi_runtime_services()
>> 3. Use alloc_ordered_workqueue() to create efi_rts_wq as
>> create_workqueue() is scheduled for depreciation.
>> 4. Make efi_call_rts() static, as it has no callers outside
>> 5. Use BUG(), when we are unable to queue work or unable to identify
>> requested efi_runtime_service() - Because these two situations should
>> *never* happen.
>> Sai Praneeth (3):
>> x86/efi: Make efi_delete_dummy_variable() use
>> set_variable_nonblocking() instead of set_variable()
>> efi: Create efi_rts_wq and efi_queue_work() to invoke all
>> efi: Use efi_rts_wq to invoke EFI Runtime Services
> This version looks good to me, and works as expected on SynQuacer (arm64).
> Tested-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> I will give others the opportunity to review this code before queuing it though.
>> arch/x86/platform/efi/quirks.c | 11 +-
>> drivers/firmware/efi/efi.c | 14 ++
>> drivers/firmware/efi/runtime-wrappers.c | 218 +++++++++++++++++++++++++++++---
>> include/linux/efi.h | 3 +
>> 4 files changed, 224 insertions(+), 22 deletions(-)
>> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@xxxxxxxxx>
>> Suggested-by: Andy Lutomirski <luto@xxxxxxxxxx>
>> Cc: Lee Chun-Yi <jlee@xxxxxxxx>
>> Cc: Borislav Petkov <bp@xxxxxxxxx>
>> Cc: Tony Luck <tony.luck@xxxxxxxxx>
>> Cc: Will Deacon <will.deacon@xxxxxxx>
>> Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>> Cc: Bhupesh Sharma <bhsharma@xxxxxxxxxx>
>> Cc: Naresh Bhat <naresh.bhat@xxxxxxxxxx>
>> Cc: Ricardo Neri <ricardo.neri@xxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Ravi Shankar <ravi.v.shankar@xxxxxxxxx>
>> Cc: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
>> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
>> Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
>> Cc: Miguel Ojeda <miguel.ojeda.sandonis@xxxxxxxxx>