Re: [RFC PATCH 2/3] sdei: enable dbg in '_sdei_handler'
From: James Morse
Date: Wed Apr 24 2019 - 12:21:56 EST
Hi Xiongfeng Wang,
On 12/04/2019 13:04, Xiongfeng Wang wrote:
> When we monitor a sdei_event callback using 'kprobe', the singlestep
> handler can not be called because dbg is masked in sdei_handler.
For a good reason: the debug hardware may be in use single-stepping the kernel, or worse:
in use by a KVM guest.
The DAIF flags are all set because none of these things are safe for an
SDEI-event/NMI-handler. We haven't done KVM's guest exit work yet, so things like the
debug hardware belong to the guest.
Its not safe to take an exception from NMI context, avoid it at all costs!
> This patch enable dbg in '_sdei_handler'.
This is very bad as the debug hardware may have been in use by something else. A malicious
guest can now cause you to take breakpoint/watchpoint exceptions.
> When SDEI events interrupt the userspace, 'vbar_el1' contains
> 'tramp_vectors' if CONFIG_UNMAP_KERNEL_AT_EL0 is enabled. So we need to
> restore 'vbar_el1' with kernel vectors, otherwise we will go to the
> wrong place when brk exception or dbg exception happens.
This is the tip of the iceberg. You may have been partway through KVM's world-switch,
you'd need to temporarily save/restore the host context. This ends up being a
parody-world-switch, which has to be kept in-sync with KVM. We didn't go this way because
its fragile and unmaintainable.
> SDEI events may interrupt 'kernel_entry' before we save 'spsr_el1' and
> 'elr_el1', and dbg exception will corrupts these two registers. So we
> also need to save and restore these two registers.
(or don't do anything that would cause them to get clobbered)
> If SDEI events interrupt an instruction being singlestepped, the
> instruction in '_sdei_handler' will begin to be singlestepped after we
> enable dbg. So we need to disable singlestep in the beginning of
> _sdei_handler if we find out we interrupt a singlestep process.
And now the arch code's do_debug_exception() is re-rentrant, which it doesn't expect.
The kprobes core code can't be kprobed, but it can be interrupted by NMI. If you can
kprobe the NMI, you've made the kprobes core re-entrant too. A quick look shows
raw_spin_lock() that could deadlock, and it passes values around with a per-cpu variable.
You could interrupt any part of the krpobes machinery with an SDEI event, take a kprobe,
then take a critical SDEI event, and a third kprobe.
I don't see how its possible to make this safe without re-writing kprobes to be NMI safe.
> diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c
> index ea94cf8..9dd9cf6 100644
> --- a/arch/arm64/kernel/sdei.c
> +++ b/arch/arm64/kernel/sdei.c
> - if (elr != read_sysreg(elr_el1)) {
> - /*
> - * We took a synchronous exception from the SDEI handler.
> - * This could deadlock, and if you interrupt KVM it will
> - * hyp-panic instead.
> - */
> - pr_warn("unsafe: exception during handler\n");
> - }
This is here to catch a WARN_ON() firing and not killing the system. Its not safe to take
an exception from the NMI handler.
Why do you need to kprobe an NMI handler?
Thanks,
James