Re: [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception

From: Yosry Ahmed

Date: Fri Feb 27 2026 - 12:36:18 EST

On Fri, Feb 27, 2026 at 8:34 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Fri, Feb 27, 2026, Sean Christopherson wrote:
> > So instead of patch 1, I want to try either (a) blocking KVM_SET_VCPU_EVENTS,
> > KVM_X86_SET_MCE, and KVM_SET_GUEST_DEBUG if nested_run_pending=1, *and* follow-up
> > with the below WARN-spree, or (b) add a separate flag, e.g. nested_run_in_progress
> > or so, that is set with nested_run_pending, but cleared on an exit to userspace,
> > and then WARN on _that_, i.e. so that we can detect KVM bugs (the whole point of
> > the WARN) and hopefully stop playing this losing game of whack-a-mole with syzkaller.

I like the idea of the WARN there, although something in the back of
my mind tells me I went through this code before with an exception in
mind that could be injected with nested_run_pending=1, but I can't
remember it. Maybe it was injected by userspace and all is good.

That being said, I hate nested_run_in_progress. It's too close to
nested_run_pending and I am pretty sure they will be mixed up.

exception_from_userspace's name made me think this is something we
could key off to WARN, but it's meant to morph queued exceptions from
userspace into an "exception_vmexit" if needed. The field name is
generic but its functionality isn't, maybe it should have been called
exception_check_vmexit or something. Anyway..

That gave me an idea though, can we add a field to
kvm_queued_exception to identify the origin of the exception
(userspace vs. KVM)? Then we can key the warning off of that.

We can potentially also avoid adding the field and just plumb the
argument through to kvm_multiple_exception(), and WARN there if
nested_run_pending is set and the origin is not userspace?

> >
> > I think I'm leaning toward (b)? Except for KVM_SET_GUEST_DEBUG, where userspace
> > is trying to interpose on the guest, restricting ioctls doesn't really add any
> > value in practice. Yeah, in theory it could _maybe_ prevent userspace from shooting
> > itself in the foot, but practically speaking, if userspace is restoring state into
> > a vCPU with nested_run_pending=1, it's either playing on expert mode or is already
> > completely broken.
> >
> > My only hesitation with (b) is that KVM wouldn't be entirely consistent, since
> > vmx_unhandleable_emulation_required() _does_ explicitly reject a "userspace did
> > something stupid with nested_run_pending=1" case. So from that perspective, part
> > of me wants to get greedy and try for (a).
>
> On second (fifth?) thought, I don't think (a) is a good idea. In addition to
> potentially breaking userspace, it also risks preventing genuinely useful sequences.
> E.g. even if no VMM does so today, it's entirely plausible that a VMM could want
> to asynchronously inject an #MC to mimic a broadcast, and that the injection could
> collide with a pending nested VM-Enter.
>
> I'll send a separate (maybe RFC?) series for (b) using patch 1 as a starting point.
> I want to fiddle around with some ideas, and it'll be faster to sketch things out
> in code versus trying to describe things in text.

So you'll apply patch 3 as-is, drop patch 2, and (potentially) take
patch 1 and build another series on top of it?