Re: CONFIG_KCOV causing crash in svm_vcpu_run()

From: Dmitry Vyukov
Date: Wed Feb 27 2019 - 03:21:56 EST


On Wed, Feb 27, 2019 at 8:14 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
>
> Reviving this old thread because it wasn't fully fixed after all...
>
> On Tue, May 15, 2018 at 07:33:48AM +0200, Dmitry Vyukov wrote:
> > On Mon, May 14, 2018 at 7:25 PM, Eric Biggers <ebiggers3@xxxxxxxxx> wrote:
> > > On Mon, May 14, 2018 at 07:14:41AM +0200, Dmitry Vyukov wrote:
> > >> On Mon, May 14, 2018 at 5:02 AM, Eric Biggers <ebiggers3@xxxxxxxxx> wrote:
> > >> > Sorry, messed up address for KVM mailing list. See message below.
> > >> >
> > >> > On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
> > >> >> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
> > >> >> the kernel with no output (I'm testing in a VM, so it's using nested
> > >> >> virtualization):
> > >> >>
> > >> >> #include <fcntl.h>
> > >> >> #include <linux/kvm.h>
> > >> >> #include <sys/ioctl.h>
> > >> >>
> > >> >> int main()
> > >> >> {
> > >> >> int dev, vm, cpu;
> > >> >> char page[4096] __attribute__((aligned(4096))) = { 0 };
> > >> >> struct kvm_userspace_memory_region memreg = {
> > >> >> .memory_size = 4096,
> > >> >> .userspace_addr = (unsigned long)page,
> > >> >> };
> > >> >> dev = open("/dev/kvm", O_RDONLY);
> > >> >> vm = ioctl(dev, KVM_CREATE_VM, 0);
> > >> >> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
> > >> >> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
> > >> >> ioctl(cpu, KVM_RUN, 0);
> > >> >> }
> > >> >>
> > >> >> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
> > >> >> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
> > >> >> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
> > >> >> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
> > >> >> crash somehow. The following patch fixes it, though I don't know that it's the
> > >> >> right solution; maybe KCOV should be disabled in the function instead, or maybe
> > >> >> there's a more fundamental problem. What do people think?
> > >>
> > >>
> > >> If __sanitizer_cov_trace_pc() crashes, I would expect there must be
> > >> few more of them here:
> > >>
> > >> if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
> > >> svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
> > >>
> > >> if (svm->spec_ctrl)
> > >> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > >>
> > >> Compiler inserts these callbacks into every basic block/edge.. Aren't there?
> > >>
> > >> Unfortunately we don't have an attribute that disables instrumentation
> > >> of a single function. This is currently possible only on file level.
> > >>
> > >
> > > Yes, due to the code dealing with MSR_IA32_SPEC_CTRL, there were several calls
> > > to __sanitizer_cov_trace_pc() before the write to MSR_GS_BASE. The patch I
> > > tested moves the write to MSR_GS_BASE to before all of them, so it's once again
> > > the first thing after the asm block. Again I'm not sure it's the proper
> > > solution, but it did make it stop crashing.
> >
> > From KCOV perspective:
> > This is definitely the simplest and less intrusive solution.
> > It's somewhat unreliable. But it's hard to tell if/when it will
> > actually break in practice. Compiler can decide to insert the callback
> > after asm block, or a branch can be added to wrmsrl (e.g. under some
> > debug config). More reliable solution would be to restore registers in
> > asm block itself, or move this to a separate file and disable
> > instrumentation of that file (though, will not save from non-inlined
> > wrmsrl). But again, the proposed solution may work well for the next
> > 10 years, so additional complexity may not worth it.
> >
> > Btw, I don't see anything about fs/gs in vmx_vcpu_run. Is it VMLAUNCH
> > that saves/restores them?
>
> So it turns out there *is* a branch in wrmsrl() when CONFIG_PARAVIRT=y &&
> CONFIG_PARAVIRT_DEBUG=y, and that causes the same crash: the compiler inserts a
> call to __sanitizer_cov_trace_pc() prior to the GS_BASE register being restored
> in svm_vcpu_run().
>
> #ifdef CONFIG_PARAVIRT_DEBUG
> #define PVOP_TEST_NULL(op) BUG_ON(pv_ops.op == NULL)
> #else
> #define PVOP_TEST_NULL(op) ((void)pv_ops.op)
> #endif
>
> Dmitry, in the long run maybe this should be solved by adding a function
> attribute to gcc that disables coverage for a function?
>
> But for now maybe CONFIG_KCOV should depend on !CONFIG_PARAVIRT_DEBUG? Or does
> anyone have a better idea? Alternatively as Dmitry suggested, svm_vcpu_run()
> could be moved to a separate file and compiled with different flags...


Re attribute. This is only place this come up so far. No precedents in
user-space and kernel is also quite extensive tested as well. Though
we bulk ignore some files involved in early boot, maybe the attribute
would be useful in some of these files too. E.g.:
https://bugzilla.kernel.org/show_bug.cgi?id=198443
We could add it, but adding an attribute to both compilers is not
something completely trivial and will take time to propagate to users.
I would go with some solution local to SVM for now.

It would be nice to restore segment registers right in the asm block
so that C code always have proper normal context for execution (e.g.
can affect KASAN/KMSAN/KTSAN too). But with paravirt it's not trivial,
right? At least not as trivial as executing WRMSR.