On 8/20/20 10:55 AM, Andy Lutomirski wrote:
On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
On 8/20/20 10:10 AM, Sean Christopherson wrote:
On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
On 8/19/20 1:07 PM, Tom Lendacky wrote:
It looks like the FSGSBASE support is crashing my second generation EPYC
system. I was able to bisect it to:
b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
The panic only happens when using KVM. Doing kernel builds or stress
on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
guest and do a kernel build within the guest, I get the following:
I should clarify that this panic is on the bare-metal system, not in the
guest. And that specifying nofsgsbase on the bare-metal command line fixes
the issue.
I certainly see some oddities:
We have this code:
static void svm_vcpu_put(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
int i;
avic_vcpu_put(vcpu);
++vcpu->stat.host_state_reload;
kvm_load_ldt(svm->host.ldt);
#ifdef CONFIG_X86_64
loadsegment(fs, svm->host.fs);
wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
Pretty sure current->thread.gsbase can be stale, i.e. this needs:
current_save_fsgs();
I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
current->thread.gsbase value to a new variable in the svm struct. I then
used that variable in the wrmsrl below, but it still crashed.
Can you try bisecting all the way back to:
commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
Author: Andy Lutomirski <luto@xxxxxxxxxx>
Date: Thu May 28 16:13:48 2020 -0400
x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
and adding the unsafe_fsgsbase command line option while you bisect.
I'll give that a try.
Also, you're crashing when you run a guest, right? Can you try
Right, when the guest is running. The guest boots fine and only when I put some stress on it (kernel build) does it cause the issue. It might be worth trying to pin all the vCPUs and see if the crash still happens.
running the x86 sefltests on a bad kernel without running any guests?
I'll give that a try.
Thanks,
Tom
--Andy