FSGSBASE causing panic on 5.9-rc1

From: Tom Lendacky
Date: Wed Aug 19 2020 - 14:07:28 EST


It looks like the FSGSBASE support is crashing my second generation EPYC
system. I was able to bisect it to:

b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")

The panic only happens when using KVM. Doing kernel builds or stress
on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
guest and do a kernel build within the guest, I get the following:

[ 120.360637] BUG: scheduling while atomic: qemu-system-x86/5485/0x00110000
[ 124.041646] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: x86_pmu_handle_irq+0x163/0x170
[ 124.041647] ------------[ cut here ]------------
[ 124.041649] Hardware name: AMD
[ 124.041649] Workqueue: 0x0 (events)
[ 124.041651] Call Trace:
[ 124.041651] ------------[ cut here ]------------
[ 124.041652] corrupted preempt_count: kworker/22:1/1449/0x110000
[ 124.051267] WARNING: CPU: 22 PID: 1449 at kernel/sched/core.c:3595 finish_task_switch+0x289/0x290
[ 124.051268] Modules linked in: tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc fuse amd64_edac_mod edac_mce_amd wmi_bmof kvm_amd kvm irqbypass sg ipmi_ssif ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq squashfs loop sch_fq_codel parport_pc ppdev lp parport ip_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ast drm_vram_helper drm_ttm_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt libahci fb_sys_fops libata drm e1000e i2c_piix4 wmi i2c_designware_platform i2c_designware_core pinctrl_amd i2c_core
[ 124.051285] CPU: 22 PID: 1449 Comm: kworker/22:1 Tainted: G W 5.9.0-rc1-sos-linux #1
[ 124.051286] Hardware name: AMD
[ 124.051286] Workqueue: 0x0 (events)
[ 124.051287] RIP: 0010:finish_task_switch+0x289/0x290
[ 124.051288] Code: ff 65 48 8b 04 25 c0 7b 01 00 8b 90 a8 08 00 00 48 8d b0 b0 0a 00 00 48 c7 c7 20 10 10 86 c6 05 be aa 55 01 01 e8 89 03 fd ff <0f> 0b e9 6b ff ff ff 55 48 89 e5 41 55 41 54 49 89 fc 53 48 89 f3
[ 124.051288] RSP: 0018:ffffc9001afe7e10 EFLAGS: 00010082
[ 124.051289] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000023
[ 124.051290] RDX: 0000000000000023 RSI: ffffffff86101044 RDI: ffff88900d798bb0
[ 124.051290] RBP: ffffc9001afe7e38 R08: ffff88900d798ba8 R09: 0000000000000005
[ 124.051290] R10: 000000000000000f R11: ffff88900d798d54 R12: ffff88900d7aacc0
[ 124.051291] R13: ffff889bd2308000 R14: 0000000000000000 R15: ffff88900d7aacc0
[ 124.051291] FS: 0000000000000000(0000) GS:ffff88900d780000(0000) knlGS:0000000000000000
[ 124.051292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 124.051292] CR2: 00007ff607620000 CR3: 0000001bcb0d2000 CR4: 0000000000350ee0
[ 124.051293] Call Trace:
[ 124.051293] __schedule+0x348/0x810
[ 124.051293] ? dbs_work_handler+0x47/0x60
[ 124.051294] schedule+0x4a/0xb0
[ 124.051294] worker_thread+0xcf/0x3b0
[ 124.051294] ? process_one_work+0x370/0x370
[ 124.051294] kthread+0xfe/0x140
[ 124.051295] ? kthread_park+0x90/0x90
[ 124.051295] ret_from_fork+0x22/0x30
[ 124.051295] ---[ end trace 7f77ee8ad05caa89 ]---
[ 124.051296] Kernel Offset: disabled

Specifying nofsgsbase avoids the issue. This is very reproducible, so I
can easily test any fixes.

Thanks,
Tom