Re: [PATCH v2] x86/sev: Fix host kdump support for SNP
From: Ashish Kalra
Date: Thu Sep 12 2024 - 18:18:39 EST
Hello Sean,
On 9/4/2024 5:23 PM, Sean Christopherson wrote:
>> On Wed, Sep 04, 2024, Ashish Kalra wrote:
>>> On 9/4/2024 2:54 PM, Michael Roth wrote:
>>>> - Sean inquired about making the target kdump kernel more agnostic to
>>>> whether or not SNP_SHUTDOWN was done properly, since that might
>>>> allow for capturing state even for edge cases where we can't go
>>>> through the normal cleanup path. I mentioned we'd tried this to some
>>>> degree but hit issues with the IOMMU, and when working around that
>>>> there was another issue but I don't quite recall the specifics.
>>>> Can you post a quick recap of what the issues are with that approach
>>>> so we can determine whether or not this is still an option?
>>>
>>> Yes, i believe without SNP_SHUTDOWN, early_enable_iommus() configure the
>>> IOMMUs into an IRQ remapping configuration causing the crash in
>>> io_apic.c::check_timer().
>>>
>>> It looks like in this case, we enable IRQ remapping configuration *earlier*
>>> than when it needs to be enabled and which causes the panic as indicated:
>>>
>>> EMERGENCY [ 1.376701] Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC
>>
>> I assume the problem is that IOMMU setup fails in the kdump kernel, not that it
>> does the setup earlier. That's that part I want to understand.
>Here is a deeper understanding of this issue:
>It looks like this is happening: when we do SNP_SHUTDOWN without IOMMU_SNP_SHUTDOWN during panic, kdump boot runs with iommu snp
>enforcement still enabled and IOMMU completion wait buffers (cwb) still locked and exclusivity still setup on those, and then in
>kdump boot, we allocate new iommu completion wait buffers and try to use them, but we get a iommu command completion wait time-out,
>due to the locked in (prev) completion wait buffers, the newly allocated completion wait buffers are not getting used for iommu
>command execution and completion indication :
>[ 1.711588] AMD-Vi: early_amd_iommu_init: irq remaping enabled
>[ 1.718972] AMD-Vi: in early_enable_iommus
>[ 1.723543] AMD-Vi: Translation is already enabled - trying to copy translation structures
>[ 1.733333] AMD-Vi: Copied DEV table from previous kernel.
>[ 1.739566] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.11.0-rc6-next-20240903-snp-host-f2a41ff576cc+ #78
>[ 1.750920] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM100AB 10/17/2022
>[ 1.759950] Call Trace:
>[ 1.762677] <TASK>
>[ 1.765018] dump_stack_lvl+0x70/0x90
>[ 1.769109] dump_stack+0x14/0x20
>[ 1.772809] iommu_completion_wait.part.0.isra.0+0x38/0x140
>[ 1.779035] amd_iommu_flush_all_caches+0xa3/0x240
>[ 1.784383] ? memcpy_toio+0x25/0xc0
>[ 1.788372] early_enable_iommus+0x151/0x880
>[ 1.793140] state_next+0xe67/0x22b0
>[ 1.797130] ? __raw_callee_save___native_queued_spin_unlock+0x19/0x30
>[ 1.804421] amd_iommu_enable+0x24/0x60
>[ 1.808702] irq_remapping_enable+0x1f/0x50
>[ 1.813371] enable_IR_x2apic+0x155/0x260
>[ 1.817848] x86_64_probe_apic+0x13/0x70
>[ 1.822226] apic_intr_mode_init+0x39/0xf0
>[ 1.826799] x86_late_time_init+0x28/0x40
>[ 1.831266] start_kernel+0x6ad/0xb50
>[ 1.835436] x86_64_start_reservations+0x1c/0x30
>[ 1.840591] x86_64_start_kernel+0xbf/0x110
>[ 1.845256] ? setup_ghcb+0x12/0x130
>[ 1.849247] common_startup_64+0x13e/0x141
>[ 1.853821] </TASK>
>[ 2.077901] AMD-Vi: Completion-Wait loop timed out
>...
>And because of this the iommu command, in this case which is for enabling irq remapping does not succeed and that eventually causes
>timer to fail without irq remapping support enabled.
>Once IOMMU SNP support is enabled, to enforce RMP enforcement the IOMMU completion wait buffers are setup as read-only and
>exclusivity set on these and additionally the IOMMU registers used to mark the exclusivity on the store addresses associated with
>these CWB is also locked. This enforcement of SNP in the IOMMU is only disabled with the IOMMU_SNP_SHUTDOWN parameter with
>SNP_SHUTDOWN_EX command.
>From the AMD IOMMU specifications:
>2.12.2.2 SEV-SNP COMPLETION_WAIT Store Restrictions On systems that are SNP-enabled, the store address associated with any host
>COMPLETION_WAIT command (s=1) is restricted. The Store Address must fall within the address range specified by the Completion Store
>Base and Completion Store Limit registers. When the system is SNP-enabled, the memory within this range will be marked in the RMP
>using a special immutable state by the PSP. This memory region will be readable by the CPU but not writable.
>2.12.2.3 SEV-SNP Exclusion Range Restrictions The exclusion range feature is not supported on systems that are SNP-enabled.
>Additionally, the Exclusion Base and Exclusion Range Limit registers are re-purposed to act as the Completion Store Base and Limit
>registers.
>Therefore, we need to disable IOMMU SNP enforcement with SNP_SHUTDOWN_EX command before the kdump kernel starts booting as we can't
>setup IOMMU CWB again in kdump as SEV-SNP exclusion base and range limit registers are locked as IOMMU SNP support is still enabled.
>I tried to use the previous kernel's CWB (cmd_sem) as below:
>static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>{
> if (!is_kdump_kernel())
> iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
> else {
> if (check_feature(FEATURE_SNP)) {
> u64 cwwb_sem_paddr;
>
> cwwb_sem_paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET);
> iommu->cmd_sem = iommu_phys_to_virt(cwwb_sem_paddr);
> return iommu->cmd_sem ? 0 : -ENOMEM;
> }
> }
>
> return iommu->cmd_sem ? 0 : -ENOMEM;
>}
>I tried this, but this fails as i believe the kdump kernel will not have these previous kernel's allocated IOMMU CWB in the kernel
>direct map :
>[ 1.708959] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.714327] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100805000, cmd_sem_vaddr 0xffff9f5340805000
>[ 1.726309] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.731676] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050051000, cmd_sem_vaddr 0xffff9f6290051000
>[ 1.743742] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.749109] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050052000, cmd_sem_vaddr 0xffff9f6290052000
>[ 1.761177] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.766542] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100808000, cmd_sem_vaddr 0xffff9f5340808000
>[ 1.778509] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.783877] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050053000, cmd_sem_vaddr 0xffff9f6290053000
>[ 1.795942] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.801300] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100809000, cmd_sem_vaddr 0xffff9f5340809000
>[ 1.813268] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.818636] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050054000, cmd_sem_vaddr 0xffff9f6290054000
>[ 1.830701] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.836069] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x10080a000, cmd_sem_vaddr 0xffff9f534080a000
>[ 1.848039] AMD-Vi: early_amd_iommu_init: irq remaping enabled
>[ 1.855431] AMD-Vi: in early_enable_iommus
>[ 1.860032] AMD-Vi: Translation is already enabled - trying to copy translation structures
>[ 1.869812] AMD-Vi: Copied DEV table from previous kernel.
>[ 1.875958] AMD-Vi: in build_completion_wait, paddr = 0x100805000
>[ 1.882766] BUG: unable to handle page fault for address: ffff9f5340805000
>[ 1.890441] #PF: supervisor read access in kernel mode
>[ 1.896177] #PF: error_code(0x0000) - not-present page
>....
>I think that memremap(..,..,MEMREMAP_WB) will also fail for the same reason as memremap(.., MEMREMAP_WB) for the RAM region will
>again use the kernel directmap.
More follow-up on this:
Fixed crashkernel/kdump boot with IOMMU SNP support still enabled prior to kdump boot by reusing the pages of the previous
kernel for IOMMU completion wait buffers, command buffer and device table and memremap them during kdump boot, with this change in
the IOMMU driver kdump boots and is able to complete saving the core image.
With this IOMMU driver fix, there is no need to do SNP_DECOMMISSION during panic() and kdump kernel boot will be more agnostic to
whether or not SNP_SHUTDOWN is done properly (or even done at all), i.e., even with active SNP guests.
As mentioned earlier as SNP is not shutdown and IOMMU SNP support is still enabled prior to kdump boot, all the MMIO registers
mentioned in AMD IOMMU specs (as below) are locked:
2.12.2.1 SEV-SNP Register Locks
The following IOMMU registers become locked and are no longer writeable after the system
becomes SNP-enabled:
- Device Table Base Address Register [MMIO Offset 0000h]
- Command Buffer Base Address Register [MMIO Offset 0008h]
- Event Log Base Address Register [MMIO Offset 0010h]
- IOMMU Control Register [MMIO Offset 0018h] fields:
- MMIO Offset 0018h[IOMMUEn]
- MMIO Offset 0018h[DevTblSegEn]
- IOMMU Exclusion Base Register / Completion Store Base Register [MMIO Offset 0020h]
- IOMMU Exclusion Range Limit Register / Completion Store Limit Register [MMIO Offset 0028h]
- PPR Log Base Address Register [MMIO Offset 0038h]
- Guest Virtual APIC Log Base Address Register [MMIO Offset 00E0h]
- Guest Virtual APIC Log Tail Address Register [MMIO Offset 00E8h]
- PPR Log B Base Address Register [MMIO Offset 00F0h]
- Event Log B Base Address Register [MMIO Offset 00F8h]
- Device Table Segment n Base Address Register
As Device Table Base Address Register, Command Buffer Base Address Register and Completion Store Base Register and Completion Store
Limit Register are locked, the writes look to them are ignored, they don’t cause any errors, but as writes are being ignored these
registers are not updated with new allocations for device table, command buffer and CWB buffers during IOMMU driver init when doing
kdump boot and these are required to initialize the IOMMU and enable irq remapping support in the kdump kernel.
Therefore, we reuse the pages of the previous kernel for CWB buffers, command buffer and device table and memremap them during
kdump boot and essentially work with an already enabled SNP configuration and re-using the previous kernel’s data structures.
So now this will be an IOMMU driver change and we can skip the need to do SNP_DECOMMISSION and this should work in all situations
irrespective of SNP_SHUTDOWN done prior to kdump boot.
Thanks,
Ashish