Re: [PATCH v2] x86/sev: Fix host kdump support for SNP

From: Ashish Kalra
Date: Mon Sep 09 2024 - 19:34:36 EST


Hello Sean,

On 9/4/2024 5:23 PM, Sean Christopherson wrote:
>> On Wed, Sep 04, 2024, Ashish Kalra wrote:
>>> On 9/4/2024 2:54 PM, Michael Roth wrote:
>>>> - Sean inquired about making the target kdump kernel more agnostic to
>>>> whether or not SNP_SHUTDOWN was done properly, since that might
>>>> allow for capturing state even for edge cases where we can't go
>>>> through the normal cleanup path. I mentioned we'd tried this to some
>>>> degree but hit issues with the IOMMU, and when working around that
>>>> there was another issue but I don't quite recall the specifics.
>>>> Can you post a quick recap of what the issues are with that approach
>>>> so we can determine whether or not this is still an option?
>>>
>>> Yes, i believe without SNP_SHUTDOWN, early_enable_iommus() configure the
>>> IOMMUs into an IRQ remapping configuration causing the crash in
>>> io_apic.c::check_timer().
>>>
>>> It looks like in this case, we enable IRQ remapping configuration *earlier*
>>> than when it needs to be enabled and which causes the panic as indicated:
>>>
>>> EMERGENCY [ 1.376701] Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC
>>
>> I assume the problem is that IOMMU setup fails in the kdump kernel, not that it
>> does the setup earlier. That's that part I want to understand.

>Here is a deeper understanding of this issue:

>It looks like this is happening: when we do SNP_SHUTDOWN without IOMMU_SNP_SHUTDOWN during panic, kdump boot runs with iommu snp
>enforcement still enabled and IOMMU completion wait buffers (cwb) still locked and exclusivity still setup on those, and then in
>kdump boot, we allocate new iommu completion wait buffers and try to use them, but we get a iommu command completion wait time-out,
>due to the locked in (prev) completion wait buffers, the newly allocated completion wait buffers are not getting used for iommu
>command execution and completion indication :

>[ 1.711588] AMD-Vi: early_amd_iommu_init: irq remaping enabled
>[ 1.718972] AMD-Vi: in early_enable_iommus
>[ 1.723543] AMD-Vi: Translation is already enabled - trying to copy translation structures
>[ 1.733333] AMD-Vi: Copied DEV table from previous kernel.
>[ 1.739566] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.11.0-rc6-next-20240903-snp-host-f2a41ff576cc+ #78
>[ 1.750920] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM100AB 10/17/2022
>[ 1.759950] Call Trace:
>[ 1.762677] <TASK>
>[ 1.765018] dump_stack_lvl+0x70/0x90
>[ 1.769109] dump_stack+0x14/0x20
>[ 1.772809] iommu_completion_wait.part.0.isra.0+0x38/0x140
>[ 1.779035] amd_iommu_flush_all_caches+0xa3/0x240
>[ 1.784383] ? memcpy_toio+0x25/0xc0
>[ 1.788372] early_enable_iommus+0x151/0x880
>[ 1.793140] state_next+0xe67/0x22b0
>[ 1.797130] ? __raw_callee_save___native_queued_spin_unlock+0x19/0x30
>[ 1.804421] amd_iommu_enable+0x24/0x60
>[ 1.808702] irq_remapping_enable+0x1f/0x50
>[ 1.813371] enable_IR_x2apic+0x155/0x260
>[ 1.817848] x86_64_probe_apic+0x13/0x70
>[ 1.822226] apic_intr_mode_init+0x39/0xf0
>[ 1.826799] x86_late_time_init+0x28/0x40
>[ 1.831266] start_kernel+0x6ad/0xb50
>[ 1.835436] x86_64_start_reservations+0x1c/0x30
>[ 1.840591] x86_64_start_kernel+0xbf/0x110
>[ 1.845256] ? setup_ghcb+0x12/0x130
>[ 1.849247] common_startup_64+0x13e/0x141
>[ 1.853821] </TASK>
>[ 2.077901] AMD-Vi: Completion-Wait loop timed out
>...

>And because of this the iommu command, in this case which is for enabling irq remapping does not succeed and that eventually causes
>timer to fail without irq remapping support enabled.

>Once IOMMU SNP support is enabled, to enforce RMP enforcement the IOMMU completion wait buffers are setup as read-only and
>exclusivity set on these and additionally the IOMMU registers used to mark the exclusivity on the store addresses associated with
>these CWB is also locked. This enforcement of SNP in the IOMMU is only disabled with the IOMMU_SNP_SHUTDOWN parameter with
>SNP_SHUTDOWN_EX command.

>From the AMD IOMMU specifications:

>2.12.2.2 SEV-SNP COMPLETION_WAIT Store Restrictions On systems that are SNP-enabled, the store address associated with any host
>COMPLETION_WAIT command (s=1) is restricted. The Store Address must fall within the address range specified by the Completion Store
>Base and Completion Store Limit registers. When the system is SNP-enabled, the memory within this range will be marked in the RMP
>using a special immutable state by the PSP. This memory region will be readable by the CPU but not writable.

>2.12.2.3 SEV-SNP Exclusion Range Restrictions The exclusion range feature is not supported on systems that are SNP-enabled.
>Additionally, the Exclusion Base and Exclusion Range Limit registers are re-purposed to act as the Completion Store Base and Limit
>registers.

>Therefore, we need to disable IOMMU SNP enforcement with SNP_SHUTDOWN_EX command before the kdump kernel starts booting as we can't
>setup IOMMU CWB again in kdump as SEV-SNP exclusion base and range limit registers are locked as IOMMU SNP support is still enabled.

>I tried to use the previous kernel's CWB (cmd_sem) as below:

>static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>{
> if (!is_kdump_kernel())
> iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
> else {
> if (check_feature(FEATURE_SNP)) {
> u64 cwwb_sem_paddr;
>
> cwwb_sem_paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET);
> iommu->cmd_sem = iommu_phys_to_virt(cwwb_sem_paddr);
> return iommu->cmd_sem ? 0 : -ENOMEM;
> }
> }
>
> return iommu->cmd_sem ? 0 : -ENOMEM;
>}

>I tried this, but this fails as i believe the kdump kernel will not have these previous kernel's allocated IOMMU CWB in the kernel
>direct map :

>[ 1.708959] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.714327] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100805000, cmd_sem_vaddr 0xffff9f5340805000
>[ 1.726309] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.731676] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050051000, cmd_sem_vaddr 0xffff9f6290051000
>[ 1.743742] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.749109] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050052000, cmd_sem_vaddr 0xffff9f6290052000
>[ 1.761177] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.766542] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100808000, cmd_sem_vaddr 0xffff9f5340808000
>[ 1.778509] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.783877] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050053000, cmd_sem_vaddr 0xffff9f6290053000
>[ 1.795942] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.801300] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100809000, cmd_sem_vaddr 0xffff9f5340809000
>[ 1.813268] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.818636] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050054000, cmd_sem_vaddr 0xffff9f6290054000
>[ 1.830701] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[ 1.836069] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x10080a000, cmd_sem_vaddr 0xffff9f534080a000
>[ 1.848039] AMD-Vi: early_amd_iommu_init: irq remaping enabled
>[ 1.855431] AMD-Vi: in early_enable_iommus
>[ 1.860032] AMD-Vi: Translation is already enabled - trying to copy translation structures
>[ 1.869812] AMD-Vi: Copied DEV table from previous kernel.
>[ 1.875958] AMD-Vi: in build_completion_wait, paddr = 0x100805000
>[ 1.882766] BUG: unable to handle page fault for address: ffff9f5340805000
>[ 1.890441] #PF: supervisor read access in kernel mode
>[ 1.896177] #PF: error_code(0x0000) - not-present page

>....

>I think that memremap(..,..,MEMREMAP_WB) will also fail for the same reason as memremap(.., MEMREMAP_WB) for the RAM region will
>again use the kernel directmap.

To follow up on this:

I am able to use memremap() to map the previous kernel's allocated CWB buffers and try to reuse the same CWB buffers in the
kdump kernel, obviously, memremap() does not return a direct pointer to kernel directmap as the previous kernel's CWB buffers
will be in a RAM address which is not directly mapped into kdump kernel's directmap.

And these memremap() mappings seem to be correct, because if i do a memset(0) on these, i get a RMP #PF violation due
to these buffers being setup as RO in the RMP table, so that means that memremap() seems to have done the mapping correctly.

I am getting inconsistent IOMMU command completion wait timeout's with these reused CWB buffers (which are used as
semaphores to indicate IOMMU command completions) and i am still debugging those issues.

Thanks,
Ashish