Re: [PATCH v2] x86/sev: Fix host kdump support for SNP

From: Ashish Kalra
Date: Fri Sep 06 2024 - 16:28:57 EST


Hello Sean,

On 9/4/2024 5:23 PM, Sean Christopherson wrote:
> On Wed, Sep 04, 2024, Ashish Kalra wrote:
>> On 9/4/2024 2:54 PM, Michael Roth wrote:
>>> - Sean inquired about making the target kdump kernel more agnostic to
>>> whether or not SNP_SHUTDOWN was done properly, since that might
>>> allow for capturing state even for edge cases where we can't go
>>> through the normal cleanup path. I mentioned we'd tried this to some
>>> degree but hit issues with the IOMMU, and when working around that
>>> there was another issue but I don't quite recall the specifics.
>>> Can you post a quick recap of what the issues are with that approach
>>> so we can determine whether or not this is still an option?
>>
>> Yes, i believe without SNP_SHUTDOWN, early_enable_iommus() configure the
>> IOMMUs into an IRQ remapping configuration causing the crash in
>> io_apic.c::check_timer().
>>
>> It looks like in this case, we enable IRQ remapping configuration *earlier*
>> than when it needs to be enabled and which causes the panic as indicated:
>>
>> EMERGENCY [ 1.376701] Kernel panic - not syncing: timer doesn't work
>> through Interrupt-remapped IO-APIC
>
> I assume the problem is that IOMMU setup fails in the kdump kernel, not that it
> does the setup earlier. That's that part I want to understand.

Here is a deeper understanding of this issue:

It looks like this is happening: when we do SNP_SHUTDOWN without IOMMU_SNP_SHUTDOWN during panic, kdump boot runs with iommu snp
enforcement still enabled and IOMMU completion wait buffers (cwb) still locked and exclusivity still setup on those, and then in
kdump boot, we allocate new iommu completion wait buffers and try to use them, but we get a iommu command completion wait time-out,
due to the locked in (prev) completion wait buffers, the newly allocated completion wait buffers are not getting used for iommu
command execution and completion indication :

[ 1.711588] AMD-Vi: early_amd_iommu_init: irq remaping enabled
[ 1.718972] AMD-Vi: in early_enable_iommus
[ 1.723543] AMD-Vi: Translation is already enabled - trying to copy translation structures
[ 1.733333] AMD-Vi: Copied DEV table from previous kernel.
[ 1.739566] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.11.0-rc6-next-20240903-snp-host-f2a41ff576cc+ #78
[ 1.750920] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM100AB 10/17/2022
[ 1.759950] Call Trace:
[ 1.762677] <TASK>
[ 1.765018] dump_stack_lvl+0x70/0x90
[ 1.769109] dump_stack+0x14/0x20
[ 1.772809] iommu_completion_wait.part.0.isra.0+0x38/0x140
[ 1.779035] amd_iommu_flush_all_caches+0xa3/0x240
[ 1.784383] ? memcpy_toio+0x25/0xc0
[ 1.788372] early_enable_iommus+0x151/0x880
[ 1.793140] state_next+0xe67/0x22b0
[ 1.797130] ? __raw_callee_save___native_queued_spin_unlock+0x19/0x30
[ 1.804421] amd_iommu_enable+0x24/0x60
[ 1.808702] irq_remapping_enable+0x1f/0x50
[ 1.813371] enable_IR_x2apic+0x155/0x260
[ 1.817848] x86_64_probe_apic+0x13/0x70
[ 1.822226] apic_intr_mode_init+0x39/0xf0
[ 1.826799] x86_late_time_init+0x28/0x40
[ 1.831266] start_kernel+0x6ad/0xb50
[ 1.835436] x86_64_start_reservations+0x1c/0x30
[ 1.840591] x86_64_start_kernel+0xbf/0x110
[ 1.845256] ? setup_ghcb+0x12/0x130
[ 1.849247] common_startup_64+0x13e/0x141
[ 1.853821] </TASK>
[ 2.077901] AMD-Vi: Completion-Wait loop timed out
...

And because of this the iommu command, in this case which is for enabling irq remapping does not succeed and that eventually causes
timer to fail without irq remapping support enabled.

Once IOMMU SNP support is enabled, to enforce RMP enforcement the IOMMU completion wait buffers are setup as read-only and
exclusivity set on these and additionally the IOMMU registers used to mark the exclusivity on the store addresses associated with
these CWB is also locked. This enforcement of SNP in the IOMMU is only disabled with the IOMMU_SNP_SHUTDOWN parameter with
SNP_SHUTDOWN_EX command.

>From the AMD IOMMU specifications:

2.12.2.2 SEV-SNP COMPLETION_WAIT Store Restrictions On systems that are SNP-enabled, the store address associated with any host
COMPLETION_WAIT command (s=1) is restricted. The Store Address must fall within the address range specified by the Completion Store
Base and Completion Store Limit registers. When the system is SNP-enabled, the memory within this range will be marked in the RMP
using a special immutable state by the PSP. This memory region will be readable by the CPU but not writable.

2.12.2.3 SEV-SNP Exclusion Range Restrictions The exclusion range feature is not supported on systems that are SNP-enabled.
Additionally, the Exclusion Base and Exclusion Range Limit registers are re-purposed to act as the Completion Store Base and Limit
registers.

Therefore, we need to disable IOMMU SNP enforcement with SNP_SHUTDOWN_EX command before the kdump kernel starts booting as we can't
setup IOMMU CWB again in kdump as SEV-SNP exclusion base and range limit registers are locked as IOMMU SNP support is still enabled.

I tried to use the previous kernel's CWB (cmd_sem) as below:

static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
{
if (!is_kdump_kernel())
iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
else {
if (check_feature(FEATURE_SNP)) {
u64 cwwb_sem_paddr;

cwwb_sem_paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET);
iommu->cmd_sem = iommu_phys_to_virt(cwwb_sem_paddr);
return iommu->cmd_sem ? 0 : -ENOMEM;
}
}

return iommu->cmd_sem ? 0 : -ENOMEM;
}

I tried this, but this fails as i believe the kdump kernel will not have these previous kernel's allocated IOMMU CWB in the kernel
direct map :

[ 1.708959] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.714327] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100805000, cmd_sem_vaddr 0xffff9f5340805000
[ 1.726309] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.731676] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050051000, cmd_sem_vaddr 0xffff9f6290051000
[ 1.743742] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.749109] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050052000, cmd_sem_vaddr 0xffff9f6290052000
[ 1.761177] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.766542] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100808000, cmd_sem_vaddr 0xffff9f5340808000
[ 1.778509] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.783877] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050053000, cmd_sem_vaddr 0xffff9f6290053000
[ 1.795942] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.801300] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100809000, cmd_sem_vaddr 0xffff9f5340809000
[ 1.813268] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.818636] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050054000, cmd_sem_vaddr 0xffff9f6290054000
[ 1.830701] AMD-Vi: in alloc_cwwb_sem kdump kernel
[ 1.836069] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x10080a000, cmd_sem_vaddr 0xffff9f534080a000
[ 1.848039] AMD-Vi: early_amd_iommu_init: irq remaping enabled
[ 1.855431] AMD-Vi: in early_enable_iommus
[ 1.860032] AMD-Vi: Translation is already enabled - trying to copy translation structures
[ 1.869812] AMD-Vi: Copied DEV table from previous kernel.
[ 1.875958] AMD-Vi: in build_completion_wait, paddr = 0x100805000
[ 1.882766] BUG: unable to handle page fault for address: ffff9f5340805000
[ 1.890441] #PF: supervisor read access in kernel mode
[ 1.896177] #PF: error_code(0x0000) - not-present page

....

I think that memremap(..,..,MEMREMAP_WB) will also fail for the same reason as memremap(.., MEMREMAP_WB) for the RAM region will
again use the kernel directmap.

So it looks like we need to support IOMMU_SNP_SHUTDOWN with SNP_SHUTDOWN_EX command before kdump kernel starts booting.

Thanks,
Ashish