Re: [PATCH] Allocate DMAR fault interrupts locally

From: Thomas Gleixner
Date: Thu Feb 29 2024 - 17:18:54 EST


Dimitri!

On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote:

The subject lacks a subsystem prefix. You're doing this for how many
decades now?

> The Intel IOMMU code currently tries to allocate all DMAR fault interrupt
>
> +#ifdef CONFIG_X86_LOCAL_APIC

I seriously doubt that this code can ever be compiled w/o X86_LOCAL_APIC:

obj-$(CONFIG_DMAR_TABLE) += dmar.o

config DMAR_TABLE
bool

config INTEL_IOMMU
depends on PCI_MSI && ACPI && X86
select DMAR_TABLE

config IRQ_REMAP
depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
select DMAR_TABLE

config X86_LOCAL_APIC
def_bool y
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI

What are you trying to achieve here other than #ifdef voodoo?

> +static void __init irq_remap_enable_fault_handling_thr(struct work_struct *work)
> +{
> + irq_remap_enable_fault_handling();

because if INTEL_IOMMU=y and IRQ_REMAP=n then X86_LOCAL_APIC=y and this
muck gets invoked for nothing. 'git grep irq_remap_enable_fault_handling
include/' might give you a hint.

> +}
> +
> +static int __init assign_dmar_vectors(void)
> +{
> + struct work_struct irq_remap_work;
> + int nid;
> +
> + INIT_WORK(&irq_remap_work, irq_remap_enable_fault_handling_thr);
> + cpus_read_lock();
> + for_each_online_node(nid) {
> + /* Boot cpu dmar vectors are assigned before the rest */
> + if (nid == cpu_to_node(get_boot_cpu_id()))
> + continue;
> + schedule_work_on(cpumask_first(cpumask_of_node(nid)),
> + &irq_remap_work);
> + flush_work(&irq_remap_work);
> + }
> + cpus_read_unlock();
> + return 0;
> +}
> +
> +arch_initcall(assign_dmar_vectors);

Stray newline before arch_initcall(), but that's not the problem.

The real problems are:

1) This approach only works when _ALL_ APs have been brought up during
boot. With 'maxcpus=N' on the command line this will fail to enable
fault handling when the APs which have not been brought up initially
are onlined later on.

This might be working in practice because intel_iommu_init() will
enable the interrupts later on via init_dmars() unconditionally, but
that's far from correct because IRQ_REMAP does not depend on
INTEL_IOMMU.

2) It leaves a gap where the reporting is not working between bringing
up the APs during boot and this initcall. Mostly theoretical, but
that does not make it more correct either.

What you really want is a cpu hotplug state in the CPUHP_BP_PREPARE_DYN
space which enables the interrupt for the node _before_ the first AP of
the node is brought up. That will solve the problem nicely w/o any of
the above issues.

Thanks,

tglx