Re: [PATCH] irqchip/gic: prevent buffer overflow in gic_ipi_send_mask()

From: Sergey Shtylyov
Date: Mon Sep 09 2024 - 15:48:45 EST


On 9/8/24 12:37 PM, Marc Zyngier wrote:
[...]

>>>>> ARM GIC arch v2 spec claims support for just 8 CPU interfaces. However,
>>>>> looking at the GIC driver's irq_set_affinity() method, it seems that the
>>>>> passed CPU mask may contain the logical CPU #s beyond 8, and that method
>>>>> filters them out before reading gic_cpu_map[], bailing out with
>>>>> -EINVAL.
>>>>
>>>> The reasoning is correct in theory, but in reality it's a non problem.
>>>>
>>>> Simply because processors which use this GIC version cannot have more
>>>> than 8 cores.
>>>>
>>>> That means num_possible_cpus() <= 8 so the cpumask handed in cannot have
>>>> bits >= 8 set. Ergo for_each_cpu() can't return a bit which is >= 8.

[...]

>>> 33de0aa4bae98, the affinity that the driver gets is narrowed to what
>>> is actually *online*.
>>
>> What I haven't quite understood from my (cursory) looking at the GICv2
>> spec (and the GIC driver) is why only one CPU (with a lowest #) is selected
>> from *mask_val before writing to GICD_GIC_DIST_TARGET, while the spec holds
>> that an IRQ can be forwarded to any set of 8 CPU interfaces...
>
> Because on all the existing implementations, having more than a single
> target in GICD_ITARGETSRn results in all the targeted CPUs to be
> interrupted, with the guarantee that only one will see the actual
> interrupt (the read from GICC_IAR returns a value that is not 0x3ff),
> and everyone else will only see a spurious interrupt (0x3ff). This is
> because the distributor does not track which CPU is actually in a
> position to handle the interrupt.

Ah! Previously I was only familiar with the x86 {I/O,local} APICs,
and my recollection was that they somehow manage to negotiate that
matter over the APIC bus... but my knowledge it pretty dated, I've
had almost no part in the x86 Linux development. :-(

> While this can be, under limited circumstances, beneficial from an
> interrupt servicing latency, it is always bad from a global throughput
> perspective. You end-up thrashing CPU caches, generating odd latencies
> in unsuspecting code, and in general with disappointing performance.
>
> Thankfully, GIC (v1/v2) is a dead horse, and v3 doesn't have this
> particular problem (it replaced it with a bigger one in the form of
> 1:n distribution).

GICv2 spec does talk about 1-N and N-N interrupt handling modes;
at the same time, I can't find such words in the GICv3/4 spec. :-)

Thanks a lot for your explanations! Despite being involved in the
ARM dev't since 2008, I have a limited knowledge of the ARM low-level
things... :-(

> M.

MBR, Sergey