Re: [PATCH 20/24] irqchip/gic-v5: Add GICv5 LPI/IPI support

From: Arnd Bergmann
Date: Wed Apr 09 2025 - 10:27:10 EST


On Wed, Apr 9, 2025, at 15:15, Lorenzo Pieralisi wrote:
> On Wed, Apr 09, 2025 at 12:56:52PM +0200, Arnd Bergmann wrote:
>
> KMALLOC_MAX_SIZE is set according to MAX_PAGE_ORDER, that should
> be fine for most set-ups (well, obviously implementations that
> only support a 1-level IST can't expect a very large number of
> IRQs - we set that to 12 bits worth of IDs deliberately but
> given the current memory allocation limits it can be much higher).
>
> A 2-level IST can easily manage 24-bits worth of IDs split into
> two-level tables with the current kmalloc() limits.
>
> For the ITS DT and ITT the same reasoning goes, so the capping
> is the (rare) exception not the rule and I don't expect this to be a
> problem at all or I am missing something.

Ok, just mention that estimation in the source code. If someone
ever runs into the limit and it does become a problem, they can
then figure out whether they have an unusually small
KMALLOC_MAX_SIZE or an unusually large number of interupts.

>> >> Do you expect actual implementation to not be cache-coherent?
>> >
>> > It is allowed by the architecture - I don't have a crystal ball
>> > but if I want to add support for a non-coherent IRS the DMA mapping
>> > like sequence above has to be there - alternatives are welcome.
>>
>> I see that we have a few GICv3 implementations that are marked
>> as non-coherent in DT. I don't understand why they'd do that,
>> but I guess there is not much to be done about it.
>
> You don't understand why the GIC HW is not coherent or why we set it
> up as such in the driver ?

I meant why hardware would be built like that. I would have
assumed that the GIC is designed to be closely tied to the
CPU core and the L2 cache, so it shouldn't be hard to make
it coherent even if the rest of the system is not.

>> The only other idea I have would be to use an uncached allocation
>> for the non-coherent case, the same way that dma_alloc_coherent()
>> or maybe dma_alloc_wc() does. This still has the same problem
>> with bypassing the dma-mapping.h interface because of the lack
>> of a device pointer, but it would at least avoid the cache flushes
>> at runtime. If I read this code right, the data in here is only
>> written by the CPU and read by the GIC, so a WC buffer wouldn't
>> be more expensive, right?
>
> The IST is also written by the GIC, the CPU reads it (explicity, with a
> memory read rather than through instructions) only if the table is two
> level and we are allocating L2 entries on demand to check whether an
> L2 entry is valid.
>
> I am not sure the CMOs are that bad given that's what we do
> for GICv3 already but it is worth looking into it.

If the reads are common enough, then an uncached mapping would
likely be slower than the flushes. Not sure which way is better
without L2, you'd probably have to measure on real hardware,
though you could perhaps do that on a GICv3 one if the access
patterns are similar enough.

Arnd