Re: [PATCH platform-next v3 1/1] [PATCH platform-next] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm

From: Andy Shevchenko
Date: Mon Oct 27 2025 - 08:07:01 EST


On Tue, Sep 23, 2025 at 04:49:54PM +0300, Ciju Rajan K wrote:
> In case of broken hardware, it is possible that broken device will
> flood interrupt handler with false events. For example, if fan or
> power supply has damaged presence pin, it will cause permanent
> generation of plugged in / plugged out events. As a result, interrupt
> handler will consume a lot of CPU resources and will keep raising
> "UDEV" events to the user space.
>
> This patch provides a mechanism to detect device causing interrupt
> flooding and mask interrupt for this specific device, to isolate
> from interrupt handling flow. Use the following criteria: if the
> specific interrupt was generated 'N' times during 'T' seconds,
> such device is to be considered as broken and will be closed for
> getting interrupts. User will be notified through the log error
> and will be instructed to replace broken device.
>
> Add fields for interrupt storm handling.
> Extend structure mlxreg_core_data with the following fields:
> 'wmark_cntr' - interrupt storm counter.
> 'wmark_window' - time window to count interrupts to check for storm.
>
> Extend structure mlxreg_core_item with the following field:
> 'storming_bits' - interrupt storming bits mask.

...

> for_each_set_bit(bit, &asserted, 8) {
> int pos;
>
> + /* Skip already marked storming bit. */
> + if (item->storming_bits & BIT(bit))
> + continue;

Instead, just mask "asserted" correspondingly before for-loop.

...

> struct mlxreg_core_data {

> u8 regnum;
> u8 slot;
> u8 secured;
> + unsigned int wmark_cntr;
> + unsigned long wmark_window;

Is it okay to use variadic (arch-dependent) types? The context suggests that
this data sturcture has fixed-width fields.

> };

--
With Best Regards,
Andy Shevchenko