Re: [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection

From: Andy Shevchenko

Date: Thu Jan 15 2026 - 03:35:02 EST


On Thu, Jan 15, 2026 at 09:49:09AM +0200, Ciju Rajan K wrote:
> This patch enables the interrupt storm detection feature and
> also adds the per device counter for tracking the faulty
> devices. It also masks the faulty devices from generating
> any further interrupts.
>
> Add field for interrupt storm handling.
> Extend structure mlxreg_core_data with the following field:
> 'wmark_cntr' - interrupt storm counter.
>
> Extend structure mlxreg_core_item with the following field:
> 'storming_bits' - interrupt storming bits mask.

...

> +static void mlxreg_hotplug_storm_handler(unsigned int irq, unsigned int freq, void *dev_id)
> +{
> + struct mlxreg_hotplug_priv_data *priv = dev_id;
> + struct mlxreg_core_hotplug_platform_data *pdata;
> + struct mlxreg_core_item *item;
> + struct mlxreg_core_data *data;
> + unsigned long asserted;
> + u32 bit;
> +
> + dev_warn(priv->dev,
> + "Interrupt storm detected on IRQ %u (%u interrupts/sec)",
> + irq, freq);

Below you put long line, here it seems wrapped by 80, why so inconsistent?
Please, choose one style and use it everywhere (inside the same file).

> + pdata = dev_get_platdata(&priv->pdev->dev);
> + item = pdata->items;
> + asserted = item->cache;
> +
> + for_each_set_bit(bit, &asserted, 8) {
> + int pos;
> +
> + pos = mlxreg_hotplug_item_label_index_get(item->mask, bit);
> + if (pos < 0)

> + goto out;

Used only once. Just drop the label and move the related code under the branch.

> + data = item->data + pos;
> + /* Check per device interrupt counter */
> + if (data->wmark_cntr >= MLXREG_HOTPLUG_INTR_FREQ_HZ - 1) {
> + dev_err(priv->dev,
> + "Storming bit %d (label: %s) - interrupt masked permanently. Replace broken HW.",
> + bit, data->label);
> + /* Mark bit as storming. */
> + item->storming_bits |= BIT(bit);
> + }
> + data->wmark_cntr = 0;
> + }
> + return;
> + out:
> + dev_err(priv->dev, "Failed to complete interrupt storm handler\n");
> +}

...

> + /* Register with generic interrupt storm detection */
> + if (!irq_register_storm_detection(priv->irq, MLXREG_HOTPLUG_INTR_FREQ_HZ,
> + mlxreg_hotplug_storm_handler, priv)) {
> + dev_warn(&pdev->dev, "Failed to register generic interrupt storm detection\n");
> + } else {
> + dev_info(&pdev->dev, "Registered generic storm detection for IRQ %d\n", priv->irq);
> + }

Invert the conditional, it will be slightly easier to parse.

...

> struct mlxreg_core_data {
> char label[MLXREG_CORE_LABEL_MAX_SIZE];

> u8 regnum;
> u8 slot;
> u8 secured;
> + unsigned int wmark_cntr;
> };

Have you run `pahole`? No issues / room to improve this layout?

...

> struct mlxreg_core_item {
> struct mlxreg_core_data *data;

> u8 ind;
> u8 inversed;
> u8 health;
> + u32 storming_bits;
> };

Ditto.

--
With Best Regards,
Andy Shevchenko