Re: [PATCH v5 38/41] arm_mpam: Add workaround for T241-MPAM-4

From: Ben Horgan

Date: Tue Mar 10 2026 - 07:39:35 EST


Hi Fenghua,

On 3/9/26 17:39, Fenghua Yu wrote:
> Hi, Ben,
>
> On 3/2/26 09:11, Ben Horgan wrote:
>> Hi Fenghua,
>>
>> On 3/1/26 17:28, Fenghua Yu wrote:
>>> Hi, Ben,
>>>
>>> On 2/24/26 09:57, Ben Horgan wrote:
>>>> From: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
>>>>
>>>> In the T241 implementation of memory-bandwidth partitioning, in the
>>>> absence
>>>> of contention for bandwidth, the minimum bandwidth setting can
>>>> affect the
>>>> amount of achieved bandwidth. Specifically, the achieved bandwidth
>>>> in the
>>>> absence of contention can settle to any value between the values of
>>>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.  Also, if MPAMCFG_MBW_MIN is set
>>>> zero (below 0.78125%), once a core enters a throttled state, it will
>>>> never
>>>> leave that state.
>>>>
>>>> The first issue is not a concern if the MPAM software allows to program
>>>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>>>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>>>>
>>>> In the scenario where the resctrl doesn't support the MBW_MIN
>>>> interface via
>>>> sysfs, to achieve bandwidth closer to MBW_MAX in the absence of
>>>> contention,
>>>> software should configure a relatively narrow gap between MBW_MIN and
>>>> MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
>>>>
>>>> Clear the feature MBW_MIN feature from the class to ensure we don't
>>>> accidentally change behaviour when resctrl adds support for a MBW_MIN
>>>> interface.
>>>>
>>>> Tested-by: Gavin Shan <gshan@xxxxxxxxxx>
>>>> Tested-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
>>>> Reviewed-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
>>>> Signed-off-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
>>>> Signed-off-by: James Morse <james.morse@xxxxxxx>
>>>> Signed-off-by: Ben Horgan <ben.horgan@xxxxxxx>
>>>
>>> Reviewed-by: Fenghua Yu <fenghuay@xxxxxxxxxx>
>>>
>>> This patch itself is good.
>>>
>>> Please check the following comments.
>>>
>>>> ---
>>>> [ morse: Added as second quirk, adapted to use the new intermediate
>>>> values
>>>> in mpam_extend_config() ]
>>>>
>>>> Changes since rfc:
>>>> MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
>>>> Handling when reset_mbw_min is set
>>>>
>>>> Changes since v3:
>>>> Move the 5% gap policy back here
>>>> Clear mbw_min feature in class
>>>> ---
>>>>    Documentation/arch/arm64/silicon-errata.rst |  2 +
>>>>    drivers/resctrl/mpam_devices.c              | 50 ++++++++++++++++
>>>> +++--
>>>>    drivers/resctrl/mpam_internal.h             |  1 +
>>>>    3 files changed, 50 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/
>>>> Documentation/arch/arm64/silicon-errata.rst
>>>> index a65620f98e3a..a4b246655e37 100644
>>>> --- a/Documentation/arch/arm64/silicon-errata.rst
>>>> +++ b/Documentation/arch/arm64/silicon-errata.rst
>>>> @@ -249,6 +249,8 @@ stable kernels.
>>>>    +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>>    | NVIDIA         | T241 MPAM       | T241-MPAM-1     | N/
>>>> A                         |
>>>>    +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>> +| NVIDIA         | T241 MPAM       | T241-MPAM-4     | N/
>>>> A                         |
>>>> ++----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>>    +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>>    | Freescale/NXP  | LS2080A/LS1043A | A-008585        |
>>>> FSL_ERRATUM_A008585         |
>>>>    +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>>>> mpam_devices.c
>>>> index 08cb080592d9..8f44e9dee207 100644
>>>> --- a/drivers/resctrl/mpam_devices.c
>>>> +++ b/drivers/resctrl/mpam_devices.c
>>>> @@ -679,6 +679,12 @@ static const struct mpam_quirk mpam_quirks[] = {
>>>>        .iidr_mask  = MPAM_IIDR_MATCH_ONE,
>>>>        .workaround = T241_SCRUB_SHADOW_REGS,
>>>>        },
>>>> +    {
>>>> +    /* NVIDIA t241 erratum T241-MPAM-4 */
>>>> +    .iidr       = MPAM_IIDR_NVIDIA_T241,
>>>> +    .iidr_mask  = MPAM_IIDR_MATCH_ONE,
>>>> +    .workaround = T241_FORCE_MBW_MIN_TO_ONE,
>>>> +    },
>>>>        { NULL } /* Sentinel */
>>>>    };
>>>>    @@ -1464,6 +1470,31 @@ static void
>>>> mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
>>>>            mpam_apply_t241_erratum(ris, partid);
>>>>    }
>>>>    +static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props
>>>> *props)
>>>> +{
>>>> +    u16 max_hw_value, min_hw_granule, res0_bits;
>>>> +
>>>> +    res0_bits = 16 - props->bwa_wd;
>>>> +    max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
>>>> +    min_hw_granule = ~max_hw_value;
>>>> +
>>>> +    return min_hw_granule + 1;
>>>> +}
>>>> +
>>>> +static u16 mpam_wa_t241_calc_min_from_max(struct mpam_config *cfg)
>>>> +{
>>>> +    u16 val = 0;
>>>> +
>>>> +    if (mpam_has_feature(mpam_feat_mbw_max, cfg)) {
>>>
>>> But the problem is mpam_feat_mbw_max feature is NOT set in cfg.
>>>
>>>> +        u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>>>> +
>>>> +        if (cfg->mbw_max > delta)
>>>> +            val = cfg->mbw_max - delta;
>>>> +    }
>>>> +
>>>> +    return val;
>>>
>>> So 0 is always returned.
>>>
>>> The workaround will set mbw_min as 1% which is too small and will cause
>>> performance degradation, e.g. about 20% degradation on some benchmarks.
>>>
>>> This patch itself doesn't have any issue.
>>>
>>> The issue is the mbw_max feature bit in cfg is not set.
>>
>> This is intended behaviour as the reset is done independently
>> from the value set in the config. The value is there so that
>> resctrl can display the expected values.
>>
>>> This is a legacy issue, not introduced by this patch set.
>>>> Here is a fix patch for the issue:
>>> https://lore.kernel.org/lkml/20260301171829.1357886-1-
>>> fenghuay@xxxxxxxxxx/T/#u
>>
>> I've commented on that patch. I think it's best to fix it in the context
>> of the erratum.
>>
>> Does the below solve your performance problems?
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>> mpam_devices.c
>> index 236f78ab9163..60d3d3e2193f 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1515,16 +1515,20 @@ static u16
>> mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
>>          return min_hw_granule + 1;
>>   }
>>   -static u16 mpam_wa_t241_calc_min_from_max(struct mpam_config *cfg)
>> +static u16 mpam_wa_t241_calc_min_from_max(struct mpam_props *props,
>> +                                         struct mpam_config *cfg)
>>   {
>>          u16 val = 0;
>> +       u16 max;
>> +       u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>>   -       if (mpam_has_feature(mpam_feat_mbw_max, cfg)) {
>> -               u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>> +       if (mpam_has_feature(mpam_feat_mbw_max, cfg))
>> +               max = cfg->mbw_max;
>> +       else
>> +               max = GENMASK(15, 16 - cprops->bwa_wd);
>>   -               if (cfg->mbw_max > delta)
>> -                       val = cfg->mbw_max - delta;
>> -       }
>
> Could you please add some comments on this piece of code? It's worth to
> comment on why there are different values on cfg and props.

Sure, how about this?

} else {
/* Resetting so use the ris specific default. */
max = GENMASK(15, 16 - props->bwa_wd);
}


>> +       if (max > delta)
>> +               val = max - delta;
>>            return val;
>>   }
>> @@ -1577,9 +1581,8 @@ static void mpam_reprogram_ris_partid(struct
>> mpam_msc_ris *ris, u16 partid,
>>                  if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, msc)) {
>>                          u16 min =
>> mpam_wa_t241_force_mbw_min_to_one(rprops);
>>   -                       val = mpam_wa_t241_calc_min_from_max(cfg);
>> -                       if (val < min)
>> -                               val = min;
>> +                       val = mpam_wa_t241_calc_min_from_max(rprops,
>> cfg);
>> +                       val = max(val, min);
>>                  }
>>                    mpam_write_partsel_reg(msc, MBW_MIN, val);
>>
>
> Otherwise, this change looks good to me.

Did you get a chance to confirm if it behaves as expected on your harware?

>
> Thanks.
>
> -Fenghua

Thanks,

Ben