Re: [PATCH v5 38/41] arm_mpam: Add workaround for T241-MPAM-4
From: Ben Horgan
Date: Tue Mar 10 2026 - 07:39:35 EST
Hi Fenghua,
On 3/9/26 17:39, Fenghua Yu wrote:
> Hi, Ben,
>
> On 3/2/26 09:11, Ben Horgan wrote:
>> Hi Fenghua,
>>
>> On 3/1/26 17:28, Fenghua Yu wrote:
>>> Hi, Ben,
>>>
>>> On 2/24/26 09:57, Ben Horgan wrote:
>>>> From: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
>>>>
>>>> In the T241 implementation of memory-bandwidth partitioning, in the
>>>> absence
>>>> of contention for bandwidth, the minimum bandwidth setting can
>>>> affect the
>>>> amount of achieved bandwidth. Specifically, the achieved bandwidth
>>>> in the
>>>> absence of contention can settle to any value between the values of
>>>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
>>>> zero (below 0.78125%), once a core enters a throttled state, it will
>>>> never
>>>> leave that state.
>>>>
>>>> The first issue is not a concern if the MPAM software allows to program
>>>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>>>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>>>>
>>>> In the scenario where the resctrl doesn't support the MBW_MIN
>>>> interface via
>>>> sysfs, to achieve bandwidth closer to MBW_MAX in the absence of
>>>> contention,
>>>> software should configure a relatively narrow gap between MBW_MIN and
>>>> MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
>>>>
>>>> Clear the feature MBW_MIN feature from the class to ensure we don't
>>>> accidentally change behaviour when resctrl adds support for a MBW_MIN
>>>> interface.
>>>>
>>>> Tested-by: Gavin Shan <gshan@xxxxxxxxxx>
>>>> Tested-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
>>>> Reviewed-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
>>>> Signed-off-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
>>>> Signed-off-by: James Morse <james.morse@xxxxxxx>
>>>> Signed-off-by: Ben Horgan <ben.horgan@xxxxxxx>
>>>
>>> Reviewed-by: Fenghua Yu <fenghuay@xxxxxxxxxx>
>>>
>>> This patch itself is good.
>>>
>>> Please check the following comments.
>>>
>>>> ---
>>>> [ morse: Added as second quirk, adapted to use the new intermediate
>>>> values
>>>> in mpam_extend_config() ]
>>>>
>>>> Changes since rfc:
>>>> MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
>>>> Handling when reset_mbw_min is set
>>>>
>>>> Changes since v3:
>>>> Move the 5% gap policy back here
>>>> Clear mbw_min feature in class
>>>> ---
>>>> Documentation/arch/arm64/silicon-errata.rst | 2 +
>>>> drivers/resctrl/mpam_devices.c | 50 ++++++++++++++++
>>>> +++--
>>>> drivers/resctrl/mpam_internal.h | 1 +
>>>> 3 files changed, 50 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/
>>>> Documentation/arch/arm64/silicon-errata.rst
>>>> index a65620f98e3a..a4b246655e37 100644
>>>> --- a/Documentation/arch/arm64/silicon-errata.rst
>>>> +++ b/Documentation/arch/arm64/silicon-errata.rst
>>>> @@ -249,6 +249,8 @@ stable kernels.
>>>> +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>> | NVIDIA | T241 MPAM | T241-MPAM-1 | N/
>>>> A |
>>>> +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>> +| NVIDIA | T241 MPAM | T241-MPAM-4 | N/
>>>> A |
>>>> ++----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>> +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>> | Freescale/NXP | LS2080A/LS1043A | A-008585 |
>>>> FSL_ERRATUM_A008585 |
>>>> +----------------+-----------------+-----------------
>>>> +-----------------------------+
>>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>>>> mpam_devices.c
>>>> index 08cb080592d9..8f44e9dee207 100644
>>>> --- a/drivers/resctrl/mpam_devices.c
>>>> +++ b/drivers/resctrl/mpam_devices.c
>>>> @@ -679,6 +679,12 @@ static const struct mpam_quirk mpam_quirks[] = {
>>>> .iidr_mask = MPAM_IIDR_MATCH_ONE,
>>>> .workaround = T241_SCRUB_SHADOW_REGS,
>>>> },
>>>> + {
>>>> + /* NVIDIA t241 erratum T241-MPAM-4 */
>>>> + .iidr = MPAM_IIDR_NVIDIA_T241,
>>>> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
>>>> + .workaround = T241_FORCE_MBW_MIN_TO_ONE,
>>>> + },
>>>> { NULL } /* Sentinel */
>>>> };
>>>> @@ -1464,6 +1470,31 @@ static void
>>>> mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
>>>> mpam_apply_t241_erratum(ris, partid);
>>>> }
>>>> +static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props
>>>> *props)
>>>> +{
>>>> + u16 max_hw_value, min_hw_granule, res0_bits;
>>>> +
>>>> + res0_bits = 16 - props->bwa_wd;
>>>> + max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
>>>> + min_hw_granule = ~max_hw_value;
>>>> +
>>>> + return min_hw_granule + 1;
>>>> +}
>>>> +
>>>> +static u16 mpam_wa_t241_calc_min_from_max(struct mpam_config *cfg)
>>>> +{
>>>> + u16 val = 0;
>>>> +
>>>> + if (mpam_has_feature(mpam_feat_mbw_max, cfg)) {
>>>
>>> But the problem is mpam_feat_mbw_max feature is NOT set in cfg.
>>>
>>>> + u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>>>> +
>>>> + if (cfg->mbw_max > delta)
>>>> + val = cfg->mbw_max - delta;
>>>> + }
>>>> +
>>>> + return val;
>>>
>>> So 0 is always returned.
>>>
>>> The workaround will set mbw_min as 1% which is too small and will cause
>>> performance degradation, e.g. about 20% degradation on some benchmarks.
>>>
>>> This patch itself doesn't have any issue.
>>>
>>> The issue is the mbw_max feature bit in cfg is not set.
>>
>> This is intended behaviour as the reset is done independently
>> from the value set in the config. The value is there so that
>> resctrl can display the expected values.
>>
>>> This is a legacy issue, not introduced by this patch set.
>>>> Here is a fix patch for the issue:
>>> https://lore.kernel.org/lkml/20260301171829.1357886-1-
>>> fenghuay@xxxxxxxxxx/T/#u
>>
>> I've commented on that patch. I think it's best to fix it in the context
>> of the erratum.
>>
>> Does the below solve your performance problems?
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>> mpam_devices.c
>> index 236f78ab9163..60d3d3e2193f 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1515,16 +1515,20 @@ static u16
>> mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
>> return min_hw_granule + 1;
>> }
>> -static u16 mpam_wa_t241_calc_min_from_max(struct mpam_config *cfg)
>> +static u16 mpam_wa_t241_calc_min_from_max(struct mpam_props *props,
>> + struct mpam_config *cfg)
>> {
>> u16 val = 0;
>> + u16 max;
>> + u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>> - if (mpam_has_feature(mpam_feat_mbw_max, cfg)) {
>> - u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>> + if (mpam_has_feature(mpam_feat_mbw_max, cfg))
>> + max = cfg->mbw_max;
>> + else
>> + max = GENMASK(15, 16 - cprops->bwa_wd);
>> - if (cfg->mbw_max > delta)
>> - val = cfg->mbw_max - delta;
>> - }
>
> Could you please add some comments on this piece of code? It's worth to
> comment on why there are different values on cfg and props.
Sure, how about this?
} else {
/* Resetting so use the ris specific default. */
max = GENMASK(15, 16 - props->bwa_wd);
}
>> + if (max > delta)
>> + val = max - delta;
>> return val;
>> }
>> @@ -1577,9 +1581,8 @@ static void mpam_reprogram_ris_partid(struct
>> mpam_msc_ris *ris, u16 partid,
>> if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, msc)) {
>> u16 min =
>> mpam_wa_t241_force_mbw_min_to_one(rprops);
>> - val = mpam_wa_t241_calc_min_from_max(cfg);
>> - if (val < min)
>> - val = min;
>> + val = mpam_wa_t241_calc_min_from_max(rprops,
>> cfg);
>> + val = max(val, min);
>> }
>> mpam_write_partsel_reg(msc, MBW_MIN, val);
>>
>
> Otherwise, this change looks good to me.
Did you get a chance to confirm if it behaves as expected on your harware?
>
> Thanks.
>
> -Fenghua
Thanks,
Ben