Re: [v3 PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size

From: Will Deacon
Date: Tue Oct 08 2024 - 09:35:17 EST

Next message: Paul E Luse: "Re: [RFC V8] md/bitmap: Optimize lock contention."
Previous message: Krzysztof Kozlowski: "Re: [PATCH v2 1/1] dt-bindings: watchdog: fsl-imx-wdt: Add missing 'big-endian' property"
Next in thread: Jason Gunthorpe: "Re: [v3 PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi folks,

Sorry I'm late to the party, I went fishing.

On Fri, Oct 04, 2024 at 11:04:05AM -0700, Yang Shi wrote:
> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
> is 32 bit value.
> However some platforms, for example, AmpereOne and the platforms with
> ARM MMU-700, have 32-bit stream id size. This resulted in ouf-of-bound shift.
> The disassembly of shift is:
>
> ldr w2, [x19, 828] //, smmu_7(D)->sid_bits
> mov w20, 1
> lsl w20, w20, w2
>
> According to ARM spec, if the registers are 32 bit, the instruction actually
> does:
> dest = src << (shift % 32)
>
> So it actually shifted by zero bit.
>
> The out-of-bound shift is also undefined behavior according to C
> language standard.
>
> This caused v6.12-rc1 failed to boot on such platforms.
>
> UBSAN also reported:
>
> UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
> shift exponent 32 is too large for 32-bit type 'int'
>
> Using 64 bit immediate when doing shift can solve the problem. The
> disassembly after the fix looks like:
> ldr w20, [x19, 828] //, smmu_7(D)->sid_bits
> mov x0, 1
> lsl x0, x0, x20
>
> There are a couple of problematic places, extracted the shift into a helper.
>
> Fixes: ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> Tested-by: James Morse <james.morse@xxxxxxx>
> Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Signed-off-by: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +++++++++++-----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +++++
> 2 files changed, 16 insertions(+), 5 deletions(-)

[...]

> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 737c5b882355..9d4fc91d9258 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> {
> u32 l1size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> + u64 num_sids = arm_smmu_strtab_num_sids(smmu);
> unsigned int last_sid_idx =
> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> + arm_smmu_strtab_l1_idx(num_sids - 1);
>
> /* Calculate the L1 size, capped to the SIDSIZE. */
> cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
> @@ -3655,20 +3656,25 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>
> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> {
> - u32 size;
> + u64 size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> + u64 num_sids = arm_smmu_strtab_num_sids(smmu);
> +
> + size = num_sids * sizeof(struct arm_smmu_ste);
> + /* The max size for dmam_alloc_coherent() is 32-bit */
> + if (size > SIZE_MAX)
> + return -EINVAL;
>
> - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
> cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
> &cfg->linear.ste_dma,
> GFP_KERNEL);
> if (!cfg->linear.table) {
> dev_err(smmu->dev,
> - "failed to allocate linear stream table (%u bytes)\n",
> + "failed to allocate linear stream table (%llu bytes)\n",
> size);
> return -ENOMEM;
> }
> - cfg->linear.num_ents = 1 << smmu->sid_bits;
> + cfg->linear.num_ents = num_sids;

This all looks a bit messy to me. The architecture guarantees that
2-level stream tables are supported once we hit 7-bit SIDs and, although
the driver relaxes this to > 8-bit SIDs, we'll never run into overflow
problems in the linear table code above.

So I'm inclined to take Daniel's one-liner [1] which just chucks the
'ULL' suffix into the 2-level case. Otherwise, we're in a weird
situation where the size is 64-bit for a short while until it gets
truncated anyway when we assign it to a 32-bit field.

Any objections?

Will

[1] https://lore.kernel.org/r/20241002015357.1766934-1-danielmentz@xxxxxxxxxx

Next message: Paul E Luse: "Re: [RFC V8] md/bitmap: Optimize lock contention."
Previous message: Krzysztof Kozlowski: "Re: [PATCH v2 1/1] dt-bindings: watchdog: fsl-imx-wdt: Add missing 'big-endian' property"
Next in thread: Jason Gunthorpe: "Re: [v3 PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]