Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

From: Jon Hunter
Date: Tue Jun 30 2020 - 11:17:57 EST

Next message: Sven Van Asbroeck: "Re: [EXT] Re: [PATCH v4 2/2] ARM: imx6plus: enable internal routing of clk_enet_ref where possible"
Previous message: Colin King: "[PATCH][next] net/mlx5e: fix memory leak of tls"
In reply to: Robin Murphy: "Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage"
Next in thread: Krishna Reddy: "RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 30/06/2020 15:53, Robin Murphy wrote:
> On 2020-06-30 09:19, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
>>> IOVA accesses across them.
>>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
>>> string for Tegra194 SoC SMMU topology.
>>
>> There is no description here of the 3rd SMMU that you mention below.
>> I think that we should describe the full picture here.
>> Â
>>> Signed-off-by: Krishna Reddy <vdumpa@xxxxxxxxxx>

...

>>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int
>>> page,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int sync, int status)
>>> +{
>>> +ÂÂÂ unsigned int delay;
>>> +
>>> +ÂÂÂ arm_smmu_writel(smmu, page, sync, 0);
>>> +
>>> +ÂÂÂ for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
>>
>> So we are doubling the delay every time? Is this better than just using
>> the same on each loop?
>
> This is the same logic as the main driver (see 8513c8930069) - the sync
> is expected to complete relatively quickly, hence why we have the inner
> spin loop to avoid the delay entirely in the typical case, and the
> longer it's taking, the more likely it is that something's wrong and it
> will never complete anyway. Realistically, a heavily loaded SMMU at a
> modest clock rate might take us through a couple of iterations of the
> outer loop, but beyond that we're pretty much just killing time until we
> declare it wedged and give up, and by then there's not much point in
> burning power frantically hamering on the interconnect.

Ah OK. Then maybe we should move the definitions for TLB_LOOP_TIMEOUT
and TLB_SPIN_COUNT into the arm-smmu.h so that we can use them directly
in this file instead of redefining them. Then it maybe clear that these
are part of the main driver.

>>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device
>>> *smmu)
>>> +{
>>> +ÂÂÂ unsigned int i;
>>> +ÂÂÂ struct nvidia_smmu *nvidia_smmu;
>>> +ÂÂÂ struct platform_device *pdev = to_platform_device(smmu->dev);
>>> +
>>> +ÂÂÂ nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu),
>>> GFP_KERNEL);
>>> +ÂÂÂ if (!nvidia_smmu)
>>> +ÂÂÂÂÂÂÂ return ERR_PTR(-ENOMEM);
>>> +
>>> +ÂÂÂ nvidia_smmu->smmu = *smmu;
>>> +ÂÂÂ /* Instance 0 is ioremapped by arm-smmu.c after this function
>>> returns */
>>> +ÂÂÂ nvidia_smmu->num_inst = 1;
>>> +
>>> +ÂÂÂ for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +ÂÂÂÂÂÂÂ struct resource *res;
>>> +
>>> +ÂÂÂÂÂÂÂ res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +ÂÂÂÂÂÂÂ if (!res)
>>> +ÂÂÂÂÂÂÂÂÂÂÂ break;
>>> +
>>> +ÂÂÂÂÂÂÂ nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +ÂÂÂÂÂÂÂ if (IS_ERR(nvidia_smmu->bases[i]))
>>> +ÂÂÂÂÂÂÂÂÂÂÂ return ERR_CAST(nvidia_smmu->bases[i]);
>>> +
>>> +ÂÂÂÂÂÂÂ nvidia_smmu->num_inst++;
>>> +ÂÂÂ }
>>> +
>>> +ÂÂÂ nvidia_smmu->smmu.impl = &nvidia_smmu_impl;
>>> +ÂÂÂ /*
>>> +ÂÂÂÂ * Free the arm_smmu_device struct allocated in arm-smmu.c.
>>> +ÂÂÂÂ * Once this function returns, arm-smmu.c would use arm_smmu_device
>>> +ÂÂÂÂ * allocated as part of nvidia_smmu struct.
>>> +ÂÂÂÂ */
>>> +ÂÂÂ devm_kfree(smmu->dev, smmu);
>>
>> Why don't we just store the pointer of the smmu struct passed to this
>> function
>> in the nvidia_smmu struct and then we do not need to free this here.
>> In other
>> words make ...
>>
>> Â struct nvidia_smmu {
>> ÂÂÂÂstruct arm_smmu_deviceÂÂÂ *smmu;
>> ÂÂÂÂunsigned intÂÂÂÂÂÂÂ num_inst;
>> ÂÂÂÂvoid __iomemÂÂÂÂÂÂÂ *bases[MAX_SMMU_INSTANCES];
>> Â };
>>
>> This seems more appropriate, than copying the struct and freeing memory
>> allocated else-where.
>
> But then how do you get back to struct nvidia_smmu given just a pointer
> to struct arm_smmu_device?

Ah yes of course that is what I was missing. I wondered what was going
on here. So I think we should add a nice comment in the above function
of why we are copying this and cannot simply store the pointer.

Cheers
Jon

--
nvpublic

Next message: Sven Van Asbroeck: "Re: [EXT] Re: [PATCH v4 2/2] ARM: imx6plus: enable internal routing of clk_enet_ref where possible"
Previous message: Colin King: "[PATCH][next] net/mlx5e: fix memory leak of tls"
In reply to: Robin Murphy: "Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage"
Next in thread: Krishna Reddy: "RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]