Re: [PATCH v9 3/5] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings

From: Rob Clark
Date: Wed Jun 05 2024 - 18:13:43 EST


On Wed, Jun 5, 2024 at 3:52 AM Bibek Kumar Patro
<quic_bibekkum@xxxxxxxxxxx> wrote:
>
> On 6/5/2024 12:19 AM, Rob Clark wrote:
> > On Thu, May 30, 2024 at 2:22 AM Bibek Kumar Patro
> > <quic_bibekkum@xxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 5/28/2024 9:38 PM, Rob Clark wrote:
> >>> On Tue, May 28, 2024 at 6:06 AM Dmitry Baryshkov
> >>> <dmitry.baryshkov@xxxxxxxxxx> wrote:
> >>>>
> >>>> On Tue, May 28, 2024 at 02:59:51PM +0200, Konrad Dybcio wrote:
> >>>>>
> >>>>>
> >>>>> On 5/15/24 15:59, Bibek Kumar Patro wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 5/10/2024 6:32 PM, Konrad Dybcio wrote:
> >>>>>>> On 10.05.2024 2:52 PM, Bibek Kumar Patro wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 5/1/2024 12:30 AM, Rob Clark wrote:
> >>>>>>>>> On Tue, Jan 23, 2024 at 7:00 AM Bibek Kumar Patro
> >>>>>>>>> <quic_bibekkum@xxxxxxxxxxx> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Currently in Qualcomm SoCs the default prefetch is set to 1 which allows
> >>>>>>>>>> the TLB to fetch just the next page table. MMU-500 features ACTLR
> >>>>>>>>>> register which is implementation defined and is used for Qualcomm SoCs
> >>>>>>>>>> to have a custom prefetch setting enabling TLB to prefetch the next set
> >>>>>>>>>> of page tables accordingly allowing for faster translations.
> >>>>>>>>>>
> >>>>>>>>>> ACTLR value is unique for each SMR (Stream matching register) and stored
> >>>>>>>>>> in a pre-populated table. This value is set to the register during
> >>>>>>>>>> context bank initialisation.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx>
> >>>>>>>>>> ---
> >>>>>>>
> >>>>>>> [...]
> >>>>>>>
> >>>>>>>>>> +
> >>>>>>>>>> + for_each_cfg_sme(cfg, fwspec, j, idx) {
> >>>>>>>>>> + smr = &smmu->smrs[idx];
> >>>>>>>>>> + if (smr_is_subset(smr, id, mask)) {
> >>>>>>>>>> + arm_smmu_cb_write(smmu, cbndx, ARM_SMMU_CB_ACTLR,
> >>>>>>>>>> + actlrcfg[i].actlr);
> >>>>>>>>>
> >>>>>>>>> So, this makes ACTLR look like kind of a FIFO. But I'm looking at
> >>>>>>>>> downstream kgsl's PRR thing (which we'll need to implement vulkan
> >>>>>>>>> sparse residency), and it appears to be wanting to set BIT(5) in ACTLR
> >>>>>>>>> to enable PRR.
> >>>>>>>>>
> >>>>>>>>> val = KGSL_IOMMU_GET_CTX_REG(ctx, KGSL_IOMMU_CTX_ACTLR);
> >>>>>>>>> val |= FIELD_PREP(KGSL_IOMMU_ACTLR_PRR_ENABLE, 1);
> >>>>>>>>> KGSL_IOMMU_SET_CTX_REG(ctx, KGSL_IOMMU_CTX_ACTLR, val);
> >>>>>>>>>
> >>>>>>>>> Any idea how this works? And does it need to be done before or after
> >>>>>>>>> the ACTLR programming done in this patch?
> >>>>>>>>>
> >>>>>>>>> BR,
> >>>>>>>>> -R
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Rob,
> >>>>>>>>
> >>>>>>>> Can you please help provide some more clarification on the FIFO part? By FIFO are you referring to the storing of ACTLR data in the table?
> >>>>>>>>
> >>>>>>>> Thanks for pointing to the downstream implementation of kgsl driver for
> >>>>>>>> the PRR bit. Since kgsl driver is already handling this PRR bit's
> >>>>>>>> setting, this makes setting the PRR BIT(5) by SMMU driver redundant.
> >>>>>>>
> >>>>>>> The kgsl driver is not present upstream.
> >>>>>>>
> >>>>>>
> >>>>>> Right kgsl is not present upstream, it would be better to avoid configuring the PRR bit and can be handled by kgsl directly in downstream.
> >>>>>
> >>>>> No! Upstream is not a dumping ground to reduce your technical debt.
> >>>>>
> >>>>> There is no kgsl driver upstream, so this ought to be handled here, in
> >>>>> the iommu driver (as poking at hardware A from driver B is usually not good
> >>>>> practice).
> >>>>
> >>>> I'd second the request here. If another driver has to control the
> >>>> behaviour of another driver, please add corresponding API for that.
> >>>
> >>> We have adreno_smmu_priv for this purpose ;-)
> >>>
> >>
> >> Thanks Rob for pointing to this private interface structure between smmu
> >> and gpu. I think it's similar to what you're trying to implement here
> >> https://lore.kernel.org/all/CAF6AEGtm-KweFdMFvahH1pWmpOq7dW_p0Xe_13aHGWt0jSbg8w@xxxxxxxxxxxxxx/#t
> >> I can add an api "set_actlr_prr()" with smmu_domain cookie, page pointer
> >> as two parameters. This api then can be used by drm/msm driver to carry
> >> out the prr implementation by simply calling this.
> >> Would this be okay Rob,Konrad,Dmitry?
> >> Let me know if any other suggestions you have in mind as well regarding
> >> parameters and placement.
> >
> > Hey Bibek, quick question.. is ACTLR preserved across a suspend/resume
> > cycle? Or does it need to be reprogrammed on resume? And same
> > question for these two PRR related regs:
> >
> > /* Global SMMU register offsets */
> > #define KGSL_IOMMU_PRR_CFG_LADDR 0x6008
> > #define KGSL_IOMMU_PRR_CFG_UADDR 0x600c
> >
> > (ie. high/low 32b of the PRR page)
> >
>
> Hey Rob, In suspend/resume, the register space power rails are not in
> disabled state, so it won't go back to reset values and should retain
> it's value. Only in hibernation cycle the registers' value would get reset.
>
> So the hi/low address bit register for PRR page would also retain it's
> value along with the ACTLR registers.
>
> > I was starting to type up a patch to add PRR configuration, but
> > depending on whether it interacts with suspend/resume, it might be
> > better form arm-smmu-qcom.c to just always enable and configure PRR
> > (including allocating a page to have an address to program into
> > PRR_CFG_LADDR/UADDR), and instead add an interface to return the PRR
> > page? I think there is no harm in unconditionally configuring PRR for
> > gpu smmu.
>
> Sounds okay though since this would not interact with suspend/resume path.
> But I think, suppose in-case this page would have some other references
> as well before configuring the address to the registers for PRR
> configuration, then GPU would be dependent on arm-smmu-qcom for this page.
> So Instead an endpoint api in arm-smmu-qcom.c can recieve the just the
> page-address, and bit set status from drm/msm driver and can set/reset
> the bit along with any page-address they want ?
> It would mean the interface will be smmu's , but the choice of
> configuration data to the registers' will be still with gpu.
>
> I wrote up a small patch with this implementation, would you like to
> review that?
> Will send it in this v11 series as new patch.

I think if there is no suspend/resume interaction, we should go back
to the original idea of page allocation in drm/msm.

Basically, I think the pros and cons are:

allocate in arm-smmu
pro: easy to sequence programming with suspend/resume
con: there isn't a convenient place to free the page on driver unload

allocate in drm/msm:
pro: easy place to free the page in teardown
con: harder to sequence with s/r

But if ACTLR and PRR_CFG_LADDR/UADDR are retained, then the con isn't
actually an issue ;-)

Anyways, I can type that patch.. the rest of drm/msm and userspace
changes (vm_bind + sparse) to get to the point where I can use PRR are
a somewhat bigger task so it will take me a while to get the point
where I can test any smmu patches.

BR,
-R


> Thanks & regards,
> Bibek
>
> >
> > BR,
> > -R
> >
> >> Thanks & regards,
> >> Bibek
> >>
> >>> BR,
> >>> -R
> >>>
> >>>>>
> >>>>>>
> >>>>>>>> Thanks for bringing up this point.
> >>>>>>>> I will send v10 patch series removing this BIT(5) setting from the ACTLR
> >>>>>>>> table.
> >>>>>>>
> >>>>>>> I think it's generally saner to configure the SMMU from the SMMU driver..
> >>>>>>
> >>>>>> Yes, agree on this. But since PRR bit is not directly related to SMMU
> >>>>>> configuration so I think it would be better to remove this PRR bit
> >>>>>> setting from SMMU driver based on my understanding.
> >>>>>
> >>>>> Why is it not related? We still don't know what it does.
> >>>>>
> >>>>> Konrad
> >>>>
> >>>> --
> >>>> With best wishes
> >>>> Dmitry