Re: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

From: Nicolin Chen
Date: Tue Apr 30 2024 - 14:09:41 EST


(Removing chunks that I simply ack)

On Tue, Apr 30, 2024 at 01:35:45PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2024 at 09:43:48PM -0700, Nicolin Chen wrote:

> > +/* MMIO helpers */
> > +#define cmdqv_readl(reg) \
> > + readl(cmdqv->base + TEGRA241_CMDQV_##reg)
> > +#define cmdqv_readl_relaxed(reg) \
> > + readl_relaxed(cmdqv->base + TEGRA241_CMDQV_##reg)
> > +#define cmdqv_writel(val, reg) \
> > + writel((val), cmdqv->base + TEGRA241_CMDQV_##reg)
> > +#define cmdqv_writel_relaxed(val, reg) \
> > + writel_relaxed((val), cmdqv->base + TEGRA241_CMDQV_##reg)
>
> Please don't hide access to a stack variable in a macro, and I'm not
> keen on the ##reg scheme either - it makes it much harder to search
> for things.

I can pass in cmdqv/vintf/vcmdq pointers, if it would be better.

> Really this all seems like alot of overkill to make a little bit of
> shorthand. It is not so wordy just to type it out:
>
> readl(vintf->base + TEGRA241_VINTF_CONFIG)

vintf_readl(vintf, CONFIG) is much shorter. Doing so reduced the
line breaks at quite a lot places, so overall the driver looks a
lot cleaner to me. It also helps a bit, when I want to debug the
HW configuration flow by adding prints to these helpers. It might
be a personal preference, yet I would still like to have these.

> > +/* Logging helpers */
> > +#define cmdqv_warn(fmt, ...) \
> > + dev_warn(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> > +#define cmdqv_err(fmt, ...) \
> > + dev_err(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> > +#define cmdqv_info(fmt, ...) \
> > + dev_info(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
> > +#define cmdqv_dbg(fmt, ...) \
> > + dev_dbg(cmdqv->dev, "CMDQV: " fmt, ##__VA_ARGS__)
>
> Really not sure these are necessary, same remark about the stack
> variable.

Same justification. And it simply keeps the same style of prints.
Also, it eases the following vcmdq prints.

I can probably change these logging helpers to inline functions.

> Also cmdqv->dev is the wrong thing to print, this is part of the smmu driver,
> it should print cmdqv->smmu->dev for consistency

Yea. I can drop the dev from the cmdqv structure.

> > +#define vintf_warn(fmt, ...) \
> > + dev_warn(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +#define vintf_err(fmt, ...) \
> > + dev_err(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +#define vintf_info(fmt, ...) \
> > + dev_info(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +#define vintf_dbg(fmt, ...) \
> > + dev_dbg(vintf->cmdqv->dev, "VINTF%u: " fmt, vintf->idx, ##__VA_ARGS__)
> > +
> > +#define vcmdq_warn(fmt, ...) \
> > + ({ \
> > + struct tegra241_vintf *vintf = vcmdq->vintf; \
> > + if (vintf) \
> > + vintf_warn("VCMDQ%u/LVCMDQ%u: " fmt, \
> > + vcmdq->idx, vcmdq->lidx, \
> > + ##__VA_ARGS__); \
> > + else \
> > + dev_warn(vcmdq->cmdqv->dev, "VCMDQ%u: " fmt, \
> > + vcmdq->idx, ##__VA_ARGS__); \
> > + })

> Some of these are barely used, is it worth all these macros??

Only vcmdq_warn isn't called. But I think it would be useful.
I could also find a place to call it, if that's a must.

> > +
> > +/* Configuring and polling helpers */
> > +#define tegra241_cmdqv_write_config(_owner, _OWNER, _regval) \
> > + ({ \
> > + bool _en = (_regval) & _OWNER##_EN; \
> > + u32 _status; \
> > + int _ret; \
> > + writel((_regval), _owner->base + TEGRA241_##_OWNER##_CONFIG); \
> > + _ret = readl_poll_timeout( \
> > + _owner->base + TEGRA241_##_OWNER##_STATUS, _status, \
> > + _en ? (_regval) & _OWNER##_ENABLED : \
> > + !((_regval) & _OWNER##_ENABLED), \
> > + 1, ARM_SMMU_POLL_TIMEOUT_US); \
> > + if (_ret) \
> > + _owner##_err("failed to %sable, STATUS = 0x%08X\n", \
> > + _en ? "en" : "dis", _status); \
> > + atomic_set(&_owner->status, _status); \
> > + _ret; \
> > + })
>
> I feel like this could be an actual inline function without the macro
> wrapper with a little fiddling.

It would be unrolled to three mostly identical inline functions:
tegra241_cmdqv_write_config(cmdqv, regval)
tegra241_vintf_write_config(vintf, regval)
tegra241_vcmdq_write_config(vcmdq, regval)

> > +
> > +#define cmdqv_write_config(_regval) \
> > + tegra241_cmdqv_write_config(cmdqv, CMDQV, _regval)
> > +#define vintf_write_config(_regval) \
> > + tegra241_cmdqv_write_config(vintf, VINTF, _regval)
> > +#define vcmdq_write_config(_regval) \
> > + tegra241_cmdqv_write_config(vcmdq, VCMDQ, _regval)
>
> More hidden access to stack values

Btw, any reason for forbidding this practice? It will break the
build if something goes wrong, which seems to be pretty easy to
catch.

> > +/**
> > + * struct tegra241_cmdqv - CMDQ-V for SMMUv3
> > + * @smmu: SMMUv3 pointer
> > + * @dev: Device pointer
>
> This should probably be clarified as the device pointer to the ACPI
> companion device

I could drop it and use cmdqv->smmu->dev as your previous remark
suggested.

> > +struct arm_smmu_cmdq *tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
> > +{
..
> > + /* Use SMMU CMDQ if vintfs[0] is uninitialized */
> > + if (!FIELD_GET(VINTF_ENABLED, atomic_read(&vintf->status)))
> > + return &smmu->cmdq;
> > +
> > + /* Use SMMU CMDQ if vintfs[0] has error status */
> > + if (FIELD_GET(VINTF_STATUS, atomic_read(&vintf->status)))
> > + return &smmu->cmdq;
>
> Why atomic_read? The unlocked interaction with
> tegra241_cmdqv_handle_vintf0_error() doesn't seem especially sane IMHO

Race between this get_cmdq() and the isr. Any alternative practice?

> > +static void tegra241_vcmdq_hw_deinit(struct tegra241_vcmdq *vcmdq)
> > +{
> > + u32 gerrorn, gerror;
> > +
> > + if (vcmdq_write_config(0)) {
> > + vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
> > + vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
> > + vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
>
> Less prints, include a unique message about why this is being
> printed..

Something must be wrong if disabling VCMDQ fails, so the prints of
error register values would be helpful. And "failed to disable" is
already printed by the vcmdq_write_config() call. I can merge them
into one vcmdq_err call though.

> > + }
> > + vcmdq_page0_writel_relaxed(0, PROD);
> > + vcmdq_page0_writel_relaxed(0, CONS);
> > + vcmdq_page1_writeq_relaxed(0, BASE);
> > + vcmdq_page1_writeq_relaxed(0, CONS_INDX_BASE);
> > +
> > + gerrorn = vcmdq_page0_readl_relaxed(GERRORN);
> > + gerror = vcmdq_page0_readl_relaxed(GERROR);
> > + if (gerror != gerrorn) {
> > + vcmdq_info("Uncleared error detected, resetting\n");
> > + vcmdq_page0_writel(gerror, GERRORN);
> > + }
> > +
> > + vcmdq_dbg("deinited\n");
> > +}
> > +
> > +static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq)
> > +{
> > + int ret;
> > +
> > + /* Configure and enable the vcmdq */
> > + tegra241_vcmdq_hw_deinit(vcmdq);
> > +
> > + vcmdq_page1_writeq_relaxed(vcmdq->cmdq.q.q_base, BASE);
> > +
> > + ret = vcmdq_write_config(VCMDQ_EN);
> > + if (ret) {
> > + vcmdq_err("GERRORN=0x%X\n", vcmdq_page0_readl_relaxed(GERRORN));
> > + vcmdq_err("GERROR=0x%X\n", vcmdq_page0_readl_relaxed(GERROR));
> > + vcmdq_err("CONS=0x%X\n", vcmdq_page0_readl_relaxed(CONS));
> > + return ret;
>
> Same print?

Yea. Here the prints are for a failure when enabling a VCMDQ. Again,
"failed to enable" is already printed by vcmdq_write_config(). I'll
merge three to one call here too.

> > +static void tegra241_vcmdq_free_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
> > +{
> > + struct tegra241_cmdqv *cmdqv = vcmdq->cmdqv;
> > + struct arm_smmu_queue *q = &vcmdq->cmdq.q;
> > + size_t nents = 1 << q->llq.max_n_shift;
> > +
> > + dmam_free_coherent(cmdqv->smmu->dev, (nents * CMDQ_ENT_DWORDS) << 3,
> > + q->base, q->base_dma);
>
> If we are calling dmam_free, do we really need devm at all?

Hmm. This is a part of SMMU's probe/device_reset(). So, all the
devm calls in cmdqv driver are following the style there, since
the arm-smmu-v3 driver could be rmmod-ed? Though the arm-smmu-v3
driver seems to miss a dmam_free_coherent for its own queues..

> > +tegra241_cmdqv_acpi_probe(struct arm_smmu_device *smmu, int id)
>
> id is a u32.

Ack.

> It might be clearer to just pass in the struct
> acpi_iort_node *?

Well, it felt quite similar to me, yet acpi_iort_node probably
fits the name better.

> > +{
> > + struct tegra241_cmdqv *cmdqv;
> > +
> > + cmdqv = tegra241_cmdqv_find_resource(smmu, id);
> > + if (!cmdqv)
> > + return NULL;
> > +
> > + if (tegra241_cmdqv_probe(cmdqv)) {
> > + if (cmdqv->irq > 0)
> > + devm_free_irq(smmu->dev, cmdqv->irq, cmdqv);
> > + devm_iounmap(smmu->dev, cmdqv->base);
> > + devm_kfree(smmu->dev, cmdqv);
> > + return NULL;
>
> Oh. Please don't use devm at all in this code then, it is not attached
> to a probed driver with the proper scope, devm isn't going to work in
> sensible way.

Mind elaborating "it is not"? This function is called by
arm_smmu_device_acpi_probe and arm_smmu_device_probe.

Thanks for the help!
Nicolin