Re: [PATCH v6 2/6] iommufd: Add IOMMU_HWPT_INVALIDATE

From: Jason Gunthorpe
Date: Fri Dec 01 2023 - 15:43:50 EST


On Fri, Dec 01, 2023 at 11:58:07AM -0800, Nicolin Chen wrote:
> > It seems there is not a simple way to realize this error back to
> > userspace since we can't block the global command queue and we proceed
> > to complete commands that the real HW would not have completed.
> >
> > To actually emulate this the gerror handler would have to capture all
> > the necessary registers, return them back to the thread doing
> > invalidate_user and all of that would have to return back to userspace
> > to go into the virtual version of all the same registers.
> >
> > Yes, it can be synchronous it seems, but we don't have any
> > infrastructure in the driver to do this.
> >
> > Given this is pretty niche maybe we just don't support error
> > forwarding and simply ensure it could be added to the uapi later?
>
> If arm_smmu_cmdq_issue_cmdlist in arm_smmu_cache_invalidate_user
> fails with ETIMEOUT, we polls the CONS register to get the error
> code. This can cover CERROR_ABT and CERROR_ATC_INV.

Why is timeout linked to these two? Or rather, it doesn't have to be
linked like that. Any gerror is effectively synchronous because it
halts the queue and allows SW time to inspect which command failed and
record the gerror flags. So each and every command can get an error
indication.

Restarting the queue is done by putting sync in there to effectively
nop the failed command and we hope for the best and let it rip.

> As you remarked that we can't block the global CMDQ, so we have
> to let a real CERROR_ILL go. Yet, we can make sure commands to
> be fully sanitized before being issued, as we should immediately
> reject faulty commands anyway, for errors such as unsupported op
> codes, unzero-ed reserved fields, and unlinked vSIDs. This can
> at least largely reduce the probability of a real CERROR_ILL.

I'm more a little more concerend with ATC_INV as a malfunctioning
device can trigger this..

> So, combining these two, we can still have a basic synchronous
> way by returning an errno to the invalidate ioctl? I see Kevin
> replied something similar too.

It isn't enough information, you don't know which gerror bits to set
and you don't know what cons index to stick to indicate the error
triggering command with just a simple errno.

It does need to return a bunch of data to get it all right.

Though again, there is no driver infrastructure to do all this and it
doesn't seem that important so maybe we can just ensure there is a
future extension possiblity and have userspace understand an errno
means to generate CERROR_ILL on the last command in the batch?

Jason