Re: [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()

From: John Garry
Date: Thu Jul 16 2020 - 06:28:22 EST


On 16/07/2020 11:20, Will Deacon wrote:
On Tue, Jun 23, 2020 at 01:28:40AM +0800, John Garry wrote:
It has been shown that the cmpxchg() for finding space in the cmdq can
be a bottleneck:
- for more CPUs contending the cmdq, the cmpxchg() will fail more often
- since the software-maintained cons pointer is updated on the same 64b
memory region, the chance of cmpxchg() failure increases again

The cmpxchg() is removed as part of 2 related changes:

- Update prod and cmdq owner in a single atomic add operation. For this, we
count the prod and owner in separate regions in prod memory.

As with simple binary counting, once the prod+wrap fields overflow, they
will zero. They should never overflow into "owner" region, and we zero
the non-owner, prod region for each owner. This maintains the prod
pointer.

As for the "owner", we now count this value, instead of setting a flag.
Similar to before, once the owner has finished gathering, it will clear
a mask. As such, a CPU declares itself as the "owner" when it reads zero
for this region. This zeroing will also clear possible overflow in
wrap+prod region, above.

The owner is now responsible for all cmdq locking to avoid possible
deadlock. The owner will lock the cmdq for all non-owers it has gathered
when they have space in the queue and have written their entries.

- Check for space in the cmdq after the prod pointer has been assigned.

We don't bother checking for space in the cmdq before assigning the prod
pointer, as this would be racy.

So since the prod pointer is updated unconditionally, it would be common
for no space to be available in the cmdq when prod is assigned - that
is, according the software-maintained prod and cons pointer. So now
it must be ensured that the entries are not yet written and not until
there is space.

How the prod pointer is maintained also leads to a strange condition
where the prod pointer can wrap past the cons pointer. We can detect this
condition, and report no space here. However, a prod pointer progressed
twice past the cons pointer cannot be detected. But it can be ensured that
this that this scenario does not occur, as we limit the amount of
commands any CPU can issue at any given time, such that we cannot
progress prod pointer further.

Signed-off-by: John Garry <john.garry@xxxxxxxxxx>
---
drivers/iommu/arm-smmu-v3.c | 101 ++++++++++++++++++++++--------------
1 file changed, 61 insertions(+), 40 deletions(-)

I must admit, you made me smile putting trivial@xxxxxxxxxx on cc for this ;)


Yes, quite ironic :)