Re: [PATCH v3 1/1] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd()

From: Robin Murphy
Date: Wed Dec 08 2021 - 13:40:39 EST


On 2021-12-08 18:17, John Garry wrote:
Did you notice any performance change with this change?

Hi John:
   Thanks for the tip. I wrote a test case today, and I found that the
performance did not go up but down.

I very quickly tested on a DMA mapping benchmark very similar to the kernel DMA benchmark module - I got mixed results. For fewer CPUs (<8), a small improvement, like 0.7%. For more CPUs, a dis-improvement - that's surprising, I did expect just no change as any improvement would get dwarfed from the slower unmap rates for more CPUs. I can check this
more tomorrow.

It's so weird. So I decided not to
change it, because it's also poorly readable. So I plan to make only
the following modifications:
@@ -237,7 +237,7 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
  static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
  {
         memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
-       cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
+       cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);

         switch (ent->opcode) {
         case CMDQ_OP_TLBI_EL2_ALL:

This prevents the compiler from generating the following two inefficient
instructions:
      394:       f9400002        ldr     x2, [x0]    //x2 = cmd[0]
      398:       aa020062        orr     x2, x3, x2    //x3 = FIELD_PREP(CMDQ_0_OP, ent->opcode)

Maybe it's not worth changing because I've only seen a 0.x nanosecond reduction
in performance. But one thing is, it only comes with benefits, no side effects.


I just think that with the original code that cmd[] is on the stack and cached, so if we have write-back attribute (which I think we do) then there would not necessarily a write to external memory per write to cmd[].

So, apart from this approach, I think that if we can just reduce the instructions through other efficiencies in the function then that would be good.

Not sure if it's still true, but FWIW last time the best result actually came from doing the ridiculously counter-intuitive:

https://lore.kernel.org/linux-iommu/141de3c3278e280712d16d9ac9ab305c3b80a810.1534344167.git.robin.murphy@xxxxxxx/

Robin.