Re: [PATCH v2] i2c: designware: Fix corrupted memory seen in the ISR

From: Jan Bottorff
Date: Fri Sep 15 2023 - 21:51:30 EST


On 9/15/2023 8:21 AM, Serge Semin wrote:
...

Based on the patch log and the comment, smp_wmb() seems to be more
suitable here since the problem looks like SMP-specific. Most
importantly the smp_wmb() will get to be just the compiler barrier on
the UP system, so no cache and pipeline flushes in that case.
Meanwhile

I am not ARM expert, but based on the problem and the DMB/DSB barriers
descriptions using DMB should be enough in your case since you only
need memory syncs.

Hi Serge,

I looked at the definition of smp_wmb, and it looks like on arm64 it uses a DMB barrier not a DSB barrier.

In /arch/arm64/include/asm/barrier.h:
...
#define __arm_heavy_mb(x...) dsb(x)
...
#if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
...
#define wmb() __arm_heavy_mb(st)
...
#define __smp_wmb() dmb(ishst)

And then in /include/asm-generic/barrier.h it says:
#ifdef CONFIG_SMP
...
#ifndef smp_wmb
#define smp_wmb() do { kcsan_wmb(); __smp_wmb(); } while (0)
#endif

This looks like wmb() is a DSB and smp_wmb() is a DMB on SMP systems, so the two functions are not equivalent on SMP systems.

So lets explore if we think DMB or DSB is the correct barrier.

The ARM barrier docs I referred to has a specific example that says this:

"In some message passing systems, it is common for one observer to update memory and then send an interrupt using a mailbox of some sort to a second observer to indicate that memory has been updated and the new
contents have been read. Even though the sending of the interrupt using a mailbox might be initiated using a memory access, a DSB barrier
must be used to ensure the completion of previous memory accesses.

Therefore the following sequence is needed to ensure that P2 sees the updated value.

P1:
STR R5, [R1] ; message stored to shared memory location
DSB [ST]
STR R1, [R4] ; R4 contains the address of a mailbox

P2:
; interrupt service routine
LDR R5, [R1]

Even if R4 is a pointer to Strongly-Ordered memory, the update to R1 might not be visible without the DSB executed by P1.
It should be appreciated that these rules are required in connection to the ARM Generic Interrupt Controller (GIC).
"

I don't positivly understand why it needs to be a DSB and not just a DMB, but this example matches what happens in the driver. The ARM docs do some hand waving that DSB is required because of the GIC.

Unless we can come up with a reason why this example in the ARM Barrier docs is not a match for what happens in the i2c driver, then ARM is saying it has to be a DSB not a DMB. If it needs to be a DSB then smb_wmb is insufficient.

Does anybody else have a different interpretation of this section in the ARM barrier docs? They use the word mailbox, and show a shared memory write, an interrupt triggering write, and a read of shared memory on a different core. Some would describe that as a software mailbox.

I did read someplace (although don't have a specific reference I can give) that ordering applied to normal memory writes are in a different group than ordering applied between strongly ordered accesses. The excerpt from the ARM barrier document above does say "Even if R4 is a pointer to Strongly-Ordered memory, the update to R1 might not be visible without the DSB executed by P1", which implies a DMB is insufficient to cause ordering between normal memory writes and strongly-ordered device memory writes.

I know currently on ARM64 Windows, the low-level kernel device MMIO access functions (like WRITE_REGISTER_ULONG) all have a DSB before the MMIO memory access. That seems a little heavy handed to me, but it also may be that was required to get all the current driver code written for AMD/Intel processors to work correctly on ARM64 without adding barriers in the drivers. There are also non-barrier variants that can be used if a driver wants to optimize performance. Defaulting to correct operation with minimal code changes would reduce the risk to delivery schedules.

Linux doesn't seem to make any attempt to have barriers in the low level MMIO access functions. If Linux had chosen to do that on ARM64, this patch would not have been required. For a low speed device like an i2c controller, optimizing barriers likely make little difference in performance.

Let's look at it from a risk analysis viewpoint. Say a DMB is sufficient and we use the stronger DSB variant, the downside is a few cpu cycles will be wasted in i2c transfers. Say we use a DMB when a DSB is required for correct operation, the downside is i2c operations may malfunction. In this case, using a few extra cpu cycles for an operation that does not happen at high frequency is lower risk than failures in i2c transfers. If there is any uncertainty in what barrier type to use, picking DSB over DMB would be better. We determined from the include fragments above that wmb() give the DSB and smp_wmb() does not.

Based on the above info, I think wmb() is still the correct function, and a change to smp_wmb() would not be correct.

Sorry for the long message, I know some of you will be inspired to think deeply about barriers, and some will be annoyed that I spent this much space to explain how I came to the choice of wmb().

Thanks,
Jan