Re: [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE

From: Catalin Marinas
Date: Wed Jun 16 2021 - 11:46:11 EST

On Wed, Jun 16, 2021 at 04:23:26PM +0100, Russell King wrote:
> On Wed, Jun 16, 2021 at 04:04:56PM +0100, Catalin Marinas wrote:
> > On Wed, Jun 16, 2021 at 02:22:27PM +0100, Russell King wrote:
> > > If it's a problem, then it needs fixing. sys_cacheflush() is used to
> > > implement GCC's __builtin___clear_cache(). I'm not sure who added this
> > > to gcc.
> >
> > I'm surprised that it works. I guess it's just luck that the thread
> > doing the code writing doesn't migrate before the sys_cacheflush() call.
> Maybe the platforms that use ARM MPCore avoid the issue somehow (maybe
> by not using self-modifying code?)

Not sure how widely it is/was used with JITs. In general, I think the
systems at the time were quite tolerant to missing I-cache maintenance
(maybe small caches?). We ran Linux for a while without 826cbdaff297
("[ARM] 5092/1: Fix the I-cache invalidation on ARMv6 and later CPUs").

> > > Likely only in places where we care about I/D coherency - as the data
> > > cache is required to be PIPT on these SMP platforms.
> >
> > We had similar issue with the cache maintenance for DMA. The hack we
> > employed (in cache.S) is relying on the MESI protocol internals and
> > forcing a read/write for ownership before the D-cache maintenance.
> > Luckily ARM11MPCore doesn't do speculative data loads to trigger some
> > migration back.
> That's very similar to the hack that was originally implemented for
> MPCore DMA - see the DMA_CACHE_RWFO configuration option.

Well, yes, that's what I wrote above ;) (I added the hack and config
option IIRC).

> An interesting point here is that cache_ops_need_broadcast() reads
> MMFR3 bits 12..15, which in the MPCore TRM has nothing to with cache
> operation broadcasting - but luckily is documented as containing zero.
> So, cache_ops_need_broadcast() returns correctly (true) here.

That's typical with any new feature. The 12..15 field was added in ARMv7
stating that cache maintenance is broadcast in hardware. Prior to this,
the field was read-as-zero. So it's not luck but we could have avoided
negating the meaning here, i.e. call it cache_ops_are_broadcast().

> > The simpler fix for flush_icache_range() is to disable preemption, read
> > a word in a cacheline to force any dirty lines on another CPU to be
> > evicted and then issue the D-cache maintenance (for those cache lines
> > which are still dirty on the current CPU).
> Is just reading sufficient? If so, why do we do a read-then-write in
> the MPCore DMA cache ops? Don't we need the write to force exclusive
> ownership? If we don't have exclusive ownership of the dirty line,
> how can we be sure to write it out of the caches?

For cleaning (which is the case for I/D coherency), we only need reading
since we are fine with clean lines being left in the D-cache on other
CPUs. For invalidation, we indeed need to force the exclusive ownership,
hence the write.