Re: [PATCH 0/2] jump label: 2.6.38 updates

From: Will Newton
Date: Wed Feb 16 2011 - 05:16:09 EST


On Tue, Feb 15, 2011 at 9:56 PM, Will Simoneau <simoneau@xxxxxxxxxxx> wrote:
> On 13:27 Tue 15 Feb     , David Miller wrote:
>> From: Will Simoneau <simoneau@xxxxxxxxxxx>
>> Date: Tue, 15 Feb 2011 16:11:23 -0500
>>
>> > Note how the cache and cache coherence protocol are fundamental parts of this
>> > operation; if these instructions simply bypassed the cache, they *could not*
>> > work correctly - how do you detect when the underlying memory has been
>> > modified?
>>
>> The issue here isn't L2 cache bypassing, it's local L1 cache bypassing.
>>
>> The chips in question aparently do not consult the local L1 cache on
>> "ll" instructions.
>>
>> Therefore you must only ever access such atomic data using "ll"
>> instructions.
>
> (I should not have said "underlying memory", since it is correct that
> only the L1 caches are the problem here)
>
> That's some really crippled hardware... it does seem like *any* loads
> from *any* address updated by an sc would have to be done with ll as
> well, else they may load stale values. One could work this into
> atomic_read(), but surely there are other places that are problems.

I think it's actually ok, atomics have arch implemented accessors, as
do spinlocks and atomic bitops. Those are the only place we do sc so
we can make sure we always ll or invalidate manually.

> It would be OK if the caches on the hardware in question were to
> back-invalidate matching cachelines when the sc is snooped from the bus,
> but it sounds like this doesn't happen?

Yes it's possible to manually invalidate the line but it is not
automatic. Manual invalidation is actually quite reasonable in many
cases because you never see a bad value, just a potentially stale one,
so many of the races are harmless in practice.

I think you're correct in your comments re multi-processor cache
coherence and the performance problems associated with not doing ll/sc
in the cache. I believe some of the reasoning behind the current
implementation is to allow different processors in the same SoC to
participate in the atomic store protocol without having a fully
coherent cache (and implementing a full cache coherence protocol).
It's my understanding that the ll/sc is implemented somewhere beyond
the cache in the bus fabric.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/