Re: [PATCH] x86: eliminate redundant/contradicting cache line size config options

From: Nick Piggin
Date: Thu Nov 19 2009 - 05:01:06 EST

Next message: Viral Mehta: "Re: How to move two valuables to x86 CPU register ebx, ecx by usingAT&A inline asm."
Previous message: Johnny Hung: "How to move two valuables to x86 CPU register ebx, ecx by using AT&A inline asm."
In reply to: Jan Beulich: "Re: [PATCH] x86: eliminate redundant/contradicting cache line size config options"
Next in thread: Arjan van de Ven: "Re: [PATCH] x86: eliminate redundant/contradicting cache line sizeconfig options"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Nov 19, 2009 at 08:38:14AM +0000, Jan Beulich wrote:
> >>> Nick Piggin <npiggin@xxxxxxx> 19.11.09 09:13 >>>
> >On Wed, Nov 18, 2009 at 08:52:40PM -0800, Arjan van de Ven wrote:
> >Basically what I think we should do is consider L1_CACHE_BYTES to be
> >*the* correct default value to use for 1) avoiding false sharing (which
> >seems to be the most common use), and 2) optimal and repeatable per-object
> >packing into cachelines (which is more of a micro-optimization to be
> >applied carefully to really critical structures).
>
> But then this really shouldn't be called L1_CACHE_... Though I realize
> that the naming seems to already be broken - looking over the cache
> line specifiers for CPUID leaf 2, there's really no L1 with 128 byte lines,
> just two L2s.

Yes, I agree L1_CACHE is not the best name. In what situation would
you *only* care about L1 cache line size, without knowing any other
line sizes? IMO only in the case where you also know more details
about L1 cache like size and write some particular cache blocking or
algorithm like that. And we don't really do that in kernel, especially
not in generic code.

> One question however is whether e.g. cache line ping-pong between
> L3s is really costing that much on non-NUMA, as opposed to it
> happening between L1s.

Well I think we still need to work to minimise intra-chip bouncing
even though it is far cheaper than inter-chip. It is still costly
and probably costs more power too. And as core count continues to
increase, I think even intra-chip bouncing costs are going to become
important (8 core Nehalem I think already doesn't have a true
unified L3 cache with crossbars to each core but has 8 L3 caches
connected with ring busses).

I don't think it makes too much sense to add much complexity to
say "oh we don't care about bouncing between threads on core or
cores on chip" because I haven't seen anywhere we can get a
significant data size benefit, and it often slows down the straight
line performance too (eg. per-cpu variable can often be non atomic,
but when you even share it between threads on a core then you have
to start using atomics).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Viral Mehta: "Re: How to move two valuables to x86 CPU register ebx, ecx by usingAT&A inline asm."
Previous message: Johnny Hung: "How to move two valuables to x86 CPU register ebx, ecx by using AT&A inline asm."
In reply to: Jan Beulich: "Re: [PATCH] x86: eliminate redundant/contradicting cache line size config options"
Next in thread: Arjan van de Ven: "Re: [PATCH] x86: eliminate redundant/contradicting cache line sizeconfig options"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]