RE: touch_cache() only touches two thirds

From: Bela Lubkin
Date: Fri Nov 10 2006 - 19:15:21 EST


Andi Kleen wrote:

> "Bela Lubkin" <blubkin@xxxxxxxxxx> writes:
> >
> > /*
> > * Dirty a big buffer in a hard-to-predict (for the L2 cache) way. This
> > * is the operation that is timed, so we try to generate unpredictable
> > * cachemisses that still end up filling the L2 cache:
> > */
>
> The comment is misleading anyways. AFAIK several of the modern
> CPUs (at least K8, later P4s, Core2, POWER4+, PPC970) have prefetch
> predictors advanced enough to follow several streams forward and backwards
> in parallel.
>
> I hit this while doing NUMA benchmarking for example.
>
> Most likely to be really unpredictable you need to use a
> true RND and somehow make sure still the full cache range
> is covered.

The corrected code in <http://bugzilla.kernel.org/show_bug.cgi?id=7476#c4>
covers the full cache range. Granted that modern CPUs may be able to track
multiple simultaneous cache access streams: how many such streams are they
likely to be able to follow at once? It seems like going from 1 to 2 would
be a big win, 2 to 3 a small win, beyond that it wouldn't likely make much
incremental difference. So what do the actual implementations in the field
support?

The code (original and corrected) uses 6 simultaneous streams.

I have a modified version that takes a `ways' parameter to use an arbitrary
number of streams. I'll post that onto bugzilla.kernel.org.

>Bela<
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/