Re: [PATCH v3] crc32c: Implement CRC32c with slicing-by-8 algorithm

From: Joakim Tjernlund
Date: Mon Oct 03 2011 - 16:13:31 EST

"Darrick J. Wong" <djwong@xxxxxxxxxx> wrote on 2011/10/03 18:00:36:
> On Sat, Oct 01, 2011 at 03:52:00PM +0200, Joakim Tjernlund wrote:
> > "Darrick J. Wong" <djwong@xxxxxxxxxx> wrote on 2011/09/30 18:12:23:
> > >
> > > [putting mailing lists on cc]


> > >
> > > <shrug> I suppose I could make CRC32C_BITS configurable. What is the hardware
> > > profile of your ppc32 processor? How much L1D/L2 cache? slice-by-8 does have
> > > a big cache footprint. On the other hand it's faster than the slice-by-4
> > > (crc32) and Sarwate (crc32c) code in the kernel, even on old slow 32-bit x86
> > > processors (PII, PIII, P4).
> >
> > It is a low end embedded 333 MHz CPU with only L1 cache. How much faster
> > is slice by 8 than slice by 4 on these old x86 machines?
> How much L1 cache? Or, if you'd rather not give away specifics, has the CPU
> more than 8KB L1 cache? I'm willing to concede that with little cache the
> added memory pressure could be painful.
> As for the old x86 machines, please have a look at:
> ~15% faster on a 2GHz Via C7
> ~20% faster on a 2.7GHz P4
> ~25% faster on a 500MHz P3
> I vaguely recall it was ~20% faster on a 400MHz P2, but all the
> wikis are still down. :(
> So I suspect the key factor here is memory hierachy, since all of those systems
> have at least 16K of L1 cache. Slice by 8 might actually suck on a Pentium
> Proor earlier. Unfortunately I don't have anything older than a PII...

It is 16KB cache on this CPU. I don't know why it was so much slower. Could be a
gcc thing as gcc does a fairly lame job at optimizing crc32. Still think making this
configurable is a good thing. At least until the verdict is in from other CPUs.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at