Re: [PATCH v3] crc32c: Implement CRC32c with slicing-by-8 algorithm

From: Joakim Tjernlund
Date: Mon Oct 03 2011 - 16:13:31 EST

Next message: Randy Dunlap: "Re: [PATCH][TRIVIAL] Fixed alphabetical order in CREDITS"
Previous message: Serge E. Hallyn: "Re: [PATCH 01/15] add Documentation/namespaces/user_namespace.txt(v3)"
In reply to: Darrick J. Wong: "Re: [PATCH v3] crc32c: Implement CRC32c with slicing-by-8 algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Darrick J. Wong" <djwong@xxxxxxxxxx> wrote on 2011/10/03 18:00:36:
>
> On Sat, Oct 01, 2011 at 03:52:00PM +0200, Joakim Tjernlund wrote:
> > "Darrick J. Wong" <djwong@xxxxxxxxxx> wrote on 2011/09/30 18:12:23:
> > >
> > > [putting mailing lists on cc]

[SNIP]

> > >
> > > <shrug> I suppose I could make CRC32C_BITS configurable. What is the hardware
> > > profile of your ppc32 processor? How much L1D/L2 cache? slice-by-8 does have
> > > a big cache footprint. On the other hand it's faster than the slice-by-4
> > > (crc32) and Sarwate (crc32c) code in the kernel, even on old slow 32-bit x86
> > > processors (PII, PIII, P4).
> >
> > It is a low end embedded 333 MHz CPU with only L1 cache. How much faster
> > is slice by 8 than slice by 4 on these old x86 machines?
>
> How much L1 cache? Or, if you'd rather not give away specifics, has the CPU
> more than 8KB L1 cache? I'm willing to concede that with little cache the
> added memory pressure could be painful.
>
> As for the old x86 machines, please have a look at:
> http://djwong.org/docs/ext4_metadata_checksums.html#Benchmarking
>
> ~15% faster on a 2GHz Via C7
> ~20% faster on a 2.7GHz P4
> ~25% faster on a 500MHz P3
>
> I vaguely recall it was ~20% faster on a 400MHz P2, but all the kernel.org
> wikis are still down. :(
>
> So I suspect the key factor here is memory hierachy, since all of those systems
> have at least 16K of L1 cache. Slice by 8 might actually suck on a Pentium
> Proor earlier. Unfortunately I don't have anything older than a PII...

It is 16KB cache on this CPU. I don't know why it was so much slower. Could be a
gcc thing as gcc does a fairly lame job at optimizing crc32. Still think making this
configurable is a good thing. At least until the verdict is in from other CPUs.

Jocke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Randy Dunlap: "Re: [PATCH][TRIVIAL] Fixed alphabetical order in CREDITS"
Previous message: Serge E. Hallyn: "Re: [PATCH 01/15] add Documentation/namespaces/user_namespace.txt(v3)"
In reply to: Darrick J. Wong: "Re: [PATCH v3] crc32c: Implement CRC32c with slicing-by-8 algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]