Re: [PATCH v4] crc32c: Implement CRC32c with slicing-by-8 algorithm
From: Joakim Tjernlund
Date: Sat Oct 01 2011 - 10:02:18 EST
"Darrick J. Wong" <djwong@xxxxxxxxxx> wrote on 2011/09/30 21:29:56:
> The existing CRC32c implementation uses Sarwate's algorithm to calculate the
> code one byte at a time. Using a slicing-by-8 algorithm adapted from Bob
> Pearson, we can process buffers 8 bytes at a time, for a substantial increase
> in performance.
> The motivation for this patchset is that I am working on adding full metadata
> checksumming to ext4 and jbd2. As far as performance impact of adding
> checksumming goes, I see nearly no change with a standard mail server ffsb
> simulation. On a test that involves only metadata operations (file creation
> and deletion, and fallocate/truncate), I see a drop of about 50 pcercent with
> the current kernel crc32c implementation; this improves to a drop of about 20
> percent with the enclosed crc32c code.
> When metadata is usually a small fraction of total IO, this new implementation
> doesn't help much because metadata is usually a small fraction of total IO.
> However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
> tree), then this patch speeds up the operation substantially.
> Given that iscsi, sctp, and btrfs also use crc32c, this patchset should improve
> their speed as well. I have some preliminary results that show the
> difference in various crc algorithms that I've come across: the "crc32c-by8-le"
> column is the new algorithm in the patch; the "crc32c" column is the current
> crc32c kernel implementation; and the "crc32-kern-le" column is the current
> crc32 kernel implementation, which is similar to the results one gets for
> CONFIG_CRC32C_SLICEBY4=y. As you can see, the new implementation runs at
> nearly 4x the speed of the current implementation; even the slimmer slice-by-4
> implementation is generally 2-3x faster.
> However, the implementation allows the kernel builder to select from a variety
> of space-speed tradeoffs, should my results not hold true on a particular
> class of system.
> v2: Use the crypto testmgr api for self-test.
> v3: Get rid of the -be version, which had no users.
> v4: Allow kernel builder a choice of speed vs. space optimization.
> (cached copy of the ext4 wiki)
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
This is based on an old version of Bobs slice by 8 that has lots duplication and
hard to maintain.
Start from Bobs latest patches and add crc32c to lib/crc32.c
Also, for crc32c I think you only need slice by 4 and slice by 8
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/