Re: [PATCH] Performance Improvement in CRC16 Calculations.
From: Martin K. Petersen
Date: Tue Aug 21 2018 - 21:40:54 EST
> These days we obviously use the hardware-accelerated CRC calculation
> so the software table approach mostly serves as a reference
> implementation.
I was puzzled as to why WDC's tests did not seem to use the hardware-
accelerated CRC calculation whereas tests on my end worked fine. Turns
out this is due to an unfortunate side effect of how the crypto
subsystem works.
When crc-t10dif is initialized, the crypto infrastructure will pick the
algorithm with the highest priority currently registered. Both block and
SCSI will cause crc-t10dif to be compiled as a built-in so this
selection happens very early.
If crct10dif-pclmul is compiled as a module it will not be available at
the time the T10 CRC library is initialized. And thus the block layer
integrity code will be stuck with the sluggish table CRC. The workaround
is to build with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y.
However, it seems like a bit of a deficiency in crypto that there is no
way to upgrade existing transformations if higher priority algorithms
become available. btrfs and a few others work around this issue by not
using the generic lib/ CRC functions (which defeats the purpose of
having these in the first place). Instead they are registering their own
transformation at a later time where any accelerator modules are more
likely to be loaded.
Anyway. Just a heads up to people that wonder why the table algorithm is
being exercised despite their hardware supporting CRC acceleration.
--
Martin K. Petersen Oracle Linux Engineering