Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

From: tonywwang-oc
Date: Sun Dec 20 2020 - 21:47:26 EST


On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>
>> On 15/12/2020 04:41, Eric Biggers wrote:
>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>> >>>> The driver crc32c-intel match CPUs supporting
>X86_FEATURE_XMM4_2.
>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>> >>>> crc32c-intel and crc32c-generic are both registered, system will
>> >>>> use crc32c-intel because its .cra_priority is greater than
>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>for
>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>Zhaoxin
>> >>>> CPUs support from crc32c-intel.
>> >>>>
>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@xxxxxxxxxxx>
>> >>>
>> >>> Does this mean that the performance of the crc32c instruction on
>those CPUs is
>> >>> actually slower than a regular C implementation? That's very
>weird.
>> >>>
>> >>
>> >> From the lmbench3 Create and Delete file test on those chips, I
>think yes.
>> >>
>> >
>> > Did you try measuring the performance of the hashing itself, and
>not some
>> > higher-level filesystem operations?
>> >
>>
>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>same
>> input value the generic C implementation takes fewer time than the
>> crc32c instruction implementation.
>>
>
>And that is really "working as intended"?

These CPU's crc32c instruction is not working as intended.

Why do these CPUs even
>declare that
>they support the crc32c instruction, when it is so slow?
>

The presence of crc32c and some other instructions supports are enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok except the crc32c instruction.

>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>etc.) that
>these CPUs similarly declare support for but they are uselessly slow?

No.

Sincerely
Tonyw

>
>- Eric