Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
From: hpa
Date: Mon Dec 21 2020 - 23:55:57 EST
On December 21, 2020 7:01:39 PM PST, tonywwang-oc@xxxxxxxxxxx wrote:
>On December 22, 2020 3:27:33 AM GMT+08:00, hpa@xxxxxxxxx wrote:
>>On December 20, 2020 6:46:25 PM PST, tonywwang-oc@xxxxxxxxxxx wrote:
>>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>><ebiggers@xxxxxxxxxx> wrote:
>>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>>>
>>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>wrote:
>>>>> >>>> The driver crc32c-intel match CPUs supporting
>>>>X86_FEATURE_XMM4_2.
>>>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>When
>>>>> >>>> crc32c-intel and crc32c-generic are both registered, system
>>>will
>>>>> >>>> use crc32c-intel because its .cra_priority is greater than
>>>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>>for
>>>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>>Zhaoxin
>>>>> >>>> CPUs support from crc32c-intel.
>>>>> >>>>
>>>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@xxxxxxxxxxx>
>>>>> >>>
>>>>> >>> Does this mean that the performance of the crc32c instruction
>>on
>>>>those CPUs is
>>>>> >>> actually slower than a regular C implementation? That's very
>>>>weird.
>>>>> >>>
>>>>> >>
>>>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>>>think yes.
>>>>> >>
>>>>> >
>>>>> > Did you try measuring the performance of the hashing itself, and
>>>>not some
>>>>> > higher-level filesystem operations?
>>>>> >
>>>>>
>>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>the
>>>>same
>>>>> input value the generic C implementation takes fewer time than the
>>>>> crc32c instruction implementation.
>>>>>
>>>>
>>>>And that is really "working as intended"?
>>>
>>>These CPU's crc32c instruction is not working as intended.
>>>
>>> Why do these CPUs even
>>>>declare that
>>>>they support the crc32c instruction, when it is so slow?
>>>>
>>>
>>>The presence of crc32c and some other instructions supports are
>>>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok
>>>except the crc32c instruction.
>>>
>>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>AVX,
>>>>etc.) that
>>>>these CPUs similarly declare support for but they are uselessly
>slow?
>>>
>>>No.
>>>
>>>Sincerely
>>>Tonyw
>>>
>>>>
>>>>- Eric
>>
>>Then the right thing to do is to disable the CPUID bit in the
>>vendor-specific startup code.
>
>This way makes these CPUs do not support all instruction sets
>enumerated
>by CPUID.01:ECX[SSE4.2].
>While only crc32c instruction is slow, just expect the crc32c-intel
>driver do not
>match these CPUs.
>
>Sincerely
>Tonyw
Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.