Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table

From: George Spelvin
Date: Wed May 28 2014 - 19:01:55 EST

Next message: Tim Chen: "Re: [PATCH v2] crypto: crc32c-pclmul - Shrink K_table to 32-bit words"
Previous message: Joe Perches: "Re: [PATCH 11/16] byteorder: provide a linux/byteorder.h with {be, le}_to_cpu() and cpu_to_{be, le}() macros"
In reply to: Tim Chen: "Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table"
Next in thread: Tim Chen: "Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thanks for the reply!

> Changing from the aligned move (movdqa) to unaligned move and zeroing
> (pmovzxdq), is going to make things slower. If the table is aligned
> on 8 byte boundary, some of the table can span 2 cache lines, which
> can slow things further.

Um, two notes:
1) This load is performed once per 3072-byte block, which
is a minimum of 128 cycles just for the crc32q instructions,
never mind all the pcmulqdq folderol.

Is it really more than 2 cycles? Heck, is it *any* overall
time given that it's preceded by a stretch of 384 instructions
that it's not data-dependent on?

I'll do some benchmarking to find out.

2) The shrunk table entries are 8 bytes long, and so can't
span a cache line. Is there any benefit to using a
larger alignment, other than the very small issue of the
full table needing 1 more cache line to be fully cached?

> We are trading speed for only 4096 bytes of memory save,
> which is likely not a good trade for most systems except for
> those really constrained of memory. For this kind of non-performance
> critical system, it may as well use the generic crc32c algorithm and
> compile out this module.

I hadn't intended to cause any speed penalty at all.
Do you really think there will be one?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tim Chen: "Re: [PATCH v2] crypto: crc32c-pclmul - Shrink K_table to 32-bit words"
Previous message: Joe Perches: "Re: [PATCH 11/16] byteorder: provide a linux/byteorder.h with {be, le}_to_cpu() and cpu_to_{be, le}() macros"
In reply to: Tim Chen: "Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table"
Next in thread: Tim Chen: "Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink K_table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]