Re: [PATCH 0/5] crypto: add IV generation templates

From: Ard Biesheuvel
Date: Wed Jul 18 2018 - 11:35:03 EST


On 18 July 2018 at 19:59, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Wed, Jul 18, 2018 at 9:30 AM, Xiongfeng Wang
> <wangxiongfeng2@xxxxxxxxxx> wrote:
>>
>> I tested the performance of software implemented ciphers before and after
>> applying this patchset. The performance didn't change much except for
>> slight regression when writting. The detail information is as follows.
>>
>> The command I used:
>> cryptsetup -y -c aes-xts-plain -s 256 --hash sha256 luksFormat /dev/sdd1
>> cryptsetup -y -c aes-cbc-essiv:sha256 -s 256 --hash sha256 luksFormat /dev/sdd1
>> cryptsetup -y -c aes-cbc-benbi -s 256 --hash sha256 luksFormat /dev/sdd1
>>
>> cryptsetup luksOpen /dev/sdd1 crypt_fun
>> time dd if=/dev/mapper/crypt_fun of=/dev/null bs=1M count=500 iflag=direct
>> time dd if=/dev/zero of=/dev/mapper/crypt_fun bs=1M count=500 oflag=direct
>>
>> Performance comparision:
>> --------------------------------------------------------
>> algorithms | before applying | after applying
>> --------------------------------------------------------
>> | read | write | read | write
>> --------------------------------------------------------
>> aes-xts-plain | 145.34 | 145.09 | 145.89 | 144.2
>> --------------------------------------------------------
>> aes-cbc-essiv | 146.87 | 144.62 | 146.74 | 143.41
>> --------------------------------------------------------
>> aes-cbc-benbi | 146.03 | 144.74 | 146.77 | 144.46
>> --------------------------------------------------------
>
> Do you have any estimate of the expected gains for hardware
> implementations?
>
> Would it make sense to try out implementing aes-cbc-essiv
> on the ARMv8 crypto extensions? I see that Ard has done
> some prior work on aes-ccm in arch/arm64/crypto/aes-ce-ccm-*
> that (AFAICT) has a similar goal of avoiding overhead by
> combining the usual operations, so maybe the same can
> be done here.
>

I am having trouble understanding what exactly this series aims to achieve.

Calling into the crypto layer fewer times is a nice goal, but a disk
sector seems like a reasonable granularity for the dm layer to operate
on, and I don't think any hardware exists that operates on multi
sector sequences, where it would pay off to amortize the latency of
invoking the hardware over an entire bio.

So in summary, you need to explain to us why we need this. It is
really very easy to convince people if your changes make things go
faster.