Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations

From: Andy Lutomirski
Date: Thu Sep 27 2018 - 12:27:08 EST




> On Sep 27, 2018, at 8:19 AM, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
>
> Hey again Thomas,
>
>> On Thu, Sep 27, 2018 at 3:26 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
>>
>> Hi Thomas,
>>
>> I'm trying to optimize this for crypto performance while still taking
>> into account preemption concerns. I'm having a bit of trouble figuring
>> out a way to determine numerically what the upper bounds for this
>> stuff looks like. I'm sure I could pick a pretty sane number that's
>> arguably okay -- and way under the limit -- but I still am interested
>> in determining what that limit actually is. I was hoping there'd be a
>> debugging option called, "warn if preemption is disabled for too
>> long", or something, but I couldn't find anything like that. I'm also
>> not quite sure what the latency limits are, to just compute this with
>> a formula. Essentially what I'm trying to determine is:
>>
>> preempt_disable();
>> asm volatile(".fill N, 1, 0x90;");
>> preempt_enable();
>>
>> What is the maximum value of N for which the above is okay? What
>> technique would you generally use in measuring this?
>>
>> Thanks,
>> Jason
>
> From talking to Peter (now CC'd) on IRC, it sounds like what you're
> mostly interested in is clocktime latency on reasonable hardware, with
> a goal of around ~20Âs as a maximum upper bound? I don't expect to get
> anywhere near this value at all, but if you can confirm that's a
> decent ballpark, it would make for some interesting calculations.
>
>

I would add another consideration: if you can get better latency with negligible overhead (0.1%? 0.05%), then that might make sense too. For example, it seems plausible that checking need_resched() every few blocks adds basically no overhead, and the SIMD helpers could do this themselves or perhaps only ever do a block at a time.

need_resched() costs a cacheline access, but itâs usually a hot cacheline, and the actual check is just whether a certain bit in memory is set.