Re: [PATCH net-next v6 01/23] asm: simd context helper API

From: Ard Biesheuvel
Date: Fri Sep 28 2018 - 04:49:51 EST


(+ Joe)

On 28 September 2018 at 10:28, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
> On 25 September 2018 at 16:56, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
>> Sometimes it's useful to amortize calls to XSAVE/XRSTOR and the related
>> FPU/SIMD functions over a number of calls, because FPU restoration is
>> quite expensive. This adds a simple header for carrying out this pattern:
>>
>> simd_context_t simd_context;
>>
>> simd_get(&simd_context);
>> while ((item = get_item_from_queue()) != NULL) {
>> encrypt_item(item, simd_context);
>> simd_relax(&simd_context);
>> }
>> simd_put(&simd_context);
>>
>> The relaxation step ensures that we don't trample over preemption, and
>> the get/put API should be a familiar paradigm in the kernel.
>>
>> On the other end, code that actually wants to use SIMD instructions can
>> accept this as a parameter and check it via:
>>
>> void encrypt_item(struct item *item, simd_context_t *simd_context)
>> {
>> if (item->len > LARGE_FOR_SIMD && simd_use(simd_context))
>> wild_simd_code(item);
>> else
>> boring_scalar_code(item);
>> }
>>
>> The actual XSAVE happens during simd_use (and only on the first time),
>> so that if the context is never actually used, no performance penalty is
>> hit.
>>
>> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
>> Cc: Samuel Neves <sneves@xxxxxxxxx>
>> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
>> Cc: linux-arch@xxxxxxxxxxxxxxx
>> ---
>> arch/alpha/include/asm/Kbuild | 5 ++-
>> arch/arc/include/asm/Kbuild | 1 +
>> arch/arm/include/asm/simd.h | 63 ++++++++++++++++++++++++++++++
>> arch/arm64/include/asm/simd.h | 51 +++++++++++++++++++++---
>> arch/c6x/include/asm/Kbuild | 3 +-
>> arch/h8300/include/asm/Kbuild | 3 +-
>> arch/hexagon/include/asm/Kbuild | 1 +
>> arch/ia64/include/asm/Kbuild | 1 +
>> arch/m68k/include/asm/Kbuild | 1 +
>> arch/microblaze/include/asm/Kbuild | 1 +
>> arch/mips/include/asm/Kbuild | 1 +
>> arch/nds32/include/asm/Kbuild | 7 ++--
>> arch/nios2/include/asm/Kbuild | 1 +
>> arch/openrisc/include/asm/Kbuild | 7 ++--
>> arch/parisc/include/asm/Kbuild | 1 +
>> arch/powerpc/include/asm/Kbuild | 3 +-
>> arch/riscv/include/asm/Kbuild | 3 +-
>> arch/s390/include/asm/Kbuild | 3 +-
>> arch/sh/include/asm/Kbuild | 1 +
>> arch/sparc/include/asm/Kbuild | 1 +
>> arch/um/include/asm/Kbuild | 3 +-
>> arch/unicore32/include/asm/Kbuild | 1 +
>> arch/x86/include/asm/simd.h | 44 ++++++++++++++++++++-
>> arch/xtensa/include/asm/Kbuild | 1 +
>> include/asm-generic/simd.h | 20 ++++++++++
>> include/linux/simd.h | 28 +++++++++++++
>> 26 files changed, 234 insertions(+), 21 deletions(-)
>> create mode 100644 arch/arm/include/asm/simd.h
>> create mode 100644 include/linux/simd.h
>>
>> diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
>> index 0580cb8c84b2..07b2c1025d34 100644
>> --- a/arch/alpha/include/asm/Kbuild
>> +++ b/arch/alpha/include/asm/Kbuild
>> @@ -2,14 +2,15 @@
>>
>>
>> generic-y += compat.h
>> +generic-y += current.h
>> generic-y += exec.h
>> generic-y += export.h
>> generic-y += fb.h
>> generic-y += irq_work.h
>> +generic-y += kprobes.h
>> generic-y += mcs_spinlock.h
>> generic-y += mm-arch-hooks.h
>> generic-y += preempt.h
>> generic-y += sections.h
>> +generic-y += simd.h
>> generic-y += trace_clock.h
>> -generic-y += current.h
>> -generic-y += kprobes.h
>
> Given that this patch applies to all architectures at once, it is
> probably better to drop the unrelated reordering hunks to avoid
> conflicts.
>
>> diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
>> index feed50ce89fa..a7f4255f1649 100644
>> --- a/arch/arc/include/asm/Kbuild
>> +++ b/arch/arc/include/asm/Kbuild
>> @@ -22,6 +22,7 @@ generic-y += parport.h
>> generic-y += pci.h
>> generic-y += percpu.h
>> generic-y += preempt.h
>> +generic-y += simd.h
>> generic-y += topology.h
>> generic-y += trace_clock.h
>> generic-y += user.h
>> diff --git a/arch/arm/include/asm/simd.h b/arch/arm/include/asm/simd.h
>> new file mode 100644
>> index 000000000000..263950dd69cb
>> --- /dev/null
>> +++ b/arch/arm/include/asm/simd.h
>> @@ -0,0 +1,63 @@
>> +/* SPDX-License-Identifier: GPL-2.0
>> + *
>> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@xxxxxxxxx>. All Rights Reserved.
>> + */
>> +
>> +#include <linux/simd.h>
>> +#ifndef _ASM_SIMD_H
>> +#define _ASM_SIMD_H
>> +
>> +#ifdef CONFIG_KERNEL_MODE_NEON
>> +#include <asm/neon.h>
>> +
>> +static __must_check inline bool may_use_simd(void)
>> +{
>> + return !in_interrupt();
>> +}
>> +
>
> Remember this guy?
>
> https://marc.info/?l=linux-arch&m=149631094625176&w=2
>
> That was never merged, so let's get it right this time.
>
...
>> diff --git a/include/linux/simd.h b/include/linux/simd.h
>> new file mode 100644
>> index 000000000000..33bba21012ff
>> --- /dev/null
>> +++ b/include/linux/simd.h
>> @@ -0,0 +1,28 @@
>> +/* SPDX-License-Identifier: GPL-2.0
>> + *
>> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@xxxxxxxxx>. All Rights Reserved.
>> + */
>> +
>> +#ifndef _SIMD_H
>> +#define _SIMD_H
>> +
>> +typedef enum {
>> + HAVE_NO_SIMD = 1 << 0,
>> + HAVE_FULL_SIMD = 1 << 1,
>> + HAVE_SIMD_IN_USE = 1 << 31
>> +} simd_context_t;
>> +

Oh, and another thing (and I'm surprised checkpatch.pl didn't complain
about it): the use of typedef in new code is strongly discouraged.
This policy predates my involvement, so perhaps Joe can elaborate on
the rationale?


>> +#include <linux/sched.h>
>> +#include <asm/simd.h>
>> +
>> +static inline void simd_relax(simd_context_t *ctx)
>> +{
>> +#ifdef CONFIG_PREEMPT
>> + if ((*ctx & HAVE_SIMD_IN_USE) && need_resched()) {
>> + simd_put(ctx);
>> + simd_get(ctx);
>> + }
>> +#endif
>
> Could we return a bool here indicating whether we rescheduled or not?
> In some cases, we could pass that into the asm code as a 'reload'
> param, allowing repeated loads of key schedules, round constant tables
> or S-boxes to be elided.
>
>> +}
>> +
>> +#endif /* _SIMD_H */
>> --
>> 2.19.0
>>