Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c

From: Mark Rutland
Date: Tue Nov 21 2017 - 06:48:10 EST

On Mon, Nov 20, 2017 at 01:01:43PM -0800, Sami Tolvanen wrote:
> On Mon, Nov 20, 2017 at 03:25:31PM +0000, Ard Biesheuvel wrote:
> > However, under LTO this all changes, and it is no longer guaranteed
> > that the NEON registers are only touched between the kernel mode
> > neon begin/end calls.

Just to check, I take it that the feat is that LTO can merge the
begin/asm/end, reordering bits to the begin/end relative to the asm?

AFAICT, assuming that LTO respects our compiler barriers:

* the preempt_disable() in kernel_neon_begin() should prevent the asm
block from being moved earlier, but it looks like it could be moved
somewhere in the middle of local_bh_enable().

* the __this_cpu_xchg() in kernel_neon_end() *isn't* ordered w.r.t the
asm, as it doesn't have a full memory clobber, and could be
re-ordered before the asm block.

We *could* solve this case with a barrier() at the end of
kernel_neon_begin() and the start of kernel_neon_end(), but it is a
whack-a-mole solution. :/

... this also raises the question as to how the {__,}this_cpu*() ops are
expected to be ordered w.r.t. other local operations, as that's not
clear to me even in the absence of LTO.

> LTO operates on LLVM IR, so disabling LTO for this file should make
> sure there won't be any unsafe optimizations. Are there other places
> in the kernel that might have this issue?

I suspect that as above, there are a number of places that implicitly
rely on compilation-unit boundaries enforcing (local) ordering w.r.t.
asynchronous events, as the compiler won't otherwise be able to reorder
code such as cpu-local flag manipulation.

I think we have a much bigger problem here.