Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached

From: Robin Murphy
Date: Tue Jul 09 2024 - 11:53:00 EST


On 09/07/2024 1:22 pm, Yangyu Chen wrote:


On Jul 9, 2024, at 19:46, Christoph Hellwig <hch@xxxxxx> wrote:

On Tue, Jul 09, 2024 at 07:39:29PM +0800, Yangyu Chen wrote:
The reason is that some optimizations failed to apply after adding
some passes. I will fix the compiler later. Whatever, we should not
rely on this optimization to get the code being successfully compiled.

The Linux kernel relies on constant propagation and dead code
eliminiation a lot to make code simpler and more readable.

Actually, the compiler is patched LLVM with -O2 optimization. I didn’t
turn off the optimization.

Well, sorry, you did do that, by patching the compiler in a way which makes it no longer happen as before. If LLVM is otherwise able to make this optimisation as expected then to me that sounds like your patch effectively causes a codegen regression in LLVM, thus it should hardly be the concern of Linux, nor every other project which may get worse codegen because of it, to work around it.

Regardless of whether people think it's reasonable to *depend* on compiler optimisation or not, emitting dead code which was not previously emitted at the same optimisation level seems like a clear step backwards.

You can see what we did for the compiler here[1] and compile the
kernel with `-march=rv64imac_zicond_zicldst` added to KBUILD_CFLAGS.
I added conditional load/store pass as Intel did for the x86 APX
extension, which appeared last year (called hoist load/store in
LLVM if you want to search the PR), and then LLVM failed to optimize
this.

The only failed symbol on the kernel with `ARCH=riscv defconfig`
is `arch_dma_set_uncached` since the compiler requires all possible
values to be known. I think a pattern like in kernel/dma/direct.c:349
for symbol `arch_dma_clear_uncached`, which uses `if
(IS_ENABLE(CONFIG_ARCH_HAS_xxx)) xxx` is acceptable. But for the
symbol `arch_dma_set_uncached`, a complex analysis is needed for a
value set in a block of branches. I think we should not rely on such
compiler optimization in such a complex pattern.

I'm not a compiler guy, but is it really that complex when the variable is only ever written with the same compile-time-constant value that it's already initialised with? If the optimisation pass is so focused on being able to use a conditional store instruction that it would rather emit one which has no effect either way than elide it entirely, that doesn't strike me as a particularly good optimisation :/

Thanks,
Robin.


In addition, patching this way can also make this symbol safer to use.

[1] https://github.com/cyyself/llvm-project/tree/zicldst-support-bugless-v3

Thanks,
Yangyu Chen