Re: [PATCH bpf] arm32, bpf: Fix sign-extension mov instruction

From: Puranjay Mohan
Date: Fri Apr 19 2024 - 14:52:43 EST

Next message: Shresth Prasad: "Re: [PATCH][next] drivers: video: Simplify device_node cleanup using __free"
Previous message: Lukasz Luba: "Re: [PATCH v1 16/16] thermal: core: Relocate critical and hot trip handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> writes:

> On Tue, Apr 09, 2024 at 09:50:38AM +0000, Puranjay Mohan wrote:
>> The current implementation of the mov instruction with sign extension
>> clobbers the source register because it sign extends the source and then
>> moves it to the destination.
>>
>> Fix this by moving the src to a temporary register before doing the sign
>> extension only if src is not an emulated register (on the scratch stack).
>>
>> Also fix the emit_a32_movsx_r64() to put the register back on scratch
>> stack if that register is emulated on stack.
>
> It would be good to include in the commit message an example or two of
> the resulting assembly code so that it's clear what the expected
> generation is. Instead, I'm going to have to work it out myself, but
> I'm quite sure this is information you already have.
>
>> Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
>> Reported-by: syzbot+186522670e6722692d86@xxxxxxxxxxxxxxxxxxxxxxxxx
>> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@xxxxxxxxxx/
>> Signed-off-by: Puranjay Mohan <puranjay@xxxxxxxxxx>
>> ---
>> arch/arm/net/bpf_jit_32.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>> index 1d672457d02f..8fde6ab66cb4 100644
>> --- a/arch/arm/net/bpf_jit_32.c
>> +++ b/arch/arm/net/bpf_jit_32.c
>> @@ -878,6 +878,13 @@ static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off,
>>
>> rt = arm_bpf_get_reg32(src, tmp[0], ctx);
>> if (off && off != 32) {
>> + /* If rt is not a stacked register, move it to tmp, so it doesn't get clobbered by
>> + * the shift operations.
>> + */
>> + if (rt == src) {
>> + emit(ARM_MOV_R(tmp[0], rt), ctx);
>> + rt = tmp[0];
>> + }
>
> This change is adding inefficiency, don't we want to have the JIT
> creating as efficient code as possible within the bounds of
> reasonableness?
>
>> emit(ARM_LSL_I(rt, rt, 32 - off), ctx);
>> emit(ARM_ASR_I(rt, rt, 32 - off), ctx);
>
> LSL and ASR can very easily take a different source register to the
> destination register. All this needs to be is:
>
> emit(ARM_LSL_I(tmp[0], rt, 32 - off), ctx);
> emit(ARM_ASR_I(tmp[0], tmp[0], 32 - off), ctx);
> rt = tmp[0];
>
> This will generate:
>
> lsl tmp[0], src, #32-off
> asr tmp[0], tmp[0], #32-off
>
> and then the store to the output register will occur.
>
> What about the high-32 bits of the register pair - should that be
> taking any value?
>
>> }
>
> I notice in passing that the comments are out of sync with the
> code - please update the comments along with code changes.
>
>> @@ -919,15 +926,15 @@ static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 ds
>> const s8 *tmp = bpf2a32[TMP_REG_1];
>> const s8 *rt;
>>
>> - rt = arm_bpf_get_reg64(dst, tmp, ctx);
>> -
>> emit_a32_mov_r(dst_lo, src_lo, off, ctx);
>> if (!is64) {
>> if (!ctx->prog->aux->verifier_zext)
>> /* Zero out high 4 bytes */
>> emit_a32_mov_i(dst_hi, 0, ctx);
>> } else {
>> + rt = arm_bpf_get_reg64(dst, tmp, ctx);
>> emit(ARM_ASR_I(rt[0], rt[1], 31), ctx);
>> + arm_bpf_put_reg64(dst, rt, ctx);
>> }
>> }
>
> Why oh why oh why are we emitting code to read the source register
> (which may be a load), then write it to the destination (which may
> be a store) to only then immediately reload from the destination
> to then do the sign extension? This is madness.
>
> Please... apply some thought to the code generation from the JIT...
> or I will remove you from being a maintainer of this code. I spent
> time crafting some parts of the JIT to generate efficient code and
> I'm seeing that a lot of that work is now being thrown away by
> someone who seemingly doesn't care about generating "good" code.
>

Sorry for this, I also like to make sure the JITs are as efficient as
possible. I was too focused on fixing this as fast as possible and
didn't pay attention that day.

I have reimplemented the whole thing again to make sure all bugs are
fixed. The commit message has the generated assembly for all cases:

https://lore.kernel.org/all/20240419182832.27707-1-puranjay@xxxxxxxxxx/

Thanks,
Puranjay

Next message: Shresth Prasad: "Re: [PATCH][next] drivers: video: Simplify device_node cleanup using __free"
Previous message: Lukasz Luba: "Re: [PATCH v1 16/16] thermal: core: Relocate critical and hot trip handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]