Re: [PATCH v3 3/5] LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions

From: WANG Rui

Date: Mon Apr 13 2026 - 04:20:18 EST

On Mon, Apr 13, 2026 at 12:49 PM <bot+bpf-ci@xxxxxxxxxx> wrote:
>
> > diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> > index fefda4050a20..c9a32f124f5e 100644
> > --- a/arch/loongarch/net/bpf_jit.c
> > +++ b/arch/loongarch/net/bpf_jit.c
>
> [ ... ]
>
> > @@ -388,11 +408,32 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
> > /* src = atomic_fetch_<op>(dst + off, src) */
> > case BPF_ADD | BPF_FETCH:
> > + switch (BPF_SIZE(insn->code)) {
> > + case BPF_B:
> > + if (cpu_has_lam_bh) {
> > + emit_insn(ctx, amaddb, src, t1, t3);
> > + emit_zext_32(ctx, src, true);
>
> Is emit_zext_32() sufficient for zero-extending an 8-bit value here?
>
> emit_zext_32() uses lu32id to clear bits [63:32], but for a byte-width
> atomic fetch the returned value needs bits [63:8] zeroed. If amadd.b
> follows the LoongArch convention of sign-extending sub-word results
> (as amadd.w does for 32-bit values), then a byte value like 0x80 would
> produce 0x00000000ffffff80 after emit_zext_32() instead of the expected
> 0x0000000000000080.
>
> Elsewhere in this file, 16-bit zero-extension already uses bstrpickd:
>
> emit_insn(ctx, bstrpickd, dst, dst, 15, 0);
>
> Should the BPF_B and BPF_H cases here use bstrpickd with the
> appropriate bit range (7,0 for bytes and 15,0 for halfwords) instead
> of emit_zext_32()?

I don't think truncating byte and sub-word results to only the low 8
or 16 bits is the right model here. The current behavior intentionally
keeps the low 32 bits of the computed value in the register, because
all uses explicitly extend from 32 bits to 64 bits when feeding
instructions that require 64-bit operands. The same reasoning should
apply to byte and sub-word types as well.

For example:

Def:
int8 a = amadd.b
int8 b = amadd.b

Use:
t1 = sext32 a
t2 = sext32 b
blt t1, t2 // a < b

Take a = -1 and b = 0. In the 64-bit registers:

If we keep the low 32 bits:
a: 0x0000_0000_ffff_ffff
b: 0x0000_0000_0000_0000

then:
t1: 0xffff_ffff_ffff_ffff
t2: 0x0000_0000_0000_0000

So a < b evaluates to true, as expected.

But if we only keep the low 8 bits:
a: 0x0000_0000_0000_00ff
b: 0x0000_0000_0000_0000

then:
t1: 0x0000_0000_0000_00ff
t2: 0x0000_0000_0000_0000

Now a < b becomes false, which is clearly not the intended result.

In other words, narrowing the result to 8 or 16 bits here would break
the existing 32-bit canonicalization rule for sub-word values.

More generally, the "always explicitly extend" pattern doesn't map
particularly well to the LoongArch ISA. In the long run, it would be
preferable if we could eliminate these explicit extensions altogether
rather than relying on them as part of the semantic model.

Thanks,
Rui

>
> > + case BPF_H:
> > + if (cpu_has_lam_bh) {
> > + emit_insn(ctx, amaddh, src, t1, t3);
> > + emit_zext_32(ctx, src, true);
>
> Same concern for the halfword case -- emit_zext_32() would leave
> bits [31:16] with sign-extended values if amadd.h sign-extends.
>
> [ ... ]
>
> > @@ -421,11 +462,32 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
> > /* src = atomic_xchg(dst + off, src); */
> > case BPF_XCHG:
> > + switch (BPF_SIZE(insn->code)) {
> > + case BPF_B:
> > + if (cpu_has_lam_bh) {
> > + emit_insn(ctx, amswapb, src, t1, t3);
> > + emit_zext_32(ctx, src, true);
>
> Same question applies to the exchange path -- amswap.b and amswap.h
> return the old value and would need proper sub-word zero-extension
> rather than the 32-bit emit_zext_32().
>
> > + case BPF_H:
> > + if (cpu_has_lam_bh) {
> > + emit_insn(ctx, amswaph, src, t1, t3);
> > + emit_zext_32(ctx, src, true);
>
> Same here for amswap.h.
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24325384786