Re: [PATCH bpf-next v2 1/2] bpf, arm64: Remove redundant bpf_flush_icache() after pack allocator finalize
From: Xu Kuohai
Date: Tue Apr 14 2026 - 07:17:42 EST
On 4/14/2026 5:38 PM, Puranjay Mohan wrote:
On Tue, Apr 14, 2026 at 2:56 AM Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> wrote:Right, thanks for the explanation!
On 4/14/2026 3:11 AM, Puranjay Mohan wrote:
bpf_flush_icache() calls flush_icache_range() to clean the data cache
and invalidate the instruction cache for the JITed code region. However,
since commit 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory
management"), this flush is redundant.
bpf_jit_binary_pack_finalize() copies the JITed instructions to the ROX
region via bpf_arch_text_copy() -> aarch64_insn_copy() -> __text_poke(),
and __text_poke() already calls flush_icache_range() on the written
range. The subsequent bpf_flush_icache() repeats the same cache
maintenance on an overlapping range, including an unnecessary second
synchronous IPI to all CPUs via kick_all_cpus_sync().
So icache is flushed twice: once per instruction and again after all
instructions are copied. I think it's better to remove the per-instruction
flush and retain the single final flush to avoid repeating flush overhead
for each instruction.
No, bpf_jit_binary_pack_finalize() is called at the end after the
whole program is jited, and it calls: bpf_arch_text_copy(ro_header,
rw_header, rw_header->size); which does aarch64_insn_copy(dst, src,
len), this calls __text_poke() which copies the whole program and then
does flush_icache_range((uintptr_t)addr, (uintptr_t)addr + len); once.
This is correct, after this we don't need to call flush_icache_range()
on the same range again.
If we had been calling flush_icache_range() for each instruction, the
system would hang due to the storm of IPIs.
LGTM.