Re: [PATCH bpf-next] Detect jumping to reserved code during check_cfg()

From: Andrii Nakryiko
Date: Tue Oct 10 2023 - 22:43:03 EST


On Tue, Oct 10, 2023 at 1:33 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>
> On 10/10/23 9:02 AM, John Fastabend wrote:
> > Hao Sun wrote:
> >> Currently, we don't check if the branch-taken of a jump is reserved code of
> >> ld_imm64. Instead, such a issue is captured in check_ld_imm(). The verifier
> >> gives the following log in such case:
> >>
> >> func#0 @0
> >> 0: R1=ctx(off=0,imm=0) R10=fp0
> >> 0: (18) r4 = 0xffff888103436000 ; R4_w=map_ptr(off=0,ks=4,vs=128,imm=0)
> >> 2: (18) r1 = 0x1d ; R1_w=29
> >> 4: (55) if r4 != 0x0 goto pc+4 ; R4_w=map_ptr(off=0,ks=4,vs=128,imm=0)
> >> 5: (1c) w1 -= w1 ; R1_w=0
> >> 6: (18) r5 = 0x32 ; R5_w=50
> >> 8: (56) if w5 != 0xfffffff4 goto pc-2
> >> mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx -1
> >> mark_precise: frame0: regs=r5 stack= before 6: (18) r5 = 0x32
> >> 7: R5_w=50
> >> 7: BUG_ld_00
> >> invalid BPF_LD_IMM insn
> >>
> >> Here the verifier rejects the program because it thinks insn at 7 is an
> >> invalid BPF_LD_IMM, but such a error log is not accurate since the issue
> >> is jumping to reserved code not because the program contains invalid insn.
> >> Therefore, make the verifier check the jump target during check_cfg(). For
> >> the same program, the verifier reports the following log:
> >
> > I think we at least would want a test case for this. Also how did you create
> > this case? Is it just something you did manually and noticed a strange error?
>
> Curious as well.
>
> We do have test cases which try to jump into the middle of a double insn as can
> be seen that this patch breaks BPF CI with regards to log mismatch below (which
> still needs to be adapted, too). Either way, it probably doesn't hurt to also add
> the above snippet as a test.
>
> Hao, as I understand, the patch here is an usability improvement (not a fix per se)
> where we reject such cases earlier during cfg check rather than at a later point
> where we validate ld_imm instruction. Or are there cases you found which were not
> yet captured via current check_ld_imm()?
>
> test_verifier failure log :
>
> #458/u test1 ld_imm64 FAIL
> Unexpected verifier log!
> EXP: R1 pointer comparison
> RES:
> FAIL
> Unexpected error message!
> EXP: R1 pointer comparison
> RES: jump to reserved code from insn 0 to 2
> verification time 22 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> jump to reserved code from insn 0 to 2
> verification time 22 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> #458/p test1 ld_imm64 FAIL
> Unexpected verifier log!
> EXP: invalid BPF_LD_IMM insn
> RES:
> FAIL
> Unexpected error message!
> EXP: invalid BPF_LD_IMM insn
> RES: jump to reserved code from insn 0 to 2
> verification time 9 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> jump to reserved code from insn 0 to 2
> verification time 9 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> #459/u test2 ld_imm64 FAIL
> Unexpected verifier log!
> EXP: R1 pointer comparison
> RES:
> FAIL
> Unexpected error message!
> EXP: R1 pointer comparison
> RES: jump to reserved code from insn 0 to 2
> verification time 11 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> jump to reserved code from insn 0 to 2
> verification time 11 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> #459/p test2 ld_imm64 FAIL
> Unexpected verifier log!
> EXP: invalid BPF_LD_IMM insn
> RES:
> FAIL
> Unexpected error message!
> EXP: invalid BPF_LD_IMM insn
> RES: jump to reserved code from insn 0 to 2
> verification time 8 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> jump to reserved code from insn 0 to 2
> verification time 8 usec
> stack depth 0
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> #460/u test3 ld_imm64 OK
>
> >> func#0 @0
> >> jump to reserved code from insn 8 to 7
> >>
> >> ---
> >>
> >>
> >> Signed-off-by: Hao Sun <sunhao.th@xxxxxxxxx>
>
> nit: This needs to be before the "---" line.
>
> >> ---
> >> kernel/bpf/verifier.c | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index eed7350e15f4..725ac0b464cf 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -14980,6 +14980,7 @@ static int push_insn(int t, int w, int e, struct bpf_verifier_env *env,
> >> {
> >> int *insn_stack = env->cfg.insn_stack;
> >> int *insn_state = env->cfg.insn_state;
> >> + struct bpf_insn *insns = env->prog->insnsi;
> >>
> >> if (e == FALLTHROUGH && insn_state[t] >= (DISCOVERED | FALLTHROUGH))
> >> return DONE_EXPLORING;
> >> @@ -14993,6 +14994,12 @@ static int push_insn(int t, int w, int e, struct bpf_verifier_env *env,
> >> return -EINVAL;
> >> }
> >>
> >> + if (e == BRANCH && insns[w].code == 0) {
> >> + verbose_linfo(env, t, "%d", t);
> >> + verbose(env, "jump to reserved code from insn %d to %d\n", t, w);
> >> + return -EINVAL;
> >> + }
>
> Other than that, lgtm.

We do rely quite a lot on verifier not complaining eagerly about some
potentially invalid instructions if it's provable that some portion of
the code won't ever be reached (think using .rodata variables for
feature gating, poisoning intructions due to failed CO-RE relocation,
which libbpf does actively, except it's using a call to non-existing
helper). As such, check_cfg() is a wrong place to do such validity
checks because some of the branches might never be run and validated
in practice.

This seems like a pretty obscure case of fuzzer generated test with
random jumps into the middle of ldimm64 instruction. I think the tool
should be able to avoid this or handle verifier log just fine in such
situations. On the other hand, valid code generated by compilers will
never have such jumps.

So perhaps we can improve existing "invalid BPF_LD_IMM insn" message,
but let's not teach check_cfg() more checks than necessary?

>
> >> if (e == BRANCH) {
> >> /* mark branch target for state pruning */
> >> mark_prune_point(env, w);
> >>
>