Re: Help with verifier failure
From: Brendan Jackman
Date: Thu Apr 22 2021 - 09:55:37 EST
On Wed, 21 Apr 2021 at 18:59, Yonghong Song <yhs@xxxxxx> wrote:
> On 4/21/21 8:06 AM, Yonghong Song wrote:
> > On 4/21/21 5:23 AM, Brendan Jackman wrote:
> > Thanks, Brendan. Looks at least the verifier failure is triggered
> > by recent clang changes. I will take a look whether we could
> > improve verifier for such a case and whether we could improve
> > clang to avoid generate such codes the verifier doesn't like.
> > Will get back to you once I had concrete analysis.
> >
> >>
> >> This seems like it must be a common pitfall, any idea what we can do
> >> to fix it
> >> and avoid it in future? Am I misunderstanding the issue?
>
> First, for the example code you provided, I checked with llvm11, llvm12
> and latest trunk llvm (llvm13-dev) and they all generated similar codes,
> which may trigger verifier failure. Somehow you original code could be
> different may only show up with a recent llvm, I guess.
>
> Checking llvm IR, the divergence between "w2 = w8" and "if r8 < 0x1000"
> appears in insn scheduling phase related handling PHIs. Need to further
> check whether it is possible to prevent the compiler from generating
> such codes.
>
> The latest kernel already had the ability to track register equivalence.
> However, the tracking is conservative for 32bit mov like "w2 = w8" as
> you described in the above. if we have code like "r2 = r8; if r8 <
> 0x1000 ...", we will be all good.
>
> The following hack fixed the issue,
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 58730872f7e5..54f418fd6a4a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7728,12 +7728,20 @@ static int check_alu_op(struct bpf_verifier_env
> *env, struct bpf_insn *insn)
> insn->src_reg);
> return -EACCES;
> } else if (src_reg->type == SCALAR_VALUE) {
> + /* If src_reg is in 32bit range,
> there is
> + * no need to reset the ID.
> + */
> + bool is_32bit_src =
> src_reg->umax_value <= 0x7fffffff;
> +
> + if (is_32bit_src && !src_reg->id)
> + src_reg->id = ++env->id_gen;
> *dst_reg = *src_reg;
> /* Make sure ID is cleared
> otherwise
> * dst_reg min/max could be
> incorrectly
> * propagated into src_reg by
> find_equal_scalars()
> */
> - dst_reg->id = 0;
> + if (!is_32bit_src)
> + dst_reg->id = 0;
> dst_reg->live |= REG_LIVE_WRITTEN;
> dst_reg->subreg_def =
> env->insn_idx + 1;
> } else {
>
> Basically, for a 32bit mov insn like "w2 = w8", if we can ensure
> that "w8" is 32bit and has no possibility that upper 32bit is set
> for r8, we can declare them equivalent. This fixed your issue.
>
> Will try to submit a formal patch later.
Ah.. I did not realise this equivalence tracking with reg.id was there
for scalar values! I also didn't take any notice of the use of 32-bit
operations in the assembly, thanks for pointing that out.
Yes it sounds like this is certainly worth fixing in the kernel - even
if Clang stops generating the code today it will probably start doing
so again in the future. I can also help with the verifier work if
needed.