Re: [External] Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

From: Andrii Nakryiko
Date: Fri Sep 15 2023 - 16:24:41 EST


On Thu, Sep 14, 2023 at 10:46 PM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
>
> Hello.
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> 在 2023/9/13 21:53, Chuyi Zhou 写道:
> >>> Hello.
> >>>
> >>> 在 2023/9/12 15:01, Chuyi Zhou 写道:
> >>>> css_iter and process_iter should be used in rcu section. Specifically, in
> >>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> >>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> >>>> use them directly.
> >>>>
> >>>> This patch checks whether we are in rcu cs before we want to invoke
> >>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> >>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> >>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> >>>> reject if reg->type is UNTRUSTED.
> >>>
> >>> I use the following BPF Prog to test this patch:
> >>>
> >>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> >>> int iter_task_for_each_sleep(void *ctx)
> >>> {
> >>> struct task_struct *task;
> >>> struct task_struct *cur_task = bpf_get_current_task_btf();
> >>>
> >>> if (cur_task->pid != target_pid)
> >>> return 0;
> >>> bpf_rcu_read_lock();
> >>> bpf_for_each(process, task) {
> >>> bpf_rcu_read_unlock();
> >>> if (task->pid == target_pid)
> >>> process_cnt += 1;
> >>> bpf_rcu_read_lock();
> >>> }
> >>> bpf_rcu_read_unlock();
> >>> return 0;
> >>> }
> >>>
> >>> Unfortunately, we can pass the verifier.
> >>>
> >>> Then I add some printk-messages before setting/clearing state to help
> >>> debug:
> >>>
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index d151e6b43a5f..35f3fa9471a9 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
> >>> bpf_verifier_env *env,
> >>> __mark_reg_known_zero(st);
> >>> st->type = PTR_TO_STACK; /* we don't have dedicated reg
> >>> type */
> >>> if (is_iter_need_rcu(meta)) {
> >>> + printk("mark reg_addr : %px", st);
> >>> if (in_rcu_cs(env))
> >>> st->type |= MEM_RCU;
> >>> else
> >>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
> >>> bpf_verifier_env *env, struct bpf_insn *insn,
> >>> return -EINVAL;
> >>> } else if (rcu_unlock) {
> >>> bpf_for_each_reg_in_vstate(env->cur_state,
> >>> state, reg, ({
> >>> + printk("clear reg_addr : %px MEM_RCU :
> >>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
> >>> PTR_UNTRUSTED);
> >>> if (reg->type & MEM_RCU) {
> >>> - printk("clear reg addr : %lld",
> >>> reg);
> >>> reg->type &= ~(MEM_RCU |
> >>> PTR_MAYBE_NULL);
> >>> reg->type |= PTR_UNTRUSTED;
> >>> }
> >>>
> >>>
> >>> The demsg log:
> >>>
> >>> [ 393.705324] mark reg_addr : ffff88814e40e200
> >>>
> >>> [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>>
> >>> [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>>
> >>> [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>> ....
> >>> ....
> >>>
> >>> I didn't see ffff88814e40e200 is cleared as expected because
> >>> bpf_for_each_reg_in_vstate didn't find it.
> >>>
> >>> It seems when we are doing bpf_read_unlock() in the middle of iteration
> >>> and want to clearing state through bpf_for_each_reg_in_vstate, we can
> >>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
> >>> mark_stack_slots_iter().
> >>>
> >>
> >> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
> >> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
> >>
> >> With the following change, everything seems work OK.
> >>
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index a3236651ec64..83c5ecccadb4 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
> >>
> >> #define bpf_get_spilled_reg(slot, frame) \
> >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
> >> - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \
> >> + (frame->stack[slot].slot_type[0] == STACK_SPILL ||
> >> frame->stack[slot].slot_type[0] == STACK_ITER)) \
> >> ? &frame->stack[slot].spilled_ptr : NULL)
> >>
> >> I am not sure whether this would harm some logic implicitly when using
> >> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
> >> maybe we should add a extra parameter to control the picking behaviour.
> >>
> >> #define bpf_get_spilled_reg(slot, frame, stack_type)
> >> \
> >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
> >> (frame->stack[slot].slot_type[0] == stack_type)) \
> >> ? &frame->stack[slot].spilled_ptr : NULL)
> >>
> >> Thanks.
> >
> > I don't think it's safe to just make bpf_get_spilled_reg, and
> > subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
> > just suddenly start iterating iterator states and/or dynptrs. At least
> > some of existing uses of those assume they are really working just
> > with registers.
>
> IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of
> reg including STACK_ITER.
>
> Maybe here we only need change the logic when using
> bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep
> everything else unchanged ?

Right, maybe. I see 10 uses of bpf_for_each_reg_in_vstate() in
kernel/bpf/verifier.c. Before we change the definition of
bpf_for_each_reg_in_vstate() we should validate that iterating dynptr
and iter states doesn't break any of them, that's all.

>
> Thanks.