Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

From: Chuyi Zhou
Date: Thu Sep 14 2023 - 04:57:08 EST




在 2023/9/13 21:53, Chuyi Zhou 写道:
Hello.

在 2023/9/12 15:01, Chuyi Zhou 写道:
css_iter and process_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.

This patch checks whether we are in rcu cs before we want to invoke
bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
reject if reg->type is UNTRUSTED.

I use the following BPF Prog to test this patch:

SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int iter_task_for_each_sleep(void *ctx)
{
    struct task_struct *task;
    struct task_struct *cur_task = bpf_get_current_task_btf();

    if (cur_task->pid != target_pid)
        return 0;
    bpf_rcu_read_lock();
    bpf_for_each(process, task) {
        bpf_rcu_read_unlock();
        if (task->pid == target_pid)
            process_cnt += 1;
        bpf_rcu_read_lock();
    }
    bpf_rcu_read_unlock();
    return 0;
}

Unfortunately, we can pass the verifier.

Then I add some printk-messages before setting/clearing state to help debug:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d151e6b43a5f..35f3fa9471a9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
                __mark_reg_known_zero(st);
                st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
                if (is_iter_need_rcu(meta)) {
+                       printk("mark reg_addr : %px", st);
                        if (in_rcu_cs(env))
                                st->type |= MEM_RCU;
                        else
@@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
                        return -EINVAL;
                } else if (rcu_unlock) {
                        bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+                               printk("clear reg_addr : %px MEM_RCU : %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & PTR_UNTRUSTED);
                                if (reg->type & MEM_RCU) {
-                                       printk("clear reg addr : %lld", reg);
                                        reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
                                        reg->type |= PTR_UNTRUSTED;
                                }


The demsg log:

[  393.705324] mark reg_addr : ffff88814e40e200

[  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 PTR_UNTRUSTED : 0

[  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 PTR_UNTRUSTED : 0

[  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 PTR_UNTRUSTED : 0
....
....

I didn't see ffff88814e40e200 is cleared as expected because bpf_for_each_reg_in_vstate didn't find it.

It seems when we are doing bpf_read_unlock() in the middle of iteration and want to clearing state through bpf_for_each_reg_in_vstate, we can not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in mark_stack_slots_iter().


bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in mark_stack_slots_iter() we has marked the slots *STACK_ITER*

With the following change, everything seems work OK.

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a3236651ec64..83c5ecccadb4 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -387,7 +387,7 @@ struct bpf_verifier_state {

#define bpf_get_spilled_reg(slot, frame) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
- (frame->stack[slot].slot_type[0] == STACK_SPILL)) \
+ (frame->stack[slot].slot_type[0] == STACK_SPILL || frame->stack[slot].slot_type[0] == STACK_ITER)) \
? &frame->stack[slot].spilled_ptr : NULL)

I am not sure whether this would harm some logic implicitly when using bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, maybe we should add a extra parameter to control the picking behaviour.

#define bpf_get_spilled_reg(slot, frame, stack_type)
\
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
(frame->stack[slot].slot_type[0] == stack_type)) \
? &frame->stack[slot].spilled_ptr : NULL)

Thanks.