Re: [PATCH bpf-next v5 4/5] bpf: Make fs kfuncs available for SYSCALL and TRACING program types

From: Juntong Deng
Date: Mon Dec 16 2024 - 18:08:55 EST


On 2024/12/14 00:41, Alexei Starovoitov wrote:
On Fri, Dec 13, 2024 at 10:51 AM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:


sched-ext is struct_ops only. No syscall progs there.


I saw some on Github [0], sorry, yes they are not in the Linux tree.

[0]:
https://github.com/search?q=repo%3Asched-ext%2Fscx%20SEC(%22syscall%22)&type=code

Ahh. I see. Those are executed from user space via prog_run.
https://github.com/sched-ext/scx/blob/e8e68e8ee80f65f62a6e900d457306217b764e58/scheds/rust/scx_lavd/src/main.rs#L794

These progs are not executed by sched-ext core,
so not really sched-ext progs.
They're auxiliary progs that populate configs and knobs in bpf maps
that sched-ext progs use later.


As BPF_PROG_TYPE_SYSCALL becomes more general, it would be valuable to
make more kfuncs available for BPF_PROG_TYPE_SYSCALL.

Maybe. I still don't understand how it helps CRIB goal.

For CRIB goals, the program type is not important. What is important is
that CRIB bpf programs are able to call the required kfuncs, and that
CRIB ebpf programs can be executed from userspace.

In our previous discussion, the conclusion was that we do not need a
separate CRIB program type [1].

BPF_PROG_TYPE_SYSCALL can be executed from userspace via prog_run, which
fits the CRIB use case of calling the ebpf program from userspace to get
process information.

So BPF_PROG_TYPE_SYSCALL becomes an option.

[1]:
https://lore.kernel.org/bpf/etzm4h5qm2jhgi6d4pevooy2sebrvgb3lsa67ym4x7zbh5bgnj@feoli4hj22so/

In fs/bpf_fs_kfuncs.c, CRIB currently needs bpf_fget_task (dump files
opened by the process), bpf_put_file, and bpf_get_task_exe_file.

So I would like these kfuncs to be available for BPF_PROG_TYPE_SYSCALL.

bpf_get_dentry_xattr, bpf_get_file_xattr, and bpf_path_d_path have
nothing to do with CRIB, but they are all in bpf_fs_kfunc_set_ids.

Should we make bpf_fs_kfunc_set_ids available to BPF_PROG_TYPE_SYSCALL
as a whole? Or create a separate set? Maybe we can discuss.

I don't think it's necessary to slide and dice that match.
Since they're all safe from syscall prog it's cleaner to enable them all.

When I said:

I still don't understand how it helps CRIB goal.

I meant how are you going to use them from CRIB ?

Patch 5 selftest does:

+ file = bpf_fget_task(task, test_fd1);
+ if (file == NULL) {
+ err = 2;
+ return 0;
+ }
+
+ if (file->f_op != &pipefifo_fops) {
+ err = 3;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ bpf_put_file(file);


It's ok for selftest, but not enough to explain the motivation and
end-to-end operation of CRIB.

Patch 2 selftest is also weak.
It's not using bpf_iter_task_file_next() like iterators are
normally used.

When selftests are basic sanity tests, it begs the question: what's next?
How are they going to be used for real?

In my plan CRIB ebpf program will look like this

SEC("syscall")
int dump_task_files(struct task_arg *arg)
...

SEC("syscall")
int dump_task_socket(struct socket_arg *arg)
...

SEC("syscall")
int dump_task_pipe(struct pipe_arg *arg)
...

Since the complexity of an ebpf program is limited, I am unable to
implement dumping all types of files within one ebpf program, so I need
to provide separate bpf programs for different file types (restoring
part is similar).

In dump_task_files will use bpf_iter_task_file to obtain the file
descriptor, the file type (socket, pipe, ...) and other necessary
information of all files opened by the process.

And then the userspace program will pass the file descriptors to
different dump_task_xxx ebpf programs based on the different file
types. bpf_fget_task will be used in the dump_task_xxx ebpf programs.

In restoring part, ebpf program is used only as minimal necessary help,
and most of the restoring part continue to use the original kernel
interface.

For example, when restoring sockets, we still use the original kernel
interface "socket" to create sockets, and only use the ebpf program
when necessary (e.g. to set TCP internal states).

Thanks to BPF CO-RE, it would be more convenient to extend the data
structure of kfuncs than the data structure of the system call
interface, so this still has value.

Once I have finished a good enough solution, I will send out this part
of the patch series (as soon as I can, even though I cannot do it
full time).

All other patch series related to opened 'files' (socket, pipe, ...)
depend on bpf_iter_task_file and bpf_fget_task.