Re: [RFC] bpf: Rethinking BPF safety, BPF open-coded iterators, and possible improvements (runtime protection)

From: Alexei Starovoitov
Date: Tue Feb 04 2025 - 19:00:13 EST


On Tue, Feb 4, 2025 at 11:35 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:
>
> This discussion comes from the patch series open-coded BPF file
> iterator, which was Nack-ed and thus ended [0].
>
> Thanks for the feedback from Christian, Linus, and Al, all very helpful.
>
> The problems encountered in this patch series may also be encountered in
> other BPF open-coded iterators to be added in the future, or in other
> BPF usage scenarios.
>
> So maybe this is a good opportunity for us to discuss all of this and
> rethink BPF safety, BPF open coded iterators, and possible improvements.
>
> [0]:
> https://lore.kernel.org/bpf/AM6PR03MB50801990BD93BFA2297A123599EC2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#t
>
> What do we expect from BPF safety?
> ----------------------------------
>
> Christian points out the important fact that BPF programs can hold
> references for a long time and cause weird issues.
>
> This is an inherent flaw in BPF. Since the addition of bpf_loop and
> BPF open-code iterators, the myth that BPF is "absolutely" safe has
> been broken.
>
> The BPF verifier is a static verifier and has no way of knowing how
> long a BPF program will actually run.
>
> For example, the following BPF program can freeze your computer, but
> can pass the BPF verifier smoothly.
>
> SEC("raw_tp/sched_switch")
> int BPF_PROG(on_switch)
> {
> struct bpf_iter_num it;
> int *v;
> bpf_iter_num_new(&it, 0, 100000);
> while ((v = bpf_iter_num_next(&it))) {
> struct bpf_iter_num it2;
> bpf_iter_num_new(&it2, 0, 100000);
> while ((v = bpf_iter_num_next(&it2))) {
> bpf_printk("BPF Bomb\n");
> }
> bpf_iter_num_destroy(&it2);
> }
> bpf_iter_num_destroy(&it);
> return 0;
> }
>
> This BPF program runs a huge loop at each schedule.
>
> bpf_iter_num_new is a common iterator that we can use in almost any
> context, including LSM, sched-ext, tracing, etc.
>
> We can run large, long loops on any critical code path and freeze the
> system, since the BPF verifier has no way of knowing how long the
> iteration will run.

This is completely orthogonal to the issue that Christian explained.
The long runtime of *malicious* bpf progs is a known issue and
there are wip patches to address that.

> Then holding references or holding locks in BPF programs doesn't seem
> to be a problem?

It's a known issue.

> This brings us back to the question at the beginning, what do we expect
> from BPF safety?

Safety is paramount.

> What do we expect from BPF and BPF open coded iterators?

They are not special. All progs can be exploited if bad actors
try hard enough. Including unprivileged progs like tcpdump.
That's why unpriv is disabled by default.

> Would we expect BPF programs to have flexible access to more information
> in the kernel?

yes, but the tracing progs must be free of side effects.

> Would we expect to have more BPF open-coded iterators allowing BPF
> programs to iterate through various data structures in the kernel?

true, but it's nuanced.

> What are the boundaries of what we expect BPF to be able to do?

Tracing bpf progs are readonly. If they cause side effects
they must be fixed.

> Of course, there may be risks, but maybe those risks can be solved by
> improving BPF?

Please help by contributing patches instead of screaming "fire fire".