Re: [RFC] bpf: Rethinking BPF safety, BPF open-coded iterators, and possible improvements (runtime protection)
From: Alexei Starovoitov
Date: Fri Feb 07 2025 - 21:40:32 EST
On Tue, Feb 4, 2025 at 4:40 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:
>
> On 2025/2/4 23:59, Alexei Starovoitov wrote:
> > On Tue, Feb 4, 2025 at 11:35 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:
> >>
> >> This discussion comes from the patch series open-coded BPF file
> >> iterator, which was Nack-ed and thus ended [0].
> >>
> >> Thanks for the feedback from Christian, Linus, and Al, all very helpful.
> >>
> >> The problems encountered in this patch series may also be encountered in
> >> other BPF open-coded iterators to be added in the future, or in other
> >> BPF usage scenarios.
> >>
> >> So maybe this is a good opportunity for us to discuss all of this and
> >> rethink BPF safety, BPF open coded iterators, and possible improvements.
> >>
> >> [0]:
> >> https://lore.kernel.org/bpf/AM6PR03MB50801990BD93BFA2297A123599EC2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#t
> >>
> >> What do we expect from BPF safety?
> >> ----------------------------------
> >>
> >> Christian points out the important fact that BPF programs can hold
> >> references for a long time and cause weird issues.
> >>
> >> This is an inherent flaw in BPF. Since the addition of bpf_loop and
> >> BPF open-code iterators, the myth that BPF is "absolutely" safe has
> >> been broken.
> >>
> >> The BPF verifier is a static verifier and has no way of knowing how
> >> long a BPF program will actually run.
> >>
> >> For example, the following BPF program can freeze your computer, but
> >> can pass the BPF verifier smoothly.
> >>
> >> SEC("raw_tp/sched_switch")
> >> int BPF_PROG(on_switch)
> >> {
> >> struct bpf_iter_num it;
> >> int *v;
> >> bpf_iter_num_new(&it, 0, 100000);
> >> while ((v = bpf_iter_num_next(&it))) {
> >> struct bpf_iter_num it2;
> >> bpf_iter_num_new(&it2, 0, 100000);
> >> while ((v = bpf_iter_num_next(&it2))) {
> >> bpf_printk("BPF Bomb\n");
> >> }
> >> bpf_iter_num_destroy(&it2);
> >> }
> >> bpf_iter_num_destroy(&it);
> >> return 0;
> >> }
> >>
> >> This BPF program runs a huge loop at each schedule.
> >>
> >> bpf_iter_num_new is a common iterator that we can use in almost any
> >> context, including LSM, sched-ext, tracing, etc.
> >>
> >> We can run large, long loops on any critical code path and freeze the
> >> system, since the BPF verifier has no way of knowing how long the
> >> iteration will run.
> >
> > This is completely orthogonal to the issue that Christian explained.
>
> Thanks for your reply!
>
> Completely orthogonal? Sorry, I may have some misunderstandings.
...
> program runs a huge loop at each schedule
You've discovered bpf iterators and said, rephrasing,
"loops can take a long time" and concluded with:
"This is an inherent flaw in BPF".
This kind of rhetoric is not helpful.
People that wanted to abuse bpf powers could have done it 10 years
ago without iterators, loops, etc.
One could create a hash map and populate it with collisions
and long per bucket link lists. Though we have random seed with enough
persistence hashtab becomes slow.
Then just do bpf_map_lookup_elem() from the prog.
This was a known issue that is gradually being fixed.
> Could you please share a link to the patch? I am curious how we can
> fix this.
There is no "fix" for the iterator. There is no single patch either.
The issues were discussed over many _years_ in LPC and LSFMM.
Exception logic was a step to fixing it.
Now we will do "exceptions part 2" or will rip out exceptions completely
and go with "fast execute" approach.
When either approach works we can add a watchdog (and other mechanisms)
to cancel program execution.
Unlike user space there is no easy way to sigkill bpf prog.
We have to free up all resources cleanly.
> Yes, I am willing to help, so I included a "Possible improvements"
> section.
With rants like "inherent flaw in BPF" it's hard to take
your offer of help seriously.
> I am also working on another patch about filters that we discussed
> earlier, although it still needs some time.
Pls focus on landing that first.