Re: [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task

From: Roman Gushchin
Date: Tue Aug 15 2023 - 15:12:33 EST


On Thu, Aug 10, 2023 at 12:41:01PM -0700, Martin KaFai Lau wrote:
> > > > > First, I'm a bit concerned about implicit restrictions we apply to bpf programs
> > > > > which will be executed potentially thousands times under a very heavy memory
> > > > > pressure. We will need to make sure that they don't allocate (much) memory, don't
> > > > > take any locks which might deadlock with other memory allocations etc.
> > > > > It will potentially require hard restrictions on what these programs can and can't
> > > > > do and this is something that the bpf community will have to maintain long-term.
> > > >
> > > > Right, BPF callbacks operating under OOM situations will be really
> > > > constrained but this is more or less by definition. Isn't it?
> > >
> > > What do you mean?
> >
> > Callbacks cannot depend on any direct or indirect memory allocations.
> > Dependencies on any sleeping locks (again directly or indirectly) is not
> > allowed just to name the most important ones.
> >
> > > In general, the bpf community is trying to make it as generic as possible and
> > > adding new and new features. Bpf programs are not as constrained as they were
> > > when it's all started.
>
> bpf supports different running context. For example, only non-sleepable bpf
> prog is allowed to run at the NIC driver. A sleepable bpf prog is only
> allowed to run at some bpf_lsm hooks that is known to be safe to call
> blocking bpf-helper/kfunc. From the bpf side, it ensures a non-sleepable bpf
> prog cannot do things that may block.

Yeah, you're right: non-sleepable bpf should be ok here.

>
> fwiw, Dave has recently proposed something for iterating the task vma
> (https://lore.kernel.org/bpf/20230810183513.684836-4-davemarchevsky@xxxxxx/).
> Potentially, a similar iterator can be created for a bpf program to iterate
> cgroups and tasks.

Yes, it looks like a much better approach rather than adding a hook into
the existing iteration over all tasks.

Thanks!