Re: [RFC patch 14/19] bpf: Use migrate_disable() in hashtab code

From: Alexei Starovoitov
Date: Wed Feb 19 2020 - 23:20:00 EST


On Wed, Feb 19, 2020 at 10:17:28AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2020, at 6:36 PM, Alexei Starovoitov alexei.starovoitov@xxxxxxxxx wrote:
>
> [...]
>
> > If I can use migrate_disable() without RT it will help my work on sleepable
> > BPF programs. I would only have to worry about rcu_read_lock() since
> > preempt_disable() is nicely addressed.
>
> Hi Alexei,
>
> You may want to consider using SRCU rather than RCU if you need to sleep while
> holding a RCU read-side lock.
>
> This is the synchronization approach I consider for adding the ability to take page
> faults when doing syscall tracing.
>
> Then you'll be able to replace preempt_disable() by combining SRCU and
> migrate_disable():
>
> AFAIU eBPF currently uses preempt_disable() for two reasons:
>
> - Ensure the thread is not migrated,
> -> can be replaced by migrate_disable() in RT
> - Provide RCU existence guarantee through sched-RCU
> -> can be replaced by SRCU, which allows sleeping and taking page faults.

bpf is using normal rcu to protect map values
and rcu+preempt to protect per-cpu map values.
srcu is certainly under consideration. It hasn't been used due to performance
implications. atomics and barriers are too heavy for certain use cases. So we
have to keep rcu where performance matters, but cannot fork map implementations
to rcu and srcu due to huge code bloat. So far I've been thinking to introduce
explicit helper bpf_rcu_read_lock() and let programs use it directly instead of
implicit rcu_read_lock() that is done outside of bpf prog. The tricky part is
teaching verifier to enforce critical section.