Re: [RFC] Add BPF_SYNCHRONIZE bpf(2) command

From: Joel Fernandes
Date: Sat Jul 07 2018 - 16:33:51 EST


On Fri, Jul 06, 2018 at 07:54:28PM -0700, Alexei Starovoitov wrote:
> On Fri, Jul 06, 2018 at 06:56:16PM -0700, Daniel Colascione wrote:
> > BPF_SYNCHRONIZE waits for any BPF programs active at the time of
> > BPF_SYNCHRONIZE to complete, allowing userspace to ensure atomicity of
> > RCU data structure operations with respect to active programs. For
> > example, userspace can update a map->map entry to point to a new map,
> > use BPF_SYNCHRONIZE to wait for any BPF programs using the old map to
> > complete, and then drain the old map without fear that BPF programs
> > may still be updating it.
> >
> > Signed-off-by: Daniel Colascione <dancol@xxxxxxxxxx>
> > ---
> > include/uapi/linux/bpf.h | 1 +
> > kernel/bpf/syscall.c | 14 ++++++++++++++
> > 2 files changed, 15 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index b7db3261c62d..4365c50e8055 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -98,6 +98,7 @@ enum bpf_cmd {
> > BPF_BTF_LOAD,
> > BPF_BTF_GET_FD_BY_ID,
> > BPF_TASK_FD_QUERY,
> > + BPF_SYNCHRONIZE,
> > };
> >
> > enum bpf_map_type {
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index d10ecd78105f..60ec7811846e 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2272,6 +2272,20 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
> > if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> > return -EPERM;
> >
> > + if (cmd == BPF_SYNCHRONIZE) {
> > + if (uattr != NULL || size != 0)
> > + return -EINVAL;
> > + err = security_bpf(cmd, NULL, 0);
> > + if (err < 0)
> > + return err;
> > + /* BPF programs are run with preempt disabled, so
> > + * synchronize_sched is sufficient even with
> > + * RCU_PREEMPT.
> > + */
> > + synchronize_sched();
> > + return 0;
>
> I don't think it's necessary. sys_membarrier() can do this already
> and some folks use it exactly for this use case.

Alexei, the use of sys_membarrier for this purpose seems kind of weird to me
though. No where does the manpage say membarrier should be implemented this
way so what happens if the implementation changes?

Further, membarrier manpage says that a memory barrier should be matched with
a matching barrier. In this use case there is no matching barrier, so it
makes it weirder.

Lastly, sys_membarrier seems will not work on nohz-full systems, so its a bit
fragile to depend on it for this?

case MEMBARRIER_CMD_GLOBAL:
/* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
if (tick_nohz_full_enabled())
return -EINVAL;
if (num_online_cpus() > 1)
synchronize_sched();
return 0;


Adding Mathieu as well who I believe is author/maintainer of membarrier.

thanks!

- Joel