Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command

From: Alexei Starovoitov
Date: Fri Aug 10 2018 - 18:52:53 EST


On Tue, Jul 31, 2018 at 02:36:39AM -0700, Daniel Colascione wrote:
>
> > An API command name
> > such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and
> > exposes specific map details (here: map-in-map) into the UAPI whereas it
> > should reside within a specific implementation instead similar to other ops
> > we have for maps.
>
> But synchronize isn't conceptually a command that applies to a
> specific map. It waits on all references. Did you address my point
> about your proposed map-specific interface requiring redundant
> synchronize_rcu calls in the case where we swap multiple maps and want
> to wait for all the references to drain? Under my proposal, you'd just
> BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your
> proposal, we'd make it a per-map operation, so we'd issue one
> synchronize_rcu per map.

optimizing for multi-map sync sounds like premature optimization.
In general I'd prefer DanielB proposal to make the sync logic map and fd specific,
but before we argue about implementation further let's agree on
the problem we're trying to solve.
I believe the only issue being discussed is user space doesn't know
when it's ok to start draining the inner map when it was replaced
by bpf_map_update syscall command with another map, right?
If we agree on that, should bpf_map_update handle it then?
Wouldn't it be much easier to understand and use from user pov?
No new commands to learn. map_update syscall replaced the map
and old map is no longer accessed by the program via this given map-in-map.
But if the replaced map is used directly or it sits in some other
map-in-map slot the progs can still access it.

My issue with DanielC SYNC cmd that it exposes implementation details
and introduces complex 'synchronization' semantics. To majority of
the users it won't be obvious what is being synchronized.

My issue with DanielB WAIT_REF map_fd cmd that it needs to wait for all refs
to this map to be dropped. I think combination of usercnt and refcnt
can answer that, but feels dangerous to sleep potentially forever
in a syscall until all prog->map references are gone, though such
cmd is useful beyond map-in-map use case.