On 8/10/23 4:15 PM, Stanislav Fomichev wrote:
On 08/10, David Vernet wrote:
On Thu, Aug 10, 2023 at 03:46:18PM -0700, Stanislav Fomichev wrote:
On 08/10, David Vernet wrote:
Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also
define the .validate() and .update() callbacks in its corresponding
struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful
in its own right to ensure that the map is unloaded if an application
crashes. For example, with sched_ext, we want to automatically unload
the host-wide scheduler if the application crashes. We would likely
never support updating elements of a sched_ext struct_ops map, so we'd
have to implement these callbacks showing that they _can't_ support
element updates just to benefit from the basic lifetime management of
struct_ops links.
Let's enable struct_ops maps to work with BPF_F_LINK even if they
haven't defined these callbacks, by assuming that a struct_ops map
element cannot be updated by default.
Any reason this is not part of sched_ext series? As you mention,
we don't seem to have such users in the three?
Hi Stanislav,
The sched_ext series [0] implements these callbacks. See
bpf_scx_update() and bpf_scx_validate().
[0]: https://lore.kernel.org/all/20230711011412.100319-13-tj@xxxxxxxxxx/
We could add this into that series and remove those callbacks, but this
patch is fixing a UX / API issue with struct_ops links that's not really
relevant to sched_ext. I don't think there's any reason to couple
updating struct_ops map elements with allowing the kernel to manage the
lifetime of struct_ops maps -- just because we only have 1 (non-test)
Agree the link-update does not necessarily couple with link-creation, so removing 'link' update function enforcement is ok. The intention was to avoid the struct_ops link inconsistent experience (one struct_ops link support update and another struct_ops link does not) because consistency was one of the reason for the true kernel backed link support that Kui-Feng did. tcp-cc is the only one for now in struct_ops and it can support update, so the enforcement is here. I can see Stan's point that removing it now looks immature before a struct_ops landed in the kernel showing it does not make sense or very hard to support 'link' update. However, the scx patch set has shown this point, so I think it is good enough.
For 'validate', it is not related a 'link' update. It is for the struct_ops 'map' update. If the loaded struct_ops map is invalid, it will end up having a useless struct_ops map and no link can be created from it. I can see some struct_ops subsystem check all the 'ops' function for NULL before calling (like the FUSE RFC). I can also see some future struct_ops will prefer not to check NULL at all and prefer to assume a subset of the ops is always valid. Does having a 'validate' enforcement is blocking the scx patchset in some way? If not, I would like to keep this for now. Once it is removed, there is no turning back.
struct_ops implementation in-tree doesn't mean we shouldn't improve APIs
where it makes sense.
Thanks,
David
Ack. I guess up to you and Martin. Just trying to understand whether I'm
missing something or the patch does indeed fix some use-case :-)