Re: [RFC PATCH 00/30] Kernel NET policy

From: Hannes Frederic Sowa
Date: Mon Jul 18 2016 - 15:04:43 EST

On 18.07.2016 17:45, Andi Kleen wrote:
>> It seems strange to me to add such policies to the kernel.
>> Addmittingly, documentation of some settings is non-existent and one needs
>> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> The problem is that different applications need different policies.

I fear that if those policies get changed in future, people will rely on
some of their side-effects, causing us to add more and more policies
which basically just differ in those side-effects.

If you compare your policies to madvise or fadvise options, they seem a
have a much more strict and narrower effects, which can be reasoned much
more easily about.

> The only entity which can efficiently negotiate between different
> applications' conflicting requests is the kernel. And that is pretty
> much the basic job description of a kernel: multiplex hardware
> efficiently between different users.

The multiplexing part seems to be not really relevant for the per-device
settings, thus being controllable from current user space just fine.
Per-task setting could be conflicting with per-socket settings which
could lead to non-deterministic behavior. Probably semantically it
should be made clear what overrides what here (here == cover letter).
Things like indeterminate allocation of sockets in a threaded
environment come to my mind. Also allocation strategy could very much
depend on the installed rss key.

> So yes the user space tuning approach works for simple cases
> ("only run workloads that require the same tuning"), but is ultimately not
> very interesting nor scalable.

I wonder if this can be attacked from a different angle. What would be
missing to add support for this in user space? The first possibility
that came to my mind is to just multiplex those hints in the kernel.
Implement a generic way to add metadata to sockets and allow tuning
daemons to retrieve them via sockdiag? I could imagine that if the
SO_INCOMING_CPU information would be visible in sockdiag, one could
already do more automatic tuning and basically allow to implement your
policy in user space.