Hello,
On Tue, Oct 12, 2021 at 02:12:18PM +0530, Pratik Sampat wrote:
It's only unique in the context that you're trying to place CPU distributionI completely agree with you on this, fundamentally a namespace shouldThe control and the display interface is fairly disjoint with eachA task wouldn't really opt-in to cpu isolation with CLONE_NEWCPU it
other. Restrictions can be set through control interfaces like cgroups,
would only affect resource reporting. So it would be one half of the
semantics of a namespace.
isolate both the resource as well as the reporting. As you mentioned
too, cgroups handles the resource isolation while this namespace
handles the reporting and this seems to break the semantics of what a
namespace should really be.
The CPU resource is unique in that sense, at least in this context,
which makes it tricky to design a interface that presents coherent
information.
into the namespace framework when the resource in question isn't distributed
that way. All of the three major local resources - CPU, memory and IO - are
in the same boat. Computing resources, the physical ones, don't render
themselves naturally to accounting and ditributing by segmenting _name_
spaces which ultimately just shows and hides names. This direction is a
dead-end.
I too think that having a brand new interface all together and teachingWhile I'm sympathetic to compatibility argument, identifying available
userspace about it is much cleaner approach.
On the same lines, if were to do that, we could also add more useful
metrics in that interface like ballpark number of threads to saturate
usage as well as gather more such metrics as suggested by Tejun Heo.
My only concern for this would be that if today applications aren't
modifying their code to read the existing cgroup interface and would
rather resort to using userspace side-channel solutions like LXCFS or
wrapping them up in kata containers, would it now be compelling enough
to introduce yet another interface?
resources was never well-define with the existing interfaces. Most of the
available information is what hardware is available but there's no
consistent way of knowing what the software environment is like. Is the
application the only one on the system? How much memory should be set aside
for system management, monitoring and other administrative operations?
In practice, the numbers that are available can serve as the starting points
on top of which application and environment specific knoweldge has to be
applied to actually determine deployable configurations, which in turn would
go through iterative adjustments unless the workload is self-sizing.
Given such variability in requirements, I'm not sure what numbers should be
baked into the "namespaced" system metrics. Some numbers, e.g., number of
CPUs can may be mapped from cpuset configuration but even that requires
quite a bit of assumptions about how cpuset is configured and the
expectations the applications would have while other numbers - e.g.
available memory - is a total non-starter.
If we try to fake these numbers for containers, what's likely to happen is
that the service owners would end up tuning workload size against whatever
number the kernel is showing factoring in all the environmental factors
knowingly or just through iterations. And that's not *really* an interface
which provides compatibility. We're just piping new numbers which don't
really mean what they used to mean and whose meanings can change depending
on configuration through existing interfaces and letting users figure out
what to do with the new numbers.
To achieve compatibility where applications don't need to be changed, I
don't think there is a solution which doesn't involve going through
userspace. For other cases and long term, the right direction is providing
well-defined resource metrics that applications can make sense of and use to
size themselves dynamically.
While I concur with Tejun Heo's comment the mail thread and overloadingWell, it's incomplete even without containerization. Containerization just
existing interfaces of sys and proc which were originally designed for
system wide resources, may not be a great idea:
There is a fundamental problem with trying to represent a resource sharedA fundamental question we probably need to ascertain could be -
environment controlled with cgroup using system-wide interfaces including
procfs
Today, is it incorrect for applications to look at the sys and procfs to
get resource information, regardless of their runtime environment?
amplifies the shortcomings. All of these problems existed well before
cgroups / namespaces. How would you know how much resource you can consume
on a system just looking at hardware resources without implicit knowledge of
what else is on the system? It's just that we are now more likely to load
systems dynamically with containerization.
Also, if an application were to only be able to view the resourcesCan you elaborate further? I have a hard time understanding what's being
based on the restrictions set regardless of the interface - would there
be a disadvantage for them if they could only see an overloaded context
sensitive view rather than the whole system view?
asked.