Re: [PATCH v1 2/2] tests/pid_namespace: add pid_max tests

From: Tycho Andersen
Date: Mon Feb 26 2024 - 10:31:36 EST


On Mon, Feb 26, 2024 at 09:57:47AM +0100, Christian Brauner wrote:
> > > > A small quibble, but I wonder about the semantics here. "You can write
> > > > whatever you want to this file, but we'll ignore it sometimes" seems
> > > > weird to me. What if someone (CRIU) wants to spawn a pid numbered 450
> > > > in this case? I suppose they read pid_max first, they'll be able to
> > > > tell it's impossible and can exit(1), but returning E2BIG from write()
> > > > might be more useful.
> > >
> > > That's a good idea. But it's a bit tricky. The straightforward thing is
> > > to walk upwards through all ancestor pid namespaces and use the lowest
> > > pid_max value as the upper bound for the current pid namespace. This
> > > will guarantee that you get an error when you try to write a value that
> > > you would't be able to create. The same logic should probably apply to
> > > ns_last_pid as well.
> > >
> > > However, that still leaves cases where the current pid namespace writes
> > > a pid_max limit that is allowed (IOW, all ancestor pid namespaces are
> > > above that limit.). But then immediately afterwards an ancestor pid
> > > namespace lowers the pid_max limit. So you can always end up in a
> > > scenario like this.
> >
> > I wonder if we can push edits down too? Or an render .effective file, like
>
> I don't think that works in the current design? The pid_max value is per
> struct pid_namespace. And while there is a 1:1 relationship between a
> child pid namespace to all of its ancestor pid namespaces there's a 1 to
> many relationship between a pid namespace and it's child pid namespaces.
> IOW, if you change pid_max in pidns_level_1 then you'd have to go
> through each of the child pid namespaces on pidns_level_2 which could be
> thousands. So you could only do this lazily. IOW, compare and possibly
> update the pid_max value of the child pid namespace everytime it's read
> or written. Maybe that .effective is the way to go; not sure right now.

I wonder then, does it make sense to implement this as a cgroup thing
instead, which is used to doing this kind of traversal?

Or I suppose not, since the idea is to get legacy software that's
writing to pid_max to work?

Tycho