Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

From: Eric W. Biederman
Date: Mon Oct 03 2022 - 13:08:11 EST

David Laight <David.Laight@xxxxxxxxxx> writes:

> From: Eric W. Biederman
>> Sent: 30 September 2022 17:17
>> David Laight <David.Laight@xxxxxxxxxx> writes:
>> > From: Eric W. Biederman
>> >> Sent: 29 September 2022 23:48
>> >>
>> >> Since common apparmor policies don't allow access /proc/tgid/task/tid/net
>> >> point the code at /proc/tid/net instead.
>> >>
>> >> Link:
>> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> >> ---
>> >>
>> >> I have only compile tested this. All of the boiler plate is a copy of
>> >> /proc/self and /proc/thread-self, so it should work.
>> >>
>> >> Can David or someone who cares and has access to the limited apparmor
>> >> configurations could test this to make certain this works?
>> >
>> > It works with a minor 'cut & paste' fixup.
>> > (Not nested inside a program that changes namespaces.)
>> Were there any apparmor problems? I just want to confirm that is what
>> you tested.
> I know nothing about apparmor - I just tested that /proc/net
> pointed to somewhere that looked right.

Fair enough. We should attempt to verify with an apparmor configuration
before merging this just in case there is a detail someone overlooked.
It doesn't help much if there is a fix that has to be reverted right

>> Assuming not this patch looks like it reveals a solution to this
>> issue.
>> > Although if it is reasonable for /proc/net -> /proc/tid/net
>> > why not just make /proc/thread-self -> /proc/tid
>> > Then /proc/net can just be thread-self/net
>> There are minor differences between the process directories that
>> tend to report process wide information and task directories that
>> only report some of the same information per-task. So in general
>> thread-self makes much more sense pointing to a per-task directory.
>> The hidden /proc/tid/ directories use the per process code to generate
>> themselves. The difference is that they assume the tid is the leading
>> thread instead of the other process. Those directories are all a bit of
>> a scrambled mess. I was suspecting the other day we might be able to
>> fix gdb and make them go away entirely in a decade or so.
>> So I don't think it makes sense in general to point /proc/thread-self at
>> the hidden per /proc/tid/ directories.
> Ok - I hadn't actually looked in them.
> But if you have a long-term plan to remove them directing /proc/net
> thought them might not be such a good idea.

Nah. I just want to grouse about them and encourage people not to
use them in general. They are a weird special case. They aren't
painful enough to maintain to make me want to do something else.

It would actually be less work to fix the apparmor security polices,
and the to verify over the course of a several years that the broken
security policies are no longer shipped.

>> > I have wondered if the namespace lookup could be done as a 'special'
>> > directory lookup for "net" rather that changing everything when the
>> > namespace is changed.
>> > I can imagine scenarios where a thread needs to keep changing
>> > between two namespaces, at the moment I suspect that is rather
>> > more expensive than a lookup and changing the reference counts.
>> You can always open the net directories once, and then change as
>> an open directory will not change between namespaces.
> Part of the problem is that changing the net namespace isn't
> enough, you also have to remount /sys - which isn't entirely
> trivial.

Yes. That is actually a much more maintainable model. But it is still
imperfect. I was thinking about the proc/net directories when
I made my comment. Unlike proc where we have task ids there is nothing
in /proc that can do anything.

> It might be possibly to mount a network namespace version
> of /sys on a different mountpoint - I've not tried very
> hard to do that.

It is a bug if that doesn't work.

>> > Notwithstanding the apparmor issues, /proc/net could actuall be
>> > a symlink to (say) /proc/net_namespaces/namespace_name with
>> > readlink returning the name based on the threads actual namespace.
>> There really aren't good names for namespaces at the kernel level. As
>> one of their use cases is to make process migration possible between
>> machines. So any kernel level name would need to be migrated as well.
>> So those kernel level names would need a name in another namespace,
>> or an extra namespace would have to be created for those names.
> Network namespaces do seem to have names.
> Although I gave up working out how to change to a named network
> namespace from within the kernel (especially in a non-GPL module).

Network namespaces have mount points. The mount points have names.

It is just a matter of finding the right filesystem and calling

There are a some network namespace local names for other network
namespaces. For those I don't see how it would make any sense
to change the name. If you need to you can always create a
new network namespace and ensure you get the name you want there.
Which is good enough for process migration. I don't know why else
anyone would want to change names.

> ...
>> > FWIW I'm pretty sure there a sequence involving unshare() that
>> > can get you out of a chroot - but I've not found it yet.
>> Out of a chroot is essentially just:
>> chdir("/");
>> chroot("/somedir");
>> chdir("../../../../../../../../../../../../../../../..");
> A chdir() inside a chroot anchors at the base of the chroot.
But the check is very simple.
If (working_directory == root_directory) make chdir("...") a noop.

Once the working directory is below the root directory (as
chroot("/somedir") achieves the chroot checks are no longer usable.

> fchdir() will get you out if you have an open fd to a directory
> outside the chroot.
> The 'usual' way out requires a process outside the chroot to
> just use mvdir().
> But there isn't supposed to be a way to get out.

As I recall the history chroot was a quick hack to allow building a
building against a different version of the binaries than were currently
installed. It was not built as a security feature.