Re: + proc-fix-the-threaded-proc-self.patch added to -mm tree

From: Eric W. Biederman
Date: Thu Nov 29 2007 - 16:43:03 EST


"Albert Cahalan" <acahalan@xxxxxxxxx> writes:

> On Nov 28, 2007 6:31 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>> Ingo Molnar <mingo@xxxxxxx> writes:
>> > * Albert Cahalan <acahalan@xxxxxxxxx> wrote:
>> >> On Nov 27, 2007 7:49 PM, Guillaume Chazarain <guichaz@xxxxxxxx> wrote:
>
>> In a lot of ways if you access /proc/self and you get back information
>> that does not correspond to yourself the result is nonsense. Which
>> is a fairly mighty problem.
>
> In general, this is not a problem we have.

True in general we do not have a problem, in specific corner cases
we do have a problem, and being a system software developer corner
cases that do not work properly disturb me.




Linux does not have processes.

Linux has tasks
Linux has task groups.

Linux tasks when used in one particular way can fulfill the posix
requirements for single threaded processes.

Linux task groups when used in one particular way can fulfill the
posix requirements for processes.

Prior to 2.6 and good threading support task and process were
used interchangeably in linux.



Claiming something linux specific should be one way or another way
because the documentation said process is not an argument but rather
an indication that the documentation was ambiguous.



When the CLONE_THREAD support was added in 2.5 we left a lot of
corner cases ill-defined and not thought through and that continues
to give us trouble. In particular peoples intuition varies on what
the right set of semantics in a different situations are.



Regardless of the original intent the way the files in /proc are today
is that both /proc/<tgid>/ and /proc/<tgid>/task/<tid>/ contain all of
the same status files with the files under /proc/<tgid>/ containing
information summed across the entire thread group. To preserve
the pre thread group understanding of /proc this needs to continue
to be the case. Otherwise the /proc/<tgid>/ directory would quickly
become uninteresting to everything except tools like ps and top.


As for where /proc/self points given that procps seems to read
files like /proc/self/stat. It looks to me like we have a clear
case of a user space application that cares about the current
behavior and would break if we changed things.


> Note that it was intended that non-legacy additions
> would normally be added to either the process directory
> or the thread directory, not both. I think somebody may
> have ripped out the ability to do this; at the very least
> there have been numerous illogical additions.

The rationale was not conveyed and the policy you describe
seems like deprecating the /proc/<tgid> directory in favor
of the /proc/<tgid>/task/<pid>/. Which was a pattern
never established and it doesn't seem to make anything better
so I don't see the point there.


>> I'm still trying to understand which will break user space more,
>> adding /proc/task or changing /proc/self.
>
> Changing /proc/self makes you get per-thread data
> when you asked for per-process data. That's bad.

/proc/self used to ask for per task data. Which is why there
is some confusion.

>> >> This one is probably best:
>> >> /proc/task -> 123/task/456
>> >> (with both numbers showing)
>> >
>> > this sounds good to me. If it's a symlink then there's not much other
>> > choice because the thread PIDs do not even show up under /proc anymore.
>>
>> The name sounds good to me.

I will see about writing the patch for this in a bit and sending
it to Andrew.

>> I am not certain the two components make sense as we have a possible
>> permission problem where it is remotely possible that a task will
>> have permission to access /proc/<tid> but not /proc/<tgid>.
>
> If it hurts, don't do that. We allow foot shooting.

Well as I replied earlier I have a solution to this so it isn't
an issue.

However we don't go around pointing guns at peoples feet. If
there isn't a reason for problems we should not do it. Now
I do admit security modules in general and selinux in particular
seem to be foot shooting by definition but that is another issue
entirely. Waving too many problems away as foot shooting is a
reckless attitude can cause problems later.

> As I predicted, the container bloat would be a never-ending
> source of bugs. You're discovering bugs where there were none.

Nope. /proc/mounts was a symlink to /proc/self/mounts long before
/proc/self was modified to stop pointing at the task directory and
changed it point at the new task group directory.

Frankly from what I have seen of the code the task-group work
seems to be a larger source of bugs, and complications, because
people have a darn hard time wrapping their head around how it
is supposed to behave, and all of the corner cases were not
resolved at the time it was developed.

My favorite ongoing issue is what is needed to allow a threaded
init to actually function properly. I think enough fixes have
gone in that it might even work.

> You'll never run out of this sort of problem. Keeping Linux lean
> and simple would be far better.

Nah. The control group stuff has all kinds of corner cases because
it is a new and untested API. The namespace work after we get the
code cleanup up so it is maintainable and we can work with it
is usually just finding our globals through a pointer instead of
from a static variable. Hardly a measurable cost on the best day.

The rule is that the user space interfaces change as little as
possible. For /proc that could mean either we capture all of
the namespaces at mount time so things stay exactly the way
the are today, or that we find a way to show things that have
now become namespace dependent. Given that monitoring works
best if you can access the status files of other namespaces
it seems to make sense to follow the lead of /proc mounts.
Although I believe we should shortly reach the point where
/proc/.../mounts will have the same inode number if it is
the same mount namespace.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/