Re: Killing clones (threads)

Peeter Joot (peeter@accessv.com)
Fri, 15 Aug 1997 02:08:18 -0400


Please forgive me for this long note, but I wanted to make some comments
on this mail thread in a non-piecemeal fashion.

Linus Torvalds wrote:

> Encoding the thread ID in the high bits was one of the ideas from the
> very
> beginning. That's what CLONE_PID is there for: the _intent_ was that
> CLONE_PID would change only the high bits, and then you could do a
> global
> kill (anything with the high bits zero would send a signal to _all_
> threads that shared the same low bits).
>
> Again, this is still an interesting approach. I'd like to see it done
> some
> day. The fact that the /proc fs makes it a bit harder is a misfeature,
> but
> that's actually due to bad /proc design (which used to make a lot of
> sense
> back when inode numbers was all we had, but we could do better these
> days).
>
> My personal favourite for /proc would be that any CLONE_PID threads
> would
> show up _inside_ the original parent (that's kind of the basic idea
> with
> CLONE_PID). So you'd have
>
> /proc/155/ "original" process (ie something that
> was
> created without the CLONE_PID bit)
> /proc/155/1 "1st CLONE_PID child"
> /proc/155/2 "2nd CLONE_PID child"
>

This was roughly what I had in mind, with :

/proc/155/ "stuff that is common to all threads"
/proc/155/x "stuff that is specific to thread x"
...

However, Linus' method would be a lot simpler to implement, and could be
an intermediate stage if it is desirable to make the thread specific
information distinct.

Another suggestion for /proc was not to do anything special, and let the
userspace programs figure out the where the proc entry for the thread is
based on the encoded pid/tid. This doesn't appeal to me, as it seems to
expose kernel information that most userspace applications shouldn't
need or care about, and makes the possible future changes to the kernel
pid/tid encoding difficult.

> CLONE_PID is that. It currently has a very limited use: the kernel
> uses it
> to allocate the SMP idle processes for each task, and those all have
> to
> have pid 0 (also high bits). But my real intent was to have something
> like
> this:
>
> if (flags & CLONE_PID) {
> newpid = current->pid;
> /* zero is special - the idle process */
> if (newpid) {
> create linked list of processes
> sharing the same low 16 bits,
> make "newpid" be the largest to
> date plus 0x10000 (ie "increment" the
> high 16 bit counter on a per-PID basis)
> }
> } else
> newpid = traditional_newpid();
>

As it stands now we don't neccessarily need to add a linked list of the
processes sharing the same low 16 bits (pid) if we use the pid hash
table and hash all the threads with the common pids to the same hash
entry. This is what I have done at this point. To iterate over the
threads we just need to get the start of the hash list for the pid, and
skip over any task_struct entries that have a different pid.

There have been some comments about what sort of encoding method for the
pid/tid's should be used or not used.

* One objection with putting a tid in the upper bits of the pid was that
some experimental code is already using the upper 8 bits. A possible
work around for this is restricting the number of threads per pid to 256
(8 bits of the pid bit space). This is not actually too severe a
restriction as NR_TASKS is currently only 512 at this point anyhow, but
it limits future expansion.
* If the tid is to be encoded in the pid bits then there was some
question of whether to put the tid in the upper or lower parts of the
pid bits. I am not too clear about what the advantages of putting the
pid in the upper bits would be. I can see that this would place all the
threads of a single "process" within a partition of the pid space, and
could possibly make the pid to /proc inode translation tidier. However
this means that all the system calls that accept or return pids would
have to shift the bits around. Are there other advantages that I am
missing?
* Also suggested is having separate pid and tid entries in task_struct.
The simplicity of this solution makes me think that it is perhaps the
best way to go. If we go with this method we eliminate pid range
restrictions that the tid/pid encoding imply, and we eliminate the need
to do a lot of pid/tid bit masking in a lot of places. However the
beauty of using the encoded tid/pid's was that a lot of system calls
(kill, wait4, *sched_* ...) would have worked with little or no changes
on both processes and threads, just by passing in an encoded tid/pid
(operate on a thread), or just a pid (operate on just all threads of a
process). This is still doable, but perhaps would take more work.

Another point of relevance at this point is whether we really want to
have system calls accept encoded tid/pids. There was some discussion as
to whether this kind of behavior would violate the posix requirements
for things like kill. Although I don't have the specs my though was
that it wouldn't, as kill is probably defined to operate on pids, and
that we would probably be free to do what we want with it if we pass it
something that is not a pid, but perhaps this is not true. Anyhow, if
we wanted to at some point, go to a bigger pid space then we probably
don't want to encode the tid in the pid bits as we would start break the
whole thread interface as soon as we started using, say, 24 or 32 bits
for the pid space.

What we may want to do is to add additional system calls that accept an
additional tid parameter in addition to existing pid parameters and have
the existing versions just call the new system calls with a tid of 0
(all threads), or 1 (main thread), as appropriate.

Andries.Brouwer@cwi.nl wrote:

> I have one comment: Several Linux applications have security holes
> caused by the fact that a pid has only 16 bits (so that one can
> trick some suid process into sending a signal to some other privileged
>
> process by spawning 32000 processes).
> Because of such and other reasons it might be a good idea to move to
> a 32-bit pid in the future, so that pid's will be unique for a
> process.
>
> This again might mean that the outside world must know as little
> as possible about the structure of a tid.

I don't see how this security hole could exist since NR_TASKS is limited
by default to 512, but I agree that userspace, with perhaps the
exception of specialized things like libpthreads, should probably know
as little about kernel structures as possible.

I started off just doing some coding changes, without thinking that
there would be too many "design" issues to deal with. But I guess that
there are some things to resolve before I continue blindly coding along.

Peeter

p.s.:

In order to make things like setpgid() simpler, it seemed to me that a
couple other things should be split out of task_struct into a common
structure for all threads (ie: when CLONE_PID is used ) and I have done
it as follows:

struct process_struct {
/* int last_tid, next_safe; */ /* maybe need this ? */
int pgrp;
int tty_old_pgrp;
int session;
/* boolean value for session group leader */
int leader;
int ngroups;
gid_t groups[NGROUPS];
unsigned short uid,euid,suid,fsuid;
unsigned short gid,egid,sgid,fsgid;
};

and have added a 'struct process_struct * ps' to task_struct.

Does anybody want to comment on this. It seems right to me but perhaps
there other task_struct fields that should or should not be in this
structure that I am not thinking of. I also noticed that not all the
task_struct sub-structures include a locking mechanism (files_struct and
fs_struct). What locking mechanism is used for these?

Peeter