Re: [patch] threading fix, tid-2.5.47-A3

From: Ulrich Drepper (drepper@redhat.com)
Date: Sun Nov 17 2002 - 22:33:39 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:
> On Sun, 17 Nov 2002, Ulrich Drepper wrote:

>> sys_clone(CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0,
>> NULL, NULL, &THREAD_SELF->tid)
>>
>>I.e., the information is stored only in the child.
>
>
> And the point of this is? The child has to re-initialize it's pthread
> structures anyway after a fork,

No, that's the whole point. The thread descriptor for the one thread in
the new process is fine with the one exception: the TID. There is not
one change to the thread descriptor in the fork code left if this change
is available.

> since the child certainly doesn't have the
> threads that the fork()ing parent had.

No, not all the threadsm only the one is available. But all the memory
is still available and deallocating it requires only two list
operations. Again, all thread descriptors are just fine.

> The thing is, I don't see the _point_. I refuse to add magic flags that
> are just some obscure implementation issue inside of glibc. A flag should
> have a _meaning_, and I seriously dislike the "meaning" behind
> CLONE_CHILD_SETTID. I dislike in particular its asynchronous behaviour,
> which is visible in the parent if CLONE_VM isn't set.

The parent isn't affected at all if CLONE_VM and CLONE_PARENT_SETTID are
not set.

> Actually, the above interface is most easily done by just havign
> CLONE_SETTID take effect _before_ we start copying the VM space. Which may
> well make sense (and avoids any extra page dirtying and COW breakage, as
> well as all asynchronous behaviour).
>
> So moving the CLONE_SETTID check to _before_ copy_mm() will give you
> exactly the semantics you want. I wouldn't have any issues with that
> approach (it's certainly synchronous, and in fact it has to be done there
> if we want to use this for non-CLONE_VM anyway).

But this is wrong. This would make it impossible to use CLONE_SETTID in
the fork() replacement and it would require the extra set_tid_address
call in the child.

The memory the clone() call used to implement fork() uses points to the
very same memory location which the currently running thread uses. It's
just in the other VM and it is therefore absolutely necessary that the
pid_t gets written into memory after the VM got cloned.

To be clear(er):

  each thread descriptor contains a field tid

  in the fork() implementation clone gets called with a parameter
pointing to the tid field of the parent's thread descriptor.

  the parent's tid field must not be modified

  the child must have the correct value from the very beginning

  after the child starts the only things it does are these:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      if (__fork_generation_pointer != NULL)
        *__fork_generation_pointer += 4;

      /* Reset the file list. These are recursive mutexes. */
      fresetlockfiles ();

      /* We execute this even if the 'fork' call failed. */
      _IO_list_resetlock ();

      /* Run the handlers registered for the child. */
      list_for_each (runp, &__fork_child_list)
        {
          struct fork_handler *curp;

          curp = list_entry (runp, struct fork_handler, list);

          curp->handler ();
        }

      /* Initialize the fork lock. */
      __fork_lock = (lll_lock_t) LLL_LOCK_INITIALIZER;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  i.e., it's bumping a magic generation counter needed to implement
pthread_once

  it re-initialized all the FILE handles which could be locked

  it executes the user-registered handlers (one of the is the cleanup
handling of the memory of all the other threads)

  finally it re-initializes the fork mutex

That's all! Without the CLONE_CHILD_SETTID flag I'd have to add a call
to set_tid_address at the very beginning of the block. This would
basically work but it means there is a time when the TID field is not
set correctly and in this time a signal can arrive or an external
program can access the process. Especially the first can only be
handled by never trusting the TID field and always making gettid() syscalls.

I know that the CLONE_CHILD_SETTID interface is a bit awkward but I
cannot see a cleaner way (misusing CLONE_VM being even worse). Because
clone() can create a new VM or not and since all four possible
combinations of writing/not writing are useful I think the two flags are
an OK solution.
                                CLONE_PARENT_SETTID
                            no yes

                      no normal fork() new thread with CLONE_VM
  CLONE_CHILD_SETTID
                      yes fork() in MT app cfork()

And again: not having the flag means making an additional syscall and
for a brief period the thread descriptor of the one thread in the new
process wouldn't be initialized.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE92F+T2ijCOnn/RHQRArOfAJwOco6h27qkoPxB8W6NSNLMYs4FhwCfVchk
b5+5W2/3/eitAEYpRwLc9Ok=
=Iqpl
-----END PGP SIGNATURE-----

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Nov 23 2002 - 22:00:20 EST