Re: SMP-ix86-threads-fork: Linux 2.4.x kernel problem identified [phantom read()]

From: Andreas Jaeger (aj@suse.de)
Date: Tue Sep 11 2001 - 02:31:56 EST


Wolfram Gloger <wg@malloc.de> writes:

> Hi,
>
>> So it's perfectly possible to observe the behavior you are seeing
>> if __libc_read() returns -1. Because then sz_read will acquire the
>> value -1, and the guard expresssion sz_read < sizeof(request) will yield
>> zero, terminating the loop.
>
> Argh! That indeed seems to be the case, _no_ apparent kernel problem
> is involved, sorry for the false alarm. read() 'just' returns -1 with
> errno becoming EINTR. One should never code up a loop like that in
> the last minute (I only found the effect yesterday)..
>
>> Could you recode the test patch to eliminate these suspicions and re-test?
>
> The analysis was nevertheless correct, LinuxThreads gets out of synch
> with disastrous effects. I was puzzled how read() could return -1,
> however it may make sense with a __pthread_sig_cancel pending in the
> manager thread due to a child exiting (?). But then, why wasn't this
> possibility discovered before?

Because we never compiled with -DDEBUG so that the ASSERT was
triggered?

There's another such place (line 135) with __libc_read in
__pthread_manager:

  /* Synchronize debugging of the thread manager */
  n = __libc_read(reqfd, (char *)&request, sizeof(request));
  ASSERT(n == sizeof(request) && request.req_kind == REQ_DEBUG);

We should account for a return value of -1 here also, shouldn't we?

> Below is a corrected patch for glibc-2.2.4. I've run fork-malloc with
> this for a couple of hours.

Is it still running? ;-) That would be excellent!

Thanks a lot Wolfram,
Andreas

> Thanks,
> Wolfram.
>
> 2001-09-11 Wolfram Gloger <wg@malloc.de>
>
> * manager.c (__pthread_manager): When reading from pipe. account
> for possible error return from read().
>
> --- linuxthreads/manager.c.orig Mon Jul 23 19:54:13 2001
> +++ linuxthreads/manager.c Tue Sep 11 00:47:48 2001
> @@ -150,8 +150,19 @@
> }
> /* Read and execute request */
> if (n == 1 && (ufd.revents & POLLIN)) {
> - n = __libc_read(reqfd, (char *)&request, sizeof(request));
> - ASSERT(n == sizeof(request));
> + int sz_read = 0;
> +
> + while (sz_read < sizeof(request)) {
> + n = __libc_read(reqfd, (char *)&request + sz_read,
> + sizeof(request) - sz_read);
> + if (n < 0) {
> +#ifdef DEBUG
> + char d[64];
> + sprintf(d, "*** read err %d\n", errno);
> +#endif
> + } else
> + sz_read += n;
> + }
> switch(request.req_kind) {
> case REQ_CREATE:
> request.req_thread->p_retcode =
>

-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 15 2001 - 21:00:31 EST