Re: 2.2.16 crashes on ES40 (with spinlock messages...)

From: Andrew Pochinsky (avp@honti.mit.edu)
Date: Mon Jul 24 2000 - 15:40:30 EST


Hi,

Strange. I have run the code both from NFS mounted and ext2 partitions
and never saw such a message. I'm running RH6.2 with 2.2.16 kernel
(with knfsd.) The NFS disks are hard mounted and used to complain
about nfs server not responding, but since the switch upgrade that
message disappeared from the syslog. What's your configuration?

--andrew

   Hi,

   on our ES40 2.2.14, NFS mounted directory I get in syslog:
   /.nfs0000000021364c6b0000000f, error=-13
   NFS: can't silly-delete mqueue/.nfs0000000021358c8700000015, error=-13
   NFS: can't silly-delete tmp/.nfs0000000021364c6f0000001a, error=-13
   NFS: can't silly-delete tmp/.nfs0000000021364c710000001c, error=-13
   NFS: can't silly-delete tmp/.nfs0000000021364c730000001e, error=-13
   NFS: can't silly-delete mqueue/.nfs0000000021358c8900000028, error=-13
   NFS: can't silly-delete tmp/.nfs0000000021364c700000001b, error=-13
   NFS: can't silly-delete tmp/.nfs0000000021364c720000001d, error=-13
   NFS: can't silly-delete tmp/.nfs0000000021364c740000001f, error=-13
   NFS: can't silly-delete mqueue/.nfs0000000021358c8c0000002b, error=-13

   The test4_out.1 file says:

   The e-mail address is frey
   The hostname of this machine is es0.scs.ch
   The architecture of this machine is ALPHA
   The process id is 873
   The executable is szin_w-11-2.ALPHA
   The reader is read-szin-11-15

   Time = map_shmem: error in shmget: Invalid argument
   test4_a : Error in szin_w-11-2.ALPHA : program crashed, status=1

   Exaclty the same (including the kernel messages) comes when
   running on a local SCSI disk with ext2 filesystem. What's
   wrong?

   Andrew Pochinsky wrote:
>
> Hi,
>
> Some time ago I posted a message to this list about misterious crashes
> on Alpha ES40. Peter Rival, Pat O'Rourke and Michal Jaegermann made
> some interesting suggestions for a possible cause. Unfortunately, I
> was not able to fix the problem. This time, however, our user built
> the code which reliably crashes the system after a few second run.
> Each crash is accompanied by a 'spinlock ... stuck' message repeated
> for every processor in the system. Once all the processors are stuck,
> the system goes catatonic. To check that the problem is not related to
> some flaky hardware, I rebooted the same kernel with nosmp flag. The
> problem is gone (of course, the machine now runs four times slower ;(
>
> --andrew
>
> P.S. The tarball of the executable could be found at
> <ftp://ftp.lns.mit.edu/pub/avp/smp-crash.tar.gz>. Simply start runme
> and wait.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

   --
   Supercomputing Systems AG email: frey@scs.ch
   Martin Frey www: http://www.scs.ch/~frey
   Technoparkstrasse 1 phone: +41 (0)1 445 16 00
   CH-8005 Zurich fax: +41 (0)1 445 16 10

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jul 31 2000 - 21:00:17 EST