Re: Hang in wait_on_inode with SMP 2.1.87

Ard van Breemen (ard@cstmel.hobby.nl)
Sat, 21 Feb 1998 19:23:56 +0100 (MET)


On Sat, 21 Feb 1998, Bill Hawes wrote:
> Carsten Gross wrote:
> > Today, I started a 'df' and the 'df' hung. I used ps -afxl to have a look on
> > the process state. And 'df' hung in "wait_on_inode" with status
> > 'uninterruptible sleep'. The 'bdflush' process was in the same state, no
> > write access to the disks were possible until reboot. (A sync hung too, also
> > in "wait_on_inode")
>
> > I cannot find a function actually removing the I_LOCK flag? Yes, sync_one
> > does, but checks for I_LOCK presence before. I think 'get_new_inode' is only
> > for new inodes? Would someone of the kernel experts have a look on this,
> > please? Thanks a lot for your work.
>
> Hi Carsten,
>
> I looked over the inode code and it appears to me that a wakeup is done whenever
> the I_LOCK flag is cleared. It might be possible for the inode i_state field to
> get trashed somehow, so it would be very helpful if you could determine which
> filesystem's inode is getting stuck. Did this problem just start recently?
>
> I've attached a patch with some debugging code that may help track things down.
> It enables a magic sysreq option to display a table of the inode state on
> alt-sysrq-i, so if you could get your system to hang again and then dispay the
> inode table, we may get some addditional clues.
Hmmm, I got these freezes regularly using ncpfs and mars_nwe on 2.0.29 and
on 2.0.30. Df does not suffer, but all ipx traffic to the novell server is
halted. Mars continues to work, but even init suffers. (Load rises above
30... No new gettys spawned... Lots of zombies...) The only thing that
helps me at that time is to stop mars. All ncpfs mounts then return an
error, and that's it. Just start mars again and unmount and mount, and it
works again... But all ncpfs mounts were stalled.
The only mounts that seem pretty stable/fault tolerant are pure nfsfs
mounts. nfs mounted with amd using an erronic ethernet driver results in
long stalls (> 2 hours), but using pure nfsfs it seems like nothing is
wrong.... Amd even suffers from server reboots, while nfsfs seems to
continue like nothing has happened (as it should)...
--
dec1:  6:54pm  up 5 days, 23:30,  6 users,  load average: 0.12, 0.13, 0.10

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu