Re: load avg += 1

Andi Kleen (ak@muc.de)
12 Oct 1999 14:53:55 +0200


[cc'ed Andreas because he noticed it earlier too]

ookhoi@dds.nl (Ookhoi) writes:

> Hi Steve,
>
> > > I had something similar. Every x hour a script tried to access a mounted
> > > dir from a server that had crashed. For some reason, the scripts didn't
> > > die, and the load went up with 1 every x hour. Just the load; the system
> > > was fine.
> >
> > What filesystem?
>
> NFS? The script runs (from cron) on a Linux 2.2.7SMP. The dir used by
> this script is mounted from a Solaris 5.7 machine (which crashed etc).
> I just killed the scripts that stayed alive, and the load went back to
> normal.
>
> I would be happy to provide more info if useful.

The NFS (or rather the sunrpc) code sometimes does uninterruptible sleeps
which count as one in the loadavg. This seems to only happen when NFS
drop the synchronous operation: when r/wsize is <PAGE_SIZE(4096) or when
an error occurs (? at least the comments say this, but the error handling
looks very suspicious).

At least this part of __rpc_execute looks very wrong:

sti();
__wait_event(task->tk_wait, RPC_IS_RUNNING(task));
cli();

/*
* When the task received a signal, remove from
* any queues etc, and make runnable again.
*/
if (signalled())
__rpc_wake_up(task);

__wait_event is an non interruptible sleep, that should be at least
__wait_event_interruptible at least for sync NFS, no? Otherwise checking
for signals doesn't make much sense.

-Andi

-- 
This is like TV. I don't like TV.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/