Longstanding bug in tty_write/swap/nfs_readpage interaction

Wolfram Gloger (Wolfram.Gloger@dent.med.uni-muenchen.de)
Fri, 13 Dec 1996 12:41:50 +0100


Hi,

Ever since the asynchronous NFS client code was installed (sometime in
the 1.3.x series), I had modem getty processes (whose executables lie
on an NFS-mounted system) mysteriously hang in the disk wait state.
Now, I finally tracked down the problem. write_chan() from the serial
driver code does:

...
add_wait_queue(&tty->write_wait, &wait);
while (1) {
current->state = TASK_INTERRUPTIBLE;
...
if(...)
else
c = tty->driver.write(tty, 1, b, nr);
...
schedule();
}
current->state = TASK_RUNNING;
remove_wait_queue(&tty->write_wait, &wait);
...

But when tty->driver.write() is rs_write(), it does a memcpy_fromfs()
(or whatever it is now called in 2.1; the problem probably remains the
same) which may have to swap-in a page via nfs_readpage(), which only
works when current->state is TASK_RUNNING (the asynchronous NFS code
calls schedule()).

I've discussed this with Olaf Kirch, and he thinks that current->state
should possibly be only set _after_ calling the tty->driver.write()
handler, so as to maintain the invariant that every task can call
schedule() without looking at current->state first.

But if the serial code isn't the only place where this situation can
occur, perhaps the NFS code needs to be changed.

Regards,
Wolfram.

-- 
`Surf the sea, not double-u three...'
Wolfram.Gloger@dent.med.uni-muenchen.de, Gloger@lrz.uni-muenchen.de