Re: n_tty: Check the other end of pty pair before returning EAGAIN on a read()

From: Marc Aurele La France
Date: Fri Dec 11 2015 - 08:37:20 EST


On Thu, 10 Dec 2015, Peter Hurley wrote:
On 12/10/2015 02:48 PM, Marc Aurele La France wrote:
On Thu, 10 Dec 2015, Peter Hurley wrote:
On 12/09/2015 01:06 PM, Marc Aurele La France wrote:

After sshd has been SIGCHLD'ed about the shell's termination, it
continues to read the master pty until an error occurs. This error
will be EIO if no process has the slave pty open. Otherwise (for
example when the shell spawned long-running processes in the
background before terminating), that error is expected to be EAGAIN.
sshd cannot continue to read until an EIO in all cases, because doing
so causes the session to hang until all processes have closed the
slave pty, which is not the desired behaviour. Thus a spurious EAGAIN
return causes sshd to lose data, whether or not the slave pty is
completely closed.

Ah, the games userspace will be up to :)

Not really.

Definitely.

The idea that a read with O_NONBLOCK set should have synchronous behavior
is ridiculous.

The fact different OSes behave differently in this regard can
hardly be said to be userland's fault. The lower the number of distinct
behaviours userland needs to deal with, the better. Furthermore, sshd
"knows" there should be data there, so it makes no sense to befuddle it
with false EAGAIN returns.

But sshd doesn't "know". sshd "knows" the data has been sent and that's all.
sshd is extrapolating from one known condition to another unknown condition,
and assuming it "should" be that way because it has been.

For example, try the same idea with real ttys on loopback. Wouldn't work,
because it's asynchronous.

The only reason this needs fixing is because it's a userspace regression.

It's the kernel that introduced this regression, not OpenSSH.

I am not asking to read data before it has been produced. I am puzzled that despite knowing that the data exists, I can now be lied to when I try to retrieve it, when I wasn't before. We are talking about what is essentially a two-way pipe, not some network or serial connection with transmission delays userland has long experience in dealing with.

These previously internal additional delays, that are now exposed to userland, are simply an implementation detail that userland did not, and should not, need to worry about.

This is just one of those unfortunate situations where userspace has come
to rely on an unspecified behavior because it worked.

Whether the behaviour is specified or not is irrelevent. This simply means there is no standard to debunk the fact that the kernel's previous behaviour mimics that of other systems.

So, how am I supposed to avoid these spurious EAGAINs and finally be allowed to read the data I know exists? How long do I have to wait? Do I have to run a calibration loop to figure that out? Why should I need to do that only on Linux?

I don't know, but there's nonsense in here somewhere.

Marc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/