Re: n_tty: Check the other end of pty pair before returning EAGAIN on a read()

From: Peter Hurley
Date: Thu Dec 10 2015 - 09:59:41 EST


On 12/09/2015 01:06 PM, Marc Aurele La France wrote:
> Greetings.
>
> The following four commits are some of the changes that have been made
> to the tty layer since kernel version 3.11:
>
> 1) f95499c3030fe1bfad57745f2db1959c5b43dca8
> n_tty: Don't wait for buffer work in read() loop
>
> 2) f8747d4a466ab2cafe56112c51b3379f9fdb7a12
> tty: Fix pty master read() after slave closes
>
> 3) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73
> pty, n_tty: Simplify input processing on final close
>
> 4) 1a48632ffed61352a7810ce089dc5a8bcd505a60
> pty: Fix input race when closing
>
> Commit "4)" corrected an issue whereby EIO could be prematurely
> returned on a read of one end of a master/slave pty pair after the
> other had been completely closed. Yet, I would argue that EAGAIN
> should not be returned either when there actually is data to be
> returned. This whether or not the other end has been completely
> closed.
>
> Indeed, the previous code (before commit "1)") checked the other end
> of the pty pair for any data before returning EAGAIN. This mimics the
> behaviour of other System-V variants (Solaris, AIX, etc.)
^^^^
What other SysV systems were tested?

> in this
> regard and ensured that EAGAIN really did mean no data was available
> at the time of the call.
>
> Portable OpenSSH, since release 4.6p1 in 2007, relies on this being
> the case and has been broken since commit "1)" introduced spurious
> EAGAIN returns (i.e. as of 3.12 kernels). The scenario at hand is
> as follows.
>
> After sshd has been SIGCHLD'ed about the shell's termination, it
> continues to read the master pty until an error occurs. This error
> will be EIO if no process has the slave pty open. Otherwise (for
> example when the shell spawned long-running processes in the
> background before terminating), that error is expected to be EAGAIN.
> sshd cannot continue to read until an EIO in all cases, because doing
> so causes the session to hang until all processes have closed the
> slave pty, which is not the desired behaviour. Thus a spurious EAGAIN
> return causes sshd to lose data, whether or not the slave pty is
> completely closed.

Ah, the games userspace will be up to :)


> I've been using the following script to reproduce the problem. It
> loops until the issue is detected.
>
> #! /bin/bash
>
> LOG=sshlog-`date "+%F.%T"`
>
> touch ${LOG}
>
> while test -z "`grep -n '^Connection' ${LOG} | grep -v '0:Connection'`"
> do
> ssh -p 22 -tt root@localhost \
> '/bin/bash -c "/bin/ping -c4 8.8.8.8"' 2>&1 | \
> tee -a ${LOG}
> done
>
> It should be noted that the problem is extremely rare, but still
> occurs, on real hardware. This bug is easier to replicate in a
> virtual machine such as those that can be created using Google Cloud.
>
> The patch below is a suggested fix. It was developed using a 4.3.0
> kernel and should apply, modulo fuzz, to any release >= 4.0.5. My
> suggested fix is modeled after commit "2)" mentionned above. Given
> commit "2)" was later reworked by commit "3)", I fully expect my fix
> to be reworked as well.
>
> I volunteer to backport the fix this ends up being to any stable
> release >= 3.12 deemed needed.
>
> Please Reply-To-All.
>
> Thanks and have a great day.
>
> Marc.
>
> Reported-by: Volth <openssh@xxxxxxxxx>
> BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52
> BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492
> Signed-off-by: Marc Aurele La France <tsi@xxxxxxxxxx>
>
> --- a/drivers/tty/n_tty.c
> +++ a/drivers/tty/n_tty.c
> @@ -2281,20 +2281,26 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
> if (!timeout)
> break;
> if (file->f_flags & O_NONBLOCK) {
> - retval = -EAGAIN;
> - break;
> - }
> - if (signal_pending(current)) {
> - retval = -ERESTARTSYS;
> - break;
> - }
> - up_read(&tty->termios_rwsem);
> + up_read(&tty->termios_rwsem);
> + flush_work(&tty->port->buf.work);
> + down_read(&tty->termios_rwsem);
> + if (!input_available_p(tty, 0)) {
> + retval = -EAGAIN;
> + break;
> + }
> + } else {
> + if (signal_pending(current)) {
> + retval = -ERESTARTSYS;
> + break;
> + }
> + up_read(&tty->termios_rwsem);

No sense in doing this just for O_NONBLOCK; might as well do it before
all the condition tests.

Which renders the earlier fixes for the slave end closing superfluous,
so might as well rip those out.

n_tty_poll() will need to be fixed as well, because if one application
used read() with O_NONBLOCK to expect to block until i/o became available,
then I guarantee some other application uses poll() with no timeout
for the same purpose.

Regards,
Peter Hurley

>
> - timeout = wait_woken(&wait, TASK_INTERRUPTIBLE,
> - timeout);
> + timeout = wait_woken(&wait, TASK_INTERRUPTIBLE,
> + timeout);
>
> - down_read(&tty->termios_rwsem);
> - continue;
> + down_read(&tty->termios_rwsem);
> + continue;
> + }
> }
>
> if (ldata->icanon && !L_EXTPROC(tty)) {
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/