Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite

From: OGAWA Hirofumi
Date: Thu Sep 03 2009 - 21:41:34 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Thu, 3 Sep 2009, OGAWA Hirofumi wrote:
>>
>> If I'm not missing, I think it doesn't have big change with old
>> code. But I would need to check more deeply.
>
> The thing is, the old pty code pushed _directly_ to the receiving ldisc,
> with no buffering.

Yes.

> I'm not entirely sure why Alan felt it needed changing,
> but moving over to the generic tty buffering code did get rid of some
> duplicate logic, and the locking is now done in one place, so that's
> probably the main reason.

IIRC, ppp had the locking issue without that patch?

> Anyway, the old pty code would be entirely synchronous, and would do the
>
> ld->ops->receive_buf(to, buf, NULL, c);
>
> to push the data all the way to the receive side frm pty_write(). So with
> the old code, the destination "receive_room" was always accurate, because
> both the reading side and the writing side basically accessed it directly.
>
> With the new code, it all goes through tty_buffer.c, and the bugs have
> been mostly about the receiving side not seeing all the data in the
> buffers. And those buffers simply didn't use to exist before.

Yes. However, pty_write() checks tty_buffer instead of receive_room. So
I thought, the change of write side is mainly buffer size (receive_room
size + tty_buffer size). It will stop after filling tty_buffer, not
receive_room.

And (I hope) the read side guarantees to consume both buffers. If it is
right, I guessed the change is timing issues with more larger buffer
size.

>> Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
>> why reverting buffer handling fixes the problem.
>
> In the old code, if 'receive_room' was zero, then the writer would simply
> stop writing (no buffers in between). So in the old code, you could never
> get into a situation where receive_room was zero and there was still
> pending data.
>
> At least that's how I read the situation.

Another possibility in my guess is the change of pty_flush_buffer() and
pty_chars_in_buffer(). I'm not sure at all though, especially, I'm
suspecting pty_flush_buffer() may change the behaviors.

> If I'm right, I'm hoping that the patch I sent out fixes it, and if so,
> we'll do that for 2.6.31 (and then after that maybe re-think whether the
> extra buffering is worth all this pain).

I also hope it works.

> And if it _doesn't_ fix it, then I think we'll just have to revert the
> commits in question. We won't have time to root-cause it if the above
> isn't it.

At least for me, it sounds like good if revert works. I have no
preference about it.

FWIW, meanwhile, I'll just try to see the root-cause of this as
another/fallback solution.

Thanks.
--
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/