Re: PROBLEM: Data corruption when pasting large data to terminal

From: Egmont Koblinger
Date: Mon Feb 20 2012 - 12:19:00 EST


Further investigation reveals that:

- In case of emacs, strace shows that it receives the correct data on
its standard input, so it's an emacs bug, not a kernel one. My bad.

- For the other three remaining readline-based apps (bash, python,
bc), strace shows that wherever the data is correct, lines are
terminated by '\r' (as it seems to be the standard for raw terminal
mode, and the terminal always puts this character in the terminal),
whereas as soon as it's buggy, the received character becomes a '\n'
(as it seems to be the way for cooked terminal mode). Here's an
excerpt of 'strace bash', grepping only the reads from stdin:
read(0, "8", 1) = 1
read(0, "9", 1) = 1
read(0, ")", 1) = 1
read(0, "\r", 1) = 1 <-- everything's fine
read(0, "a", 1) = 1
read(0, "=", 1) = 1
read(0, "(", 1) = 1
read(0, "1", 1) = 1
...
read(0, "2", 1) = 1
read(0, "3", 1) = 1 <-- a line
shouldn't end with '3',
read(0, "\n", 1) = 1 <-- and it's a '\n'
where it's buggy
read(0, "a", 1) = 1
read(0, "=", 1) = 1
read(0, "(", 1) = 1
read(0, "1", 1) = 1
read(0, "2", 1) = 1

- This, in combination with the fact that we haven't been able to
reproduce the bug with a raw-only or cooked-only terminal, suggests
that there's somehow a race condition when writes, reads and termios
changes are all involved.

I'll keep on investigating. There's quite a lot for me to learn, e.g.
I'm wondering if maybe readline incorrectly uses the TCSETS* ioctl
attributes?

Right now readline only uses TCSETSW to change the terminal values, it
toggles back-n-forth between two states (raw when expecting user
input, cooked when processing a command), and only read()s in the raw
state, is this the correct behavior? Even if it uses the wrong one,
would it explain data missing from the input stream? TCSETSF seems to
be one that can cause data to be dropped, but according to strace,
readline doesn't use this.

I'm quite new to this area, so any hint from terminal experts on how
it should work would be appreciated.

thanks a lot,
egmont


On Sun, Feb 19, 2012 at 22:41, Egmont Koblinger <egmont@xxxxxxxxx> wrote:
> Hi Bruno,
>
> On Sun, Feb 19, 2012 at 22:14, Bruno PrÃmont <bonbons@xxxxxxxxxxxxxxxxx> wrote:
>> Hi Egmont,
>>
>> On Sun, 19 February 2012 Egmont Koblinger <egmont@xxxxxxxxx> wrote:
>>> Unfortunately the lost tail is a different thing: the terminal is in
>>> cooked mode by default, so the kernel intentionally keeps the data in
>>> its buffer until it sees a complete line. ÂA quick-and-dirty way of
>>> changing to byte-based transmission (I'm lazy to look up the actual
>>> system calls, apologies for the terribly ugly way of doing this) is:
>>> Â Â Â Â Â Â Â Â Âpty = open(ptsdname, O_RDWR):
>>> Â Â Â Â Â Â Â Â Âif (pty == -1) { ... }
>>> + Â Â Â Â Â Â Â Âchar cmd[100];
>>> + Â Â Â Â Â Â Â Âsprintf(cmd, "stty raw <>%s", ptsdname);
>>> + Â Â Â Â Â Â Â Âsystem(cmd);
>>> Â Â Â Â Â Â Â Â Âptmx_slave_test(pty, line, rsz);
>>>
>>> Anyway, thanks very much for your test program, I'll try to modify it
>>> to trigger the data corruption bug.
>>
>> Well, not sure but the closing of ptmx on sender side should force kernel
>> to flush whatever is remaining independently on end-of-line (I was
>> thinking I should push an EOF over the ptmx instead of closing it before
>> waiting for child process though I have not yet looked-up how to do so!).
>
> As Alan also pointed out, the way to close stuff is not handled very
> nicely in the example. ÂHowever, I didn't face a problem with that -
> I'm not particularly interested in whether the application receives
> all the data if I kill the underlying terminal. ÂMy problem is data
> corruption way before the end of the stream, and actually incorrect
> bytes received by the application (not just an early eof due to a
> closed terminal). ÂI'm trying hard to reproduce that with a single
> example, but I haven't succeeded so far.
>
> Note that I've triggered the bug with 4 apps so far: emacs (which is
> always in char-based input mode), and three readline apps (which keep
> switching back and forth between the two modes). ÂI have no clue yet
> whether the bug itself is related to raw char-based mode or not, but I
> guess switching to this mode might not hurt.
>
>
> egmont
>
>>
>> The amount of missing tail for my few runs of the test program were of
>> varying length, but in all cases way more than a single line, thus I would
>> hope it's not line-buffering by the kernel which causes the missing data!
>>
>> Bruno
>>
>>
>>> egmont
>>>
>>> On Fri, Feb 17, 2012 at 22:57, Bruno PrÃmont <bonbons@xxxxxxxxxxxxxxxxx> wrote:
>>> > Hi,
>>> >
>>> > On Fri, 17 February 2012 Pavel Machek <pavel@xxxxxx> wrote:
>>> >> > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue:
>>> >> > >
>>> >> > > - strace reveals that the terminal emulator writes the correct data
>>> >> > > into /dev/ptmx, and the kernel reports no short writes(!), all the
>>> >> > > write(..., ..., 68) calls actually return 68 (the length of the
>>> >> > > example file's lines incl. newline; I'm naively assuming I can trust
>>> >> > > strace here.)
>>> >> > > - strace reveals that the receiving application (bash) doesn't receive
>>> >> > > all the data from /dev/pts/N.
>>> >> > > - so: the data gets lost after writing to /dev/ptmx, but before
>>> >> > > reading it out from /dev/pts/N.
>>> >> >
>>> >> > Which it will, if the reader doesn't read fast enough, right? ÂIs the
>>> >> > data somewhere guaranteed to never "overrun" the buffer? ÂIf so, how do
>>> >> > we handle not just running out of memory?
>>> >>
>>> >> Start blocking the writer?
>>> >
>>> > I did quickly write a small test program (attached). It forks a reader child
>>> > and sends data over to it, at the end both write down their copy of the buffer
>>> > to a /tmp/ptmx_{in,out}.txt file for manual comparing results (in addition
>>> > to basic output of mismatch start line)
>>> >
>>> > From the time it took the writer to write larger buffers (as seen using strace)
>>> > it seems there *is* some kind of blocking, but it's not blocking long enough
>>> > or unblocking too early if the reader does not keep up.
>>> >
>>> >
>>> > For quick and dirty testing of effects of buffer sizes, tune "rsz", "wsz"
>>> > and "line" in main() as well as total size with BUFF_SZ define.
>>> >
>>> >
>>> > The effects for me are that writer writes all data but reader never sees tail
>>> > of written data (how much is being seen seems variable, probably matter of
>>> > scheduling, frequency scaling and similar racing factors).
>>> >
>>> > My test system is single-core uniprocessor centrino laptop (32bit x86) with
>>> > 3.2.5 kernel.
>>> >
>>> > Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/