Re: PROBLEM: Data corruption when pasting large data to terminal

From: Egmont Koblinger
Date: Sun Feb 19 2012 - 16:42:29 EST


Hi Bruno,

On Sun, Feb 19, 2012 at 22:14, Bruno PrÃmont <bonbons@xxxxxxxxxxxxxxxxx> wrote:
> Hi Egmont,
>
> On Sun, 19 February 2012 Egmont Koblinger <egmont@xxxxxxxxx> wrote:
>> Unfortunately the lost tail is a different thing: the terminal is in
>> cooked mode by default, so the kernel intentionally keeps the data in
>> its buffer until it sees a complete line. ÂA quick-and-dirty way of
>> changing to byte-based transmission (I'm lazy to look up the actual
>> system calls, apologies for the terribly ugly way of doing this) is:
>> Â Â Â Â Â Â Â Â Âpty = open(ptsdname, O_RDWR):
>> Â Â Â Â Â Â Â Â Âif (pty == -1) { ... }
>> + Â Â Â Â Â Â Â Âchar cmd[100];
>> + Â Â Â Â Â Â Â Âsprintf(cmd, "stty raw <>%s", ptsdname);
>> + Â Â Â Â Â Â Â Âsystem(cmd);
>> Â Â Â Â Â Â Â Â Âptmx_slave_test(pty, line, rsz);
>>
>> Anyway, thanks very much for your test program, I'll try to modify it
>> to trigger the data corruption bug.
>
> Well, not sure but the closing of ptmx on sender side should force kernel
> to flush whatever is remaining independently on end-of-line (I was
> thinking I should push an EOF over the ptmx instead of closing it before
> waiting for child process though I have not yet looked-up how to do so!).

As Alan also pointed out, the way to close stuff is not handled very
nicely in the example. However, I didn't face a problem with that -
I'm not particularly interested in whether the application receives
all the data if I kill the underlying terminal. My problem is data
corruption way before the end of the stream, and actually incorrect
bytes received by the application (not just an early eof due to a
closed terminal). I'm trying hard to reproduce that with a single
example, but I haven't succeeded so far.

Note that I've triggered the bug with 4 apps so far: emacs (which is
always in char-based input mode), and three readline apps (which keep
switching back and forth between the two modes). I have no clue yet
whether the bug itself is related to raw char-based mode or not, but I
guess switching to this mode might not hurt.


egmont

>
> The amount of missing tail for my few runs of the test program were of
> varying length, but in all cases way more than a single line, thus I would
> hope it's not line-buffering by the kernel which causes the missing data!
>
> Bruno
>
>
>> egmont
>>
>> On Fri, Feb 17, 2012 at 22:57, Bruno PrÃmont <bonbons@xxxxxxxxxxxxxxxxx> wrote:
>> > Hi,
>> >
>> > On Fri, 17 February 2012 Pavel Machek <pavel@xxxxxx> wrote:
>> >> > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue:
>> >> > >
>> >> > > - strace reveals that the terminal emulator writes the correct data
>> >> > > into /dev/ptmx, and the kernel reports no short writes(!), all the
>> >> > > write(..., ..., 68) calls actually return 68 (the length of the
>> >> > > example file's lines incl. newline; I'm naively assuming I can trust
>> >> > > strace here.)
>> >> > > - strace reveals that the receiving application (bash) doesn't receive
>> >> > > all the data from /dev/pts/N.
>> >> > > - so: the data gets lost after writing to /dev/ptmx, but before
>> >> > > reading it out from /dev/pts/N.
>> >> >
>> >> > Which it will, if the reader doesn't read fast enough, right? ÂIs the
>> >> > data somewhere guaranteed to never "overrun" the buffer? ÂIf so, how do
>> >> > we handle not just running out of memory?
>> >>
>> >> Start blocking the writer?
>> >
>> > I did quickly write a small test program (attached). It forks a reader child
>> > and sends data over to it, at the end both write down their copy of the buffer
>> > to a /tmp/ptmx_{in,out}.txt file for manual comparing results (in addition
>> > to basic output of mismatch start line)
>> >
>> > From the time it took the writer to write larger buffers (as seen using strace)
>> > it seems there *is* some kind of blocking, but it's not blocking long enough
>> > or unblocking too early if the reader does not keep up.
>> >
>> >
>> > For quick and dirty testing of effects of buffer sizes, tune "rsz", "wsz"
>> > and "line" in main() as well as total size with BUFF_SZ define.
>> >
>> >
>> > The effects for me are that writer writes all data but reader never sees tail
>> > of written data (how much is being seen seems variable, probably matter of
>> > scheduling, frequency scaling and similar racing factors).
>> >
>> > My test system is single-core uniprocessor centrino laptop (32bit x86) with
>> > 3.2.5 kernel.
>> >
>> > Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/