Re: data loss when doing ls-remote and piped to command

From: Junio C Hamano
Date: Thu Sep 16 2021 - 16:42:31 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Thu, Sep 16, 2021 at 5:17 AM Rolf Eike Beer <eb@xxxxxxxxx> wrote:
>>
>> Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer:
>> > > The redirection seems to be an important part of it. I now did:
>> > >
>> > > git ... 2>&1 | sha256sum
>> >
>> > I've tried to reproduce this since yesterday, but couldn't until now:
>> >
>> > 2>&1 made all the difference, took less than a minute.
>
> So if that redirection is what matters, and what causes problems, I
> can almost guarantee that the reason is very simple:
> ...
> Anyway. That was a long email just to tell people it's almost
> certainly user error, not the kernel.

Yes, 2>&1 will mix messages from the standard error stream at random
places in the output, which explains the checksum quite well.

I am not sure if it explains the initial report where

ls-remote 2>&1 | less

produced

> 6f38b5d6cfd43dde3058a10c68baae9cf17af912 refs/tags/v5.0-rc2
> 1c7fc5cbc33980acd13ae83d0b416db002fe95601e7f97f64b59514d936 refs/tags/v5.7-rc2^{}
> d0709bb6da2ab6d49b11643e98abdf79b1a2817f refs/tags/v5.7-rc3

What we see on the second line is the beginning of peeled
v5.0-rc2^{} up to the "acd13" (that is, the first 19 bytes of the
line), followed by the full line for peeled v5.7-rc2^{} (which
begins with "ae83d"). 12407 bytes in between are missing, which
is even more puzzling as it is not a nice round number.

I can sort of guess that the progress display during transfer, which
comes out on the standard error stream and uses terminal control
sequences like "go back to the end of the line without feeding a new
line", "erase to the end of the line", etc., would be contributing,
but because it is piped to "less", which would make it "visible"
(i.e. you do not get the raw escape but see three capital letters
ESC in reverse), it does not quite explain how the display was
broken.

In any case, I do not think the kernel is involved, or more
generally I do not think any "loss of output bytes" is happening
here. It's just "| less" that failed to show a range about 12k
bytes long is mystery to me ;-).