Re: INFO: task hung in pipe_write (2)

From: Darrick J. Wong
Date: Mon Oct 14 2019 - 18:16:48 EST


On Mon, Oct 14, 2019 at 10:40:44PM +0200, Andreas Gruenbacher wrote:
> Hi Darrick,
>
> On Thu, Sep 19, 2019 at 11:10 PM Darrick J. Wong
> <darrick.wong@xxxxxxxxxx> wrote:
> > On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote:
> > > On 19/09/2019 19.19, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit: 288b9117 Add linux-next specific files for 20190918
> > > > git tree: linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e86645600000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3
> > > > dashboard link:
> > > > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d
> > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11855769600000
> > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=143580a1600000
> > > >
> > > > The bug was bisected to:
> > > >
> > > > commit cfb864757d8690631aadf1c4b80022c18ae865b3
> > > > Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > Date: Tue Sep 17 16:05:22 2019 +0000
> > > >
> > > > splice: only read in as much information as there is pipe buffer space
> > >
> > > The middle hunk (the one before splice_pipe_to_pipe()) accesses
> > > opipe->{buffers, nrbufs}, but opipe is not locked at that point. So
> > > maybe we end up passing len==0, which seems (once there's room in opipe)
> > > it would put a zero-length pipe_buffer in opipe - and that probably
> > > violates an invariant somewhere.
> > >
> > > But does the splice_pipe_to_pipe() case even need that extra logic?
> > > Doesn't it handle short writes correctly already?
> >
> > Yep. I missed the part where splice_pipe_to_pipe is already perfectly
> > capable of detecting insufficient space in opipe and kicking opipe's
> > readers to clear out the buffer. So that hunk isn't needed, and now I'm
> > wondering how in the other clause we return 0 from wait_for_space yet
> > still don't have buffer space...
> >
> > Oh well, back to the drawing board. Good catch, though now it's become
> > painfully clear that xfstests lacks rigorous testing of splice()...
>
> have you had any luck figuring out how to fix this? We're still
> suffering from the regression I've reported a while ago (*).

Nope, that's slipped along with everything else because I'm burning out
on all the buggy sh*t that has gone in the kernel for 5.4 that has made
it difficult to get regression tests to run reliably to find out if
there's anything wrong with XFS.

Oh, sure, if I turn off kmemleak and whatever the hell "haltpoll" is
then it tidies up enough to run fstests but now "sleep 0.5" runs in
anywhere between a jiffie and 10s. WTH.

> If not, I wonder if reverting commit 8f67b5adc030 would make sense for now.

Ugh, no, splice shouldn't be asking the filesystem for a 75k buffered
read and then *oopsie* running out of pages after ~64k or so. Even more
frighteningly the syzbot reproducer asks for a 20GB read into a pipe
which gets sent right into the fs without any size clamping.

Ok I'll at least cough up a v3 patch which maybe will work.

--D

>
> * https://lore.kernel.org/linux-fsdevel/CAHpGcM+WQYFHOOC8SzKq+=DuHVZ4fw4RHLTMUDN-o6GX3YtGvQ@xxxxxxxxxxxxxx/T/#u
>
> Thanks,
> Andreas