Re: INFO: rcu detected stall in memcpy

From: Takashi Iwai
Date: Thu Jan 04 2018 - 12:03:12 EST


On Thu, 04 Jan 2018 15:17:23 +0100,
Takashi Iwai wrote:
>
> On Thu, 04 Jan 2018 15:01:06 +0100,
> Dmitry Vyukov wrote:
> >
> > On Thu, Jan 4, 2018 at 1:57 PM, Takashi Iwai <tiwai@xxxxxxx> wrote:
> > > On Thu, 04 Jan 2018 13:08:45 +0100,
> > > Dmitry Vyukov wrote:
> > >>
> > >> On Thu, Jan 4, 2018 at 1:03 PM, syzbot
> > >> <syzbot+387f48da65cb522abfe8@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >> > Hello,
> > >> >
> > >> > syzkaller hit the following crash on
> > >> > 30a7acd573899fd8b8ac39236eff6468b195ac7d
> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > >> > compiler: gcc (GCC) 7.1.1 20170620
> > >> > .config is attached
> > >> > Raw console output is attached.
> > >> > Unfortunately, I don't have any reproducer for this bug yet.
> > >> >
> > >> >
> > >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > >> > Reported-by: syzbot+387f48da65cb522abfe8@xxxxxxxxxxxxxxxxxxxxxxxxx
> > >> > It will help syzbot understand when the bug is fixed. See footer for
> > >> > details.
> > >> > If you forward the report, please keep this part and the footer.
> > >>
> > >> This looks ALSA-related. +ALSA maintainers.
> > >
> > > Not sure exactly what triggers it. It's the simple memcpy(), and I
> > > don't know where RCU is involved in that code path.
> > >
> > > BTW, other two suspicious RCU usage reports are actually stopped at
> > > the second WARN_ON() after the RCU message, and the second WARN_ON()
> > > is independent from RCU; it's the known spurious WARN_ON() and was
> > > already removed in the sound git tree.
> >
> >
> > Hi Takashi,
> >
> > Another similar one just popped up:
> >
> > https://groups.google.com/forum/#!topic/syzkaller-bugs/X3d6-PIrJM0
> >
> > This looks like mulaw_decode enters an infinite loop, or at least
> > doing very large amount of computations without a resched, e.g.
> > (uint64_t)-1 number of iterations of something along these lines.
>
> OK, that makes sense.
>
> My rough guess is that it's the misconfigured aloop device by
> concurrent setup. The aloop device allows to restrict the parameters
> of the other side of the connection, and something bad may happen
> there if both sides are updated concurrently.
>
> We've seen segfault by memset() at loopback_preapre() in
> sound/drivers/aloop.c by syzbot+3902b5220e8ca27889ca, too, which
> indicates also the wrongly setup parameters that overflows the
> allocated buffer.

Below two patches may possibly plug the holes, but I'm not entirely
sure whether that's the exact culprit. Could you put them into syzbot
to watch whether they have any influence?

In anyway, they are obvious bugs to be fixed, so I'm going to queue to
my tree.


thanks,

Takashi

Attachment: 0001-ALSA-pcm-Add-missing-error-checks-in-OSS-emulation-p.patch
Description: Binary data