Re: INFO: task hung in wdm_flush

From: Dmitry Vyukov
Date: Mon Feb 10 2020 - 05:09:44 EST


On Mon, Feb 10, 2020 at 11:06 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > > Oliver Neukum <oneukum@xxxxxxx> writes:
> > > > Am Dienstag, den 19.11.2019, 10:14 +0100 schrieb BjÃrn Mork:
> > > >
> > > >> Anyway, I believe this is not a bug.
> > > >>
> > > >> wdm_flush will wait forever for the IN_USE flag to be cleared or the
> > > >
> > > > Damn. Too obvious. So you think we simply have pending output that does
> > > > just not complete?
> > >
> > > I do miss a lot of stuff so I might be wrong, but I can't see any other
> > > way this can happen. The out_callback will unconditionally clear the
> > > IN_USE flag and wake up the wait_queue.
> > >
> > > >> DISCONNECTING flag to be set. The only way you can avoid this is by
> > > >> creating a device that works normally up to a point and then completely
> > > >> ignores all messages,
> > > >
> > > > Devices may crash. I don't think we can ignore that case.
> > >
> > > Sure, but I've never seen that happen without the device falling off the
> > > bus. Which is a disconnect.
> > >
> > > But I am all for handling this *if* someone reproduces it with a real
> > > device. I just don't think it's worth the effort if it's only a
> > > theoretical problem.
> > >
> > > >> but without resetting or disconnecting. It is
> > > >> obviously possible to create such a device. But I think the current
> > > >> error handling is more than sufficient, unless you show me some way to
> > > >> abuse this or reproduce the issue with a real device.
> > > >
> > > > Malicious devices are real. Potentially at least.
> > > > But you are right, we need not bend over to handle them well, but we
> > > > ought to be able to handle them.
> > >
> > > Sure, we need to handle malicious devices. But only if they can be used
> > > for real harm.
> > >
> > > This warning requires physical acceess and is only slightly annoying.
> > > Like a USB device making loud farting sounds. You'd just disconnect the
> > > device. No need for Linux to detect the sound and handle it
> > > automatically, I think.
> >
> > Hi BjÃrn,
> >
> > Besides the production use you are referring to, there are 2 cases we
> > should take into account as well:
> > 1. Testing.
> > Any kernel testing system needs a binary criteria for detecting kernel
> > bugs. It seems right to detect unkillable hung tasks as kernel bugs.
> > Which means that we need to resolve this in some way regardless of the
> > production scenario.
> > 2. Reliable killing of processes.
> > It's a very important property that an admin or script can reliably
> > kill whatever process/container they need to kill for whatever reason.
> > This case results in an unkillable process, which means scripts will
> > fail, automated systems will misbehave, admins will waste time (if
> > they are qualified to resolve this at all).
>
> On Mon, Feb 10, 2020 at 11:00 AM Tetsuo Handa
> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hello.
> >
> > Will you check whether patch testing is working? I tried
> >
> > #syz test: https://github.com/google/kasan.git usb-fuzzer
> >
> > but the reproducer did not trigger crash for both "with a patch"
> > and "without a patch", despite dashboard is still adding crashes.
> > I suspect something is wrong. Is it possible that reproducer is
> > trying to test a bug which was already fixed but a different new
> > bug is still reported as the same bug?

Hi Tetsuo,

The simplest and fastest you may try is to request testing on another,
simpler bug. I have not seen any other signals suggesting that patch
testing in general is somehow broken.

You may also try on the exact commit the bug was reported, because
usb-fuzzer is tracking branch, things may change there.

If the old bug was fixed, but syzbot is not aware, new bugs being
piled into the same bucket is exactly what will happen. So that's
definitely possible.