RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

From: Liang, Kan
Date: Mon Jul 17 2017 - 10:47:02 EST




> On Mon, 17 Jul 2017, Liang, Kan wrote:
> > > That doesn't make sense. What's the exact test procedure?
> >
> > I don't know the exact test procedure. The test case is from our customer.
> > I only know that the test case makes calls into the x11 libs.
>
> Sigh. This starts to be silly. You test something and have no idea what it does?

As I said, the test case is from our customer. They only share binaries with us.
Actually, it's more proper to call it test suite. It includes dozens of small test.
I just reproduced the issue and verified all the three patches in our lab.
Then I report it here as request immediately.
So I know little about the test case for now.
I will share more when I learn more.
Sorry for that.

>
> > > > According to our test, only patch 3 works well.
> > > > The other two patches will hang the system eventually.
>
> Hang the system eventually? Does that mean that the system stops working
> and the watchdog does not catch the problem?


Right, the system stops working and the watchdog does not catch the problem.

>
> > > > BTW: We set 1 to watchdog_thresh when we did the test.
> > > > It's believed that can speed up the failure.
> > >
> > > Believe is not really a technical measure....
> > >
> >
> > 1 is a valid value for watchdog_thresh.
> > It was set through the standard proc interface.
> > /proc/sys/kernel/watchdog_thresh
> > It should not impacts the final test result.
>
> I know that 1 is a valid value and I know how that can be set. Still, it does not
> help if you believe that setting the threshold to 1 can speed up the failure.
> Either you know it for sure or not. You can believe in god or whatever, but
> here we talk about facts.

I personally didn't compare the difference between 1 and default 10 for this
test case.
Before we had the test case from customer, we developed other micro
which can reproduce the similar issue.
For that micro, 1 can speed up the failure.
(BTW: all the three patches can fix the issue which was reproduced by that micro.)

If you think it's meaningful to verify 10 as well, I can do the compare.

Thanks,
Kan