RE: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

From: Oza (Pawandeep) Oza
Date: Fri May 08 2015 - 01:21:37 EST


It seems odd to me to use BUG() for what you appear to be using it for..
not that I know exactly what that it mind you, but when you said when
some other gizmo in your box has a problem you crash the kernel, my head
tilted to the side - surely there's a more controlled response possible
than poking the big red self destruct button ;-)

Oza:
We have to place red button as our last resort, if we donât press we pass the time or miss the point where we can go back and debug.
So that is something by design.

Regards,
-Oza


-----Original Message-----
From: Mike Galbraith [mailto:umgwanakikbuti@xxxxxxxxx]
Sent: Friday, May 08, 2015 10:42 AM
To: Oza (Pawandeep) Oza
Cc: pawandeep oza; linux-kernel@xxxxxxxxxxxxxxx; malayasen rout
Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17

On Fri, 2015-05-08 at 04:16 +0000, Oza (Pawandeep) Oza wrote:
> So Mike, is this reason strong enough for you ?

Nope. I think you did the right thing in removing your dependency on
jiffies reliability in a dying box. You don't have to convince me of
anything though, CC timer subsystem maintainer, see what he says.

> I understand your point: solve the BUG, and I do tend to agree with you.
>
> But by design and implementation, the BUG() is just a beginning of the end for dying kernel.
> And what happens in between this 'the beginning' and 'the end' is not less important.
> (because say, on our platform we want to get clean RAMDUMP to analyze what happened, and for that we want to get clean reboot)

I don't see anybody else having any trouble getting crash dumps. I
spent yet another long day just yesterday, rummaging through one.

> Also,
> If somebody's design is to legally Crash the kernel (e.g. where kernel is actually not faulty).
> Then, I do expect that tick/timekeeping framework do its job as long as it can do, and it should do, because kernel is not faulty.
> But in this case it doesnât handover jiffies incrementing job sanely.

It seems odd to me to use BUG() for what you appear to be using it for..
not that I know exactly what that it mind you, but when you said when
some other gizmo in your box has a problem you crash the kernel, my head
tilted to the side - surely there's a more controlled response possible
than poking the big red self destruct button ;-)

-Mike