Re: [BISECTED]: Kernel panic (was: Linux 5.7-rc2)

From: Peter Zijlstra
Date: Tue Apr 21 2020 - 17:24:01 EST


On Tue, Apr 21, 2020 at 12:03:10PM -0700, Linus Torvalds wrote:
> On Mon, Apr 20, 2020 at 1:52 AM Harald Arnesen <harald@xxxxxxxxxxx> wrote:
> >
> > Neither rc1 nor rc2 will boot on my laptop. The attached picture is all
> > I have been able to capture.
>
> I know you saw the reply about this probably being fixed by
>
> https://lore.kernel.org/lkml/20200416054745.740-1-ggherdovich@xxxxxxx/
>
> but it would be lovely if you could actually verify that that series
> of four patches does indeed fix it for you.

(not seeing the original report in the archives or my list copy)

I'm assuming it's some sort of dodgy virt setup, actual real proper
hardware should never get here like that.

> Your oops is on that divide instruction:
>
> freq_scale = div64_u64(acnt, mcnt);
>
> and while we had a check for mcnt not being zero earlier, we did
>
> mcnt *= arch_max_freq_ratio;
>
> after that check. I could see it becoming zero either due to an
> overflow, or due to arch_max_freq_ratio being 0.

Right, so that's not supposed to happen, as you say, we should not
enable this code if the ratio is 0, and we should not overflow mcnt due
to reading that reg once per tick.

But yeha, virt, anything can happen :/

> I think the first commit in that series is supposed to fix that
> arch_max_freq_ratio being 0 case, but it still feels like the code
> that does the divide is checking for zero in the wrong place...

Yeah, we can certainly modify that. As is, real actual hardware should
never even hit that case either. So we might as well move that check and
then also make it disable all this frequency scaling stuff if we ever do
hit it.