Re: [next] arm: Internal error: Oops: 5 PC is at __read_once_word_nocheck

From: Naresh Kamboju
Date: Wed Mar 09 2022 - 11:51:31 EST


Hi Russell,

On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
<linux@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
> > >
<trim>
> Well, we unwound until:
>
> __irq_svc from migrate_disable+0x0/0x70
>
> and then crashed - and the key thing there is that we're at the start
> of migrate_disable() when we took an interrupt.
>
> For some reason, this triggers an access to address 0x10, which faults.
> We then try unwinding again, and successfully unwind all the way back
> to the same point (the line above) which then causes the unwinder to
> again access address 0x10, and the cycle repeats with the stack
> growing bigger and bigger.
>
> I'd suggest also testing without the revert but with my patch.

I have tested your patch on top of linux next-20220309 and still see kernel
crash as below [1]. build link [2].

[ 26.812060] 8<--- cut here ---
[ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
[ 26.816139] [b6a3ab70] *pgd=fb28a835
[ 26.817770] Internal error: : 1b [#1] SMP ARM
[ 26.819636] Modules linked in:
[ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
5.17.0-rc7-next-20220309 #1
[ 26.824519] Hardware name: Generic DT based system
[ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
[ 26.829856] LR is at unwind_frame+0x7dc/0xab4

- Naresh

[1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
[2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/