Re: [next] arm: Internal error: Oops: 5 PC is at __read_once_word_nocheck

From: Ard Biesheuvel
Date: Wed Mar 09 2022 - 10:10:40 EST


On Wed, 9 Mar 2022 at 16:07, Russell King (Oracle)
<linux@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
> > >
> > > On Wed, 9 Mar 2022 at 19:37, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, 9 Mar 2022 at 16:16, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, 9 Mar 2022 at 11:37, Russell King (Oracle)
> > > > > <linux@xxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Mar 09, 2022 at 03:18:12PM +0530, Naresh Kamboju wrote:
> > > > > > > While boting linux next-20220308 on BeagleBoard-X15 and qemu arm the following
> > > > > > > kernel crash reported which is CONFIG_KASAN enabled build [1] & [2].
> > > > > >
> > > > > > The unwinder is currently broken in linux-next. Please try reverting
> > > > > > 532319b9c418 ("ARM: unwind: disregard unwind info before stack frame is
> > > > > > set up")
> > >
> > > I have reverted the suggested commit and built and boot failed due to reported
> > > kernel crash [1].
> > >
> > > - Naresh
> > >
> >
> > Thanks Naresh,
> >
> > This looks like it might be related to the issue Russell just sent a fix for:
> > https://lore.kernel.org/linux-arm-kernel/CAMj1kXEqp2UmsyUe1eWErtpMk3dGEFZyyno3nqydC_ML0bwTLw@xxxxxxxxxxxxxx/T/#t
> >
> > Could you please try that?
>
> Well, we unwound until:
>
> __irq_svc from migrate_disable+0x0/0x70
>
> and then crashed - and the key thing there is that we're at the start
> of migrate_disable() when we took an interrupt.
>
> For some reason, this triggers an access to address 0x10, which faults.
> We then try unwinding again, and successfully unwind all the way back
> to the same point (the line above) which then causes the unwinder to
> again access address 0x10, and the cycle repeats with the stack
> growing bigger and bigger.
>
> I'd suggest also testing without the revert but with my patch.
>

Indeed.

And as I suggested the other day, maybe it wouldn't be so bad to
harden the vsp dereference, like below:

--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/uaccess.h>
#include <linux/list.h>

#include <asm/sections.h>
@@ -236,10 +237,11 @@ static int unwind_pop_register(struct
unwind_ctrl_block *ctrl,
if (*vsp >= (unsigned long *)ctrl->sp_high)
return -URC_FAILURE;

- /* Use READ_ONCE_NOCHECK here to avoid this memory access
- * from being tracked by KASAN.
+ /* Use get_kernel_nofault() here to avoid this memory access
+ * from causing a fatal fault, and from being tracked by KASAN.
*/
- ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
+ if (get_kernel_nofault(ctrl->vrs[reg], *vsp))
+ return -URC_FAILURE;
if (reg == 14)
ctrl->lr_addr = *vsp;
(*vsp)++;