Re: [syzbot] upstream test error: KASAN: invalid-access Read in __entry_tramp_text_end
From: Dmitry Vyukov
Date: Tue Sep 28 2021 - 06:19:49 EST
On Mon, 27 Sept 2021 at 19:18, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> On Mon, Sep 27, 2021 at 06:01:22PM +0100, Mark Rutland wrote:
> > On Mon, Sep 27, 2021 at 04:27:30PM +0200, Dmitry Vyukov wrote:
> > > On Tue, 21 Sept 2021 at 18:51, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > >
> > > > Hi Dmitry,
> > > >
> > > > The good news is that the bad unwind is a known issue, the bad news is
> > > > that we don't currently have a way to fix it (and I'm planning to talk
> > > > about this at the LPC "objtool on arm64" talk this Friday).
> > > >
> > > > More info below: the gist is we can produce spurious entries at an
> > > > exception boundary, but shouldn't miss a legitimate value, and there's a
> > > > plan to make it easier to spot when entries are not legitimate.
> > > >
> > > > On Fri, Sep 17, 2021 at 05:03:48PM +0200, Dmitry Vyukov wrote:
> > > > > > Call trace:
> > > > > > dump_backtrace+0x0/0x1ac arch/arm64/kernel/stacktrace.c:76
> > > > > > show_stack+0x18/0x24 arch/arm64/kernel/stacktrace.c:215
> > > > > > __dump_stack lib/dump_stack.c:88 [inline]
> > > > > > dump_stack_lvl+0x68/0x84 lib/dump_stack.c:105
> > > > > > print_address_description+0x7c/0x2b4 mm/kasan/report.c:256
> > > > > > __kasan_report mm/kasan/report.c:442 [inline]
> > > > > > kasan_report+0x134/0x380 mm/kasan/report.c:459
> > > > > > __do_kernel_fault+0x128/0x1bc arch/arm64/mm/fault.c:317
> > > > > > do_bad_area arch/arm64/mm/fault.c:466 [inline]
> > > > > > do_tag_check_fault+0x74/0x90 arch/arm64/mm/fault.c:737
> > > > > > do_mem_abort+0x44/0xb4 arch/arm64/mm/fault.c:813
> > > > > > el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:357
> > > > > > el1h_64_sync_handler+0xb0/0xd0 arch/arm64/kernel/entry-common.c:408
> > > > > > el1h_64_sync+0x78/0x7c arch/arm64/kernel/entry.S:567
> > > > > > __entry_tramp_text_end+0xdfc/0x3000
> > > > >
> > > > > /\/\/\/\/\/\/\
> > > > >
> > > > > This is broken unwind on arm64. d_lookup statically calls __d_lookup,
> > > > > not __entry_tramp_text_end (which is not even a function).
> > > > > See the following thread for some debugging details:
> > > > > https://lore.kernel.org/lkml/CACT4Y+ZByJ71QfYHTByWaeCqZFxYfp8W8oyrK0baNaSJMDzoUw@xxxxxxxxxxxxxx/
> > Looking at this again (and as you point out below), my initial analysis
> > was wrong, and this isn't to do with the LR -- this value should be the
> > PC at the time the exception boundary.
> Whoops, I accidentally nuked the more complete/accurate analysis I just
> wrote and sent the earlier version. Today is not a good day for me and
> computers. :(
> What's happened here is that __d_lookup() (via a few layers of inlining) called
> load_unaligned_zeropad(). The `LDR` at the start of the asm faulted (I suspect
> due to a tag check fault), and so the exception handler replaced the PC with
> the (anonymous) fixup function. This is akin to a tail or sibling call, and so
> the fixup function entirely replaces __d_lookup() in the trace.
> The fixup function itself has an `LDR` which faulted (because it's
> designed to fixup page alignment problems, not tag check faults), and
> that is what's reported here.
> As the fixup function is anonymous, and the nearest prior symbol in .text is
> __entry_tramp_text_end, it gets symbolized as an offset from that.
> We can make the unwinds a bit nicer by adding some markers (e.g. patch
> below), but actually fixing this case will require some more thought.
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 709d2c433c5e..127096a0faea 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -111,6 +111,11 @@ jiffies = jiffies_64;
> #define TRAMP_TEXT
> +#define FIXUP_TEXT \
> + __fixup_text_start = .; \
> + *(.fixup); \
> + __fixup_text_end = .;
> * The size of the PE/COFF section that covers the kernel image, which
> * runs from _stext to _edata, must be a round multiple of the PE/COFF
> @@ -161,7 +166,7 @@ SECTIONS
> - *(.fixup)
> + FIXUP_TEXT
> . = ALIGN(16);
> *(.got) /* Global offset table */
Oh, good it's very local to the .fixup thing rather than a common
issue that affects all unwinds.
In the other x86 thread Josh Poimboeuf suggested to use asm goto to a
cold part of the function instead of .fixup:
This sounds like a more reliable solution that will cause less
maintenance burden. Would it work for arm64 as well?