Re: arm64/acpi: NULL dereference reports from UBSAN at boot

From: Ard Biesheuvel
Date: Mon Jun 01 2020 - 17:57:51 EST


On Mon, 1 Jun 2020 at 23:52, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
>
> On Mon, Jun 1, 2020 at 12:05 AM Will Deacon <will@xxxxxxxxxx> wrote:
> >
> > On Wed, May 27, 2020 at 02:41:04PM +0100, Lorenzo Pieralisi wrote:
> > > On Tue, May 26, 2020 at 09:21:57PM +0100, Will Deacon wrote:
> > > > Hi Lorenzo, Hanjun, [+Nick]
> > > >
> > > > On Thu, May 21, 2020 at 06:37:38PM +0100, Lorenzo Pieralisi wrote:
> > > > > On Thu, May 21, 2020 at 11:09:53AM +0100, Will Deacon wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > I just tried booting the arm64 for-kernelci branch under QEMU (version
> > > > > > 4.2.50 (v4.2.0-779-g4354edb6dcc7)) with UBSAN enabled, and I see a
> > > > > > couple of NULL pointer dereferences reported at boot. I think they're
> > > > > > both GIC related (log below). I don't see a panic with UBSAN disabled,
> > > > > > so something's fishy here.
> > > > >
> > > > > May I ask you the QEMU command line please - just to make sure I can
> > > > > replicate it.
> > > >
> > > > As it turns out, I'm only able to reproduce this when building with Clang,
> > > > but I don't know whether that's because GCC is missing something of Clang
> > > > is signalling a false positive. You also don't need all of those whacky
> > > > fuzzing options enabled.
> > > >
> > > > Anyway, to reproduce:
> > > >
> > > > $ git checkout for-next/kernelci
> > > > $ make ARCH=arm64 CC=clang CROSS_COMPILE=aarch64-linux-gnu- defconfig
> > > > <then do a menuconfig and enable UBSAN>
> > > > $ make ARCH=arm64 CC=clang CROSS_COMPILE=aarch64-linux-gnu- Image
> > > >
> > > > I throw that at QEMU using:
> > > >
> > > > qemu-system-aarch64 -M virt -machine virtualization=true \
> > > > -machine virt,gic-version=3 \
> > > > -cpu max,sve=off -smp 2 -m 4096 \
> > > > -drive if=pflash,format=raw,file=efi.img,readonly \
> > > > -drive if=pflash,format=raw,file=varstore.img \
> > > > -drive if=virtio,format=raw,file=disk.img \
> > > > -device virtio-scsi-pci,id=scsi0 \
> > > > -device virtio-rng-pci \
> > > > -device virtio-net-pci,netdev=net0 \
> > > > -netdev user,id=net0,hostfwd=tcp::8222-:22 \
> > > > -nographic \
> > > > -kernel ~/work/linux/arch/arm64/boot/Image \
> > > > -append "earlycon root=/dev/vda2"
> > > >
> > > > I built QEMU a while ago according to:
> > > >
> > > > https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/docs/qemu/qemu-arm64-howto.html
> > > >
> > > > and its version 4.2.50 (v4.2.0-779-g4354edb6dcc7).
> > > >
> > > > My clang is version 11.0.1.
> > >
> > > Thanks a lot Will.
> > >
> > > I *think* I was right - it is the ACPI_OFFSET() macro:
> > >
> > > #define ACPI_OFFSET(d, f) ACPI_PTR_DIFF (&(((d *) 0)->f), (void *) 0)
> > >
> > > that triggers the warnings (I suspected it because at least in one of
> > > the warnings I could not see any dereference of any dynamically
> > > allocated data).
> >
> > Cheers, Lorenzo.
> >
> > > Now on what to do with it - thoughts welcome.
> >
> > Nick -- any idea what to do about the above? The '#define' pasted by
> > Lorenzo is causing a couple of spurious UBSAN splats when compiling with
> > clang 11.
>
> If there's undefined behavior from that macro soup, we should be able
> to reproduce it outside of the kernel and regardless of target
> architecture, right? The macros aren't too much to throw into a file:
>
> ```foo.c
> #define acpi_uintptr_t void *
> #define ACPI_CAST_PTR(t, p) ((t *) (acpi_uintptr_t) (p))
> typedef unsigned char u8;
> typedef unsigned long u64;
> typedef u64 acpi_size;
> #define ACPI_PTR_DIFF(a, b) ((acpi_size) (ACPI_CAST_PTR (u8, (a)) -
> ACPI_CAST_PTR (u8, (b))))
> #define ACPI_OFFSET(d, f) ACPI_PTR_DIFF (&(((d *) 0)->f), (void *) 0)
>
> struct foo {
> unsigned char bar;
> int baz;
> };
>
> int main() {
> return ACPI_OFFSET(struct foo, baz);
> }
> ```
> I think looks right. If we run that through -E, and clean that up
> further, we get:
> ```bar.c
> typedef unsigned char u8;
> typedef unsigned long u64;
>
> struct foo {
> unsigned char bar;
> int baz;
> };
>
> int main() {
> return ((u64) (((u8 *) (void *) ((&(((struct foo *) 0)->baz)))) - ((u8
> *) (void *) (((void *) 0)))));
> }
> ```
> I may be miscounting my parentheses, but how do you take the address
> of `type`->`member`? What does that even mean?
>
> + some more sanitizer folks and Ard for ACPI.
>
> anyways, running foo.c through a compiler:
> $ clang -O2 foo.c -fsanitize=undefined
> $ ./a.out
> foo.c:15:12: runtime error: member access within null pointer of type
> 'struct foo'
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior foo.c:15:12 in
>
> (msg looks truncated, wtf)
>
> Anyways, it looks like the address of member from NULL subexpression
> looks problematic. I wonder if offsetof can be used here?
>
> #define ACPI_OFFSET(d, f) ACPI_PTR_DIFF (offsetof(d, f), (void *) 0)
>
> Seems to work in my basic test case. Untested in the kernel.
>
> IIUC, ACPI_OFFSET is trying to calculate the difference between the
> offset of a member of a struct and 0? Isn't that the tautology `x - 0
> == x`?

No. ACPI_OFFSET() is just a poor person's version of offsetof().

(Note that it calculates the difference between &(((d *) 0)->f) and
(void *)0x0, so the 0x0 term is there on both sides)