Re: arm64/acpi: NULL dereference reports from UBSAN at boot

From: Nick Desaulniers
Date: Mon Jun 01 2020 - 17:52:11 EST


On Mon, Jun 1, 2020 at 12:05 AM Will Deacon <will@xxxxxxxxxx> wrote:
>
> On Wed, May 27, 2020 at 02:41:04PM +0100, Lorenzo Pieralisi wrote:
> > On Tue, May 26, 2020 at 09:21:57PM +0100, Will Deacon wrote:
> > > Hi Lorenzo, Hanjun, [+Nick]
> > >
> > > On Thu, May 21, 2020 at 06:37:38PM +0100, Lorenzo Pieralisi wrote:
> > > > On Thu, May 21, 2020 at 11:09:53AM +0100, Will Deacon wrote:
> > > > > Hi folks,
> > > > >
> > > > > I just tried booting the arm64 for-kernelci branch under QEMU (version
> > > > > 4.2.50 (v4.2.0-779-g4354edb6dcc7)) with UBSAN enabled, and I see a
> > > > > couple of NULL pointer dereferences reported at boot. I think they're
> > > > > both GIC related (log below). I don't see a panic with UBSAN disabled,
> > > > > so something's fishy here.
> > > >
> > > > May I ask you the QEMU command line please - just to make sure I can
> > > > replicate it.
> > >
> > > As it turns out, I'm only able to reproduce this when building with Clang,
> > > but I don't know whether that's because GCC is missing something of Clang
> > > is signalling a false positive. You also don't need all of those whacky
> > > fuzzing options enabled.
> > >
> > > Anyway, to reproduce:
> > >
> > > $ git checkout for-next/kernelci
> > > $ make ARCH=arm64 CC=clang CROSS_COMPILE=aarch64-linux-gnu- defconfig
> > > <then do a menuconfig and enable UBSAN>
> > > $ make ARCH=arm64 CC=clang CROSS_COMPILE=aarch64-linux-gnu- Image
> > >
> > > I throw that at QEMU using:
> > >
> > > qemu-system-aarch64 -M virt -machine virtualization=true \
> > > -machine virt,gic-version=3 \
> > > -cpu max,sve=off -smp 2 -m 4096 \
> > > -drive if=pflash,format=raw,file=efi.img,readonly \
> > > -drive if=pflash,format=raw,file=varstore.img \
> > > -drive if=virtio,format=raw,file=disk.img \
> > > -device virtio-scsi-pci,id=scsi0 \
> > > -device virtio-rng-pci \
> > > -device virtio-net-pci,netdev=net0 \
> > > -netdev user,id=net0,hostfwd=tcp::8222-:22 \
> > > -nographic \
> > > -kernel ~/work/linux/arch/arm64/boot/Image \
> > > -append "earlycon root=/dev/vda2"
> > >
> > > I built QEMU a while ago according to:
> > >
> > > https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/docs/qemu/qemu-arm64-howto.html
> > >
> > > and its version 4.2.50 (v4.2.0-779-g4354edb6dcc7).
> > >
> > > My clang is version 11.0.1.
> >
> > Thanks a lot Will.
> >
> > I *think* I was right - it is the ACPI_OFFSET() macro:
> >
> > #define ACPI_OFFSET(d, f) ACPI_PTR_DIFF (&(((d *) 0)->f), (void *) 0)
> >
> > that triggers the warnings (I suspected it because at least in one of
> > the warnings I could not see any dereference of any dynamically
> > allocated data).
>
> Cheers, Lorenzo.
>
> > Now on what to do with it - thoughts welcome.
>
> Nick -- any idea what to do about the above? The '#define' pasted by
> Lorenzo is causing a couple of spurious UBSAN splats when compiling with
> clang 11.

If there's undefined behavior from that macro soup, we should be able
to reproduce it outside of the kernel and regardless of target
architecture, right? The macros aren't too much to throw into a file:

```foo.c
#define acpi_uintptr_t void *
#define ACPI_CAST_PTR(t, p) ((t *) (acpi_uintptr_t) (p))
typedef unsigned char u8;
typedef unsigned long u64;
typedef u64 acpi_size;
#define ACPI_PTR_DIFF(a, b) ((acpi_size) (ACPI_CAST_PTR (u8, (a)) -
ACPI_CAST_PTR (u8, (b))))
#define ACPI_OFFSET(d, f) ACPI_PTR_DIFF (&(((d *) 0)->f), (void *) 0)

struct foo {
unsigned char bar;
int baz;
};

int main() {
return ACPI_OFFSET(struct foo, baz);
}
```
I think looks right. If we run that through -E, and clean that up
further, we get:
```bar.c
typedef unsigned char u8;
typedef unsigned long u64;

struct foo {
unsigned char bar;
int baz;
};

int main() {
return ((u64) (((u8 *) (void *) ((&(((struct foo *) 0)->baz)))) - ((u8
*) (void *) (((void *) 0)))));
}
```
I may be miscounting my parentheses, but how do you take the address
of `type`->`member`? What does that even mean?

+ some more sanitizer folks and Ard for ACPI.

anyways, running foo.c through a compiler:
$ clang -O2 foo.c -fsanitize=undefined
$ ./a.out
foo.c:15:12: runtime error: member access within null pointer of type
'struct foo'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior foo.c:15:12 in

(msg looks truncated, wtf)

Anyways, it looks like the address of member from NULL subexpression
looks problematic. I wonder if offsetof can be used here?

#define ACPI_OFFSET(d, f) ACPI_PTR_DIFF (offsetof(d, f), (void *) 0)

Seems to work in my basic test case. Untested in the kernel.

IIUC, ACPI_OFFSET is trying to calculate the difference between the
offset of a member of a struct and 0? Isn't that the tautology `x - 0
== x`?
--
Thanks,
~Nick Desaulniers