Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

From: Thomas Gleixner
Date: Thu Aug 01 2019 - 03:28:02 EST


Daniel,

On Thu, 1 Aug 2019, Daniel Drake wrote:
> Working with a new consumer laptop based on AMD R7-3700U, we are
> seeing a kernel panic during early boot (before the display
> initializes). It's a new product and there is no previous known
> working kernel version (tested 5.0, 5.2 and current linus master).
>
> We may have also seen this problem on a MiniPC based on AMD APU 7010
> from another vendor, but we don't have it in hands right now to
> confirm that it's the exact same crash.
>
> earlycon shows the details: a NULL dereference under
> setup_boot_APIC_clock(), which actually happens in
> calibrate_APIC_clock():
>
> /* Replace the global interrupt handler */
> real_handler = global_clock_event->event_handler;
> global_clock_event->event_handler = lapic_cal_handler;
>
> global_clock_event is NULL here. This is a "reduced hardware" ACPI
> platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
> avoiding the usual codepaths that would set up global_clock_event.
>
> I tried the obvious:
> if (!global_clock_event)
> return -1;
>
> However I'm probably missing part of the big picture here, as this
> only makes boot fail later on. It continues til the next point that
> something leads to schedule(), such as a driver calling msleep() or
> mark_readonly() calling rcu_barrier(), etc. Then it hangs.
>
> Is something missing in terms of timer setup here? Suggestions
> appreciated...

So that trips over the problem that there is no timer to calibrate against
and the LAPIC freuency is obviously unknown.

How is the kernel supposed to figure that out?

The only possible option in that case is to use RTC, but we have no support
for this at all.

Thanks,

tglx