Re: [REGRESSION] ? system is stuck in clocksource, >60s delay at boot time without tsc=unstable

From: Thomas Gleixner
Date: Wed Jan 15 2025 - 11:59:29 EST


On Sat, Jan 04 2025 at 23:02, Fab Stz wrote:
> Le 03/01/2025 à 20:02, John Stultz a écrit :
> When building the kernel from the sources from the stable repo of the
> kernel to try a git bisect I couldn't reproduce a case where the warning
> is before loading '/init' with the versions I mentioned as working.
> Maybe I was just lucky as you mentioned. If the warning comes before the
> loading of USB modules, there is no delay. If it comes after, there is a
> delay.

This is a timing problem, which depends on kernel configuration and
run-time differences, but that's just a symptom. It explains why you are
seeing it sometimes and sometimes not. Nothing else.

> If I break/pause at the beginning of the /init script, the warning never
> comes before. I don't really understand what is happening and where the
> problem actually lies (kernel? systemd? udev? somewhere else?). If I add
> a "sleep 5" as 1st command in "/init" it would take ages. So as long as
> the warning from the clocksource is not displayed, the delays seem
> completely wrong.

That's an interesting data point because that 'sleep 5' puts the system
into idle and probably into deep idle for the first time during boot.

> Maybe the USB drivers somehow rely on a reliable clock source for
> proper functioning.

The kernel relies on a reliable clocksource. Loading the USB driver merely
exposes the problem because it probably causes a long enough delay to
get the CPUs into a state which exposes the issue.

AFAICT, that iMac 9.1 is Core 2 Duo based and that generation of
processors definitely had issues with the TSC in deeper idle states.

> BTW, I tried the "processor.max_cstate=1" you mentioned but it didn't
> change anything on the delay and/or warning.

That's weird, but we have no idea what kind of magic the BIOS implements
there for power management behind the kernels back. I assume that it
does because this generation of CPUs uses the ACPI processor idle driver
and that disables TSC when it detects that the system supports
C-states > 1.

# cat /sys/devices/system/cpu/cpuidle/

tells which idle driver is actually in use.

# ls /sys/devices/system/cpu/cpu0/cpuidle/

tells which states are supported by the driver

# cat /sys/devices/system/cpu/cpu0/cpuidle/state$N/name
# cat /sys/devices/system/cpu/cpu0/cpuidle/state$N/disable

tells the actual C-state name and the disabled state, but I expect that
there is nothing to see.

Can you try 'idle=halt' instead?

Thanks,

tglx