[BUG] bad runqueue clock/ktime value at init?

From: Nicolas Morey-Chaisemartin
Date: Tue Nov 29 2016 - 11:41:54 EST


Hi everyone,


After upgrading my worksation (ASUS Rampage IV GENE motherboard,
Core(TM) i7-3820) to a kernel >= 4.6, I noticed bad performances and
htop showing "0" CPU usage on all processes.
/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq showed <unknown>
for all cores.

Bisect pointed me to a commit in cpufreq that has nothing to do with
the initial issue but caused the unknown to appear and all those
symptoms.

Adding some logs showed that intel_pstate_update_util was called using
always the same time parameter (one per core, but always the same one
on the core).

I compiled a small module to regularly dump sched_cpu_clock() value in
dmesg, and its value was smaller than the one provided to
intel_pstate_update_util.

>From what I could understand, the runqueue clock is monotonic and
computed using sched_cpu_clock().
After a while (when sched_cpu_clock() becomes greater than the
runqueue clock which took between 10 and 30 minutes), things go back
to normal, and cpufreq gets working again.

Looking into dmesg, it seems the TSC was broken on my system (BIOS
issue), so the kernel is using another source for clocks.

Is it expected in this case (no TSC) that sched_cpu_clock "rewinds"
sometime after the boot?

Upgrading the BIOS fixed the TSC issue and solved the bug for me. So
this is not critical, but I've seen a few posts here and there about
people that hit the same bug.

Nicolas

P.S.: I have other workstation with the former BIOS version so I can
try out patches and give more info if needed.,