Re: AMD Athlon bogus performance value causing RCU stalls?

From: Thomas Gleixner
Date: Sun Sep 23 2018 - 17:19:13 EST


On Sun, 23 Sep 2018, Rob Prowel wrote:
> Sep 23 01:51:28 files kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
> Sep 23 01:51:28 files kernel: 1-...!: (0 ticks this GP) idle=27c/0/0
> softirq=35425/35425 fqs=0
> Sep 23 01:51:28 files kernel: (detected by 0, t=60009 jiffies,
> g=20812, c=20811, q=121)
> Sep 23 01:51:28 files kernel: Sending NMI from CPU 0 to CPUs 1:
> Sep 23 01:51:28 files kernel: NMI backtrace for cpu 1 skipped: idling at
> native_safe_halt+0x2/0x10
> Sep 23 01:51:28 files kernel: rcu_sched kthread starved for 60009 jiffies!
> g20812 c20811 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
> Sep 23 01:51:28 files kernel: RCU grace-period kthread stack dump:
> Sep 23 01:51:28 files kernel: rcu_sched I 0 10 2 0x80000000
> Sep 23 01:51:33 files kernel: Call Trace:
> Sep 23 01:51:33 files kernel: ? __schedule+0x25c/0x860
> Sep 23 01:51:33 files kernel: schedule+0x28/0x80
> Sep 23 01:51:33 files kernel: schedule_timeout+0x174/0x370
> Sep 23 01:51:33 files kernel: ? __next_timer_interrupt+0xc0/0xc0
> Sep 23 01:51:33 files kernel: rcu_gp_kthread+0x4b6/0x8c0
> Sep 23 01:51:33 files kernel: ?
> _synchronize_rcu_expedited.constprop.68+0x310/0x310
> Sep 23 01:51:33 files kernel: kthread+0x113/0x130
> Sep 23 01:51:33 files kernel: ? kthread_create_worker_on_cpu+0x70/0x70
> Sep 23 01:51:33 files kernel: ret_from_fork+0x35/0x40
>
> -----------------------------------------------------------------------
>
> The kernel reported bogoMIPS for the cores are as follows:
>
> $ grep bogo /proc/cpuinfo
> bogomips : 4219.49
> bogomips : 184253.06
> $
>
> What is that value for the second Athlon core (seems extremely bogus), and
> would/could that be the reason for the schedule_timeouts? This bogus value
> also shows up in the bootup log when the second core is activated. Seems to
> be AMD specific, as the values are correct on my Xeon machines.

That's a 32bit machine I assume.

> Kernel is a stock Fedora 4.18.7-100 release. Machine is an old Dell Experion
> that I've repurposed as a fileserver and postgresql machine.
>
> Other than RTFM, or please build a bunch of kernels from source on your slow
> machine, using differing config options to help track down the cause of
> this...any thoughts about a solution?

Yes. This was decoded recently as an issue on 32bit due to a calculation
which is based on 'unsigned long' but requires to be 64bit wide.

It's in the 4.18.8 stable kernel, which should be available from your
fedora repo anytime soon.

Thanks,

tglx