Re: BUG: rcu_sched detected stalls on CPUs

From: mitko
Date: Mon May 22 2017 - 07:57:07 EST

On 2017-05-10 17:32, Paul E. McKenney wrote:
On Wed, May 10, 2017 at 02:00:32PM +0300, mitko@xxxxxxxxxx wrote:
On 2017-05-09 17:49, Paul E. McKenney wrote:
>On Tue, May 09, 2017 at 10:53:25AM +0300, mitko@xxxxxxxxxx wrote:
>>On 2017-05-08 16:24, Paul E. McKenney wrote:
>>>On Mon, May 08, 2017 at 09:43:15AM +0300, mitko@xxxxxxxxxx wrote:
>>>>On 2017-01-20 17:19, Steven Rostedt wrote:
>>>>>On Fri, 20 Jan 2017 10:43:50 +0200
>>>>>mitko@xxxxxxxxxx wrote:
>>>>>>[1.] One line summary of the problem:
>>>>>>rcu_sched detected stalls on CPUs and few minutes server not respond.
>>>>>Is this reproducible? Or was this a one time ordeal?
>>>>>>[2.] Full description of the problem/report:
>>>>>>Load of my server (postgres database) isnt big less then 0.50 and when
>>>>>>error occured rcu_sched detected stalls on CPUs
>>>>>>server freeze and nothing is work for 3-5 minute.
>>>>>>No network, no video signal, no keyboard, no mouse. Nothing is worked.
>>>>>>After these few minutes everything continue normal.
>>>>>>This usual is happend once per day. When I check in google find a lots
>>>>>>of ppl complain of this error, but no solution.
>>>>>>Do any one know can help me to resolve it ? I spoke with few
>>>>>>friends and
>>>>>>they trying to convince me the problem is in CPU.
>>>>>>I did not believe after a 3 years working CPU suddenly stop working
>>>>>>correctly, but I might be wrong.
>>>>>>[3.] Keywords (i.e., modules, networking, kernel):
>>>>>>[4.] Kernel information
>>>>>>[4.1.] Kernel version (from /proc/version):
>>>>>>Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
>>>>>>SMP Sun
>>>>>>Dec 11 16:11:02 CST 2016
>>>>>Have you tried a newer version of the kernel?
>>>>>-- Steve
>>>>yesterday I change to new kernel: 4.9.26 and still no effect.
>>>>I trying to figure out what I need to buy because I read on google a
>>>>lots of posts about that problem,
>>>>some ppl suggest is BIOS firmware bug, some ppl tell that when they
>>>>swap CPU problem is resolved.
>>>>May be problem was started when I first time boot 4.x kernel and
>>>>there have cpu microcode updates.
>>>>Im 3.x kernels this feature was not in kernel, but I don't know,
>>>>only guess.
>>>>Can some one point me clearly: You need to change this one and
>>>>problem will be solved ?
>>>Looking back at your earlier email, I see the following:
>>>[Wed Jan 4 10:19:12 2017] rcu_sched kthread starved for 60004
>>>jiffies! g61967937 c61967936 f0x0 s3 ->state=0x1
>>>That indicates that the task "rcu_sched" isn't being allowed to run.
>>>The "->state=0x1" indicates that this task is currently asleep and the
>>>"s3" indicates that it is looking to start a grace period, but has not
>>>progressed through this task. My guess is that your system get very
>>>busy, and that rcu_sched's normal scheduling priority is not sufficient
>>>for it to be allowed to run.
>>>I therefore suggest that you try raising the rcu_sched task's priority.
>>>You can find this task like this:
>>> ps -ef | grep rcu_sched
>>>On my laptop, I get this:
>>> $ ps -ef | grep rcu_sched
>>> root 7 2 0 Apr11 ? 00:19:19 [rcu_sched]
>>> paulmck 18307 22926 0 06:11 pts/35 00:00:00 grep --color=auto
>>>You can use the chrt command to set the priority. For example,
>>>given that
>>>on my laptop, rcu_sched is PID 7:
>>> sudo chrt -f -p 1 7
>>>You can double-check as follows:
>>> $ chrt -p 7
>>> pid 7's current scheduling policy: SCHED_FIFO
>>> pid 7's current scheduling priority: 1
>>>Does that help?
>>> Thanx, Paul
>>interesting ... I change priority as you suggest and will keep you
>>in touch
>>if the problem happens again because I cant reproduce it. Replay on
>>your next email, I don't have any
>>virtual servers / guest os on the server. It's server with only one
>>process - redis server
>>and in this time when error it's happened actually one of other
>>servers start loading data
>>into redis sever. Data is small 3-4 mil. keys every key contain
>>around 150 symbols.
>>I have that problem on two servers, next server is database server
>>with postgresql
>>and that problem happened more often, usual once per day or for few
>>I never play before with changing priority of processes. When I check
>>iLO log I saw logged error: power lost but server does not lost the
>>power actually,
>>it's just stop respond for 2-3 minutes and then continue normal
>>operation with
>>uptime more then 100 days.
>Does redis keep any useful log? If so, it would be worth looking into
>what it thought that it was doing.


only standard logs:

1333:M 06 May 04:36:43.019 * 10000 changes in 60 seconds. Saving...
1333:M 06 May 04:36:43.025 * Background saving started by pid 4325
4325:C 06 May 04:36:46.745 * DB saved on disk
4325:C 06 May 04:36:46.749 * RDB: 40 MB of memory used by copy-on-write
1333:M 06 May 04:36:46.825 * Background saving terminated with success
1333:M 06 May 04:37:47.018 * 10000 changes in 60 seconds. Saving...
1333:M 06 May 04:37:47.026 * Background saving started by pid 4326
4326:C 06 May 04:37:50.902 * DB saved on disk
4326:C 06 May 04:37:50.907 * RDB: 208 MB of memory used by copy-on-write
1333:M 06 May 04:37:50.926 * Background saving terminated with success

Don't know if that information is valued, but before loading data
server does not
do anything for last 3-4 hours and probably kernel put some
processes in sleeping mode
then data is begin loaded. Logs tell us
10k changes for few seconds until all data of 3-4 mil keys is loaded.
May be, as you suggest kernel can not start the processes so quickly.
Last night I does not have that problem, but for the moment I can
not confirm
the suggested solution is worked.

If the problem was happening every few days, it will take a few weeks
before we can have much confidence that this change helped. I was
hoping that the event triggering the problem was somehow visible.

Depending on what your users expect, it might still be necessary to
work out the root cause of what appears to be the busy period, but
I cannot help you much with that. One approach is to use monitoring
tools or perhaps collect some ftrace data (this last if the busy
period is mostly kernel activity).

Thanx, Paul


> Thanx, Paul
>>Thanks for your suggestion.
>>>>[239940.067938] clocksource: timekeeping watchdog on CPU26: Marking
>>>>clocksource 'tsc' as unstable because the skew is too large:
>>>>[239940.067943] clocksource: 'hpet' wd_now:
>>>>ecb521ce wd_last: ca45912d mask: ffffffff
>>>>[239940.067946] clocksource: 'tsc' cs_now:
>>>>2d12df5f88d08 cs_last: 29fffbe2d36d6 mask: ffffffffffffffff
>>>>[239940.068357] clocksource: Switched to clocksource hpet
>>>>[240000.066457] INFO: rcu_sched detected stalls on CPUs/tasks:
>>>>[240000.066488] 2-...: (38 GPs behind) idle=f1a/0/0
>>>>softirq=2417028/2417028 fqs=0
>>>>[240000.066491] 3-...: (27 GPs behind) idle=0bc/0/0
>>>>softirq=3076046/3076047 fqs=0
>>>>[240000.066494] 4-...: (1006 GPs behind) idle=308/0/0
>>>>softirq=1474922/1474922 fqs=0
>>>>[240000.066497] 5-...: (8034 GPs behind) idle=7b4/0/0
>>>>softirq=69165/69166 fqs=0
>>>>[240000.066499] 6-...: (1665 GPs behind) idle=716/0/0
>>>>softirq=16465/16465 fqs=0
>>>>[240000.066502] 7-...: (8034 GPs behind) idle=bd0/0/0
>>>>softirq=15923/15924 fqs=0
>>>>[240000.066505] 8-...: (1665 GPs behind) idle=a62/0/0
>>>>softirq=93861/93861 fqs=0
>>>>[240000.066507] 9-...: (1258 GPs behind) idle=a8c/0/0
>>>>softirq=21608/21608 fqs=0
>>>>[240000.066510] 11-...: (1665 GPs behind) idle=8e6/0/0
>>>>softirq=18844/18844 fqs=0
>>>>[240000.066512] 13-...: (1665 GPs behind) idle=ab8/0/0
>>>>softirq=21278/21278 fqs=0
>>>>[240000.066515] 14-...: (4854 GPs behind) idle=578/0/0
>>>>softirq=25624/25626 fqs=0
>>>>[240000.066517] 15-...: (8034 GPs behind) idle=f5e/0/0
>>>>softirq=18114/18115 fqs=0
>>>>[240000.066520] 16-...: (1665 GPs behind) idle=b66/0/0
>>>>softirq=19511/19511 fqs=0
>>>>[240000.066528] 17-...: (0 ticks this GP) idle=e18/0/0
>>>>softirq=16163/16163 fqs=0
>>>>[240000.066531] 18-...: (1665 GPs behind) idle=a18/0/0
>>>>softirq=16424/16424 fqs=0
>>>>[240000.066533] 19-...: (1665 GPs behind) idle=944/0/0
>>>>softirq=17920/17920 fqs=0
>>>>[240000.066549] 20-...: (20991 GPs behind) idle=750/0/0
>>>>softirq=17549/17552 fqs=0
>>>>[240000.066552] 21-...: (0 ticks this GP) idle=9e0/0/0
>>>>softirq=18937/18937 fqs=0
>>>>[240000.066555] 22-...: (14519 GPs behind) idle=95e/0/0
>>>>softirq=16122/16124 fqs=0
>>>>[240000.066557] 23-...: (1665 GPs behind) idle=bbc/0/0
>>>>softirq=16316/16316 fqs=0
>>>>[240000.066560] 24-...: (0 ticks this GP) idle=c56/0/0
>>>>softirq=20070/20070 fqs=0
>>>>[240000.066563] 27-...: (8034 GPs behind) idle=bdc/0/0
>>>>softirq=18291/18292 fqs=0
>>>>[240000.066566] 28-...: (8034 GPs behind) idle=4f8/0/0
>>>>softirq=22313/22314 fqs=0
>>>>[240000.066568] 29-...: (3443 GPs behind) idle=22a/0/0
>>>>softirq=24975/24976 fqs=0
>>>>[240000.066571] 30-...: (1665 GPs behind) idle=940/0/0
>>>>softirq=89382/89382 fqs=0
>>>>[240000.066572] (detected by 10, t=60002 jiffies, g=4843721,
>>>>c=4843720, q=659)
>>>>[240000.066578] Task dump for CPU 2:
>>>>[240000.066581] swapper/2 R running task 0 0
>>>>1 0x00200000
>>>>[240000.066588] 0000000000000083 ffffffffa5cecd79 ffff8aca3eea1e00
>>>>[240000.066593] 0000000000000005 ffffffffa64cad38 ffffa96c80073eb0
>>>>[240000.066597] ffff8aca3eea1e00 000000000e94fec6 ffff8aca38535940
>>>>[240000.066601] Call Trace:
>>>>[240000.066617] [<ffffffffa5cecd79>] ? intel_idle+0x109/0x110
>>>>[240000.066625] [<ffffffffa5ba9d93>] ?
>>>>[240000.066629] [<ffffffffa5ba9f77>] ? cpuidle_enter+0x17/0x20
>>>>[240000.066638] [<ffffffffa50d5d73>] ? call_cpuidle+0x23/0x40
>>>>[240000.066641] [<ffffffffa50d5fd7>] ? cpu_startup_entry+0x147/0x220
>>>>[240000.066647] [<ffffffffa503ecfd>] ? start_secondary+0x12d/0x140
>>>>[240000.066650] Task dump for CPU 3:
>>>>[240000.066651] swapper/3 R running task 0 0
>>>>1 0x00200000

Hello again,
it's seems solution that you suggest is worked fine.
More then 10 days, usual after this :
[97631.792300] clocksource: timekeeping watchdog on CPU8: Marking clocksource 'tsc' as unstable because the skew is too large:
[97631.792304] clocksource: 'hpet' wd_now: 804486e7 wd_last: 79f06045 mask: ffffffff
[97631.792307] clocksource: 'tsc' cs_now: b9983cc93374e cs_last: b9868fb1a9b88 mask: ffffffffffffffff
[97631.792616] clocksource: Switched to clocksource hpet

I got the problem INFO: rcu_sched detected stalls on CPUs/tasks, but no error for now.

I think problem is resolved, but if something happened ill send you email.

Best regards and many thanks,
Mitko Dimitrov