Re: Subject: Warning in workqueue.c

From: Jason J. Herne
Date: Thu Feb 13 2014 - 12:58:58 EST


On 02/12/2014 10:31 PM, Lai Jiangshan wrote:
On 02/12/2014 11:18 PM, Jason J. Herne wrote:

Could you use the following patch for test if Tejun doesn't give you a new one.

Lai,

Here is the output using the patch you asked me to run with.

[ 5779.795687] ------------[ cut here ]------------
[ 5779.795695] WARNING: at kernel/workqueue.c:2159
[ 5779.795698] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi tape_3590 qeth_l2 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[ 5779.795733] CPU: 4 PID: 270 Comm: kworker/5:1 Not tainted 3.14.0-rc1 #1
[ 5779.795738] task: 0000000001938000 ti: 00000000f4d9c000 task.ti: 00000000f4d9c000
[ 5779.795750] Krnl PSW : 0404c00180000000 000000000015b452 (process_one_work+0x666/0x688)
[ 5779.795756] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 000003210f9db000 0000000000bc2a52 0000000001b640c0 0000000000000001
[ 5779.795757] 0000000000000000 0000000000000004 0000000000000005 00000000ffffffff
[ 5779.795759] 0000000000000000 0000000084a43500 0000000084a3f000 0000000084a3f018
[ 5779.795763] 0000000001b640c0 0000000000735d18 00000000f4d9fdc8 00000000f4d9fd50
[ 5779.795781] Krnl Code: 000000000015b444: dd1a9640c05b trt 1600(27,%r9),91(%r12)
000000000015b44a: a7f4fd9e brc 15,15af86
#000000000015b44e: a7f40001 brc 15,15b450
>000000000015b452: 92011000 mvi 0(%r1),1
000000000015b456: a7f4fe63 brc 15,15b11c
000000000015b45a: c03000533af9 larl %r3,bc2a4c
000000000015b460: 95003000 cli 0(%r3),0
000000000015b464: a774ff3e brc 7,15b2e0
[ 5779.795810] Call Trace:
[ 5779.795814] ([<000000000015b0ea>] process_one_work+0x2fe/0x688)
[ 5779.795817] [<000000000015ba62>] worker_thread+0x1a6/0x3d4
[ 5779.795822] [<00000000001648c2>] kthread+0x10e/0x128
[ 5779.795828] [<0000000000728ed6>] kernel_thread_starter+0x6/0xc
[ 5779.795832] [<0000000000728ed0>] kernel_thread_starter+0x0/0xc
[ 5779.795834] Last Breaking-Event-Address:
[ 5779.795837] [<000000000015b44e>] process_one_work+0x662/0x688
[ 5779.795840] ---[ end trace 8b6353b0f2821ec9 ]---
[ 5779.795844] XXX: worker->flags=0x1 pool->flags=0x0 cpu=4 pool->cpu=5(1) rescue_wq= (null)
[ 5779.795848] XXX: last_unbind=-44 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
[ 5779.795852] XXX: sleep=-39 wakeup=0
[ 5779.795855] XXX: cpus_allowed=5
[ 5779.795857] XXX: cpus_allowed_after_rebinding=5
[ 5779.795861] XXX: after schedule(), cpu=4

You had asked about reproducing this. This is on the S390 platform, I'm not sure if that makes any difference.

The workload is:
2 processes onlining random cpus in a tight loop by using 'echo 1 > /sys/bus/cpu.../online'
2 processes offlining random cpus in a tight loop by using 'echo 0 > /sys/bus/cpu.../online'
Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27

The machine has 10 processors.
The warning message some times hits within a few minutes on starting the workload. Other times it takes several hours.

Please let me know if you have further questions.

--
-- Jason J. Herne (jjherne@xxxxxxxxxxxxxxxxxx)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/