Re: [PATCH kernel] commit 4fbdf9cb ("lpfc: Fix for lun discovery issue with saturn adapter.")

From: Alexey Kardashevskiy
Date: Tue Apr 28 2015 - 06:36:21 EST


On 04/28/2015 07:18 PM, Sebastian Herbszt wrote:
Alexey Kardashevskiy wrote:
This reverts 4fbdf9cb is breaks LPFC on POWER7 machine, big endian kernel.

This is the hardware used for verification:
0005:01:00.0 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter [10df:f100] (rev 03)
0005:01:00.1 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter [10df:f100] (rev 03)

Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>

This issue is not specific to POWER7. I hit it on x86 [1] and James
promised to look at it.

[1] http://marc.info/?l=linux-scsi&m=142938432414173

Sebastian

Well, I hope so, I just wanted to be more specific and the fault looks much different (and much cooler! :) ) on my hardware (it actually enters an infinite loop of oops'es):



Welcome to Fedora 20 (Heisenbug)!

INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched self-detected stall on CPU


1: (2100 ticks this GP) idle=981/140000000000001/0 softirq=234/234 fqs
=2083
2: (2100 ticks this GP) idle=c3d/140000000000001/0 softirq=259/259 fqs
=2083

(t=2100 jiffies g=-7 c=-8 q=11820)
(t=2100 jiffies g=-7 c=-8 q=11820)
Task dump for CPU 0:
kworker/u97:0 R running task 8192 7 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ffa29ef80] [c000000ffa29f060] 0xc000000ffa29f060 (unreliable)
Task dump for CPU 1:
kworker/u97:2 R running task 10304 1636 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ff2fd2f80] [c000000ff2fd3060] 0xc000000ff2fd3060 (unreliable)
Task dump for CPU 2:
kworker/u97:1 R running task 8288 1633 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ff2f92eb0] [c0000000000cf610] .sched_show_task+0xf0/0x180 (unreliable)
[c000000ff2f92f30] [c0000000001041d8] .rcu_dump_cpu_stacks+0xd8/0x150
[c000000ff2f92fd0] [c000000000108794] .rcu_check_callbacks+0x674/0x990
[c000000ff2f93110] [c00000000010e994] .update_process_times+0x44/0x90
[c000000ff2f93190] [c0000000001223f0] .tick_sched_handle.isra.16+0x20/0xa0
[c000000ff2f93210] [c0000000001224cc] .tick_sched_timer+0x5c/0xb0
[c000000ff2f932b0] [c00000000010f108] .__run_hrtimer+0x98/0x260
[c000000ff2f93350] [c00000000010fff8] .hrtimer_interrupt+0x138/0x2f0
[c000000ff2f93460] [c00000000001be1c] .__timer_interrupt+0x8c/0x230
[c000000ff2f93500] [c00000000001c488] .timer_interrupt+0x98/0xd0
[c000000ff2f93580] [c0000000000025d0] decrementer_common+0x150/0x180
--- interrupt: 901 at .string_get_size+0x120/0x250
LR = .sd_revalidate_disk+0x57c/0x1c10
[c000000ff2f93870] [c00000000048f84c] .string_get_size+0x18c/0x250 (unreliable
)
[c000000ff2f93940] [c0000000005e7c1c] .sd_revalidate_disk+0x57c/0x1c10
[c000000ff2f93a70] [c0000000005e951c] .sd_probe_async+0xac/0x230
[c000000ff2f93b00] [c0000000000c28ec] .async_run_entry_fn+0x6c/0x180
[c000000ff2f93ba0] [c0000000000b7b78] .process_one_work+0x1a8/0x4a0
[c000000ff2f93c40] [c0000000000b7ff0] .worker_thread+0x180/0x5a0
[c000000ff2f93d30] [c0000000000bee08] .kthread+0x108/0x130
[c000000ff2f93e30] [c000000000009590] .ret_from_kernel_thread+0x58/0xc8
Task dump for CPU 0:
kworker/u97:0 R running task 8192 7 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ffa29ef80] [c000000ffa29f060] 0xc000000ffa29f060 (unreliable)
Task dump for CPU 1:
kworker/u97:2 R running task 9488 1636 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ff2fd2eb0] [c0000000000cf610] .sched_show_task+0xf0/0x180 (unreliable)
[c000000ff2fd2f30] [c0000000001041d8] .rcu_dump_cpu_stacks+0xd8/0x150
[c000000ff2fd2fd0] [c000000000108794] .rcu_check_callbacks+0x674/0x990
[c000000ff2fd3110] [c00000000010e994] .update_process_times+0x44/0x90
[c000000ff2fd3190] [c0000000001223f0] .tick_sched_handle.isra.16+0x20/0xa0
[c000000ff2fd3210] [c0000000001224cc] .tick_sched_timer+0x5c/0xb0
[c000000ff2fd32b0] [c00000000010f108] .__run_hrtimer+0x98/0x260
[c000000ff2fd3350] [c00000000010fff8] .hrtimer_interrupt+0x138/0x2f0
[c000000ff2fd3460] [c00000000001be1c] .__timer_interrupt+0x8c/0x230
[c000000ff2fd3500] [c00000000001c488] .timer_interrupt+0x98/0xd0
[c000000ff2fd3580] [c0000000000025d0] decrementer_common+0x150/0x180
--- interrupt: 901 at .string_get_size+0x110/0x250
LR = .sd_revalidate_disk+0x57c/0x1c10
[c000000ff2fd3870] [c00000000048f84c] .string_get_size+0x18c/0x250 (unreliable
)
[c000000ff2fd3940] [c0000000005e7c1c] .sd_revalidate_disk+0x57c/0x1c10
[c000000ff2fd3a70] [c0000000005e951c] .sd_probe_async+0xac/0x230
[c000000ff2fd3b00] [c0000000000c28ec] .async_run_entry_fn+0x6c/0x180
[c000000ff2fd3ba0] [c0000000000b7b78] .process_one_work+0x1a8/0x4a0
[c000000ff2fd3c40] [c0000000000b7ff0] .worker_thread+0x180/0x5a0
[c000000ff2fd3d30] [c0000000000bee08] .kthread+0x108/0x130
[c000000ff2fd3e30] [c000000000009590] .ret_from_kernel_thread+0x58/0xc8
Task dump for CPU 2:
kworker/u97:1 R running task 8288 1633 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ff2f92f80] [c000000ff2f93060] 0xc000000ff2f93060 (unreliable)

0: (2098 ticks this GP) idle=155/140000000000001/0 softirq=477/477 fqs
=2083
(t=2100 jiffies g=-7 c=-8 q=11820)
Task dump for CPU 0:
kworker/u97:0 R running task 8192 7 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ffa29eeb0] [c0000000000cf610] .sched_show_task+0xf0/0x180 (unreliable)
[c000000ffa29ef30] [c0000000001041d8] .rcu_dump_cpu_stacks+0xd8/0x150
[c000000ffa29efd0] [c000000000108794] .rcu_check_callbacks+0x674/0x990
[c000000ffa29f110] [c00000000010e994] .update_process_times+0x44/0x90
[c000000ffa29f190] [c0000000001223f0] .tick_sched_handle.isra.16+0x20/0xa0
[c000000ffa29f210] [c0000000001224cc] .tick_sched_timer+0x5c/0xb0
[c000000ffa29f2b0] [c00000000010f108] .__run_hrtimer+0x98/0x260
[c000000ffa29f350] [c00000000010fff8] .hrtimer_interrupt+0x138/0x2f0
[c000000ffa29f460] [c00000000001be1c] .__timer_interrupt+0x8c/0x230
[c000000ffa29f500] [c00000000001c488] .timer_interrupt+0x98/0xd0
[c000000ffa29f580] [c0000000000025d0] decrementer_common+0x150/0x180
--- interrupt: 901 at .string_get_size+0x118/0x250
LR = .sd_revalidate_disk+0x57c/0x1c10
[c000000ffa29f870] [c00000000048f84c] .string_get_size+0x18c/0x250 (unreliable
)
[c000000ffa29f940] [c0000000005e7c1c] .sd_revalidate_disk+0x57c/0x1c10
[c000000ffa29fa70] [c0000000005e951c] .sd_probe_async+0xac/0x230
[c000000ffa29fb00] [c0000000000c28ec] .async_run_entry_fn+0x6c/0x180
[c000000ffa29fba0] [c0000000000b7b78] .process_one_work+0x1a8/0x4a0
[c000000ffa29fc40] [c0000000000b7ff0] .worker_thread+0x180/0x5a0
[c000000ffa29fd30] [c0000000000bee08] .kthread+0x108/0x130
[c000000ffa29fe30] [c000000000009590] .ret_from_kernel_thread+0x58/0xc8
Task dump for CPU 1:
kworker/u97:2 R running task 9488 1636 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ff2fd2f80] [c000000ff2fd3060] 0xc000000ff2fd3060 (unreliable)
Task dump for CPU 2:
kworker/u97:1 R running task 8288 1633 2 0x00000804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c000000ff2f92f80] [c000000ff2f93060] 0xc000000ff2f93060 (unreliable)
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/u97:2:1636]
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u97:0:7]
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u97:1:1633]
Modules linked in:
Modules linked in: autofs4
autofs4
lpfc
lpfc


CPU: 0 PID: 7 Comm: kworker/u97:0 Not tainted 4.1.0-rc1-be-aik #470
CPU: 2 PID: 1633 Comm: kworker/u97:1 Not tainted 4.1.0-rc1-be-aik #470
Workqueue: events_unbound .async_run_entry_fn
Workqueue: events_unbound .async_run_entry_fn


task: c000000ff3588f00 ti: c000000ffa29c000 task.ti: c000000ffa29c000
task: c000000ff2f56580 ti: c000000ff2f90000 task.ti: c000000ff2f90000
NIP: c00000000048f7e0 LR: c0000000005e7c1c CTR: 0000000000000000
NIP: c00000000048f7e0 LR: c0000000005e7c1c CTR: 0000000000000000
REGS: c000000ffa29f5f0 TRAP: 0901 Not tainted (4.1.0-rc1-be-aik)
REGS: c000000ff2f935f0 TRAP: 0901 Not tainted (4.1.0-rc1-be-aik)
MSR: 9000000000009032
MSR: 9000000000009032
<
<
SF
SF
,HV
,HV
,EE
,EE
,ME
,ME
,IR
,IR
,DR
,DR
,RI
,RI
>
>
CR: 48008028 XER: 00000000
CR: 48008028 XER: 00000000
CFAR: c00000000048f7e8
CFAR: c00000000048f7e8
SOFTE: 1
SOFTE: 1

GPR00:

GPR00:
c0000000005e7c1c
c0000000005e7c1c
c000000ffa29f870
c000000ff2f93870
c000000000e8c5a8
c000000000e8c5a8
0000000000000000
0000000000000000

GPR04:

GPR04:
0000000000000200
0000000000000200
0000000000000000
0000000000000000
0000000000000200
0000000000000200
000000000000000a
000000000000000a

GPR08:

GPR08:
0000000000000000
0000000000000000
00000000000003e8
00000000000003e8
0000000000000000
0000000000000000
000000002eb72fa3
ffffffffe5dd553e

GPR12:

GPR12:
0000000028008028
0000000028008028
c00000000fdc0000
c00000000fdc0900


NIP [c00000000048f7e0] .string_get_size+0x120/0x250
NIP [c00000000048f7e0] .string_get_size+0x120/0x250
LR [c0000000005e7c1c] .sd_revalidate_disk+0x57c/0x1c10
LR [c0000000005e7c1c] .sd_revalidate_disk+0x57c/0x1c10
Call Trace:
Call Trace:
[c000000ffa29f870] [c00000000048f84c] .string_get_size+0x18c/0x250
[c000000ff2f93870] [c00000000048f84c] .string_get_size+0x18c/0x250
(unreliable)
(unreliable)


[c000000ffa29f940] [c0000000005e7c1c] .sd_revalidate_disk+0x57c/0x1c10
[c000000ff2f93940] [c0000000005e7c1c] .sd_revalidate_disk+0x57c/0x1c10


[c000000ffa29fa70] [c0000000005e951c] .sd_probe_async+0xac/0x230
[c000000ff2f93a70] [c0000000005e951c] .sd_probe_async+0xac/0x230


[c000000ffa29fb00] [c0000000000c28ec] .async_run_entry_fn+0x6c/0x180
[c000000ff2f93b00] [c0000000000c28ec] .async_run_entry_fn+0x6c/0x180


[c000000ffa29fba0] [c0000000000b7b78] .process_one_work+0x1a8/0x4a0
[c000000ff2f93ba0] [c0000000000b7b78] .process_one_work+0x1a8/0x4a0


[c000000ffa29fc40] [c0000000000b7ff0] .worker_thread+0x180/0x5a0
[c000000ff2f93c40] [c0000000000b7ff0] .worker_thread+0x180/0x5a0


[c000000ffa29fd30] [c0000000000bee08] .kthread+0x108/0x130
[c000000ff2f93d30] [c0000000000bee08] .kthread+0x108/0x130


[c000000ffa29fe30] [c000000000009590] .ret_from_kernel_thread+0x58/0xc8
[c000000ff2f93e30] [c000000000009590] .ret_from_kernel_thread+0x58/0xc8

Instruction dump:
Instruction dump:


...
[snip]




--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/