Re: [PATCH] scsi: lpfc: Move work items to a stack list

From: Daniel Wagner
Date: Tue Nov 19 2019 - 08:32:58 EST


On Tue, Nov 19, 2019 at 02:28:54PM +0100, Daniel Wagner wrote:
> On Tue, Nov 12, 2019 at 10:15:00PM -0500, Martin K. Petersen wrote:
> > > While trying to understand what's going on in the Oops below I figured
> > > that it could be the result of the invalid pointer access. The patch
> > > still needs testing by our customer but indepent of this I think the
> > > patch fixes a real bug.
>
> I was able to reproduce the same stack trace with this patch
> applied... That is obviously bad. The good news, I have access to this
> machine, so maybe I able to figure out what's the root cause of this
> crash.

Forgot to append the KASAN trace which points at the same place. Don't
know if this is the same thing or not.


[ 329.217804] ==================================================================
[ 329.280494] BUG: KASAN: slab-out-of-bounds in lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc]
[ 329.351654] Read of size 8 at addr ffff88984f160000 by task kworker/77:1/488
[ 329.396559] nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 329.412326]
[ 329.412335] CPU: 77 PID: 488 Comm: kworker/77:1 Kdump: loaded Tainted: G E 5.4.0-rc1-default+ #3
[ 329.412338] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 07/21/2019
[ 329.412414] Workqueue: lpfc_wq lpfc_sli4_hba_process_cq [lpfc]
[ 329.428650] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 329.765863] Call Trace:
[ 329.765888] dump_stack+0x71/0xab
[ 329.765967] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc]
[ 329.765981] print_address_description.constprop.6+0x1b/0x2f0
[ 329.912961] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc]
[ 329.913001] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc]
[ 330.009190] __kasan_report+0x14e/0x192
[ 330.009255] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc]
[ 330.009261] kasan_report+0xe/0x20
[ 330.120620] lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc]
[ 330.120660] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.55+0x59/0x280 [lpfc]
[ 330.226013] ? __update_load_avg_cfs_rq+0x244/0x470
[ 330.226052] ? lpfc_sli4_fp_handle_cqe+0x127/0x8e0 [lpfc]
[ 330.226089] lpfc_sli4_fp_handle_cqe+0x127/0x8e0 [lpfc]
[ 330.358896] ? lpfc_sli4_sp_handle_abort_xri_wcqe.isra.55+0x280/0x280 [lpfc]
[ 330.358907] ? __switch_to_asm+0x40/0x70
[ 330.452995] ? __switch_to_asm+0x34/0x70
[ 330.452998] ? __switch_to_asm+0x40/0x70
[ 330.453000] ? __switch_to_asm+0x34/0x70
[ 330.453002] ? __switch_to_asm+0x40/0x70
[ 330.453005] ? __switch_to_asm+0x34/0x70
[ 330.453041] __lpfc_sli4_process_cq+0x1e1/0x470 [lpfc]
[ 330.453078] ? lpfc_sli4_sp_handle_abort_xri_wcqe.isra.55+0x280/0x280 [lpfc]
[ 330.728428] ? __switch_to_asm+0x40/0x70
[ 330.728466] __lpfc_sli4_hba_process_cq+0x88/0x1d0 [lpfc]
[ 330.728503] ? lpfc_sli4_fp_handle_cqe+0x8e0/0x8e0 [lpfc]
[ 330.855605] process_one_work+0x46e/0x7f0
[ 330.855610] worker_thread+0x69/0x6b0
[ 330.855615] ? process_one_work+0x7f0/0x7f0
[ 330.855620] kthread+0x1b3/0x1d0
[ 330.855624] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 330.855627] ret_from_fork+0x35/0x40
[ 330.855631]
[ 330.855634] Allocated by task 5171:
[ 330.855644] save_stack+0x19/0x80
[ 330.855650] __kasan_kmalloc.constprop.9+0xa0/0xd0
[ 331.175452] __kmalloc+0xfb/0x5d0
[ 331.175461] alloc_pipe_info+0xff/0x210
[ 331.175464] create_pipe_files+0x66/0x2e0
[ 331.175467] __do_pipe_flags+0x2c/0x100
[ 331.175470] do_pipe2+0x80/0x130
[ 331.175472] __x64_sys_pipe2+0x2b/0x30
[ 331.175486] do_syscall_64+0x73/0x230
[ 331.395309] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 331.395310]
[ 331.395312] Freed by task 5171:
[ 331.395317] save_stack+0x19/0x80
[ 331.395319] __kasan_slab_free+0x105/0x150
[ 331.395321] kfree+0xa6/0x150
[ 331.395324] free_pipe_info+0x106/0x120
[ 331.395327] pipe_release+0xcb/0xf0
[ 331.395335] __fput+0x11d/0x330
[ 331.395338] task_work_run+0xc6/0xf0
[ 331.395344] exit_to_usermode_loop+0x11d/0x120
[ 331.730019] do_syscall_64+0x203/0x230
[ 331.730023] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 331.730023]
[ 331.730027] The buggy address belongs to the object at ffff88984f160040
[ 331.730027] which belongs to the cache kmalloc-1k of size 1024
[ 331.730030] The buggy address is located 64 bytes to the left of
[ 331.730030] 1024-byte region [ffff88984f160040, ffff88984f160440)
[ 331.730031] The buggy address belongs to the page:
[ 331.730036] page:ffffea00613c5800 refcount:1 mapcount:0 mapping:ffff888107c00700 index:0x0 compound_mapcount: 0
[ 331.730042] flags: 0x97ffffc0010200(slab|head)
[ 331.730050] raw: 0097ffffc0010200 ffffea00613c4608 ffffea00613c7f88 ffff888107c00700
[ 332.266508] raw: 0000000000000000 ffff88984f160040 0000000100000007 0000000000000000
[ 332.266509] page dumped because: kasan: bad access detected
[ 332.266510]
[ 332.266511] Memory state around the buggy address:
[ 332.266516] ffff88984f15ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 332.266518] ffff88984f15ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 332.266521] >ffff88984f160000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 332.266522] ^
[ 332.266525] ffff88984f160080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 332.266527] ffff88984f160100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 332.266528] ==================================================================

The kernel I used to create the above KASAN trace is mkp/queue (clean
without my patch), c0bf9a264e10 ("scsi: iscsi: Don't send data to
unbound connection")