Re: [PATCH] CIFS: fix oplock break deadlocks

From: Steve French
Date: Wed May 03 2017 - 12:18:43 EST


merged into cifs-2.6.git for-next

cc: stabled

On Wed, May 3, 2017 at 11:04 AM, Steve French <smfrench@xxxxxxxxx> wrote:
> Thoughts about whether cc: stable?
>
> On Wed, May 3, 2017 at 10:54 AM, Rabin Vincent <rabin.vincent@xxxxxxxx> wrote:
>> From: Rabin Vincent <rabinv@xxxxxxxx>
>>
>> When the final cifsFileInfo_put() is called from cifsiod and an oplock
>> break work is queued, lockdep complains loudly:
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 4.11.0+ #21 Not tainted
>> ---------------------------------------------
>> kworker/0:2/78 is trying to acquire lock:
>> ("cifsiod"){++++.+}, at: flush_work+0x215/0x350
>>
>> but task is already holding lock:
>> ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock("cifsiod");
>> lock("cifsiod");
>>
>> *** DEADLOCK ***
>>
>> May be due to missing lock nesting notation
>>
>> 2 locks held by kworker/0:2/78:
>> #0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
>> #1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0
>>
>> stack backtrace:
>> CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
>> Workqueue: cifsiod cifs_writev_complete
>> Call Trace:
>> dump_stack+0x85/0xc2
>> __lock_acquire+0x17dd/0x2260
>> ? match_held_lock+0x20/0x2b0
>> ? trace_hardirqs_off_caller+0x86/0x130
>> ? mark_lock+0xa6/0x920
>> lock_acquire+0xcc/0x260
>> ? lock_acquire+0xcc/0x260
>> ? flush_work+0x215/0x350
>> flush_work+0x236/0x350
>> ? flush_work+0x215/0x350
>> ? destroy_worker+0x170/0x170
>> __cancel_work_timer+0x17d/0x210
>> ? ___preempt_schedule+0x16/0x18
>> cancel_work_sync+0x10/0x20
>> cifsFileInfo_put+0x338/0x7f0
>> cifs_writedata_release+0x2a/0x40
>> ? cifs_writedata_release+0x2a/0x40
>> cifs_writev_complete+0x29d/0x850
>> ? preempt_count_sub+0x18/0xd0
>> process_one_work+0x304/0x8e0
>> worker_thread+0x9b/0x6a0
>> kthread+0x1b2/0x200
>> ? process_one_work+0x8e0/0x8e0
>> ? kthread_create_on_node+0x40/0x40
>> ret_from_fork+0x31/0x40
>>
>> This is a real warning. Since the oplock is queued on the same
>> workqueue this can deadlock if there is only one worker thread active
>> for the workqueue (which will be the case during memory pressure when
>> the rescuer thread is handling it).
>>
>> Furthermore, there is at least one other kind of hang possible due to
>> the oplock break handling if there is only worker. (This can be
>> reproduced without introducing memory pressure by having passing 1 for
>> the max_active parameter of cifsiod.) cifs_oplock_break() can wait
>> indefintely in the filemap_fdatawait() while the cifs_writev_complete()
>> work is blocked:
>>
>> sysrq: SysRq : Show Blocked State
>> task PC stack pid father
>> kworker/0:1 D 0 16 2 0x00000000
>> Workqueue: cifsiod cifs_oplock_break
>> Call Trace:
>> __schedule+0x562/0xf40
>> ? mark_held_locks+0x4a/0xb0
>> schedule+0x57/0xe0
>> io_schedule+0x21/0x50
>> wait_on_page_bit+0x143/0x190
>> ? add_to_page_cache_lru+0x150/0x150
>> __filemap_fdatawait_range+0x134/0x190
>> ? do_writepages+0x51/0x70
>> filemap_fdatawait_range+0x14/0x30
>> filemap_fdatawait+0x3b/0x40
>> cifs_oplock_break+0x651/0x710
>> ? preempt_count_sub+0x18/0xd0
>> process_one_work+0x304/0x8e0
>> worker_thread+0x9b/0x6a0
>> kthread+0x1b2/0x200
>> ? process_one_work+0x8e0/0x8e0
>> ? kthread_create_on_node+0x40/0x40
>> ret_from_fork+0x31/0x40
>> dd D 0 683 171 0x00000000
>> Call Trace:
>> __schedule+0x562/0xf40
>> ? mark_held_locks+0x29/0xb0
>> schedule+0x57/0xe0
>> io_schedule+0x21/0x50
>> wait_on_page_bit+0x143/0x190
>> ? add_to_page_cache_lru+0x150/0x150
>> __filemap_fdatawait_range+0x134/0x190
>> ? do_writepages+0x51/0x70
>> filemap_fdatawait_range+0x14/0x30
>> filemap_fdatawait+0x3b/0x40
>> filemap_write_and_wait+0x4e/0x70
>> cifs_flush+0x6a/0xb0
>> filp_close+0x52/0xa0
>> __close_fd+0xdc/0x150
>> SyS_close+0x33/0x60
>> entry_SYSCALL_64_fastpath+0x1f/0xbe
>>
>> Showing all locks held in the system:
>> 2 locks held by kworker/0:1/16:
>> #0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
>> #1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
>>
>> Showing busy workqueues and worker pools:
>> workqueue cifsiod: flags=0xc
>> pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
>> in-flight: 16:cifs_oplock_break
>> delayed: cifs_writev_complete, cifs_echo_request
>> pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3
>>
>> Fix these problems by creating a a new workqueue (with a rescuer) for
>> the oplock break work.
>>
>> Signed-off-by: Rabin Vincent <rabinv@xxxxxxxx>
>> ---
>> fs/cifs/cifsfs.c | 15 +++++++++++++--
>> fs/cifs/cifsglob.h | 1 +
>> fs/cifs/misc.c | 2 +-
>> fs/cifs/smb2misc.c | 5 +++--
>> 4 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
>> index 34fee9f..5c52c8f 100644
>> --- a/fs/cifs/cifsfs.c
>> +++ b/fs/cifs/cifsfs.c
>> @@ -87,6 +87,7 @@ extern mempool_t *cifs_req_poolp;
>> extern mempool_t *cifs_mid_poolp;
>>
>> struct workqueue_struct *cifsiod_wq;
>> +struct workqueue_struct *cifsoplockd_wq;
>> __u32 cifs_lock_secret;
>>
>> /*
>> @@ -1374,9 +1375,16 @@ init_cifs(void)
>> goto out_clean_proc;
>> }
>>
>> + cifsoplockd_wq = alloc_workqueue("cifsoplockd",
>> + WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
>> + if (!cifsoplockd_wq) {
>> + rc = -ENOMEM;
>> + goto out_destroy_cifsiod_wq;
>> + }
>> +
>> rc = cifs_fscache_register();
>> if (rc)
>> - goto out_destroy_wq;
>> + goto out_destroy_cifsoplockd_wq;
>>
>> rc = cifs_init_inodecache();
>> if (rc)
>> @@ -1424,7 +1432,9 @@ init_cifs(void)
>> cifs_destroy_inodecache();
>> out_unreg_fscache:
>> cifs_fscache_unregister();
>> -out_destroy_wq:
>> +out_destroy_cifsoplockd_wq:
>> + destroy_workqueue(cifsoplockd_wq);
>> +out_destroy_cifsiod_wq:
>> destroy_workqueue(cifsiod_wq);
>> out_clean_proc:
>> cifs_proc_clean();
>> @@ -1447,6 +1457,7 @@ exit_cifs(void)
>> cifs_destroy_mids();
>> cifs_destroy_inodecache();
>> cifs_fscache_unregister();
>> + destroy_workqueue(cifsoplockd_wq);
>> destroy_workqueue(cifsiod_wq);
>> cifs_proc_clean();
>> }
>> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
>> index 37f5a41..17f0e73 100644
>> --- a/fs/cifs/cifsglob.h
>> +++ b/fs/cifs/cifsglob.h
>> @@ -1683,6 +1683,7 @@ void cifs_oplock_break(struct work_struct *work);
>>
>> extern const struct slow_work_ops cifs_oplock_break_ops;
>> extern struct workqueue_struct *cifsiod_wq;
>> +extern struct workqueue_struct *cifsoplockd_wq;
>> extern __u32 cifs_lock_secret;
>>
>> extern mempool_t *cifs_mid_poolp;
>> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
>> index d3fb115..b578c6d 100644
>> --- a/fs/cifs/misc.c
>> +++ b/fs/cifs/misc.c
>> @@ -492,7 +492,7 @@ is_valid_oplock_break(char *buffer, struct TCP_Server_Info *srv)
>> CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2,
>> &pCifsInode->flags);
>>
>> - queue_work(cifsiod_wq,
>> + queue_work(cifsoplockd_wq,
>> &netfile->oplock_break);
>> netfile->oplock_break_cancelled = false;
>>
>> diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
>> index 1a04b3a..7b08a14 100644
>> --- a/fs/cifs/smb2misc.c
>> +++ b/fs/cifs/smb2misc.c
>> @@ -499,7 +499,7 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp,
>> else
>> cfile->oplock_break_cancelled = true;
>>
>> - queue_work(cifsiod_wq, &cfile->oplock_break);
>> + queue_work(cifsoplockd_wq, &cfile->oplock_break);
>> kfree(lw);
>> return true;
>> }
>> @@ -643,7 +643,8 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server)
>> CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2,
>> &cinode->flags);
>> spin_unlock(&cfile->file_info_lock);
>> - queue_work(cifsiod_wq, &cfile->oplock_break);
>> + queue_work(cifsoplockd_wq,
>> + &cfile->oplock_break);
>>
>> spin_unlock(&tcon->open_file_lock);
>> spin_unlock(&cifs_tcp_ses_lock);
>> --
>> 2.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Thanks,
>
> Steve



--
Thanks,

Steve