Re: [PATCH] CIFS: fix oplock break deadlocks
From: Steve French
Date: Wed May 03 2017 - 12:05:31 EST
Thoughts about whether cc: stable?
On Wed, May 3, 2017 at 10:54 AM, Rabin Vincent <rabin.vincent@xxxxxxxx> wrote:
> From: Rabin Vincent <rabinv@xxxxxxxx>
>
> When the final cifsFileInfo_put() is called from cifsiod and an oplock
> break work is queued, lockdep complains loudly:
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 4.11.0+ #21 Not tainted
> ---------------------------------------------
> kworker/0:2/78 is trying to acquire lock:
> ("cifsiod"){++++.+}, at: flush_work+0x215/0x350
>
> but task is already holding lock:
> ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock("cifsiod");
> lock("cifsiod");
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by kworker/0:2/78:
> #0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
> #1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0
>
> stack backtrace:
> CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
> Workqueue: cifsiod cifs_writev_complete
> Call Trace:
> dump_stack+0x85/0xc2
> __lock_acquire+0x17dd/0x2260
> ? match_held_lock+0x20/0x2b0
> ? trace_hardirqs_off_caller+0x86/0x130
> ? mark_lock+0xa6/0x920
> lock_acquire+0xcc/0x260
> ? lock_acquire+0xcc/0x260
> ? flush_work+0x215/0x350
> flush_work+0x236/0x350
> ? flush_work+0x215/0x350
> ? destroy_worker+0x170/0x170
> __cancel_work_timer+0x17d/0x210
> ? ___preempt_schedule+0x16/0x18
> cancel_work_sync+0x10/0x20
> cifsFileInfo_put+0x338/0x7f0
> cifs_writedata_release+0x2a/0x40
> ? cifs_writedata_release+0x2a/0x40
> cifs_writev_complete+0x29d/0x850
> ? preempt_count_sub+0x18/0xd0
> process_one_work+0x304/0x8e0
> worker_thread+0x9b/0x6a0
> kthread+0x1b2/0x200
> ? process_one_work+0x8e0/0x8e0
> ? kthread_create_on_node+0x40/0x40
> ret_from_fork+0x31/0x40
>
> This is a real warning. Since the oplock is queued on the same
> workqueue this can deadlock if there is only one worker thread active
> for the workqueue (which will be the case during memory pressure when
> the rescuer thread is handling it).
>
> Furthermore, there is at least one other kind of hang possible due to
> the oplock break handling if there is only worker. (This can be
> reproduced without introducing memory pressure by having passing 1 for
> the max_active parameter of cifsiod.) cifs_oplock_break() can wait
> indefintely in the filemap_fdatawait() while the cifs_writev_complete()
> work is blocked:
>
> sysrq: SysRq : Show Blocked State
> task PC stack pid father
> kworker/0:1 D 0 16 2 0x00000000
> Workqueue: cifsiod cifs_oplock_break
> Call Trace:
> __schedule+0x562/0xf40
> ? mark_held_locks+0x4a/0xb0
> schedule+0x57/0xe0
> io_schedule+0x21/0x50
> wait_on_page_bit+0x143/0x190
> ? add_to_page_cache_lru+0x150/0x150
> __filemap_fdatawait_range+0x134/0x190
> ? do_writepages+0x51/0x70
> filemap_fdatawait_range+0x14/0x30
> filemap_fdatawait+0x3b/0x40
> cifs_oplock_break+0x651/0x710
> ? preempt_count_sub+0x18/0xd0
> process_one_work+0x304/0x8e0
> worker_thread+0x9b/0x6a0
> kthread+0x1b2/0x200
> ? process_one_work+0x8e0/0x8e0
> ? kthread_create_on_node+0x40/0x40
> ret_from_fork+0x31/0x40
> dd D 0 683 171 0x00000000
> Call Trace:
> __schedule+0x562/0xf40
> ? mark_held_locks+0x29/0xb0
> schedule+0x57/0xe0
> io_schedule+0x21/0x50
> wait_on_page_bit+0x143/0x190
> ? add_to_page_cache_lru+0x150/0x150
> __filemap_fdatawait_range+0x134/0x190
> ? do_writepages+0x51/0x70
> filemap_fdatawait_range+0x14/0x30
> filemap_fdatawait+0x3b/0x40
> filemap_write_and_wait+0x4e/0x70
> cifs_flush+0x6a/0xb0
> filp_close+0x52/0xa0
> __close_fd+0xdc/0x150
> SyS_close+0x33/0x60
> entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> Showing all locks held in the system:
> 2 locks held by kworker/0:1/16:
> #0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
> #1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
>
> Showing busy workqueues and worker pools:
> workqueue cifsiod: flags=0xc
> pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
> in-flight: 16:cifs_oplock_break
> delayed: cifs_writev_complete, cifs_echo_request
> pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3
>
> Fix these problems by creating a a new workqueue (with a rescuer) for
> the oplock break work.
>
> Signed-off-by: Rabin Vincent <rabinv@xxxxxxxx>
> ---
> fs/cifs/cifsfs.c | 15 +++++++++++++--
> fs/cifs/cifsglob.h | 1 +
> fs/cifs/misc.c | 2 +-
> fs/cifs/smb2misc.c | 5 +++--
> 4 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 34fee9f..5c52c8f 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -87,6 +87,7 @@ extern mempool_t *cifs_req_poolp;
> extern mempool_t *cifs_mid_poolp;
>
> struct workqueue_struct *cifsiod_wq;
> +struct workqueue_struct *cifsoplockd_wq;
> __u32 cifs_lock_secret;
>
> /*
> @@ -1374,9 +1375,16 @@ init_cifs(void)
> goto out_clean_proc;
> }
>
> + cifsoplockd_wq = alloc_workqueue("cifsoplockd",
> + WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
> + if (!cifsoplockd_wq) {
> + rc = -ENOMEM;
> + goto out_destroy_cifsiod_wq;
> + }
> +
> rc = cifs_fscache_register();
> if (rc)
> - goto out_destroy_wq;
> + goto out_destroy_cifsoplockd_wq;
>
> rc = cifs_init_inodecache();
> if (rc)
> @@ -1424,7 +1432,9 @@ init_cifs(void)
> cifs_destroy_inodecache();
> out_unreg_fscache:
> cifs_fscache_unregister();
> -out_destroy_wq:
> +out_destroy_cifsoplockd_wq:
> + destroy_workqueue(cifsoplockd_wq);
> +out_destroy_cifsiod_wq:
> destroy_workqueue(cifsiod_wq);
> out_clean_proc:
> cifs_proc_clean();
> @@ -1447,6 +1457,7 @@ exit_cifs(void)
> cifs_destroy_mids();
> cifs_destroy_inodecache();
> cifs_fscache_unregister();
> + destroy_workqueue(cifsoplockd_wq);
> destroy_workqueue(cifsiod_wq);
> cifs_proc_clean();
> }
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 37f5a41..17f0e73 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1683,6 +1683,7 @@ void cifs_oplock_break(struct work_struct *work);
>
> extern const struct slow_work_ops cifs_oplock_break_ops;
> extern struct workqueue_struct *cifsiod_wq;
> +extern struct workqueue_struct *cifsoplockd_wq;
> extern __u32 cifs_lock_secret;
>
> extern mempool_t *cifs_mid_poolp;
> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> index d3fb115..b578c6d 100644
> --- a/fs/cifs/misc.c
> +++ b/fs/cifs/misc.c
> @@ -492,7 +492,7 @@ is_valid_oplock_break(char *buffer, struct TCP_Server_Info *srv)
> CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2,
> &pCifsInode->flags);
>
> - queue_work(cifsiod_wq,
> + queue_work(cifsoplockd_wq,
> &netfile->oplock_break);
> netfile->oplock_break_cancelled = false;
>
> diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
> index 1a04b3a..7b08a14 100644
> --- a/fs/cifs/smb2misc.c
> +++ b/fs/cifs/smb2misc.c
> @@ -499,7 +499,7 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp,
> else
> cfile->oplock_break_cancelled = true;
>
> - queue_work(cifsiod_wq, &cfile->oplock_break);
> + queue_work(cifsoplockd_wq, &cfile->oplock_break);
> kfree(lw);
> return true;
> }
> @@ -643,7 +643,8 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server)
> CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2,
> &cinode->flags);
> spin_unlock(&cfile->file_info_lock);
> - queue_work(cifsiod_wq, &cfile->oplock_break);
> + queue_work(cifsoplockd_wq,
> + &cfile->oplock_break);
>
> spin_unlock(&tcon->open_file_lock);
> spin_unlock(&cifs_tcp_ses_lock);
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thanks,
Steve