Re: [PATCH 3.4 61/92] ocfs2/dlm: fix deadlock when dispatch assert master
From: Zefan Li
Date: Mon Apr 18 2016 - 20:18:51 EST
On 2016/4/18 19:29, Joseph Qi wrote:
> Hi Zefan,
> In dlm_master_request_handler, it has missed the following change :
> - dlm_put(dlm);
> + if (!dispatched)
> + dlm_put(dlm);
Oops..I guess I forgot to refresh the patch after adjusting the context.
Thanks for the review!
>
> Thanks,
> Joseph
>
> On 2016/4/18 18:46, lizf@xxxxxxxxxx wrote:
>> From: Joseph Qi <joseph.qi@xxxxxxxxxx>
>>
>> 3.4.112-rc1 review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>>
>>
>> commit 012572d4fc2e4ddd5c8ec8614d51414ec6cae02a upstream.
>>
>> The order of the following three spinlocks should be:
>> dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
>>
>> But dlm_dispatch_assert_master() is called while holding
>> dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
>> dlm_grab() which will take dlm_domain_lock.
>>
>> Once another thread (for example, dlm_query_join_handler) has already
>> taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
>> happens.
>>
>> Signed-off-by: Joseph Qi <joseph.qi@xxxxxxxxxx>
>> Cc: Joel Becker <jlbec@xxxxxxxxxxxx>
>> Cc: Mark Fasheh <mfasheh@xxxxxxxx>
>> Cc: "Junxiao Bi" <junxiao.bi@xxxxxxxxxx>
>> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> [lizf: Backported to 3.4: adjust context]
>> Signed-off-by: Zefan Li <lizefan@xxxxxxxxxx>
>> ---
>> fs/ocfs2/dlm/dlmmaster.c | 4 +++-
>> fs/ocfs2/dlm/dlmrecovery.c | 6 +++++-
>> 2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
>> index 7ba6ac1..751efa8 100644
>> --- a/fs/ocfs2/dlm/dlmmaster.c
>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>> @@ -1411,6 +1411,7 @@ int dlm_master_request_handler(struct o2net_msg *msg, u32 len, void *data,
>> int found, ret;
>> int set_maybe;
>> int dispatch_assert = 0;
>> + int dispatched = 0;
>>
>> if (!dlm_grab(dlm))
>> return DLM_MASTER_RESP_NO;
>> @@ -1617,6 +1618,8 @@ send_response:
>> mlog(ML_ERROR, "failed to dispatch assert master work\n");
>> response = DLM_MASTER_RESP_ERROR;
>> dlm_lockres_put(res);
>> + } else {
>> + dispatched = 1;
>> }
>> } else {
>> if (res)
>> @@ -2041,7 +2044,6 @@ int dlm_dispatch_assert_master(struct dlm_ctxt *dlm,
>>
>>
>> /* queue up work for dlm_assert_master_worker */
>> - dlm_grab(dlm); /* get an extra ref for the work item */
>> dlm_init_work_item(dlm, item, dlm_assert_master_worker, NULL);
>> item->u.am.lockres = res; /* already have a ref */
>> /* can optionally ignore node numbers higher than this node */
>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>> index d15b071..0e5013e 100644
>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>> @@ -1689,6 +1689,7 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
>> unsigned int hash;
>> int master = DLM_LOCK_RES_OWNER_UNKNOWN;
>> u32 flags = DLM_ASSERT_MASTER_REQUERY;
>> + int dispatched = 0;
>>
>> if (!dlm_grab(dlm)) {
>> /* since the domain has gone away on this
>> @@ -1710,6 +1711,8 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
>> mlog_errno(-ENOMEM);
>> /* retry!? */
>> BUG();
>> + } else {
>> + dispatched = 1;
>> }
>> } else /* put.. incase we are not the master */
>> dlm_lockres_put(res);
>> @@ -1717,7 +1720,8 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
>> }
>> spin_unlock(&dlm->spinlock);
>>
>> - dlm_put(dlm);
>> + if (!dispatched)
>> + dlm_put(dlm);
>> return master;
>> }
>>
>>
>
>
> .
>