Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
From: Gang He
Date: Wed Dec 27 2017 - 21:11:32 EST
Hi Jun,
>>>
> Hi Gang,
>
> Thanks for your explaination, and I just have one more question. Could
> we use 'ocfs2_inode_lock' instead of 'ocfs2_inode_lock_full' to avoid
> -EAGAIN circularly?
No, please see the comments above the function ocfs2_inode_lock_with_page(),
there will be probably a deadlock between tasks acquiring DLM
locks while holding a page lock and the downconvert thread which
blocks dlm lock acquiry while acquiring page locks.
Then, the OCFS2_LOCK_NONBLOCK flag was introduced as a workaround to
avoid this case.
Thanks
Gang
>
> thanks,
> Jun
>
> On 2017/12/27 18:37, Gang He wrote:
>> Hi Jun,
>>
>>
>>>>>
>>> Hi Gang,
>>>
>>> Do you mean that too many retrys in loop cast losts of CPU-time and
>>> block page-fault interrupt? We should not add any delay in
>>> ocfs2_fault(), right? And I still feel a little confused why your
>>> method can solve this problem.
>> You can see the related code in function filemap_fault(), if ocfs2 fails to
> read a page since
>> it can not get a inode lock with non-block mode, the VFS layer code will
> invoke ocfs2
>> read page call back function circularly, this will lead to a softlockup
> problem (like the below back trace).
>> So, we should get a blocking lock to let the dlm lock to this node and also
> can avoid CPU loop,
>> second, base on my testing, the patch also can improve the efficiency in
> case modifying the same
>> file frequently from multiple nodes, since the lock acquisition chance is
> more fair.
>> In fact, the code was modified by a patch 1cce4df04f37 ("ocfs2: do not
> lock/unlock() inode DLM lock"),
>> before that patch, the code is the same, this patch can be considered to
> revert that patch, except adding more
>> clear comments.
>>
>> Thanks
>> Gang
>>
>>
>>>
>>> thanks,
>>> Jun
>>>
>>> On 2017/12/27 17:29, Gang He wrote:
>>>> If we can't get inode lock immediately in the function
>>>> ocfs2_inode_lock_with_page() when reading a page, we should not
>>>> return directly here, since this will lead to a softlockup problem.
>>>> The method is to get a blocking lock and immediately unlock before
>>>> returning, this can avoid CPU resource waste due to lots of retries,
>>>> and benefits fairness in getting lock among multiple nodes, increase
>>>> efficiency in case modifying the same file frequently from multiple
>>>> nodes.
>>>> The softlockup problem looks like,
>>>> Kernel panic - not syncing: softlockup: hung tasks
>>>> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
>>>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>>> Call Trace:
>>>> <IRQ>
>>>> dump_stack+0x5c/0x82
>>>> panic+0xd5/0x21e
>>>> watchdog_timer_fn+0x208/0x210
>>>> ? watchdog_park_threads+0x70/0x70
>>>> __hrtimer_run_queues+0xcc/0x200
>>>> hrtimer_interrupt+0xa6/0x1f0
>>>> smp_apic_timer_interrupt+0x34/0x50
>>>> apic_timer_interrupt+0x96/0xa0
>>>> </IRQ>
>>>> RIP: 0010:unlock_page+0x17/0x30
>>>> RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
>>>> RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004
>>>> RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300
>>>> RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00
>>>> R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518
>>>> R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300
>>>> ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
>>>> ocfs2_readpage+0x41/0x2d0 [ocfs2]
>>>> ? pagecache_get_page+0x30/0x200
>>>> filemap_fault+0x12b/0x5c0
>>>> ? recalc_sigpending+0x17/0x50
>>>> ? __set_task_blocked+0x28/0x70
>>>> ? __set_current_blocked+0x3d/0x60
>>>> ocfs2_fault+0x29/0xb0 [ocfs2]
>>>> __do_fault+0x1a/0xa0
>>>> __handle_mm_fault+0xbe8/0x1090
>>>> handle_mm_fault+0xaa/0x1f0
>>>> __do_page_fault+0x235/0x4b0
>>>> trace_do_page_fault+0x3c/0x110
>>>> async_page_fault+0x28/0x30
>>>> RIP: 0033:0x7fa75ded638e
>>>> RSP: 002b:00007ffd6657db18 EFLAGS: 00010287
>>>> RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700
>>>> RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700
>>>> RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000
>>>> R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770
>>>> R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000
>>>>
>>>> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
>>>> Signed-off-by: Gang He <ghe@xxxxxxxx>
>>>> ---
>>>> fs/ocfs2/dlmglue.c | 9 +++++++++
>>>> 1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>>> index 4689940..5193218 100644
>>>> --- a/fs/ocfs2/dlmglue.c
>>>> +++ b/fs/ocfs2/dlmglue.c
>>>> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>>> ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>>>> if (ret == -EAGAIN) {
>>>> unlock_page(page);
>>>> + /*
>>>> + * If we can't get inode lock immediately, we should not return
>>>> + * directly here, since this will lead to a softlockup problem.
>>>> + * The method is to get a blocking lock and immediately unlock
>>>> + * before returning, this can avoid CPU resource waste due to
>>>> + * lots of retries, and benefits fairness in getting lock.
>>>> + */
>>>> + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
>>>> + ocfs2_inode_unlock(inode, ex);
>>>> ret = AOP_TRUNCATED_PAGE;
>>>> }
>>>>
>>>>
>> .
>>