Re: [f2fs-dev] Writes stoped working on f2fs after the compression support was added

From: Chao Yu
Date: Tue Feb 25 2020 - 20:58:09 EST


On 2020/2/25 20:27, OndÅej Jirman wrote:
> On Tue, Feb 25, 2020 at 01:08:14PM +0100, megi xff wrote:
>> Hello,
>>
>> On Tue, Feb 25, 2020 at 07:24:12PM +0800, Chao Yu wrote:
>>> On 2020/2/24 22:31, OndÅej Jirman wrote:
>>>>> Just by a looks of it:
>>>>>
>>>>> root@tbs2[/proc/sys/kernel] # dmesg | grep up_write | wc -l
>>>>> 324
>>>>> root@tbs2[/proc/sys/kernel] # dmesg | grep down_write | wc -l
>>>>> 347
>>>>>
>>>>> there seems to be a mismatch of lock/unlock counts.
>>>>
>>>> Sorry, a wrong grep expression.
>>>>
>>>> root@tbs2[~] # dmesg | grep inode_down_write | wc -l
>>>> 357
>>>> root@tbs2[~] # dmesg | grep inode_up_write | wc -l
>>>> 357
>>>> root@tbs2[~] # dmesg | grep inode_up_read | wc -l
>>>> 16
>>>> root@tbs2[~] # dmesg | grep inode_down_read | wc -l
>>>> 16
>>>
>>> I don't know why we have consistent down/up pair, but through disassembled
>>> code, I doubt it's the f2fs_inode->i_sem.
>>
>> Because we were counting attempts, and not a successful lock. ;)

[ 214.508943] inode_down_write from f2fs_write_cache_pages+0x2b4/0x7c4
[ 306.213325] inode_down_write from f2fs_write_cache_pages+0x2b4/0x7c4

Actually, down_write count is 320, up_write count is 319, kworker didn't call
up_write at 214.508943.

>>
>>> c0435d7c: ebf54af8 bl c0188964 <printk>
>>> c0435d80: e1a00006 mov r0, r6
>>> c0435d84: eb138135 bl c0916260 <down_write>
>>>
>>> inode_down_write()
>>>
>>> c0435d88: e284ce1d add ip, r4, #464 ; 0x1d0
>>>
>>> We are stuck here.
>>>
>>> [ 430.675754] [<c0916260>] (down_write) from [<c0435d88>] (f2fs_write_single_data_page+0x600/0x7d8)
>>> ^^^^^^^^^
>>> [ 430.675764] [<c0435788>] (f2fs_write_single_data_page) from [<c0436214>] (f2fs_write_cache_pages+0x2b4/0x7c4)
>>>
>>>
>>> c0435d8c: e14b0ad4 ldrd r0, [fp, #-164] ; 0xffffff5c
>>> c0435d90: e1cc20d0 ldrd r2, [ip]
>>> c0435d94: e1520000 cmp r2, r0
>>> c0435d98: e0d33001 sbcs r3, r3, r1
>>> c0435d9c: b1cc00f0 strdlt r0, [ip]
>>> c0435da0: e1a00006 mov r0, r6
>>> c0435da4: ebf52227 bl c017e648 <up_write>
>>> c0435da8: e51b2098 ldr r2, [fp, #-152] ; 0xffffff68
>>> c0435dac: e30c0730 movw r0, #50992 ; 0xc730
>>> c0435db0: e59f11a4 ldr r1, [pc, #420] ; c0435f5c <f2fs_write_single_data_page+0x7d4>
>>> c0435db4: e34c00b6 movt r0, #49334 ; 0xc0b6
>>> c0435db8: ebf54ae9 bl c0188964 <printk>
>>>
>>> inode_up_write()
>>
>> The patch you sent helped so far. I'll keep the tablet running for a while,
>> but so far the issue did not reappear within a few minutes after boot as
>> usual.
>
> So this time it just took several times longer to appear (8-20mins to the hang):
>
> https://megous.com/dl/tmp/dmesg1
> https://megous.com/dl/tmp/dmesg2

Alright, I still didn't see any possible deadlock in f2fs.

Can you try below patch? I'd like to see whether spinlock can cause the same issue.