Re: [PATCH v3] fuse: invalidate the page cache after direct write

From: Jingbo Xu

Date: Tue Mar 03 2026 - 01:48:38 EST




On 3/3/26 5:19 AM, Bernd Schubert wrote:
>
>
> On 3/2/26 20:29, Bernd Schubert wrote:
>>
>>
>> On 2/27/26 16:09, Miklos Szeredi wrote:
>>> On Sun, 11 Jan 2026 at 08:37, Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> This fixes xfstests generic/451 (for both O_DIRECT and FOPEN_DIRECT_IO
>>>> direct write).
>>>>
>>>> Commit b359af8275a9 ("fuse: Invalidate the page cache after
>>>> FOPEN_DIRECT_IO write") tries to fix the similar issue for
>>>> FOPEN_DIRECT_IO write, which can be reproduced by xfstests generic/209.
>>>> It only fixes the issue for synchronous direct write, while omitting
>>>> the case for asynchronous direct write (exactly targeted by
>>>> generic/451).
>>>>
>>>> While for O_DIRECT direct write, it's somewhat more complicated. For
>>>> synchronous direct write, generic_file_direct_write() will invalidate
>>>> the page cache after the write, and thus it can pass generic/209. While
>>>> for asynchronous direct write, the invalidation in
>>>> generic_file_direct_write() is bypassed since the invalidation shall be
>>>> done when the asynchronous IO completes. This is omitted in FUSE and
>>>> generic/451 fails whereby.
>>>>
>>>> Fix this by conveying the invalidation for both synchronous and
>>>> asynchronous write.
>>>>
>>>> - with FOPEN_DIRECT_IO
>>>> - sync write, invalidate in fuse_send_write()
>>>> - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO,
>>>> fuse_send_write() otherwise
>>>> - without FOPEN_DIRECT_IO
>>>> - sync write, invalidate in generic_file_direct_write()
>>>> - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO,
>>>> generic_file_direct_write() otherwise
>>>>
>>>> Reviewed-by: Bernd Schubert <bschubert@xxxxxxx>
>>>> Signed-off-by: Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx>
>>>
>>> Applied, thanks.
>>>
>>
>> Hi Miklos,
>>
>> just back from a week off and we got a QA report last week. This commit
>> leads to a deadlock. Is there a chance you can revert and not send it
>> to Linus yet?
>>
>> [Wed Feb 25 07:14:29 2026] INFO: task clt_reactor_3:49041 blocked for more than 122 seconds.
>> [Wed Feb 25 07:14:29 2026] Tainted: G OE 6.8.0-79-generic #79-Ubuntu
>> [Wed Feb 25 07:14:29 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Wed Feb 25 07:14:29 2026] task:clt_reactor_3 state:D stack:0 pid:49041 tgid:49014 ppid:1 flags:0x00000006
>> [Wed Feb 25 07:14:29 2026] Call Trace:
>> [Wed Feb 25 07:14:29 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Wed Feb 25 07:14:29 2026] task:clt_reactor_3 state:D stack:0 pid:49041 tgid:49014 ppid:1 flags:0x00000006
>> [Wed Feb 25 07:14:29 2026] Call Trace:
>> [Wed Feb 25 07:14:29 2026] <TASK>
>> [Wed Feb 25 07:14:29 2026] __schedule+0x27c/0x6b0
>> [Wed Feb 25 07:14:29 2026] schedule+0x33/0x110
>> [Wed Feb 25 07:14:29 2026] io_schedule+0x46/0x80
>> [Wed Feb 25 07:14:29 2026] folio_wait_bit_common+0x136/0x330
>> [Wed Feb 25 07:14:29 2026] __folio_lock+0x17/0x30
>> [Wed Feb 25 07:14:29 2026] invalidate_inode_pages2_range+0x1d2/0x4f0
>> [Wed Feb 25 07:14:29 2026] fuse_aio_complete+0x258/0x270 [fuse]
>> [Wed Feb 25 07:14:29 2026] fuse_aio_complete_req+0x87/0xd0 [fuse]
>> [Wed Feb 25 07:14:29 2026] fuse_request_end+0x18e/0x200 [fuse]
>> [Wed Feb 25 07:14:29 2026] fuse_uring_req_end+0x87/0xd0 [fuse]
>> [Wed Feb 25 07:14:29 2026] fuse_uring_cmd+0x241/0xf20 [fuse]
>> [Wed Feb 25 07:14:29 2026] io_uring_cmd+0x9f/0x140
>> [Wed Feb 25 07:14:29 2026] io_issue_sqe+0x193/0x410
>> [Wed Feb 25 07:14:29 2026] io_submit_sqes+0x128/0x3e0
>> [Wed Feb 25 07:14:29 2026] __do_sys_io_uring_enter+0x2ea/0x490
>> [Wed Feb 25 07:14:29 2026] __x64_sys_io_uring_enter+0x22/0x40
>>
>>
>> Issue is that invalidate_inode_pages2_range() might trigger another
>> write to the same core (in our case a reactor / coroutine) and
>> then deadlocks.

Besides I don't know why the process hangs in folio_lock() (inside
invalidate_inode_pages2_range()) rather than folio_wait_writeback() in
fuse_launder_folio(), if the root issue is that
invalidate_inode_pages2_range() triggers another write.


>> Cheng suggests to offload that into a worker queue, but FOPEN_DIRECT_IO
>> code starts to get complex - I'm more inclined to get back to my patches
>> from about 3 years ago that the unified the DIO handlers and let it go
>> through the normal vfs handlers.
>>
>
> Hmm, maybe in the short term maybe the better solution is to update the
> patch (not posted to the list) that Cheng made and to use
> i_sb->s_dio_done_wq similar to what iomap_dio_bio_end_io() does.

BTW I also spent some time on this this morning, as the previous patch
is already pending in our internal release. Let me know if Cheng would
like to send a patch ;)

--
Thanks,
Jingbo