Re: [PATCH v3] fuse: invalidate the page cache after direct write
From: Bernd Schubert
Date: Mon Mar 02 2026 - 17:15:50 EST
On 3/2/26 20:29, Bernd Schubert wrote:
>
>
> On 2/27/26 16:09, Miklos Szeredi wrote:
>> On Sun, 11 Jan 2026 at 08:37, Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>>>
>>> This fixes xfstests generic/451 (for both O_DIRECT and FOPEN_DIRECT_IO
>>> direct write).
>>>
>>> Commit b359af8275a9 ("fuse: Invalidate the page cache after
>>> FOPEN_DIRECT_IO write") tries to fix the similar issue for
>>> FOPEN_DIRECT_IO write, which can be reproduced by xfstests generic/209.
>>> It only fixes the issue for synchronous direct write, while omitting
>>> the case for asynchronous direct write (exactly targeted by
>>> generic/451).
>>>
>>> While for O_DIRECT direct write, it's somewhat more complicated. For
>>> synchronous direct write, generic_file_direct_write() will invalidate
>>> the page cache after the write, and thus it can pass generic/209. While
>>> for asynchronous direct write, the invalidation in
>>> generic_file_direct_write() is bypassed since the invalidation shall be
>>> done when the asynchronous IO completes. This is omitted in FUSE and
>>> generic/451 fails whereby.
>>>
>>> Fix this by conveying the invalidation for both synchronous and
>>> asynchronous write.
>>>
>>> - with FOPEN_DIRECT_IO
>>> - sync write, invalidate in fuse_send_write()
>>> - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO,
>>> fuse_send_write() otherwise
>>> - without FOPEN_DIRECT_IO
>>> - sync write, invalidate in generic_file_direct_write()
>>> - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO,
>>> generic_file_direct_write() otherwise
>>>
>>> Reviewed-by: Bernd Schubert <bschubert@xxxxxxx>
>>> Signed-off-by: Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx>
>>
>> Applied, thanks.
>>
>
> Hi Miklos,
>
> just back from a week off and we got a QA report last week. This commit
> leads to a deadlock. Is there a chance you can revert and not send it
> to Linus yet?
>
> [Wed Feb 25 07:14:29 2026] INFO: task clt_reactor_3:49041 blocked for more than 122 seconds.
> [Wed Feb 25 07:14:29 2026] Tainted: G OE 6.8.0-79-generic #79-Ubuntu
> [Wed Feb 25 07:14:29 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Feb 25 07:14:29 2026] task:clt_reactor_3 state:D stack:0 pid:49041 tgid:49014 ppid:1 flags:0x00000006
> [Wed Feb 25 07:14:29 2026] Call Trace:
> [Wed Feb 25 07:14:29 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Feb 25 07:14:29 2026] task:clt_reactor_3 state:D stack:0 pid:49041 tgid:49014 ppid:1 flags:0x00000006
> [Wed Feb 25 07:14:29 2026] Call Trace:
> [Wed Feb 25 07:14:29 2026] <TASK>
> [Wed Feb 25 07:14:29 2026] __schedule+0x27c/0x6b0
> [Wed Feb 25 07:14:29 2026] schedule+0x33/0x110
> [Wed Feb 25 07:14:29 2026] io_schedule+0x46/0x80
> [Wed Feb 25 07:14:29 2026] folio_wait_bit_common+0x136/0x330
> [Wed Feb 25 07:14:29 2026] __folio_lock+0x17/0x30
> [Wed Feb 25 07:14:29 2026] invalidate_inode_pages2_range+0x1d2/0x4f0
> [Wed Feb 25 07:14:29 2026] fuse_aio_complete+0x258/0x270 [fuse]
> [Wed Feb 25 07:14:29 2026] fuse_aio_complete_req+0x87/0xd0 [fuse]
> [Wed Feb 25 07:14:29 2026] fuse_request_end+0x18e/0x200 [fuse]
> [Wed Feb 25 07:14:29 2026] fuse_uring_req_end+0x87/0xd0 [fuse]
> [Wed Feb 25 07:14:29 2026] fuse_uring_cmd+0x241/0xf20 [fuse]
> [Wed Feb 25 07:14:29 2026] io_uring_cmd+0x9f/0x140
> [Wed Feb 25 07:14:29 2026] io_issue_sqe+0x193/0x410
> [Wed Feb 25 07:14:29 2026] io_submit_sqes+0x128/0x3e0
> [Wed Feb 25 07:14:29 2026] __do_sys_io_uring_enter+0x2ea/0x490
> [Wed Feb 25 07:14:29 2026] __x64_sys_io_uring_enter+0x22/0x40
>
>
> Issue is that invalidate_inode_pages2_range() might trigger another
> write to the same core (in our case a reactor / coroutine) and
> then deadlocks.
> Cheng suggests to offload that into a worker queue, but FOPEN_DIRECT_IO
> code starts to get complex - I'm more inclined to get back to my patches
> from about 3 years ago that the unified the DIO handlers and let it go
> through the normal vfs handlers.
>
Hmm, maybe in the short term maybe the better solution is to update the
patch (not posted to the list) that Cheng made and to use
i_sb->s_dio_done_wq similar to what iomap_dio_bio_end_io() does.