Re: Subject: [BUG/RFC] write-open file THP cache purge can discard dirty page cache
From: Pedro Falcato
Date: Tue Jun 30 2026 - 14:31:17 EST
+CC some relevant THP folks
Quick note, your email client's spacing seems to be all over the place, making
this extremely hard to read.
On Tue, Jun 30, 2026 at 01:01:53PM -0400, Gregg Leventhal wrote:
> Hello,
>
> We (Gregg Leventhal <gleventhal@xxxxxxxxxxxxxx> and Eric Hagberg
>
> <ehagberg@xxxxxxxxxxxxxx>) have a reproducible data-loss issue involving file
>
> THPs and write-open, impacting filesystems that do not support
> writable large folios.
>
>
> Attached are:
>
>
> - thp_write_open_cancel_dirty_repro.c
>
> - thp-open-writeback-before-purge.patch
>
>
>
> Summary
>
> =======
>
>
> On an affected 6.12 kernel with CONFIG_READ_ONLY_THP_FOR_FS=y, a file can
>
> contain read-only file THPs installed by khugepaged / MADV_COLLAPSE. When that
>
> same file is later opened for write, do_dentry_open() notices
>
> filemap_nr_thps() and drops the page cache:
>
>
> /*
>
> * XXX: Huge page cache doesn't support writing yet. Drop all page
>
> * cache for this file before processing writes.
>
> */
>
> if (f->f_mode & FMODE_WRITE) {
>
> if (filemap_nr_thps(inode->i_mapping)) {
>
> struct address_space *mapping = inode->i_mapping;
>
>
> filemap_invalidate_lock(inode->i_mapping);
>
> unmap_mapping_range(mapping, 0, 0, 0);
>
> truncate_inode_pages(mapping, 0);
>
> filemap_invalidate_unlock(inode->i_mapping);
>
> }
>
> }
Ugh, this is embarassing. So, good news: this code doesn't exist anymore
in mainline! Bad news: it exists on every other upstream-stable-maintained
release :|
FWIW I don't think your fix works, there's still a race there (what if
you write and wait, then someone dirties a folio, then you truncate the
pagecache? you lost data again.). I'm attaching a very quick WIP patch
that I wrote against 6.12 LTS (again, this does not exist in mainline).
I _think_ we want to go roughly in that direction, either here or in
collapse file paths. There are still problems which are invasive and
I haven't dealt with (GUP and other "temporary" folio releases being
the main one). Some of these problems may simply make it so opening
these files writable may fail (there is certainly, AFAIK, no way of
waiting for GUP and other temporary folio holders).
We would probably be served with a custom loop that forcibly yanks
only THPs out the pagecache, though. But that requires a bit more
code for a stable-only issue...
Anyway, the patch is obviously ungood and uncromulent and is only
here for a rough conversation starter. I don't think it works and
it will probably never work. mapping invalidation is simply too
best-effort for something that Just Needs(tm) to work.
--
Pedro