Re: [PATCH] ext4: use private version of page_zero_new_buffers() for data=journal mode

From: Dave Hansen
Date: Mon Jan 27 2025 - 15:53:05 EST


On 1/26/25 14:45, Mateusz Guzik wrote:
>>
>> So if you don't get around to it, and _if_ I remember this when the
>> merge window is open, I might do it in my local tree, but then it will
>> end up being too late for this merge window.
>>
> The to-be-unreverted change was written by Dave (cc'ed).
>
> I had a brief chat with him on irc, he said he is going to submit an
> updated patch.

I poked at it a bit today. There's obviously been the page=>folio churn
and also iov_iter_fault_in_readable() got renamed and got some slightly
new semantics.

Additionally, I'm doubting the explicit pagefault_disable(). The
original patch did this:

+ pagefault_disable();
copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
+ pagefault_enable();

and the modern generic_perform_write() is using:

copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);

But the "atomic" copy is using kmap_atomic() internally which has a
built-in pagefault_disable(). It wouldn't be super atomic if it were
handling page faults of course.

So I don't think generic_perform_write() needs to do its own
pagefault_disable().

I actually still had the original decade-old test case sitting around
compiled on my test box. It still triggers the issue and _will_ livelock
if fault_in_iov_iter_readable() isn't called somewhere.

Anyway, here's a patch that compiles, boots and doesn't immediately fall
over on ext4 in case anyone else wants to poke at it. I'll do a real
changelog, SoB, etc.... and send it out for real tomorrow if it holds up.
index 4f476411a9a2d..98b37e4c6d43c 100644

---

b/mm/filemap.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff -puN mm/filemap.c~generic_perform_write-1 mm/filemap.c
--- a/mm/filemap.c~generic_perform_write-1 2025-01-27 09:53:13.219120969 -0800
+++ b/mm/filemap.c 2025-01-27 12:28:40.333920434 -0800
@@ -4027,17 +4027,6 @@ retry:
bytes = min(chunk - offset, bytes);
balance_dirty_pages_ratelimited(mapping);

- /*
- * Bring in the user page that we will copy from _first_.
- * Otherwise there's a nasty deadlock on copying from the
- * same page as we're writing to, without it being marked
- * up-to-date.
- */
- if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
- status = -EFAULT;
- break;
- }
-
if (fatal_signal_pending(current)) {
status = -EINTR;
break;
@@ -4055,6 +4044,11 @@ retry:
if (mapping_writably_mapped(mapping))
flush_dcache_folio(folio);

+ /*
+ * This needs to be atomic because actually handling page
+ * faults on 'i' can deadlock if the copy targets a
+ * userspace mapping of 'folio'.
+ */
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
flush_dcache_folio(folio);

@@ -4080,6 +4074,15 @@ retry:
bytes = copied;
goto retry;
}
+ /*
+ * 'folio' is now unlocked and faults on it can be
+ * handled. Ensure forward progress by trying to
+ * fault it in now.
+ */
+ if (fault_in_iov_iter_readable(i, bytes) == bytes) {
+ status = -EFAULT;
+ break;
+ }
} else {
pos += status;
written += status;
_