On Mon 15-04-24 12:28:01, Baokun Li wrote:Yes, at least we need to check for FAULT_FLAG_RETRY_NOWAIT.
On 2023/6/5 23:08, Jan Kara wrote:So the problem with this is that VM_FAULT_RETRY is not always an option -
On Mon 05-06-23 15:55:35, Matthew Wilcox wrote:Hi Honza!
On Mon, Jun 05, 2023 at 02:21:41PM +0200, Jan Kara wrote:Well, yes, without ->writepage() it might be *possible*. But still rather
On Mon 05-06-23 11:16:55, Jan Kara wrote:Is it so bad? Now that we don't have writepage in ext4, only
Yeah, I agree, that is also the conclusion I have arrived at when thinkingOK, thinking more about this and searching through the history, I've
about this problem now. We should be able to just remove the conversion
from ext4_page_mkwrite() and rely on write(2) or truncate(2) doing it when
growing i_size.
realized why the conversion is originally in ext4_page_mkwrite(). The
problem is described in commit 7b4cc9787fe35b ("ext4: evict inline data
when writing to memory map") but essentially it boils down to the fact that
ext4 writeback code does not expect dirty page for a file with inline data
because ext4_write_inline_data_end() should have copied the data into the
inode and cleared the folio's dirty flag.
Indeed messing with xattrs from the writeback path to copy page contents
into inline data xattr would be ... interesting. Hum, out of good ideas for
now :-|.
writepages, it seems like we have a considerably more benign locking
environment to work in.
ugly. The problem is that in ->writepages() i_size is not stable. Thus also
whether the inode data is inline or not is not stable. We'd need inode_lock
for that but that is not easily doable in the writeback path - inode lock
would then become fs_reclaim unsafe...
Honza
Hi Ted!
Hi Matthew!
Long time later came back to this, because while discussing another similar
ABBA problem with Hou Tao, he mentioned VM_FAULT_RETRY, and then I
thought that this could be used to solve this problem as well.
The general idea is that if we see a file with inline data in
ext4_page_mkwrite(),
we release the mmap_lock and grab the inode_lock to convert the inline data,
and then return VM_FAULT_RETRY to retry to get the mmap_lock.
The code implementation is as follows, do you have any thoughts?
in particular the caller has to set FAULT_FLAG_ALLOW_RETRY to indicate it
is prepared to handle VM_FAULT_RETRY return. See how
maybe_unlock_mmap_for_io() is carefully checking this.
There are callersIt is indeed sad. I'm going to go learn more about the code for
(most notably some get_user_pages() users) that don't set
FAULT_FLAG_ALLOW_RETRY so the escape through VM_FAULT_RETRY is sadly not a
reliable solution.
My long-term wish is we were always allowed to use VM_FAULT_RETRY and thatThat sounds like a great idea. I will try to get the history on it and
was actually what motivated some get_user_pages() cleanups I did couple
years ago. But dealing with all the cases in various drivers was too
difficult and I've run out of time. Now maybe it would be worth it to
revisit this since things have changed noticeably and maybe now it would be
easier to achive the goal...
Honza