Re: [RFC PATCH] mm: Replace ->launder_folio() with flush and wait
From: David Howells
Date: Tue Mar 19 2024 - 10:15:09 EST
Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > (2) invalidate_inode_pages2() is used in some places to effect
> > invalidation of the pagecache in the case where the server tells us
> > that a third party modified the server copy of a file. What the
> > right behaviour should be here, I'm not sure, but at the moment, any
> > dirty data will get laundered back to the server. Possibly it should
> > be simply invalidated locally or the user asked how they want to
> > handle the divergence.
>
> Skipping ->launder_page will mean there's a window where the data
> *will* be lost, AFAICS.
>
> Of course concurrent cached writes on different hosts against the same
> region (the size of which depends on how the caching is done) will
> conflict.
Indeed. Depending on when you're using invalidate_inode_pages2() and co. and
what circumstances you're using it for, you *are* going to suffer data loss.
For instance, if you have dirty data on the local host and get an invalidation
notification from the server: if you write just your dirty data back, you may
corrupt the file on the server, losing the third party changes; if you write
back your entire copy of the file, you might avoid corrupting the file, but
completely obliterate the third party changes; if you discard your changes,
you lose those instead, but save the third party changes.
I'm working towards supporting disconnected operation where I'll need to add
some sort of user interaction mechanism that will allow the user to say how
they want to handle this.
> But if concurrent writes are to different regions, then they shouldn't
> be lost, no? Without the current ->launder_page thing I don't see how
> that could be guaranteed.
Define "different regions". If they're not on the same folios, then why would
they be lost by simply flushing the data before doing the invalidation? If
they are on different parts of the same folio, all the above still apply when
you flush the whole folio.
Now, you can mitigate the latter case by keeping track of which bytes changed,
but that still allows you to corrupt the file by writing back just your
particular changes.
And then there's the joker in the deck: mmap. The main advantage of
invalidate_inode_pages2() is that it forcibly unmaps the page before
laundering it. However, this doesn't prevent you then corrupting the upstream
copy by writing the changes back.
What particular usage case of invalidate_inode_pages2() are you thinking of?
DIO read/write can only be best effort: flush, invalidate then do the DIO
which may bring the buffers back in because they're mmapped. In which case
doing a flush and a non-laundering invalidate that leaves dirty pages in place
ought to be fine.
David