Re: [PATCH v2] ext4: set csum seed in tmp inode while migrating to extents

From: Luís Henriques
Date: Wed Dec 15 2021 - 10:38:06 EST


On Wed, Dec 15, 2021 at 03:12:37PM +0100, Lukas Czerner wrote:
> On Wed, Dec 15, 2021 at 12:28:52PM +0100, Jan Kara wrote:
> > On Tue 14-12-21 16:49:45, Darrick J. Wong wrote:
> > > On Tue, Dec 14, 2021 at 05:50:58PM +0000, Luís Henriques wrote:
> > > > When migrating to extents, the temporary inode will have it's own checksum
> > > > seed. This means that, when swapping the inodes data, the inode checksums
> > > > will be incorrect.
> > > >
> > > > This can be fixed by recalculating the extents checksums again. Or simply
> > > > by copying the seed into the temporary inode.
> > > >
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=213357
> > > > Reported-by: Jeroen van Wolffelaar <jeroen@xxxxxxxxxxxxx>
> > > > Signed-off-by: Luís Henriques <lhenriques@xxxxxxx>
> > > > ---
> > > > fs/ext4/migrate.c | 12 +++++++++++-
> > > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > > >
> > > > changes since v1:
> > > >
> > > > * Dropped tmp_ei variable
> > > > * ->i_csum_seed is now initialised immediately after tmp_inode is created
> > > > * New comment about the seed initialization and stating that recovery
> > > > needs to be fixed.
> > > >
> > > > Cheers,
> > > > --
> > > > Luís
> > > >
> > > > diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
> > > > index 7e0b4f81c6c0..36dfc88ce05b 100644
> > > > --- a/fs/ext4/migrate.c
> > > > +++ b/fs/ext4/migrate.c
> > > > @@ -459,6 +459,17 @@ int ext4_ext_migrate(struct inode *inode)
> > > > ext4_journal_stop(handle);
> > > > goto out_unlock;
> > > > }
> > > > + /*
> > > > + * Use the correct seed for checksum (i.e. the seed from 'inode'). This
> > > > + * is so that the metadata blocks will have the correct checksum after
> > > > + * the migration.
> > > > + *
> > > > + * Note however that, if a crash occurs during the migration process,
> > > > + * the recovery process is broken because the tmp_inode checksums will
> > > > + * be wrong and the orphans cleanup will fail.
> > >
> > > ...and then what does the user do?
> >
> > Run fsck of course! And then recover from backups :) I know this is sad but
> > the situation is that our migration code just is not crash-safe (if we
> > crash we are going to free blocks that are still used by the migrated
> > inode) and Luis makes it work in case we do not crash (which should be
> > hopefully more common) and documents it does not work in case we crash.
> > So overall I'd call it a win.
> >
> > But maybe we should just remove this online-migration functionality
> > completely from the kernel? That would be also a fine solution for me. I
> > was thinking whether we could somehow make the inode migration crash-safe
> > but I didn't think of anything which would not require on-disk format
> > change...
>
> Since this is not something that anyone can honestly recommend doing
> without a prior backup and a word of warning I personaly would be in favor
> of removing it.

It's odd that my first fix to ext4 ends up wiping out a whole feature :-)

Anyway, in case the decision is to go ahead with the feature removal, I'm
inlining bellow an RFC that simply removes the whole file (and returns
-EOPNOTSUPP in the appropriate places). This was only compile-tested, so
I may have missed something.

(Oh, I almost forgot: I also wanted to mention the word dromedary just to
mess a little bit with Jon's article in tomorrow's LWN :-) )

Cheers,
--
Luís