Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode

From: Jaegeuk Kim
Date: Thu Sep 11 2014 - 01:38:23 EST


On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@xxxxxxxxxx> wrote:
>
> > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@xxxxxxxxxx> wrote:
> > >
> > > > Hi,
> > > >
> > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > Hi Huang,
> > > > > >
> > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > inode is not checkpointed. The non-inode dnode may be written
> > before
> > > > > > > inode. So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > recovery fail.
> > > > > > >
> > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > nid
> > > > of
> > > > > > > inode < nid of non-inode dnode. But it is possible for the
> > reverse.
> > > > > > > For example, because of alloc_nid_failed.
> > > > > > >
> > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > find_fsync_dnodes.
> > > > > > >
> > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > patch, that is, from big number to small number.
> > > > > > >
> > > > > > > Signed-off-by: Huang, Ying <ying.huang@xxxxxxxxx>
> > > > > > > ---
> > > > > > > fs/f2fs/recovery.c | 7 ++++---
> > > > > > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > set_inode_flag(F2FS_I(entry->inode),
> > > > > > > FI_INC_LINK);
> > > > > > > - } else {
> > > > > > > - if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > >
> > > > > > If this is not inode block, we should add this inode to recover its
> > > > data blocks.
> > > > >
> > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > find_fsync_dnodes checking dnodes? Per my understanding, any
> > changes to
> > > > > file will cause inode page dirty (for example, mtime changed), so
> > that
> > > > > we will write inode block. Is it right? If so, the solution in this
> > > > > patch should work too.
> > > >
> > > > Your description says that f2fs_iget will fail, which causes the
> > recovery
> > > > fail.
> > > > So, I thought it would be better to handle the f2fs_iget failure
> > directly.
> > > >
> > >
> > > Yes. That is another way to fix the issue.
> > >
> > >
> > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > For exmaple,
> > > > 1. the inode is written by flusher or kswapd, then,
> > > > 2. f2fs_sync_file writes its dnode.
> > > >
> > > > In that case, we can get only non-inode dnode in the node chain, since
> > the
> > > > inode
> > > > has not fsync_mark.
> > > >
> > >
> > > I think your solution is better here, but does not fix all scenarios. If
> > > the inode is checkpointed, the file can be recovered, although the inode
> > > information may be not up to date. But if the inode is not checkpointed,
> > > f2fs_iget will fail too and recover will fail.
> >
> > Ok, let me consider your scenarios.
> >
> > Term: F: fsync_mark, D: dentry_mark
> >
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Lose the latest inode(x). Need to fix.
> >
> > 2. inode(x) | CP | dnode(F) | inode(x)
> > -> Impossible, but recover latest dnode(F)
> >
> > 3. CP | inode(x) | dnode(F)
> > -> Need to write inode(DF) in f2fs_sync_file.
> >
> > 4. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next.
> >
> > 5. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next. But, this is an impossible
> > scenario.
> > Drop this dnode(F).
> >
> > Indeed, there were some missing scenarios.
> >
> > So, how about this patch?
> >
> > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> >
> > We can summarize the roll forward recovery scenarios as follows.
> >
> > [Term] F: fsync_mark, D: dentry_mark
> >
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> >
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> >
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Impossible, but recover latest dnode(F)
> >
>
> I think this is possible. If f2fs_sync_file runs concurrently with
> writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).

If the inode(x) was written, f2fs_sync_file will do write the inode again with
fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.

In f2fs_sync_file,
...
while (!sync_node_pages(sbi, ino, &wbc)) {
if (fsync_mark_done(sbi, ino))
goto out;
mark_inode_dirty_sync(inode);
ret = f2fs_write_inode(inode, NULL);
if (ret)
goto out;
}
...

>
>
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> >
> > 5. CP | inode(x) | dnode(F)
> > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> >
>
> I think this is possible. If inode(x) is written by writeback and then
> dnode(F) is written by f2fs_sync_file.

-> The inode(DF) was missing. Should drop this dnode(F).
Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
This is fixed by the below updated patch.

>
>
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> >
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > Drop this dnode(F).
> >
>
> I think this is possible. as 3.

ditto.

>
>
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> >
> > Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> > ---
> > fs/f2fs/file.c | 10 ++++++++
> > fs/f2fs/recovery.c | 70
> > +++++++++++++++++++++++++++++++++++++++++++++---------
> > 2 files changed, 69 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 5cde363..2660af2 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > loff_t end, int datasync)
> > up_write(&fi->i_sem);
> > }
> > } else {
> > + /*
> > + * CP | inode(x) | dnode(F)
> > + * We need to remain inode(DF) for roll-forward recovery.
> > + */
> > + if (fsync_mark_done(sbi, inode->i_ino) &&
> >
>
> How can this happen? We have not called sync_node_pages(sbi, inode->i_ino,
> &wbc). Where is the fsync mark set?
>
>
> > + !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > {
> >
>
> if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> checkpoint. Right?

Yes, whole conditions were wrong.
Please, refer the following patch.
Thanks,