Re: Fun with fdatasync()

From: Chris Mason
Date: Tue Oct 13 2009 - 12:02:59 EST


On Tue, Oct 13, 2009 at 12:00:43AM +0200, Jan Kara wrote:
> Hi,
>
> On Mon 12-10-09 10:00:49, Chris Mason wrote:

[ clearing of I_DIRTY_DATASYNC by pdflush ]

> >
> > Am I missing something? I don't see how fdatasync is safe in our
> > current usage.
> Yeah, we already discussed similar problems I_DIRTY flags with Ted and
> others in thread "fsync on ext[34] working only by an accident" on
> linux-ext4.
> I don't quite like clearing dirty flags only on sync - pdflush would then
> unnecessarily try to get rid of those inodes and burn CPU on them.
> Actually, mapping->private_list (and bh->b_assoc_buffers) is meant to be
> used exactly for the purpose of tracking what needs to be written on fsync
> so my current plan is to somehow utilize that list to fix the problem.
> Maybe I even get to that tomorrow ;) Thanks for the reminder.

I honestly don't remember all the details now, but I know that when
reiserfs stopped using the b_assoc_buffers stuff life got much less
complex. From an outsider's point of view the last thing jbd needs is
another list of buffers to live on.

It seems like ext34 need to be able to answer 3 questions during an
fsync or fdatasync:

The last transaction to change this file (fill hole, change
i_size)

The last transaction to log this inode (for full fsync)

The last transaction committed such that fsync would consider it done.

Filling holes and changing i_size only happens from a handful of places,
so it would be easy to update a transid field in the in-memory inode for
that.

The inode logging code could bump a second transid field to catch all
the other ways inodes change.

The transaction code could (or already does?) export an easy way to
check the last commit. Put the three together and you can safely jump
out of fsync or fdatasync based on what the inode really needs instead
of guessing with the I_ flags or page dirty bits.

-chris






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/