Re: Are there u32 atomic bitops? (or dealing w/ i_flags)

From: Dave Chinner
Date: Thu Dec 20 2012 - 02:04:08 EST


On Tue, Dec 18, 2012 at 02:20:06PM -0800, Andy Lutomirski wrote:
> On Tue, Dec 18, 2012 at 1:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Mon, Dec 17, 2012 at 06:42:44PM -0800, Andy Lutomirski wrote:
> >> On Mon, Dec 17, 2012 at 5:57 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >> > On Mon, Dec 17, 2012 at 05:10:21PM -0800, Andy Lutomirski wrote:
> >> >> I want to change inode->i_flags access to be atomic -- there are some
> >> >> locking oddities right now, I think, and I want to use a new inode
> >> >> flag to signal mtime updates from page_mkwrite. The problem is that
> >> >> i_flags is an unsigned int, and making it an unsigned long seems like
> >> >> a waste, but there aren't any u32 atomic bitops.
> >> >
> >> > ... and atomic accesses cost more. A lot more on some architectures.
> >> > FWIW, atomic_t *is* 32bit on 32bit architectures, which still doesn't
> >> > make it a good idea.
> >>
> >> Are atomic_set_mask and atomic_clear_mask as fast as set_bit and
> >> friends on all archs?
> >>
> >> In any case, i_flags looks like it's rarely written, so I find it a
> >> bit hard to believe that making it atomic would hurt. Isn't
> >> atomic_read equivalent to non-atomic reads everywhere?
> >>
> >> I want page_mkwrite to set a flag (without taking i_mutex) but *not*
> >> call file_update_time and then to have the writeback paths update the
> >> inode time.
> >
> > Deadlocks ahoy!
> >
> > We don't currently take the i_mutex anywhere in the writeback path
> > as the writeback path is nested inside the i_mutex. Hence this seems
> > like an extremely dangerous thing to do...
>
> Hmm.
>
> This can all be made fs-specific: page_mkclean_file sets a bit in the
> inode flags or inode state or address_space_flags. (The latter might
> be nice -- it's already atomic and I think there are plenty of bits
> free.) Then a filesystem could stop calling file_update_time in
> ->page_mkwrite and instead test-clear that bit in ->writepage.

writepage happens with pages locked. There's a limit to what you can
do in a filesystem while a page is locked, and you most certainly
don't want to be taking the i_mutex at that point...

> At that point, maybe the fs knows it's safe to mark the inode dirty
> and update times. (The actual time fields don't seem to be uniformly
> protected by any lock.) Can ext4_mark_inode_dirty be called from
> ->writepage?

Probably, but that's a filesystem internal function that has to be
called from with a valid transaction handle.

The fact you are conerned about this function tells me something
important - that you aren't having problems with i_mutex (show me
where i_mutex is taken on the page_mkwrite path ;), but you are
having latency problems with the ext4 .dirty_inode method starting a
new transaction when it is called from mark_inode_dirty_sync().

So, a filesystem specific problem, perhaps?

> Filesystems that haven't been converted can will continue to update
> times in ->page_mkwrite.

You don't need to change this at all. If you have ext4
implement .update_timestamp to do whatever timestamp trickery you
want and avoid ext4 starting a new transaction in .dirty_inode for
pure timestamp updates, you can move the timestamp update into the
ext4 writeback path. The ext4 writeback path is already quite
special, so I'm sure people would welcome another weird behaviour
being added to it :)

IOWs, what you want to do doesn't seem to require any changes to the
generic code. Just make it do timestamp updates in a manner similar
to XFS and btrfs, and you can handle it all completely internally to
the filesystem...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/