Re: Linux 2.6.29

From: Theodore Tso
Date: Wed Mar 25 2009 - 11:01:52 EST


On Wed, Mar 25, 2009 at 01:37:44PM +0100, Jan Kara wrote:
> > Also, we do have to reliably get a lock on the buffer when moving it
> > between lists and inspecting its internal state. Otherwise a competing
> > read from the underlying block device can trigger an assertion failure,
> > and a competing write to the underlying block device can confuse ext3
> > journalling state completely.
>
> I've looked at this a bit. I suppose you mean the contention arising from
> us taking the buffer lock in do_get_write_access()? But it's not obvious
> to me why we'd be contending there... We call this function only for
> metadata buffers (unless in data=journal mode) so there isn't huge amount
> of these blocks.

There isn't a huge number of those blocks, but if inode #1220 was
modified in the previous transaction which is now being committed, and
we then need to modify and write out inode #1221 in the current
contention, and they share the same inode table block, that would
cause the contention. That probably doesn't happen that often in a
synchronous code path, but it probably happens more often that you're
thinking. I still think the fsync() problem is the much bigger deal,
and solving the contention problem isn't going to solve the fsync()
latency problem with ext3 data=ordered mode.

> Also when I emailed with a few people about these sync problems, they
> wrote that switching to data=writeback mode helps considerably so this
> would indicate that handling of ordered mode data buffers is causing most
> of the slowdown...

Yes, but we need to be clear whether this was an fsync() problem or
some other random delay problem. If it's the fsync() problem,
obviously data=writeback will solve the fsync() latency delay problem.
(As will using delayed allocation in ext4 or XFS.)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/