Re: ext3_ordered_writepage() questions

From: Stephen C. Tweedie
Date: Fri Mar 17 2006 - 17:20:25 EST


On Fri, 2006-03-17 at 13:32 -0800, Badari Pulavarty wrote:

> I have a patch which eliminates adding buffers to the journal, if
> we are doing just re-write of the disk block. ...

> 2.6.16-rc6 2.6.16-rc6+patch
> real 0m6.606s 0m3.705s

OK, that's a really significant win! What exactly was the test case for
this, and does that performance edge persist for a longer-running test?

> In real world, does this ordering guarantee matter ?

Not that I am aware of. Even with the ordering guarantee, there is
still no guarantee of the order in which the writes hit disk within that
transaction, which makes it hard to depend on it.

I recall that some versions of fsync depended on ordered mode flushing
dirty data on transaction commit, but I don't think the current
ext3_sync_file() will have any problems there.

Other than that, the only thing I can think of that had definite
dependencies in this are was InterMezzo, and that's no longer in the
tree. Even then, I'm not 100% certain that InterMezzo had a dependency
for overwrites (it was certainly strongly dependent on the ordering
semantics for allocates.)

It is theoretically possible to write applications that depend on that
ordering, but they would be necessarily non-portable anyway. I think
relaxing it is fine, especially for a 100% (wow) performance gain.

There is one other perspective to be aware of, though: the current
behaviour means that by default ext3 generally starts flushing pending
writeback data within 5 seconds of a write. Without that, we may end up
accumulating a lot more dirty data in memory, shifting the task of write
throttling from the filesystem to the VM.

That's not a problem per se, just a change of behaviour to keep in mind,
as it could expose different corner cases in the performance of
write-intensive workloads.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at