Re: [RFC] ext3: per-process soft-syncing data=ordered mode

From: Al Boldi
Date: Sun Feb 10 2008 - 09:55:27 EST


Andreas Dilger wrote:
> On Jan 26, 2008 08:27 +0300, Al Boldi wrote:
> > Jan Kara wrote:
> > > > data=ordered mode has proven reliable over the years, and it does
> > > > this by ordering filedata flushes before metadata flushes. But this
> > > > sometimes causes contention in the order of a 10x slowdown for
> > > > certain apps, either due to the misuse of fsync or due to inherent
> > > > behaviour like db's, as well as inherent starvation issues exposed
> > > > by the data=ordered mode.
> > > >
> > > > data=writeback mode alleviates data=order mode slowdowns, but only
> > > > works per-mount and is too dangerous to run as a default mode.
> > > >
> > > > This RFC proposes to introduce a tunable which allows to disable
> > > > fsync and changes ordered into writeback writeout on a per-process
> > > > basis like this:
> > > >
> > > > echo 1 > /proc/`pidof process`/softsync
> > >
> > > I guess disabling fsync() was already commented on enough. Regarding
> > > switching to writeback mode on per-process basis - not easily possible
> > > because sometimes data is not written out by the process which stored
> > > them (think of mmaped file).
> >
> > Do you mean there is a locking problem?
> >
> > > And in case of DB, they use direct-io
> > > anyway most of the time so they don't care about journaling mode
> > > anyway.
> >
> > Testing with sqlite3 and mysql4 shows that performance drastically
> > improves with writeback writeout.
> >
> > > But as Diego wrote, there is definitely some room for improvement in
> > > current data=ordered mode so the difference shouldn't be as big in the
> > > end.
> >
> > Yes, it would be nice to get to the bottom of this starvation problem,
> > but even then, the proposed tunable remains useful for misbehaving apps.
>
> Al, can you try a patch posted to linux-fsdevel and linux-ext4 from
> Hisashi Hifumi <hifumi.hisashi@xxxxxxxxxxxxx> to see if this improves
> your situation? Dated Mon, 04 Feb 2008 19:15:25 +0900.
>
> [PATCH] ext3,4:fdatasync should skip metadata writeout when
> overwriting
>
> It may be that we already have a solution in that patch for database
> workloads where the pages are already allocated by avoiding the need
> for ordered mode journal flushing in that case.

Well, it seems that it does have a positive effect for the 'konqueror hangs'
case, but doesn't improve the db case.

This shouldn't be surprising, as the db redundant writeout problem is
localized not in fsync but rather in ext3_ordered_write_end.

Maybe some form of a staged merged commit could help.


Thanks!

--
Al

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/