Re: Ext4 and the "30 second window of death"

From: Theodore Tso
Date: Tue Mar 31 2009 - 09:46:04 EST


On Tue, Mar 31, 2009 at 02:52:05PM +0200, Alberto Gonzalez wrote:
>
> You've proposed that in laptop mode, fsync's should be held until next write
> cycle (say every 30 seconds) so that the disk is not spun up unnecessarily,
> wasting battery and shortening it's lifespan too. I absolutely agree with
> this, and as a trade-off I'm ok with losing my last paragraph even if I did hit
> Ctrl+S to save it a few seconds before a crash. But again, with Ext4 will I
> just lose that last paragraph or the whole book in this case?

Laptop mode is already set up such that the moment the disk spins up,
any pending writes are immediately flushed to disk --- the idea being
that if the disk is spinning, we might as well take advantage of it to
get everything pushed out to disk. As long as we actually keep a
linked list of those fsync's which were "held up", and we make sure
all of the delayed allocation blocks are also allocated before we push
them out, the right thing will happen. If we just ignore the fsync's,
then we might not allocate the delayed allocation blocks. So
basically, we need to be careful about how we implement this addition
to laptop_mode.

Jeff Garzik has also pointed out that there are additional concerns
for databases which may have issued multiple fsync()'s while the disk
has been spun down, where we wouldn't want to mix writes between
fsync()'s. This basically boils down to how much protection do we
want to give for the case where the system crashes while the disk
blocks are being pushed out to disk. (Which isn't that farfetched;
consider the case where the laptop is very low on battery, and runs
out when the disk is woken up and crashes before all of the writes
could be processed.)

So there are some things that would be tricky in terms of implementing
this perfectly, and maybe we would disable the fsync suppression
machinery if the battery level isgetting critical --- and then do
either a clean shutdown or a suspend-to-disk (although here too there
had better be enough juice in the battery to write all of memory to
your swap partition).

The bottom line is that it *can* be implemented safely, but there are
some things that we would need to pay attention to in order to make
sure it *was* safe.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/