Re: True fsync() in Linux (on IDE)

From: Peter Zaitsev
Date: Sat Mar 20 2004 - 14:51:08 EST


On Sat, 2004-03-20 at 02:20, Jamie Lokier wrote:
> Peter Zaitsev wrote:
> > If file system would guaranty atomicity of write() calls (synchronous
> > would be enough) we could disable it and get good extra performance.
>
> Store an MD5 or SHA digest of the page in the page itself, or elsewhere.
> (Obviously the digest doesn't include the bytes used to store it).
>
> Then partial write errors are always detectable, even if there's a
> hardware failure, so journal writes are effectively atomic.

Jamie,

The problem is not detecting the partial page writes, but dealing with
them. Obviously there is checksum on the page (it is however not
MD5/SHA which are designed for cryptographic needs) and so page
corruption is detected if it happens for whatever reason.

The problem is you can't do anything with the page if only unknown
portion of it was modified.

Innodb uses sort of "logical" logging which just says something like
delete row #2 from page #123, so if page is badly corrupted it will not
help to recover.

Of course you can log full pages, but this will increase overhead
significantly, especially for small row sizes.

This is why solution now is to use long term "logical" log and short
term "physical" log, which is used by background page writer, before
writing pages to their original locations.


--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/