Re: [PATCH 0/4] jbd: possible filesystem corruption fixes
From: Hidehiro Kawai
Date: Wed Apr 23 2008 - 08:46:18 EST
Andreas Dilger wrote:
> On Apr 18, 2008 12:26 -0700, Mingming Cao wrote:
>
>>On Fri, 2008-04-18 at 10:09 -0400, Josef Bacik wrote:
>>
>>>On Fri, Apr 18, 2008 at 10:00:54PM +0900, Hidehiro Kawai wrote:
>>>
>>>>Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes
>>>>
>>>>The current JBD is not sufficient for I/O error handling. It can
>>>>cause filesystem corruption. An example scenario:
>>>>
>>>>1. fail to write a metadata buffer to block B in the journal
>>>>2. succeed to write the commit record
>>>>3. the system crashes, reboots and mount the filesystem
>>>>4. in the recovery phase, succeed to read data from block B
>>>>5. write back the read data to the filesystem, but it is a stale
>>>> metadata
>>>>6. lose some files and directories!
>>>>
>>>>This scenario is a rare case, but it (temporal I/O error)
>>>>can occur. If we abort the journal between 1. and 2., this
>>>>tragedy can be avoided.
>>>>
>>>>This patch set fixes several error handling problems to protect
>>>>from filesystem corruption caused by I/O errors. It has been
>>>>done only for JBD and ext3 parts.
>>
>>Could you sent Ext4/JBD2 version patches? Thanks!
>
>
> Actually, the journal checksum in ext4/jbd2 detects this kind of error,
> as well as errors that are NOT reported to the caller (e.g. media errors
> not reported to the kernel).
It's interesting feature. I read the journal checksum patch,
it seems to fix the problem addressed by PATCH 3/4.
However, journal checksum feature is optional, so PATCH 3/4
will be needed as long as checksuming feature isn't turned
on always.
> One question is whether we want to _introduce_ a point of failure to the
> filesystem that may never actually cause a problem for the system,
> since the journal is only needed in the case of a crash. By aborting
> the journal at this point instead of letting the checkpoint write the
> data to the filesystem then we are guaranteed a filesystem failure
> instead of "likely no problem at all".
I think it depends on the system and administrator.
When we failed to write metadata to the journal, we...
(a) abort journaling
- the filesystem can keep a consistent state if the system
crashed
- the system will stop because the filesystem becomes read-only
state (default)
(b) only do printk()
- the system can continue to work
- bad journalled data may break the file system if the system
crashed
A user who demands high data integrity will choose (a), and
a user who demands high availability will choose (b).
We might want to enable the user to specify the behavior
on error such as the "errors" mount option.
> The journal checksum would detect the bad data in the transaction in the
> cases where it is important, and during operation it makes more sense
> to report the error via printk() so the administrator has some chance to
> do something about it. There is no reason why the jbd2 change couldn't be
> merged back to jbd so ext3 could use the journal checksumming. It is a
> "COMPAT" journal feature.
It's interesting. For example, when a fsync operation is issued,
commit the current transaction, then read the journalled data of
that transaction to check the checksum. If the bad data is detected,
flush the whole journal. Aborting the journal will also make sense
because the journal space is errorneous.
Regards,
--
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/