Re: [Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check

From: Junxiao Bi
Date: Wed Nov 25 2015 - 00:45:14 EST


On 11/25/2015 01:04 PM, Gang He wrote:
> Hi Mark and Junxiao,
>
>
>>>>
>> Hi Mark,
>>
>> On 11/25/2015 06:16 AM, Mark Fasheh wrote:
>>> Hi Junxiao,
>>>
>>> On Tue, Nov 03, 2015 at 03:12:35PM +0800, Junxiao Bi wrote:
>>>> Hi Gang,
>>>>
>>>> This is not like a right patch.
>>>> First, online file check only checks inode's block number, valid flag,
>>>> fs generation value, and meta ecc. I never see a real corruption
>>>> happened only on this field, if these fields are corrupted, that means
>>>> something bad may happen on other place. So fix this field may not help
>>>> and even cause corruption more hard.
>>>
>>> I agree that these are rather uncommon, we might even consider removing the
>>> VALID_FL fixup. I definitely don't think we're ready for anything more
>>> complicated than this though either. We kind of have to start somewhere too.
>>>
>> Yes, the fix is too simple, and just a start, I think we'd better wait
>> more useful parts done before merging it.
> I agree, just remark VALID_FL flag to fix this field is too simple, we should delay this field fix before
> I have a flawless solution, I will remove these lines code in the first version patches. In the future submits,
> I also hope your guys to help review the code carefully, shout out your comments when you doubt somewhere.
Sure.

>
>
>
>>>
>>>> Second, the repair way is wrong. In
>>>> ocfs2_filecheck_repair_inode_block(), if these fields in disk don't
>>>> match the ones in memory, the ones in memory are used to update the disk
>>>> fields. The question is how do you know these field in memory are
>>>> right(they may be the real corrupted ones)?
>>>
>>> Your second point (and the last part of your 1st point) makes a good
>>> argument for why this shouldn't happen automatically. Some of these
>>> corruptions might require a human to look at the log and decide what to do.
>>> Especially as you point out, where we might not know where the source of the
>>> corruption is. And if the human can't figure it out, then it's probably time
>>> to unmount and fsck.
>> The point is that the fix way is wrong, just flush memory info to disk
>> is not right. I agree online fsck is good feature, but need carefully
>> design, it should not involve more corruptions. A rough idea from mine
>> is that maybe we need some "frezee" mechanism in fs, which can hung all
>> fs op and let fs stop at a safe area. After freeze fs, we can do some
>> fsck work on it and these works should not cost lots time. What's your idea?
> If we need to touch some global data structures, freezing fs can be considered when we can't
> get any way in case using the locks.
> If we only handle some independent problem, we just need to lock the related data structures.
Hmm, I am not sure whether it's hard to decide an independent issue.

Thanks,
Junxiao.
>
>>
>> Thanks,
>> Junxiao.
>>
>>>
>>> Thanks,
>>> --Mark
>>>
>>> --
>>> Mark Fasheh
>>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/