Re: Ext4: deadlock occurs when running fsstress and ENOSPC errors are seen.
From: Andreas Dilger
Date: Wed Apr 16 2014 - 01:23:12 EST
On Apr 15, 2014, at 11:07 PM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> On Wed, Apr 16, 2014 at 10:30:10AM +0530, Amit Sahrawat wrote:
>> 4) Corrupt the block group ‘1’ by writing all ‘1’, we had one file
>> with all 1’s, so using ‘dd’ –
>> dd if=i_file of=/dev/sdb1 bs=4096 seek=17 count=1
>> After this mount the partition – create few random size files and then
>> ran ‘fsstress,
>
> Um, sigh. You didn't say that you were deliberately corrupting the
> file system. That wasn't in the subject line, or anywhere else in the
> original message.
>
> So the question isn't how the file system got corrupted, but that
> you'd prefer that the system recovers without hanging after this
> corruption.
>
> I wish you had *said* that. It would have saved me a lot of time,
> since I was trying to figure out how the system had gotten so
> corrupted (not realizing you had deliberately corrupted the file
> system).
Don't we check the bitmaps upon load to verify they are not bogus?
Looks like this is disabled completely if flex_bg is enabled, though
I guess that it is not impossible to do some checking even if flex_bg
is enabled.
For example, we have the background thread to do the inode table zeroing,
and it could load the block bitmaps and check the group descriptors for
itable and bitmap locations against the various bitmaps. With flex_bg,
this would probably only load a small subset of block bitmaps.
> So I think if you run "tune2fs -e remount-ro /dev/sdb1" before you
> started the fsstress, the file system would have remounted the
> filesystem read-only at the first EXT4-fs error message. This would
> avoid the hang that you saw, since the file system would hopefully
> "failed fast", before th euser had the opportunity to put data into
> the page cache that would be lost when the system discovered there was
> no place to put the data.
Even without remount-ro, it would be possible to flag the block bitmaps
for a particular group as corrupt, so that no new allocations are done
from that group until the next e2fsck is run.
Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail