On Fri 11-10-24 10:18:04, Baokun Li wrote:
On 2024/10/9 23:50, Jan Kara wrote:Yes, we could implement something like this be as you wrote, it's going to
Yes, there can be a lot of work involved.Or go one step further and add a mechanism like xfs Reverse-Mapping, whichWell, that is a rather big change. It requires significant on-disk format
makes sure that allocated blocks do point to the target inode, which could
replace the current block_validity, and could also be used in future online
fscks.
change and also performance cost when to maintain. Furthermore for xattr
blocks which can be shared by many inodes it is not even clear how to
implement this... So I'm not sure we really want to do this either.
* Perhaps we could create an rmap file to store the rmap tree to avoid
on-disk format changes.
* The performance impact of maintaining rmap really needs to be evaluated,
perhaps by writing a simple DEMO to test it.
* XFS supports shared blocks(A.K.A. reflink.), so even if the physical
blocks are the same, but the inodes are different or the logical blocks
are different, they will be recorded multiple times in the tree. So the
shared xattr block can be handled similarly.
We have plans to support online fsck in the future, and implementing rmap
is one of the steps. Perhaps one can wait until rmap is implemented to
assess whether it is worth a strict check here.
be a lot of work. We've briefly discussed this with Ted on ext4 call and we
came to a conclusion that this is a type of corruption ext4 may never
protect agaist. You simply should not mount arbitrarily corrupted
filesystems...
But if you want to try, sure go ahead :)As server clusters get larger and larger, server maintenance becomes very
One relatively easy solution to similar class of problems would be to storeThis solution looks great. If we save checksum further, we can sense not
the type of metadata buffer inside the buffer_head when we are verifying
checksum, clear the info when freeing the block in __ext4_forget(), and
fail with EFSCORRUPTED error when one type -> another type transition would
happen.
Implementing rmap may take some time, until then we can avoid the problemHum, there are more places in xattr code that access a buffer that could
as much as possible by checking the magic and xattr block csum.
Maybe something like this?
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7647e9f6e190..cd3ae1e3371c 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct
ext4_xattr_info *i,
}
}
+ if (WARN_ON_ONCE(last < here)) {
+ EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s",
+ is_block ? "block" : "ibody");
+ ret = -EFSCORRUPTED;
+ goto out;
+ }
+
/* Check whether we have enough space. */
if (i->value) {
size_t free;
@@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode
*inode,
}
if (s->base) {
+ struct ext4_xattr_header *hdr;
int offset = (char *)s->here - bs->bh->b_data;
BUFFER_TRACE(bs->bh, "get_write_access");
@@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode
*inode,
goto cleanup;
lock_buffer(bs->bh);
+ hdr = header(s->base);
+
+ if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
+ (ext4_has_metadata_csum(inode->i_sb) &&
+ (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, hdr)
!=
+ hdr->h_checksum))) {
+ unlock_buffer(bs->bh);
+ error = -EFSCORRUPTED;
+ goto bad_block;
+ }
if (header(s->base)->h_refcount == cpu_to_le32(1)) {
__u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash);
have been modified. So why do you add check into this place? Is it somehow
special?
Honza