Re: 2.6.20-rc4: known unfixed regressions (v2)

From: Vladimir V. Saveliev
Date: Wed Jan 10 2007 - 19:25:54 EST


Hello

On Tuesday 09 January 2007 21:30, Linus Torvalds wrote:
>
> On Tue, 9 Jan 2007, Malte Schröder wrote:
> >
> > > So something interesting is definitely going on, but I don't know exactly
> > > what it is. Why does reiserfs do the truncate as part of a close, if the
> > > same inode is actually mapped somewhere else?

on file close reiserfs tries to "pack" content of last incomplete page of file into metadata blocks.
It should not if that page is still mapped somewhere.
It does not actually truncate, it calls the same function which does truncate, but file size does not change.

Please consider the below patch.

From: Vladimir Saveliev <vs@xxxxxxxxxxx>

This patch fixes a confusion reiserfs has for a long time.

On release file operation reiserfs used to try to pack file data stored in last incomplete page of some files
into metadata blocks. After packing the page got cleared with clear_page_dirty.
It did not take into account that the page may be mmaped into
other process's address space. Recent replacement for clear_page_dirty cancel_dirty_page found the confusion
with sanity check that page has to be not mapped.

The patch fixes the confusion by making reiserfs to avoid tail packing if an inode was ever mmapped.
reiserfs_mmap and reiserfs_file_release are serialized with mutex in reiserfs specific inode.
reiserfs_mmap locks the mutex and sets a bit in reiserfs specific inode flags.
reiserfs_file_release checks the bit having the mutex locked. If bit is set - tail packing is avoided.
This eliminates a possibility that mmapped page gets cancel_page_dirty-ed.

Signed-off-by: Vladimir Saveliev <vs@xxxxxxxxxxx>



diff -puN fs/reiserfs/file.c~reiserfs-dont-convert-if-tail-page-mapped fs/reiserfs/file.c
--- linux-2.6.20-rc4/fs/reiserfs/file.c~reiserfs-dont-convert-if-tail-page-mapped 2007-01-11 02:09:19.000000000 +0300
+++ linux-2.6.20-rc4-vs/fs/reiserfs/file.c 2007-01-11 02:09:19.000000000 +0300
@@ -48,6 +48,11 @@ static int reiserfs_file_release(struct
}

mutex_lock(&inode->i_mutex);
+
+ mutex_lock(&(REISERFS_I(inode)->i_mmap));
+ if (REISERFS_I(inode)->i_flags & i_ever_mapped)
+ REISERFS_I(inode)->i_flags &= ~i_pack_on_close_mask;
+
reiserfs_write_lock(inode->i_sb);
/* freeing preallocation only involves relogging blocks that
* are already in the current transaction. preallocation gets
@@ -100,11 +105,24 @@ static int reiserfs_file_release(struct
err = reiserfs_truncate_file(inode, 0);
}
out:
+ mutex_unlock(&(REISERFS_I(inode)->i_mmap));
mutex_unlock(&inode->i_mutex);
reiserfs_write_unlock(inode->i_sb);
return err;
}

+static int reiserfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ inode = file->f_path.dentry->d_inode;
+ mutex_lock(&(REISERFS_I(inode)->i_mmap));
+ REISERFS_I(inode)->i_flags |= i_ever_mapped;
+ mutex_unlock(&(REISERFS_I(inode)->i_mmap));
+
+ return generic_file_mmap(file, vma);
+}
+
static void reiserfs_vfs_truncate_file(struct inode *inode)
{
reiserfs_truncate_file(inode, 1);
@@ -1527,7 +1545,7 @@ const struct file_operations reiserfs_fi
#ifdef CONFIG_COMPAT
.compat_ioctl = reiserfs_compat_ioctl,
#endif
- .mmap = generic_file_mmap,
+ .mmap = reiserfs_file_mmap,
.open = generic_file_open,
.release = reiserfs_file_release,
.fsync = reiserfs_sync_file,
diff -puN fs/reiserfs/inode.c~reiserfs-dont-convert-if-tail-page-mapped fs/reiserfs/inode.c
--- linux-2.6.20-rc4/fs/reiserfs/inode.c~reiserfs-dont-convert-if-tail-page-mapped 2007-01-11 02:09:19.000000000 +0300
+++ linux-2.6.20-rc4-vs/fs/reiserfs/inode.c 2007-01-11 02:14:57.000000000 +0300
@@ -1125,6 +1125,7 @@ static void init_inode(struct inode *ino
REISERFS_I(inode)->i_prealloc_count = 0;
REISERFS_I(inode)->i_trans_id = 0;
REISERFS_I(inode)->i_jl = NULL;
+ mutex_init(&(REISERFS_I(inode)->i_mmap));
reiserfs_init_acl_access(inode);
reiserfs_init_acl_default(inode);
reiserfs_init_xattr_rwsem(inode);
@@ -1832,6 +1833,7 @@ int reiserfs_new_inode(struct reiserfs_t
REISERFS_I(inode)->i_attrs =
REISERFS_I(dir)->i_attrs & REISERFS_INHERIT_MASK;
sd_attrs_to_i_attrs(REISERFS_I(inode)->i_attrs, inode);
+ mutex_init(&(REISERFS_I(inode)->i_mmap));
reiserfs_init_acl_access(inode);
reiserfs_init_acl_default(inode);
reiserfs_init_xattr_rwsem(inode);
diff -puN include/linux/reiserfs_fs_i.h~reiserfs-dont-convert-if-tail-page-mapped include/linux/reiserfs_fs_i.h
--- linux-2.6.20-rc4/include/linux/reiserfs_fs_i.h~reiserfs-dont-convert-if-tail-page-mapped 2007-01-11 02:09:19.000000000 +0300
+++ linux-2.6.20-rc4-vs/include/linux/reiserfs_fs_i.h 2007-01-11 02:09:19.000000000 +0300
@@ -25,6 +25,7 @@ typedef enum {
i_link_saved_truncate_mask = 0x0020,
i_has_xattr_dir = 0x0040,
i_data_log = 0x0080,
+ i_ever_mapped = 0x0100
} reiserfs_inode_flags;

struct reiserfs_inode_info {
@@ -52,6 +53,7 @@ struct reiserfs_inode_info {
** flushed */
unsigned long i_trans_id;
struct reiserfs_journal_list *i_jl;
+ struct mutex i_mmap;
#ifdef CONFIG_REISERFS_FS_POSIX_ACL
struct posix_acl *i_acl_access;
struct posix_acl *i_acl_default;

_



> > > And if it's a race with two
> > > different CPU's (one doing a "munmap()" and the other doing a "close()",
> > > then the unmap should _still_ have actually unmapped the pages before it
> > > actually did _its_ "release()" call.
> >
> > This was on a single core. But with CONFIG_PREEMPT_VOLUNTARY=y.
> > It didn't happen again since then.
>
> Yeah, PREEMPT would be able to show most races like this too. In fact,
> some races show up much better with preemption than they do with real SMP.
>
> But I haven't looked at what exactly reiserfs does. I did check that the
> VM layer definitely does the remove_vma() stuff (that actually closes the
> files) _after_ it has unmapped everything. It would have surprised me if
> we had had that kind of bug, but still..
>
> Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/