[RFC] mount flag "direct"

From: Peter T. Breuer (ptb@it.uc3m.es)
Date: Tue Sep 03 2002 - 10:01:24 EST


I'll rephrase this as an RFC, since I want help and comments.

Scenario:
I have a driver which accesses a "disk" at the block level, to which
another driver on another machine is also writing. I want to have
an arbitrary FS on this device which can be read from and written to
from both kernels, and I want support at the block level for this idea.

Question:
What do people think of adding a "direct" option to mount, with the
semantics that the VFS then makes all opens on files on the FS mounted
"direct" use O_DIRECT, which means that file r/w is not cached in VMS,
but instead goes straight to and from the real device? Is this enough
or nearly enough for what I have in mind?

Rationale:
No caching means that each kernel doesn't go off with its own idea of
what is on the disk in a file, at least. Dunno about directories and
metadata.

Wish:
If that mount option looks promising, can somebody make provision for
it in the kernel? Details to be ironed out later?

What I have explored or will explore:
1) I have put shared zoned read/write locks on the remote resource, so each
kernel request locks precisely the "disk" area that it should, in
precisely the mode it should, for precisely the duration of each block
layer request.

2) I have maintained request write order from individual kernels.

3) IMO I should also intercept and share the FS superblock lock, but thats
for later, and please tell me about it. What about dentries? Does
O_DIRECT get rid of them? What happens with mkdir?

4) I would LIKE the kernel to emit a "tag request" on the underlying
device before and after every atomic FS operation, so that I can maintain
FS atomicity at the block level. Please comment. Can somebody make this
happen, please? Or do I add the functionality to VFS myself? Where?

I have patched the kernel to support mount -o direct, creating MS_DIRECT
and MNT_DIRECT flags for the purpose. And it works. But I haven't
dared do too much to the remote FS by way of testing yet. I have
confirmed that individual file contents can be changed without problem
when the file size does not change.

Comments?

Here is the tiny proof of concept patch for VFS that implements the
"direct" mount option.

Peter

The idea embodied in this patch is that if we get the MS_DIRECT flag when
the vfs do_mount() is called, we pass it across into the mnt flags used
by do_add_mount() as MNT_DIRECT and thus make it a permament part of the
vfsmnt object that is the mounted fs. Then, in the generic
dentry_open() call for any file, we examine the flags on the mnt
parameter and set the O_DIRECT flag on the file pointer if MNT_DIRECT
is set on the vfsmnt object.

That makes all file opens O_DIRECT on the file system in question,
and makes all file accesses uncached by VMS.

The patch in itself works fine.

--- linux-2.5.31/fs/open.c.pre-o_direct Mon Sep 2 20:36:11 2002
+++ linux-2.5.31/fs/open.c Mon Sep 2 17:12:08 2002
@@ -643,6 +643,9 @@
                 if (error)
                         goto cleanup_file;
         }
+ if (mnt->mnt_flags & MNT_DIRECT)
+ f->f_flags |= O_DIRECT;
+
         f->f_ra.ra_pages = inode->i_mapping->backing_dev_info->ra_pages;
         f->f_dentry = dentry;
         f->f_vfsmnt = mnt;
--- linux-2.5.31/fs/namespace.c.pre-o_direct Mon Sep 2 20:37:39 2002
+++ linux-2.5.31/fs/namespace.c Mon Sep 2 17:12:04 2002
@@ -201,6 +201,7 @@
                 { MS_MANDLOCK, ",mand" },
                 { MS_NOATIME, ",noatime" },
                 { MS_NODIRATIME, ",nodiratime" },
+ { MS_DIRECT, ",direct" },
                 { 0, NULL }
         };
         static struct proc_fs_info mnt_info[] = {
@@ -734,7 +741,9 @@
                 mnt_flags |= MNT_NODEV;
         if (flags & MS_NOEXEC)
                 mnt_flags |= MNT_NOEXEC;
- flags &= ~(MS_NOSUID|MS_NOEXEC|MS_NODEV);
+ if (flags & MS_DIRECT)
+ mnt_flags |= MNT_DIRECT;
+ flags &= ~(MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_DIRECT);
 
         /* ... and get the mountpoint */
         retval = path_lookup(dir_name, LOOKUP_FOLLOW, &nd);
--- linux-2.5.31/include/linux/mount.h.pre-o_direct Mon Sep 2 20:31:16 2002
+++ linux-2.5.31/include/linux/mount.h Mon Sep 2 18:06:14 2002
@@ -17,6 +17,7 @@
 #define MNT_NOSUID 1
 #define MNT_NODEV 2
 #define MNT_NOEXEC 4
+#define MNT_DIRECT 256
 
 struct vfsmount
 {
--- linux-2.5.31/include/linux/fs.h.pre-o_direct Mon Sep 2 20:32:05 2002
+++ linux-2.5.31/include/linux/fs.h Mon Sep 2 18:05:57 2002
@@ -104,6 +104,9 @@
 #define MS_REMOUNT 32 /* Alter flags of a mounted FS */
 #define MS_MANDLOCK 64 /* Allow mandatory locks on an FS */
 #define MS_DIRSYNC 128 /* Directory modifications are synchronous */
+
+#define MS_DIRECT 256 /* Make all opens be O_DIRECT */
+
 #define MS_NOATIME 1024 /* Do not update access times. */
 #define MS_NODIRATIME 2048 /* Do not update directory access times */
 #define MS_BIND 4096

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 07 2002 - 22:00:17 EST