Re: [RFC PATCH] Introduce filesystem type tracking

From: Tom Spink
Date: Wed May 21 2008 - 10:50:11 EST

Next message: Tarkan Erimer: "Re: Suggestion About Kernel Releases"
Previous message: Ray Lee: "Re: completely lost as to what to do, or where to look."
In reply to: Tom Spink: "Re: [RFC PATCH] Introduce filesystem type tracking"
Next in thread: Jan Engelhardt: "Re: [RFC PATCH] Introduce filesystem type tracking"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

2008/5/20 Tom Spink <tspink@xxxxxxxxx>:
> 2008/5/20 Matthew Wilcox <matthew@xxxxxx>:
>> On Tue, May 20, 2008 at 10:08:04PM +0100, Tom Spink wrote:
>>> I've taken some more time to go over the locking semantics. I wrote a
>>> quick toy filesystem to simulate delays, blocking, memory allocation,
>>> etc in the init and exit routines - and with an appropriately large
>>> amount of printk's everywhere, I saw a quite a few interleavings.
>>>
>>> I *think* I may have got it right, but please, let me know what you
>>> think! The only thing that I think may be wrong with this patch is
>>> the
>>> spin_lock/unlock at the end of sget, where the superblock is
>>> list_add_tailed into the super_blocks list. I believe this opens the
>>> possibility for the same superblock being list_add_tailed twice... can
>>> anyone else see this code-path, and is it a problem?
>>
>> Hi Tom,
>
> Hi Matthew,
>
>> I spotted one definite bug; on failure, you leave the superblock on
>> the super_blocks list.
>
> I spotted this while I was coding, and I was careful not to let it get
> added to the list... If the ->init routine fails, the superblock
> hasn't even been added to the list yet. The patch moves this line:
>
> list_add_tail(&s->s_list, &super_blocks);
>
> Down to after the ->init call.
>
>> Your locking may well be correct, but it has the hallmarks of being "a bit
>> tricky" and a bit tricky means potentially buggy. How about doing the
>> nesting the other way round, ie take the mutex first, then the spinlock?
>
> Thanks for the suggestion!
>
>> The code needs a bit of tweaking because you don't want to put the
>> superblock on any list where it can be found until it's fully
>> initialised. This may not be quite right:
>>
>>> + mutex_lock(&type->fs_supers_lock);
>>> spin_lock(&sb_lock);
>>> /* should be initialized for __put_super_and_need_restart() */
>>> list_del_init(&sb->s_list);
>>> list_del(&sb->s_instances);
>>> spin_unlock(&sb_lock);
>>> +
>>> + if (list_empty(&type->fs_supers) && type->exit)
>>> + type->exit();
>>> + mutex_unlock(&type->fs_supers_lock);
>>> +
>>> up_write(&sb->s_umount);
>>> }
>>>
>
> I'll definitely give it a go.
>
>> sget is a little more complex ... the fs_supers_lock would need to be
>> dropped in a lot more places than I've shown here:
>>
>> @@ -365,11 +372,31 @@ retry:
>> retry:
>> + mutex_lock(&type->fs_supers_lock);
>> spin_lock(&sb_lock);
>>
>> destroy_super(s);
>> return ERR_PTR(err);
>> }
>> s->s_type = type;
>> strlcpy(s->s_id, type->name, sizeof(s->s_id));
>> + if (list_empty(&type->fs_supers) && type->init) {
>> + spin_unlock(&sb_lock);
>> + err = type->init();
>> + if (err) {
>> + mutex_unlock(&type->fs_supers_lock);
>> + destroy_super(s);
>> + return ERR_PTR(err);
>> + }
>> + spin_lock(&sb_lock);
>> + }
>> list_add_tail(&s->s_list, &super_blocks);
>> list_add(&s->s_instances, &type->fs_supers);
>> spin_unlock(&sb_lock);
>> + mutex_unlock(&type->fs_supers_lock);
>> get_filesystem(type);
>> return s;
>> }
>
> I had something similar earlier, but I thought it started to look
> slightly messy when I discovered that dropping the spinlock would lead
> to a racey ->init... but I hadn't thought of putting the mutex outside
> the spinlock; the mutex protecting ->init and ->exit (I was getting
> caught up in trying not to go to sleep inside a spinlock)
>
> Thanks!
> --
> Tom Spink
>

Ready for another? <g>

Here's another try, with Matthews suggestion of moving the mutex
outside the spinlock. Again, I've used a wee stress test that tries
to mount a toy filesystem many times, with random pauses in the init
routines. It seems to pass this (and again I've seen quite a few
interleavings of the calls), and a mental scan of the code paths leads
me to believe the locking is correct.

Thanks for putting up with me, guys!

-- Tom

--

From: Tom Spink <tspink@xxxxxxxxx>
Date: Wed, 21 May 2008 13:29:07 +0100
Subject: [PATCH] Introduce on-demand filesystem initialisation

This patch adds on-demand filesystem initialisation capabilities to the VFS,
whereby an init routine will be executed on first use of a particular
filesystem type. Also, an exit routine will be executed when the last
superblock of a filesystem type is deactivated.

This is useful for filesystems that share global resources between all
instances of the filesystem, but only need those resources when there are
any users of the filesystem. This lets the filesystem initialise those
resources (kernel threads or caches, say) when the first superblock is
created. It also lets the filesystem clean up those resources when the
last superblock is deactivated.

Signed-off-by: Tom Spink <tspink@xxxxxxxxx>
---
fs/filesystems.c | 2 ++
fs/super.c | 29 ++++++++++++++++++++++++++++-
include/linux/fs.h | 3 +++
3 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/fs/filesystems.c b/fs/filesystems.c
index f37f872..59b2eaa 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -79,6 +79,7 @@ int register_filesystem(struct file_system_type * fs)
res = -EBUSY;
else
*p = fs;
+ mutex_init(&fs->fs_supers_lock);
write_unlock(&file_systems_lock);
return res;
}
@@ -105,6 +106,7 @@ int unregister_filesystem(struct file_system_type * fs)
tmp = &file_systems;
while (*tmp) {
if (fs == *tmp) {
+ mutex_destroy(&fs->fs_supers_lock);
*tmp = fs->next;
fs->next = NULL;
write_unlock(&file_systems_lock);
diff --git a/fs/super.c b/fs/super.c
index 453877c..65252c2 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -287,7 +287,9 @@ int fsync_super(struct super_block *sb)
void generic_shutdown_super(struct super_block *sb)
{
const struct super_operations *sop = sb->s_op;
+ struct file_system_type *type = sb->s_type;

+ mutex_lock(&type->fs_supers_lock);
if (sb->s_root) {
shrink_dcache_for_umount(sb);
fsync_super(sb);
@@ -317,7 +319,12 @@ void generic_shutdown_super(struct super_block *sb)
list_del_init(&sb->s_list);
list_del(&sb->s_instances);
spin_unlock(&sb_lock);
+
+ if (list_empty(&type->fs_supers) && type->exit)
+ type->exit();
+
up_write(&sb->s_umount);
+ mutex_unlock(&type->fs_supers_lock);
}

EXPORT_SYMBOL(generic_shutdown_super);
@@ -338,6 +345,7 @@ struct super_block *sget(struct file_system_type *type,
struct super_block *old;
int err;

+ mutex_lock(&type->fs_supers_lock);
retry:
spin_lock(&sb_lock);
if (test) {
@@ -348,14 +356,17 @@ retry:
goto retry;
if (s)
destroy_super(s);
+ mutex_unlock(&type->fs_supers_lock);
return old;
}
}
if (!s) {
spin_unlock(&sb_lock);
s = alloc_super(type);
- if (!s)
+ if (!s) {
+ mutex_unlock(&type->fs_supers_lock);
return ERR_PTR(-ENOMEM);
+ }
goto retry;
}

@@ -363,14 +374,30 @@ retry:
if (err) {
spin_unlock(&sb_lock);
destroy_super(s);
+ mutex_unlock(&type->fs_supers_lock);
return ERR_PTR(err);
}
+
+ if (list_empty(&type->fs_supers) && type->init) {
+ spin_unlock(&sb_lock);
+ err = type->init();
+ if (err < 0) {
+ destroy_super(s);
+ mutex_unlock(&type->fs_supers_lock);
+ return ERR_PTR(err);
+ }
+ spin_lock(&sb_lock);
+ }
+
s->s_type = type;
strlcpy(s->s_id, type->name, sizeof(s->s_id));
+
list_add_tail(&s->s_list, &super_blocks);
list_add(&s->s_instances, &type->fs_supers);
+
spin_unlock(&sb_lock);
get_filesystem(type);
+ mutex_unlock(&type->fs_supers_lock);
return s;
}

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f413085..92d446f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1477,8 +1477,11 @@ struct file_system_type {
int (*get_sb) (struct file_system_type *, int,
const char *, void *, struct vfsmount *);
void (*kill_sb) (struct super_block *);
+ int (*init) (void);
+ void (*exit) (void);
struct module *owner;
struct file_system_type * next;
+ struct mutex fs_supers_lock;
struct list_head fs_supers;

struct lock_class_key s_lock_key;
--
1.5.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tarkan Erimer: "Re: Suggestion About Kernel Releases"
Previous message: Ray Lee: "Re: completely lost as to what to do, or where to look."
In reply to: Tom Spink: "Re: [RFC PATCH] Introduce filesystem type tracking"
Next in thread: Jan Engelhardt: "Re: [RFC PATCH] Introduce filesystem type tracking"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]