Re: [RFC] Change locking in block_dev.c:do_open()

From: Richard Gooch (
Date: Fri Dec 14 2001 - 12:07:42 EST

Alexander Viro writes:
> On Thu, 13 Dec 2001, Richard Gooch wrote:
> > > Sigh... Please, take a look at sys_swapon() or get_sb_bdev() or
> > > devfs_open(). Any version starting with 2.3.46-pre<something> when
> > > devfs went into the tree.
> >
> > Wait a minute. Before poking holes at other code paths, can you please
> > answer the question I'm asking? I repeat: for the devfs revalidate
> > code, do you see any remaining problems if I increment the module
> > usage count?
> For revalidate - yes.

Really? So what problem remains for revalidate? Below is my current
implementation of this code. Note that this function has a refcount on
the devfs entry, so it can't be freed at this point, although it can
be unregistered (unhooked from it's parent's list). At the point I
unlock the parent directory, the module cannot be unloaded, and it
isn't already unloading. So the bdops calls should be safe now.

static int check_disc_changed (struct devfs_entry *de)
    int tmp;
    int retval = 0;
    kdev_t dev = MKDEV (de->u.fcb.u.device.major, de->u.fcb.u.device.minor);
    struct block_device_operations *bdops = de->u.fcb.ops;
    extern int warn_no_part;

    if (!S_ISBLK (de->mode) || !bdops) return 0;
    read_lock (&de->parent->u.dir.lock); /* Prevent module from unloading */
    if ( (de->next == de) || !try_inc_mod_count (bdops->owner) )
    { /* Entry is already unhooked or module is unloading */
        read_unlock (&de->parent->u.dir.lock);
        return 0;
    read_unlock (&de->parent->u.dir.lock); /* Module can continue unloading*/
    if (bdops->check_media_change == NULL) goto out;
    if ( !bdops->check_media_change (dev) ) goto out;
    retval = 1;
    printk ( KERN_DEBUG "VFS: Disk change detected on device %s\n",
             kdevname (dev) );
    if (invalidate_device(dev, 0))
        printk("VFS: busy inodes on changed media..\n");
    /* Ugly hack to disable messages about unable to read partition table */
    tmp = warn_no_part;
    warn_no_part = 0;
    if (bdops->revalidate) bdops->revalidate (dev);
    warn_no_part = tmp;
    if (bdops->owner) __MOD_DEC_USE_COUNT (bdops->owner);
    return retval;
} /* End Function check_disc_changed */

> > > All of them have ->bd_op set from devfs handle and follow that with
> > > blkdev_get() - directly or indirectly via That
> > > function blocks. Think what will happen if entry is removed while
> > > blkdev_get() sleeps on semaphore.
> >
> > When you say "entry is removed", are you actually referring to a devfs
> > entry being removed? Or are you talking more generally about the
> > module being removed?
> devfs_unregister() being called.
> > For the case where devfs_open() is running, the devfs entry remains
> > valid over the lifetime of the open(). In fact, it remains valid over
> > the lifetime of the dentry.
> That it does, but it doesn't keep its contents alive. In particular,
> pointer to methods table is junk as soon as devfs_unregister() is done
> with it.
> > > BTW, I'd described that to you several times - last one couple of
> > > months ago on l-k.
> >
> > And since then the refcounting code has gone in, and many of your old
> > concerns have been addressed. I'm trying to engage you in a productive
> > dialogue to resolve any remaining issues.
> This one is unaffected by refcounting of devfs objects - what happens is
> * find devfs object
> * check that it's alive and pick data from it
> * store that data (outside of devfs). At that point devfs object
> is not involved anymore.
> * block (outside of devfs)
> * use that data (outside of devfs)
> * die, since the module had told devfs to invalidate the contents of
> devfs object and freed the methods table that used to be pointed to by
> data stored in devfs object.
> And any attempt to pick that data from devfs would fail (as it
> should). The trouble being, we'd already picked it - before
> devfs_unregister() had been called.

Yes, so the solution I sent you last night should fix the sequence of
events you describe above. This is what I said:

I've been wading through the various code paths. It seems to me that
devfs_get_ops() should bump the module refcount, and the 4 places
which call devfs_get_ops() should be changed to make a call to
devfs_put_ops() once they have done their stuff (or at least
implicitely bumped the module refcount again). A new function
devfs_put_ops() should be added.

As an addendum to that, after I fired off that message I decided I
should clean up the way external code gets hold of a devfs handle
(basically push forward the worked tagged by that FIXME in
devfs_find_handle()). I'll create a new function devfs_get_handle()
which keeps the refcount to the entry, and export devfs_put().


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:28 EST