Re: [bug, 2.6.26-rc4/rc5] sporadic bootup crashes inblk_lookup_devt()/prepare_namespace()

From: Greg KH
Date: Mon Jun 09 2008 - 23:13:39 EST


On Mon, Jun 09, 2008 at 09:15:40AM -0700, Linus Torvalds wrote:
>
>
> On Mon, 9 Jun 2008, Ingo Molnar wrote:
> >
> > ah. I suspect that explains the sporadic nature as well: normally there
> > is 'some' object at the list address, just with an invalid type.
>
> Yes. It could cause two kinds of problems:
>
> - it might end up returning the wrong 'dev_t'. This is unlikely, since we
> only have two cases: the working whole-disk case, and the case where we
> find a partition.
>
> But if we find a partition, we'd still get the right dev_t *most* of
> the time, because we'd first get called with "part=0", and then we have
>
> if (part < disk->minors)
> devt = MKDEV(MAJOR(dev->devt),
> MINOR(dev->devt) + part);
> break;
>
> where we would only fail if that conditional statement would be untrue
> (and then we'd incorrectly return MKDEV(0,0)). Otherwise, 'devt' ends
> up being correct anyway.
>
> So one effect of this bug would be that it would use the random
> "disk->minors" value to either return the right devt, or return one
> that is all zeroes. But if we return the all-zeroes case, then
> init/do_mounts.c will just try again, this time with the numbers
> removed, and now it wouldn't hit the "strcmp()" on any partition, and
> the next time around it would find a disk and work again.
>
> So this is a bug, but it's one that essentially is hidden by the
> caller.
>
> - The other alternative is that the bogus "disk->minors" thing would
> cause a page fault. This would only happen if the partition allocation
> was the first thing in a page, and the previous page was unused, and
> you had DEBUG_PAGEALLOC enabled.
>
> This is obviously the case you saw.
>
> My trivial fix makes it ignore partitions entirely.
>
> We *could* (and perhaps should) do something slightly more involved
> instead, which actually uses a partition if it's there). Like this. That
> would avoid my one nagging worry (that some clever usage makes partitions
> with a different numbering or without a base block device).
>
> And this is all still ignoring the locking issue, of course. It would be
> trivial to just remove the block_class_lock, and change
>
> mutex_[un]lock(&block_class_lock);
>
> into
>
> down|up(&block_class.sem);

The locking for struct class has turned into a mutex in the -next tree
already, but I have left the block_class_lock alone for the moment.

Now that I have also cleaned up the places in the /proc files where we
grabbed it, I think it might be safe to remove, I'll poke at that
tomorrow.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/