Re: a major regression in recent kernels? - was: Re: Null pointerOOPS in sync_inodes_sb+0xa9/0x104

From: Christoph Hellwig
Date: Fri Mar 04 2011 - 07:53:00 EST


On Wed, Mar 02, 2011 at 10:31:15AM -0800, Linus Torvalds wrote:
> The whole "backing_dev_info" has been a total disaster. The thing is
> crap. It violates all the normal kernel memory management rules ("Thou
> shalt use reference counts and free only when it goes to zero") and
> the whole thing has been a constant source of "oh, that driver didn't
> set it, but we changed all the code to require it to be correct".
>
> And the reason we set it to NULL when the device goes away is exactly
> that it's not ref-counted correctly, so we really _have_ to set it to
> NULL, because it's not going to be around.
>
> (And the reverse of that is why all kernel data structures should use
> refcounts, and not some external lifetime notion)

Yes. But the bdi is even worse than that, as it conflates things with
different lifetime into a single object. We have the "old school" bdi
which mostly contained various bits of tuning for the VM and read-ahead
algorithms. This one is required to stay around even with no fs mounted
on block devices because people expect it to stay around with no fs
mounted. And then we have the writeback context entangled into it,
which only makes sense with an active filesystem (or block device node)
on it to make it special fun. Even more fun is that we have a pointer
from the superblock, and one from the inode, and the latter might point
to lala land if this is say a /dev/mem node which has a different bdi
for the "old-school" MM usage.

I had various stages of prototypes for separating the two into:

1) the old bdi. Life time rules are: allocated and reference counted
with the containing device. That is gendisk for block devices,
server context for remote devices, static at module init time for
/dev/zero and similar.
2) writeback context. Only exists if a user is there, and thus
refcounted by itself. For non-blockdevice filesystem instances it's
trivially always allocated with the superblock, and goes away with it.
For block-device instances we need to keep a pointer to it from
struct block_device and properly look it up on mount, or opening of
the block device nodes.

I guess I need to get back to it, but kept it off for now as the code
had reached relative stability and really fear touching it again.

It's for sure not .38 material, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/