Re: [ANNOUNCE] block device interfaces changes

From: Alexander Viro (viro@math.psu.edu)
Date: Sat Jan 08 2000 - 21:27:13 EST


On Sun, 9 Jan 2000 Andries.Brouwer@cwi.nl wrote:

> 2) character and block devices.
> You still talk about these as if there exists a dichotomy.
> But that is not a productive point of view, I think. There is
> a class: all devices, and a subclass: all block devices.
> Some things need to be handled for all devices - and there is
> no need to distinguish - like stat, mknod, open in the userspace
> interface, and register_device() and getname() kernel-internally.
> We will have lots of devices, both block and non-block devices,
> and 64-bits at first, arbitrary strings later, identifying them.
> Thus there is a cache, an associative memory, to find them in.
> This is common to all devices. There is no reason to implement
> this twice, first for block devices and then for non-block devices.

What _is_ a device from the kernel POV? stat() is an interface-only thing
and it works with external representation of object. Yes, both classes are
currently indexed by numbers. So? register_device() simply doesn't exist
and register_blkdev()/register_chrdev() have different types. For a good
reason. mknod() is working with external representation too. open()...
What is special about open() wrt either block or character devices?
getname()? Hmm... Well, they have prefered names. Fine. Not too serious
thing to share, IMO.

> Yes, that is how my code works. The pointer type is called kdev_t.
> But it works like that both for character and for block devices, and
> the search key is b/c+major+minor. And my timing is different -
> foo_read_inode() is the wrong moment to do this - see below.
>
> When you set up the data structures, keep in mind that the kernel
> often derives one dev_t from another. If you want to avoid lookups

It should not do that. I know, and I know how to fix it.

> then you need to be able to find the entire disk from a partition,
> and all partitions from a disk. Check all cases of MKDEV().
        All _opened_ partitions. I.e. we need a per-disk structure and a
cyclic list going through the opened struct block_device that belong to
that disk. Fine with me, but why bother with MKDEV() here? Once
blkdev_open()/blkdev_get() is done we have this information. Once we had
closed the device we don't care anymore.

> I found eliminating all occurrences of MKDEV one of the more
> time-consuming parts of the change. Many driver structures
> need to change a little.

Yes, they do.

> Another thing to watch out for is that there may be no device
> corresponding to a dev_t. If I read a CDROM then it may contain
> special device nodes for many devices that are not represented
> in the kernel. Thus, the init_special_inode() in isofs_read_inode()
> must not yet create the device_struct, it is too early. First
> when the inode is opened should you try to construct the device struct.
> (Maybe you saw my October '99 post on this stuff. I needed MKBDEV,
> MKCDEV and MKXDEV, where the latter occurs in a
> "dev_t number only, no device" context.)

        block_device is created (or found in cache) when you access inode.
It's bound to _anything_ in the driver only at open time.

> > Don't forget to include the bdev/cdev bit in devno.]
>
> No, thanks. Character devices _are_ different; by coincidence we
> are using 16bit integers to refer both to block and character
> devices, but that's about it.
>
> If you forget to take the b/c bit into account then you take away the
> possibility of simplifying the kernel later.

Not really. If you'll check the places where we have kdev_t you'll
see that almost none of them can have _both_ character and block devices.
Most of those places are dealing with the requests interface. I.e. can
have only block devices. Where in the code _both_ can happen? ROOT_DEV,
->b_{r}dev, ->i_dev, ->s_dev handling is out of question - it's block
only. Ditto for IO requests. Now, check_media_change() and revalidate()
are block-only too. Ditto for functions in buffer.c and ll_rw_blk.c. Ditto
for partitions and work around struct gendisk. About the only thing I can
think of is ->i_rdev. And in the new scheme block device are not going to
care about it. Comments?

> 4. Names.
>
> > I needed a little more stuff.
> > One thing is that the kernel does not need names for files,
> > but it needs names for devices since it uses device names in
> > printk. Thus you need something like
> > char (*devname)(struct block_device *bdev);
>
> Umm... I'ld rather had that done statically, in register_...()
> time. No need to breed complexity...
>
> That is what I did first. But it didnt suffice.
> We want to generate the familiar names like hda6 and fd0h1200 and initrd,
> and only the driver itself knows the algorithm to convert minors
> to strings. But of course there is a generic routine that works
> in most cases.

        Maybe. BTW, I have _big_ problems with the floppy.c design (mainly
the lack of fdc.c that could be shared with ftape and the ioctl set in
floppy drivers).

> 5. Devices and subdevices.
>
> > Another thing is that I used two types, let us call them here
> > major_struct and minor_struct, where a kdev_t was a pointer to
> > a minor_struct, and the major_struct had a method
> > kdev_t getdev(major_struct *dev, int minor);
> > to retrieve a minor if it existed or create it if not.
> > Probably this is part of your open, but it is not clear to me
> > how you handle the difference between major_struct and minor_struct.
> > They are rather different.
>
> major_struct is completely bogus. We don't have 1-1 correspondence
> between majors and drivers. We don't have 1-1 correspondence between
> majors and physical units (disks).
>
> If major_struct displeases you so much that you don't understand
> what I meant, please read driver_struct or device_struct instead.
> But note that there are three levels:
> partition: hda6, disk: hda, ide disk driver: hda, hdb, hdc, ...
> When I say major_struct then I mean the disk level, not the partition
> or the driver level. Things like media_change and read_partition_table
> happen at this level. The request queue is at this level.

        Yes, indeed. block_device is on partition layer - adding a
disk-layer representative (gendisk on steroids, one for every disk instead
of one-for-every-major) is in the queue. Now, moving the
check_media_change() and friends there... Might be a good idea.

> My kdev_t is the subdevice, is the b/c+major+minor thing.
> I am trying to understand whether your struct block_device
> represents the subdevice or the entire device.

        The former, and only once it's opened - until then it's just a
cache element, waiting to be opened.

> Instead of subdevice I should perhaps say "logical device".
> The examples of md and loop show relationships other than that
> of partition to disk.

And you may want to have partitions on loop...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:13 EST