Re: SD_EXTRA_DEVS, what if...

From: Guest section DW (dwguest@win.tue.nl)
Date: Sat Mar 18 2000 - 12:26:50 EST


On Thu, Mar 16, 2000 at 08:29:45PM -0500, Eric Youngdale wrote:
> On Fri, 17 Mar 2000, Guest section DW wrote:

> > Hmm. kdev_t is not 16-bit at all. It is a data type of unknown size
> > and structure (outside of kdev_t.h). On over half of my machines it
> > is a 32-bit pointer.
>
> What exactly do you do to kdev_t.h in order to get anything other
> than a 16-bit kdev_t? I knew it was theoretically possible to replace it
> with something different - I just didn't think anyone did it in practice.

I hesitated between constructing a complete patch for you and just
showing some fragments, but I have no time, so let me just do the
latter. After all, five minutes before 2.4 is not the right moment
for large patches. I'll cc Linus anyway, maybe he wants to declare
that he doesnt want this at all, or perhaps that he wants it for 2.5,
in which case I can produce a complete patch for 2.5.1 or so.

If the purpose of the exercise is to test whether the source is
kdev_t clean, then it suffices to do something like

=================================================================
typedef struct { int major, minor; } mami, *kdev_t;
#define MAJOR(dev) dev->major
#define MINOR(dev) dev->minor

#define MAMIMAX 10000
mami mamis[MAMIMAX];
int mamimax;

static inline kdev_t MKDEV(int ma, int mi) {
        int i;

        for(i=0; i<mamimax; i++)
                if(mamis[i].major == ma && mamis[i].minor == mi)
                        return &mamis[i];
        if (mamimax < MAMIMAX) {
                i = mamimax++;
                mamis[i].major = ma;
                mamis[i].minor = mi;
                return &mamis[i];
        }
        panic("out of mamis");
}
=================================================================

(with the array declared somewhere in init.c or so and the rest
in kdev_t.h). This works. In my environment about 1500 array
entries get used. Of course it is slow because of the linear search.

If the purpose of the exercise is to actually have larger major and
minor numbers, then one has to get rid of all these arrays of size
MAX_BLKDEV. The trivial way is to have structures that contain the
elements of these arrays. Thus, kdev_t.h may contain something like

=================================================================
/* majorstruct contains the stuff that used to be indexed by MAJOR(dev) */
struct majorstruct {
        int major;

        /* This specifies how many sectors to read ahead on the disk. */
        int read_ahead;

        /* blk_dev_struct is:
         * *request_fn
         * *current_request
         */
        struct blk_dev_struct blk_dev;

};

/* minorstruct contains the minor number, a pointer to the majorstruct
   and the stuff that used to be indexed by [MAJOR(dev)][MINOR(dev)] */
struct minorstruct {
        int minor;
        struct majorstruct *madev;

        /* size in units of 1024 byte sectors */
        int blk_size;

        /* blocksize - default 1024 */
        int blksize_size;

        /* size of the hardware sectors - default 512 */
        int hardsect_size;

        /* this tunes the read-ahead algorithm in mm/filemap.c */
        int max_readahead;

        /* Max number of sectors per request */
        int max_sectors;

        /* Read only */
        int ro_bit;
};

typedef minorstruct *kdev_t;

static inline unsigned int majordev(kdev_t dev) {
        return dev->madev->major;
}

static inline unsigned int minordev(kdev_t dev) {
        return dev->minor;
}

#define MAJOR(dev) (majordev(dev))
#define MINOR(dev) (minordev(dev))
=================================================================

It is really a joy to do the corresponding editing.
The source becomes so much simpler. Code like

void set_device_ro(kdev_t dev,int flag)
{
        int minor,major;

        major = MAJOR(dev);
        minor = MINOR(dev);
        if (major < 0 || major >= MAX_BLKDEV) return;
        if (flag) ro_bits[major][minor >> 5] |= 1 << (minor & 31);
        else ro_bits[major][minor >> 5] &= ~(1 << (minor & 31));
}

now becomes

void set_device_ro(kdev_t dev, int flag)
{
        dev->robit = flag;
}

Constructing the patch is a few hours work. On the one hand
one has to find all occurrences of read_ahead[], blk_dev[], blksize_size[]
etc, and turn
        blksize_size[MAJOR(dev)][MINOR(dev)]
into
        dev->blksize_size.
Next one has to go through all drivers that presently initialize arrays
like hardsect_size[] and instead give them functions that return a
kdev_t given a major and minor, where this function is part of the
registered data. Now MKDEV finds the right driver using the major
number, and the driver constructs the MKDEV result.

This operation is a bit more expensive than the current MKDEV,
but most of the current uses can be avoided. The end result is
a system that is equivalent to the present one, perhaps slightly
faster, and with 32-bit major and minor device numbers internally.

Note that I did not carefully design majorstruct and minorstruct
to have the right fields. I just took what today is indexed by
major, or by major,minor. That has the advantage that the editing
operation is mechanical, and correctness is preserved.
Later these structures can be changed to have the fields
that we really want them to have.

> The other problem with assigning more than 16 disks per major (and
> a larger kdev_t) is that the kdev_t becomes unrepresentable if you have a
> 16-bit dev_t.

This is really a nonproblem. I used a larger dev_t five years ago.
glibc can handle it. The only change is a new stat call and a new mknod call.

Andries

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:24 EST