Bug and patch for md driver in 2.2.x. Could somebody please review?

From: Rick Bressler (bressler@mushroom.ca.boeing.com)
Date: Tue Sep 19 2000 - 22:54:57 EST


Problem:

The md driver doesn't handle large physical blocks in 2.2.x

Pardon the long introduction, but it might be interesting to the non IBM
types.

I'm doing some work with Linux/S390 and need to access about 100GB of
disk. The 3390 drives I can use are about 2.3GB each and that comes out
to a lot of disks. I was hoping to use the MD driver to combine those
40+ volumes into something like 4 md volumes.

The IBM disk is somewhat unusual in that the actual physical disk blocks
can be virtually any size. (They call it Count Key Data for those not
familiar with it.) You can even mix sizes on a track!

Obviously this isn't too healthy for Unix I/O subsystems so the solution
is to support a small number of block sizes and keep them all fixed at
the same size for a given disk. This is indeed what the formatting tool
under Linux does. (512, 1024, 2048 and 4096 appear to be supported.)
When formatted correctly the disk is then presented to Linux as a fixed
block device of one of those sizes. The 4k block size yields about 31%
more usable disk space than the 1k block size, so it is very attractive.

The problem (after banging away at it for nearly 18 hours straight) is
that the md driver doesn't export the underlying hard sector size of the
real device (512 is pretty much assumed) and when ext2fs tries to do a
1k bread for the superblock of a disk with larger than 1k blocks the
DASD driver returns an error and the machine soon dies. Mount doesn't
return and it has the big kernel lock at that point so the system is
toast. The basic problem is that the hardsect_size variable used by the
get_hardblocksize() routine is not set up by the md driver.

A quick look at the 2.4pre code shows that this has been addressed so I
have taken a crack at adding a few similar lines of code to md.c in
order to get things working. I have to confess that except for a couple
of other trivial Linux patches I haven't done any OS programming since
my IBM/S370 days, so I'm somewhat uncomfortable at touching the I/O
subsystem. Still the Linux philosophy is if you need it and it doesn't
work go fix it, and I have tried.

I've tested this on Linux/S390 with different block sizes by filling the
md device with files with a known md5sum and then verifying that they
still match. It seems solid and I don't think I've done anything to
foul up other architectures, but as I said, I don't consider myself an
expert. I suspect somebody that knows this stuff could take a look and
tell me if I've done anything really stupid.

If that person(s) with some experience with the md driver and/or I/O
subsystem could review this, I'd sure appreciate it. Since it is useful
I'd like to submit it to Alan for inclusion in 2.2.1[89].

One more thing, I've always hated when folks post to a list but claim
they need email replies because they aren't on the list. Unfortunately
while I followed the list when I could get digests from vger the
hundreds of messages from the new non-digest list give my employer
heartburn. I have not yet gotten a workaround in place so I find myself
pleading for email responses. Sigh.

Anyway, here is the patch. If you have suggestions or can test it, I'd
sure appreciate it.

There is also one other small patch that is part of the S390 system.
You'll need to apply it first to keep the context the same (unless
you're running a S390 Linux system with the August patches) but it is
conditional on the 390 system and shouldn't have an effect on
anything else, except to get my patch to apply.

Thanks in advance!

Here is the IBM patch to md.c Apply first in the drivers/block
directory.

-------------------- cut here ----------------------------------
diff -u -r --new-file linux-2.2.16/drivers/block/md.c linux-2.2.16-s390/drivers/block/md.c
--- md.c Wed Jun 7 23:26:42 2000
+++ md.c Fri Jun 16 11:51:35 2000
@@ -445,8 +445,13 @@
   }
   
   factor = min = 1 << FACTOR_SHIFT(FACTOR((md_dev+minor)));
-
+
+#ifdef CONFIG_ARCH_S390
   md_blocksizes[minor] <<= FACTOR_SHIFT(FACTOR((md_dev+minor)));
+ if ( md_blocksizes[minor] > PAGE_SIZE ) {
+ md_blocksizes[minor] = PAGE_SIZE;
+ }
+#endif
 
   for (i=0; i<md_dev[minor].nb_dev; i++)
     if (md_dev[minor].devices[i].size<min)
-------------------- cut here ----------------------------------

Here is my update, also applied in the drivers/block directory.

-------------------- cut here ----------------------------------
--- md.c Tue Sep 19 19:44:52 2000
+++ md.c.new Tue Sep 19 19:49:04 2000
@@ -76,6 +76,7 @@
 
 int md_size[MAX_MD_DEV]={0, };
 
+static int md_hardsect_sizes[MAX_MD_DEV];
 static void md_geninit (struct gendisk *);
 
 static struct gendisk md_gendisk=
@@ -464,7 +465,13 @@
   for (i=0; i<md_dev[minor].nb_dev; i++) {
     fsync_dev(md_dev[minor].devices[i].dev);
     invalidate_buffers(md_dev[minor].devices[i].dev);
+ md_hardsect_sizes[minor]=512;
+ if(get_hardblocksize(md_dev[minor].devices[i].dev) > md_hardsect_sizes[minor])
+ md_hardsect_sizes[minor] = get_hardblocksize(md_dev[minor].devices[i].dev);
   }
+
+ if(md_blocksizes[minor] < md_hardsect_sizes[minor])
+ md_blocksizes[minor] = md_hardsect_sizes[minor];
   
   /* Resize devices according to the factor. It is used to align
      partitions size on a given chunk size. */
@@ -933,6 +940,7 @@
   for(i=0;i<MAX_MD_DEV;i++)
   {
     md_blocksizes[i] = 1024;
+ md_hardsect_sizes[i] = 512;
     md_maxreadahead[i] = MD_DEFAULT_DISK_READAHEAD;
     md_gendisk.part[i].start_sect=-1; /* avoid partition check */
     md_gendisk.part[i].nr_sects=0;
@@ -941,6 +949,7 @@
 
   blksize_size[MD_MAJOR] = md_blocksizes;
   max_readahead[MD_MAJOR] = md_maxreadahead;
+ hardsect_size[MD_MAJOR] = md_hardsect_sizes;
 
 #ifdef CONFIG_PROC_FS
   proc_register(&proc_root, &proc_md);
-------------------- cut here ----------------------------------

-- 
+--------------------------------------------+ Rick Bressler
|Mushrooms and other fungi have several      | G-4781 (425)342-1554
|important roles in nature.  They help things| 
|grow, they are a source of food, they       |
|decompose organic matter and they           | bressler@mushroom.ca.boeing.com
|infect, debilitate and kill organisms.      | Linux: Because an S390 is a
+--------------------------------------------+ terrible thing to waste.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:22 EST