Re: PROBLEM: Oops when running mkreiserfs on large (9TB) raid0 set on AMD64 SMP

From: Neil Brown
Date: Wed Jul 13 2005 - 17:35:59 EST


On Wednesday July 13, paul+nospam@xxxxxxxxxx wrote:
> After having installed a base system (Debian) on our shiny new AMD64 SMP
> system, with 2 x 3ware 9000 SATA controllers for a total of 24 disks
> (excluding the 2 connected to the motherboard), I wanted as a "quick"
> test to put them all in a raid0 and put a reiserfs on it (just to see
> the 'df' output :-). It all went OK up to the mkreiserfs, when an Oops
> happened,.....
....
> Output from kern.log:
>
> Aug 9 20:08:37 localhost kernel: raid0: done.
> Aug 9 20:08:37 localhost kernel: raid0 : md_size is 9374734848 blocks.
> Aug 9 20:08:37 localhost kernel: raid0 : conf->hash_spacing is 9374734848 blocks.
> Aug 9 20:08:37 localhost kernel: raid0 : nb_zone is 12.
> Aug 9 20:08:37 localhost kernel: raid0 : Allocating 96 bytes for hash.
> Aug 9 20:09:18 localhost kernel: Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
> Aug 9 20:09:18 localhost kernel: <ffffffff8808eb98>{:raid0:raid0_make_request+472}

Looks like the problem is at:
sector_div(x, (unsigned long)conf->hash_spacing);
zone = conf->hash_table[x];

I had (naively) assumed that sector_div divided a "sector_t" by an
"unsigned long". This is partly based on the code in
include/linux/blkdev.h:

#ifdef CONFIG_LBD
# include <asm/div64.h>
# define sector_div(a, b) do_div(a, b)
#else
# define sector_div(n, b)( \
{ \
int _res; \
_res = (n) % (b); \
(n) /= (b); \
_res; \
} \
)
#endif

which suggests (to me) that on a 64bit arch, anything that the
compiler copes with it OK.

However, CONFIG_LBD is defined for x86-64, and asm/div64.h includes
asm-generic/div64.h which says:

#if BITS_PER_LONG == 64

# define do_div(n,base) ({ \
uint32_t __base = (base); \
uint32_t __rem; \
__rem = ((uint64_t)(n)) % __base; \
(n) = ((uint64_t)(n)) / __base; \
__rem; \
})


which is a significantly different macro and suggests that the divisor
must be 32 bit.

A bit more documentation and a bit less code duplication would go a
long way here.
(and WHY do we have uint32_t, u32, and __u32 all actively used in the
kernel???)

Anyway, the following patch, if it compiles, might changed the
behaviour of raid0 -- possibly even improve it :-)

Thanks for the report.

Success/failure reports of this patch would be most welcome.

NeilBrown
--
Fix raid0's attempt to divide by 64bit numbers

Apparently sector_div is only guaranteed to work with a 32bit divisor,
even on 64bit architectures. So allow for this in raid0.

Signed-off-by: Neil Brown <neilb@xxxxxxxxxxxxxxx>

### Diffstat output
./drivers/md/raid0.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff ./drivers/md/raid0.c~current~ ./drivers/md/raid0.c
--- ./drivers/md/raid0.c~current~ 2005-07-14 08:18:48.000000000 +1000
+++ ./drivers/md/raid0.c 2005-07-14 08:18:53.000000000 +1000
@@ -314,16 +314,16 @@ static int raid0_run (mddev_t *mddev)
sector_t space = conf->hash_spacing;
int round;
conf->preshift = 0;
- if (sizeof(sector_t) > sizeof(unsigned long)) {
+ if (sizeof(sector_t) > sizeof(u32)) {
/*shift down space and s so that sector_div will work */
- while (space > (sector_t) (~(unsigned long)0)) {
+ while (space > (sector_t) (~(u32)0)) {
s >>= 1;
space >>= 1;
s += 1; /* force round-up */
conf->preshift++;
}
}
- round = sector_div(s, (unsigned long)space) ? 1 : 0;
+ round = sector_div(s, (u32)space) ? 1 : 0;
nb_zone = s + round;
}
printk("raid0 : nb_zone is %d.\n", nb_zone);
@@ -443,7 +443,7 @@ static int raid0_make_request (request_q
volatile
#endif
sector_t x = block >> conf->preshift;
- sector_div(x, (unsigned long)conf->hash_spacing);
+ sector_div(x, (u32)conf->hash_spacing);
zone = conf->hash_table[x];
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/