On Thu, Mar 01 2007, Frank Seidel wrote:I am puzzled as well, although I do not think many people run raid6
> Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
> > I can reliably reproduce a null pointer dereference on 2.6.20 and
> > 2.6.21-rc2. I will keep digging to find the kernel version where
> > this last worked, but wanted to see if there were any immediate
> > experiments I should try.
> > ...
> > Kernel 2.6.21-rc2 on an i686
> > ...
> > [ 431.709022] BUG: unable to handle kernel NULL pointer dereference
> > at virtual address 0000005c [ 431.717993] printing eip:
> > ...
> > [ 431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
> > ...
> > [ 431.887396] [<c01e1fc9>] cfq_dispatch_requests+0x138/0x3f0
> Hi,
> unfortunately i yet don't really have much/enough knowledge of cfq and
> the kernels inwards at the moment...
> but looking at cfq_dispatch_insert+0xb it seems the struct request
> pointer given (as second parameter by cfq_dispatch_request) was NULL
> and dereferencing it in the RQ_CFQQ macro leads to this oops.
>
> The "break"-out patch below for __cfq_dispatch_request might be at least
> a possible workaround for this, but it could also be total bullsh..
> Perhaps someone smarter might pick this up.. and give a real fix.
>
> Have fun,
> Frank
> ---
>
> block/cfq-iosched.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/block/cfq-iosched.c
> ===================================================================
> --- linux-2.6.orig/block/cfq-iosched.c
> +++ linux-2.6/block/cfq-iosched.c
> @@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
> * follow expired path, else get first next available
> */
> if ((rq = cfq_check_fifo(cfqq)) == NULL)
> - rq = cfqq->next_rq;
> + if ((rq = cfqq->next_rq) == NULL)
> + break;
>
> /*
> * finally, insert request into driver dispatch list
That is not the right fix. A little further up in this function, a check
(well BUG_ON()) is done for a non-empty sort list. So we know at this
point, that we have requests pending for this queue. When that is the
case, ->next_rq must always be kept uptodate and non-NULL. The oops at
least tells us this, it should not be papered around. The real fix is
finding out _where_ this now isn't being updated.
I'm puzzled why this is hitting Dan, but no one else has reported
anything. Dan, did 2.6.19 work for you?
--
Jens Axboe