RE: in 2.6.23-rc3-git7 in do_cciss_intr

From: Miller, Mike (OS Dev)
Date: Thu Sep 25 2008 - 16:57:41 EST




> -----Original Message-----
> From: Randy Dunlap [mailto:randy.dunlap@xxxxxxxxxx]
> Sent: Thursday, September 25, 2008 3:40 PM
> To: scsi
> Cc: Jens Axboe; Miller, Mike (OS Dev); James Bottomley; lkml; akpm
> Subject: Re: in 2.6.23-rc3-git7 in do_cciss_intr
>
> On Thu, 25 Sep 2008 13:33:07 -0700 Randy Dunlap wrote:
>
> > Jens Axboe wrote:
> > > On Thu, Sep 04 2008, Miller, Mike (OS Dev) wrote:
> > >>>>>> 0x3bb2 <do_cciss_intr+1649>: mov 0x2(%r8),%dx
> > >>>>>> 0x3bb7 <do_cciss_intr+1654>: test %dx,%dx
> > >>>>>> 0x3bba <do_cciss_intr+1657>: je 0x3f0e
> <do_cciss_intr+2509>
> > >>>>>>
> > >>>>>>
> > >>>>>> $ addr2line -e cciss.o -f do_cciss_intr+0x627 SA5_fifo_full
> > >>>>>>
> > >>>
> /home/rdunlap/linsrc/linux-2.6.27-rc3-git7/drivers/block/cciss.h:2
> > >>> 06
> > >>>>> OK ...that's confusing. It seems to be saying that
> ctrlr_info_t
> > >>>>> * was NULL. However, I can't see a way of getting into the
> > >>> fifo_full
> > >>>>> callback from do_cciss_intr ..
> > >>>>> especially not with an NULL host.
> > >>>>>
> > >>>>> James
> > >>>> That is weird. Even if we could get there fifo_full doesn't
> > >>> do anything but wait for a bit.
> > >>>
> > >>> Hi,
> > >>>
> > >>> This just happened again. This time it's on 2.6.27-rc5-git3.
> > >>>
> > >>> ~Randy
> > >> Thanks Randy. I think. :)
> > >>
> > >> I'll try to recreate in my lab.
> > >
> > > This looks somewhat strange, mostly like 'c' is NULL and it's
> > > oopsing in in removeQ (I don't think Randy's analysis is
> correct in
> > > assuming it's 'h' and it's in fifo_full). Given that 'c'
> cannot be
> > > NULL, it's c->prev or c->next that are NULL.
> >
> > Yes, correct IMO. I checked my daily test logs and I have had this
> > problem in do_cciss_intr() 3 times, all at the same location, which
> > appears to be in removeQ(), as Jens says.
>
> Mike, also notice this: it's always during driver init, as
> indicated by the (+) in the dump ('+' means that the module
> is in the process of being loaded, but module load has not completed):
>
> calling cciss_init+0x0/0x2e [cciss]
> HP CISS Driver (v 3.6.20)
> ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54 cciss
> 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high)
> -> IRQ 54
> cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 503 using DAC
> BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000248
> IP: [<ffffffffa001bb68>] do_cciss_intr+0x627/0xa6c [cciss]
> PGD 17e422067 PUD 17e423067 PMD 0
> Oops: 0002 [1] SMP
> CPU 2
> Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
> Pid: 0, comm: swapper Not tainted 2.6.27-rc3-git7 #1
> RIP: 0010:[<ffffffffa001bb68>] [<ffffffffa001bb68>]
> do_cciss_intr+0x627/0xa6c [cciss]

Thanks, Randy
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/