Re: hpsa driver bug crack kernel down!
From: Davidlohr Bueso
Date: Wed Apr 09 2014 - 19:40:32 EST
On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
> On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
> > [+linux-scsi]
> > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
> > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
> > > > Hi,
> > > >
> > > > The kernel is 3.14.0+ which is pulled just now.
> > >
> > > Cc'ing more people.
> > >
> > > While the hpsa driver appears to be involved in some way, I'm sure if
> > > this is a related issue, but as of today's pull I'm getting another
> > > problem that causes my DL980 not to come up.
> > >
> > > *Massive* amounts of:
> > >
> > > DMAR:[fault reason 02] Present bit in context entry is clear
> > > dmar: DRHD: handling fault status reg 602
> > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
> > >
> > > Then:
> > >
> > > hpsa 0000:03:00.0: Controller lockup detected: 0xffff0000
> > > ...
> > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
> > > ...
> > >
> > > Screenshot of the actual LOCKUP:
> > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png
> > >
> > > While I haven't bisected, things worked fine until at least until commit
> > > 39de65aa2c3e (April 2nd).
> > >
> > > Any ideas?
> >
> > Well, it's either a DMA remapping issue or a hpsa one. Your assertion
> > that everything worked fine until 39de65aa2c3e would tend to vindicate
> > hpsa,
Hmm here you mean DMA, right?
> because all the hpsa changes went in before that under
> Missing crucial info:
>
> commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
>
> > Merge: 3e75c6d b2bff6c
> > Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Date: Tue Apr 1 18:49:04 2014 -0700
> >
> > Merge tag 'scsi-misc' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> >
> > can you revalidate that this commit works OK just to make sure?
Ok so I don't see those DMA messages and system starts just fine. I'm
thinking perhaps something broke after the IO mmu stuff in commit
3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
causing the CPU stalls and just blame hpsa in the path as a side effect?
/me goes out to try the commit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/