Re: pstore dump inside an nmi handler

From: Don Zickus
Date: Tue Jul 12 2011 - 11:34:41 EST


On Mon, Jul 11, 2011 at 05:55:41PM -0400, Don Zickus wrote:
> On Fri, Jul 08, 2011 at 02:40:13PM -0700, Luck, Tony wrote:
> > > Inside pstore_dump(), the first thing it tries to grab is a mutex_lock()
> > > (inside an nmi hander). This seems to be the root cause of my problems.
> >
> > Someone else pointed out that mutex_lock() is a problem here too. They
> > wondered whether spin_lock_irqsave() would work - or whether pstore
> > backends were allowed to sleep - to which I said I hoped they didn't,
> > but wasn't really sure what the future will hold.
> >
> > So ... ideas (and patches) are most welcome.
>
> I tested the spin_lock_irqsave thing on my one box where it was failing
> and got past my initial problem into kdump. So that is a positive and I
> can post the patch for that. Though it probably isn't a complete
> solution, it is better than a mutex.
>
> However, I have been scratching my head at a follow up problem, which is
> when I inject an error which produces an NMI->GHES->panic, the error
> record doesn't get stored under pstore (or maybe ERST too). I do see the
> ERST code follow all the correct steps in storing the kmsg_dump logs into
> the ERST table. Just on the reboot, when I mount pstore it isn't there.

Actually, is it expected that the ERST can handle only 8 records? Also if
you remove those records with pstore mount under /mnt; 'rm -rf
/mnt/dmesg-*', are those records removed immediately or are they cached to
be removed later? IOW, if a did a 'rm -rf ..' and then an 'echo c >
/proc/sysrq-trigger' immediately after it, would I expect those records to
be removed or not? Testing shows they are removed on reboot but the later
'echo c > ..' didn't save any new error records. :-/

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/