RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

From: æåèå / KAWAIïHIDEHIRO
Date: Tue Aug 04 2015 - 07:53:39 EST


Hi,

> From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
> On Fri 31-07-15 11:23:00, æåèå / KAWAIïHIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
> > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> > > > why CPU 130 waits so long. I'll try to consider for a while.
> > >
> > > Yes, I do not understand the timing here either and the fact that the
> > > log is a complete mess in the important parts doesn't help a wee bit.
> >
> > I'm interested in where "kernel panic -not syncing: " is.
> > It may give us a clue.
>
> This one is lost in the mangled text:
> [ 167.843771] U<0>[ 167.843771] hhuh. NMI received for unkn<0><0>[ 167.843765] Uh[ 16NM843774I own rea reived for
> unknow<0 r 16n 2d 765] Uhhuh. CPU recei11. <0known reason 7. on770] Ker<[ - not rn NMI:nic - not contt sing
>
> <0 >[ : Not con.inu437azed and confused, b] Dtryingaed annue
>
> fu 167.8ut trying>[ to 7.<0377 167.843775] U<0>[ 167.843776] ]hhu.ived for u3nknown rMason 3 re oived for [nk167.843781]

Thanks for the information.

I anticipated that some lock contention on issuing messages (e.g.
locks used by network/netconsole driver) delayed the panic procedure,
but it seems not to be related because the panic message finished to
be issued early.

If I come up with something, I will post a mail. I think there
may be potential issues.

> 1.
> <. N0>[ 167.843781] Uh. NMI recen 3d on CPU 0.i< >[ nowon 3d on] Chhuh.MI
> eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120.
> <0nk>no 167.wn843ason 3na s p120.
> o<0er savi d6 e843ab88] Do yeu have a
> <trange0>[ er saving mode e nabl1d?7<4][ 167 84hu94]MIuh. NceIived for unknown reas vdfor 1no3was0>[ 2d 67.84380on CI
> rUe 12e.
> ive7d8u3800wn rveaseo f2d on CPo3.r< u>k[o 1 rea6s.o2d8 oo you hn aPve <0st>a e power 1s7.843816] Do yoauv ng moade
> enbslra?ng[ e 167.8438p41o]er shhuhavi.ngIroenived fbled?nknow
> < reaso0> 2d on [PU1626.41]0> Uh67.h. NM387I] receihed for .nknown reason 2Nn MC U ceived for .
> [son 2d on CPU 6.
> < 160>7.8467.84873] Uhhuh. 3MI received 908 o knstra
> [ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed?
> n<b0ed?

Regards,
Kawai