Re: frequent lockups in 3.18rc4

From: Juergen Gross
Date: Wed Nov 26 2014 - 01:53:04 EST


On 11/26/2014 07:21 AM, Linus Torvalds wrote:
On Tue, Nov 25, 2014 at 9:52 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

And leave it running for a while, and see if the trace is always the
same, or if there are variations on it...

Amusing.

Lookie here:

http://lists.xenproject.org/archives/html/xen-changelog/2005-08/msg00310.html

That's from 2005.

:-)


Anyway, I don't see why the cr3 issue matters, *unless* there is some
situation where the scheduler can run with interrupts enabled. And why
this is Xen-related, I have no idea.

The Xen patches seem to have lost that

/* On Xen the line below does not always work. Needs investigating! */

line when backporting the 2.6.29 patches to Xen. And clearly nobody
investigated.

So please do get me back-traces, and we'll investigate. Better late
than never. But it does sound Xen-specific - although it's possible
that Xen just triggers some timing (and has apparently been able to
trigger it since 2005) that DaveJ now triggers on his one machine.

Yeah, this sounds plausible.

I'm working on the back traces right now, hope to have them soon.


Juergen


So DaveJ, even though this does appear Xen-centric (Xentric?) and
you're running on bare hardware, maybe you could do the same thing in
that x86-64 vmalloc_fault(). The timing with JÃrgen is kind of
intriguing - if 3.18-rc made it happen much more often for him, maybe
it really is very timing-sensitive, and you actually are seeing a
non-Xen version of the same thing...

Linus


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/