Re: frequent lockups in 3.18rc4

From: Andy Lutomirski
Date: Fri Nov 21 2014 - 16:34:34 EST

On Fri, Nov 21, 2014 at 1:32 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> On Fri, Nov 21, 2014 at 12:01:51PM -0500, Steven Rostedt wrote:
>> On Fri, Nov 21, 2014 at 11:25:06AM -0500, Tejun Heo wrote:
>> >
>> > * Static percpu areas wouldn't trigger fault lazily. Note that this
>> > is not necessarily because the first percpu chunk which contains the
>> > static area is embedded inside the kernel linear mapping. Depending
>> > on the memory layout and boot param, percpu allocator may choose to
>> > map the first chunk in vmalloc space too; however, this still works
>> > out fine because at that point there are no other page tables and
>> > the PUD entries covering the first chunk is faulted in before other
>> > pages tables are copied from the kernel one.
>> That sounds correct.
>> >
>> > * NMI used to be a problem because vmalloc fault handler couldn't
>> > safely nest inside NMI handler but this has been fixed since and it
>> > should work fine from NMI handlers now.
>> Right. Of course "should work fine" does not excatly mean "will work fine".
>> >
>> > * Function tracers are problematic because they may end up nesting
>> > inside themselves through triggering a vmalloc fault while accessing
>> > dynamic percpu memory area. This may lead to recursive locking and
>> > other surprises.
>> The function tracer infrastructure now has a recursive check that happens
>> rather early in the call. Unless the registered OPS specifically states
>> it handles recursions (FTRACE_OPS_FL_RECUSION_SAFE), ftrace will add the
>> necessary recursion checks. If a registered OPS lies about being recusion
>> safe, well we can't stop suicide.
> Same if the recursion state is based on per cpu memory.
>> Looking at kernel/trace/trace_functions.c: function_trace_call() which is
>> registered with RECURSION_SAFE, I see that the recursion check is done
>> before the per_cpu_ptr() call to the dynamically allocated per_cpu data.
>> It looks OK, but...
>> Oh! but if we trace the page fault handler, and we fault here too
>> we just nuked the cr2 register. Not good.
> If we fault in the page fault handler, we double fault and apparently
> recovering from that isn't quite expected anyway.

Not quite. We only double fault if we fault while pushing the
hardware part of the state onto the stack. That happens even before
the entry asm gets run.

Otherwise if we have a page fault inside do_page_fault, it's just a
nested page fault.


Andy Lutomirski
AMA Capital Management, LLC
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at