Re: frequent lockups in 3.18rc4

From: Dave Jones
Date: Fri Dec 26 2014 - 13:12:44 EST


On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote:

> One thing I think I'll try is to try and narrow down which
> syscalls are triggering those "Clocksource hpet had cycles off"
> messages. I'm still unclear on exactly what is doing
> the stomping on the hpet.

First I ran trinity with "-g vm" which limits it to use just
a subset of syscalls, specifically VM related ones.
That triggered the messages. Further experiments revealed:

-c mremap triggered it, but only when I also passed -C256
to crank up the number of child processes. The same thing
occured with mprotect, madvise, remap_file_pages.

I couldn't trigger it with -c mmap, or msync, mbind, move_pages,
migrate_pages, mlock, regardless of how many child processes there were.


Given the high child count necessary to trigger it,
it's nigh on impossible to weed through all the calls
that trinity made to figure out which one actually
triggered the messages.

I'm not even convinced that the syscall parameters are
even particularly interesting. The "needs high load to trigger"
aspect of the bug still has a smell of scheduler interaction or
side effect of lock contention. Looking at one childs
syscall params in isolation might look quite dull, but if
we have N processes hammering on the same mapping, that's
probably a lot more interesting.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/