On Tue, 2013-07-16 at 11:19 -0700, Srinivas Pandruvada wrote:Thanks for your help in debugging and isolating.Thanks. How did you trigger this error condition? Is it a code review orNo, my tests do a cpu hotplug stress and the system would hang. I had to
you have some way to reproduce?
bisect it to find the bug and it came to this code. What was weird is
that the module wasn't loaded. Then I ran the ftrace function tracer
stared by the kernel command line with the following:
ftrace=function ftrace_filter=get_online_cpus,put_online_cpus
and after I booted up, I ran:
cat /debug/tracing/trace | perl -e '
my @stack;
while (<>) {
if (/get_online/) {
push @stack, $_;
} elsif (/put_online/) {
pop @stack;
}
}
foreach my $line (@stack) {
print $line;
}'
And it showed that get_online_cpus() was called twice without a matching
put_online_cpu(). The strange thing was the calls had no parent
function. Which is when I realized that the module was loaded but then
failed to init, and was unloaded. Which explains why it didn't show up
in my lsmod.
Then it was just the matter of looking at all the calls to
get_online_cpu() in the commit, and it was rather obvious to what the
bug was.
With the patch applied, the lockup went away.
-- Steve