Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking)

From: Guenter Roeck
Date: Sat Apr 18 2015 - 20:37:13 EST


On 04/18/2015 05:04 PM, Linus Torvalds wrote:
On Sat, Apr 18, 2015 at 7:40 PM, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote:

my qemu test for arm:vexpress fails with the latest upstream kernel. It fails
hard - I don't get any output from the console. Bisect points to commit
8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
Reverting this commit fixes the problem.

Hmm. It being qemu, can you look at where it seems to lock?

I'll try. It must be very early in the boot process, prior to console
initialization - if I load qemu without -nographic I only get "Guest
has not initialized the display (yet)".

Additional observation: The system boots if I add "-smp cpus=4" to the qemu
options. It does still hang, however, with "-smp cpus=2" and "-smp cpus=3".

Funky.

That patch still looks obviously correct to me after looking at it
again, but I guess we need to revert it if somebody can't see what's
wrong.

It does make async (wait=0) smp_call_function_single() possibly be
*really* asynchronous, ie the 'csd' ends up being released and can be
re-used even before the call-single function has completed. That
should be a good thing, but I wonder if that triggers some ARM bug.

Instead of doing a full revert, what happens if you replace this part:

+ /* Do we wait until *after* callback? */
+ if (csd->flags & CSD_FLAG_SYNCHRONOUS) {
+ func(info);
+ csd_unlock(csd);
+ } else {
+ csd_unlock(csd);
+ func(info);
+ }

with just

+ func(info);
+ csd_unlock(csd);

ie keeping the csd locked until the function has actually completed? I
guess for completeness, we should do the same thing for the cpu ==
smp_processor_id() case (see the "We can unlock early" comment).

Now, if that makes a difference, I think it implies a bug in the
caller, so it's not the right fix, but it would be an interesting
thing to test.

I applied the above. No difference. Applying the same change for the cpu ==
smp_processor_id() case does not make a difference either.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/