Re: [PATCH RFC 1/3] Add a trigger API for efficient non-blockingwaiting

From: Jeremy Fitzhardinge
Date: Wed Aug 20 2008 - 14:42:40 EST


Andrew Morton wrote:
> On Sat, 16 Aug 2008 09:34:13 -0700 Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
>
>
>> There are various places in the kernel which wish to wait for a
>> condition to come true while in a non-blocking context. Existing
>> examples of this are stop_machine() and smp_call_function_mask().
>> (No doubt there are other instances of this pattern in the tree.)
>>
>> Thus far, the only way to achieve this is by spinning with a
>> cpu_relax() loop. This is fine if the condition becomes true very
>> quickly, but it is not ideal:
>>
>> - There's little opportunity to put the CPUs into a low-power state.
>> cpu_relax() may do this to some extent, but if the wait is
>> relatively long, then we can probably do better.
>>
>
> If this change saves a significant amount of power then we should fix
> the offending callsites.
>

Fix them how? In general we're talking about contexts where we can't
block, and where the wait time is limited by some property of the
platform, such as IPI time or interrupt latency (though doing a
cross-cpu call of a long-running function would be something we could fix).

>> - In a virtual environment, spinning virtual CPUs just waste CPU
>> resources, and may steal CPU time from vCPUs which need it to make
>> progress. The trigger API allows the vCPUs to give up their CPU
>> entirely. The s390 people observed a problem with stop_machine
>> taking a very long time (seconds) when there are more vcpus than
>> available cpus.
>>
>
> If this change saves a significant amount of virtual-cpu-time then we
> should fix the offending callsites.
>

This case isn't particularly about saving vcpu time, but making timely
progress. stop_machine() gets all the cpus into a spinloop, where they
spin waiting for an event to tell them to go to their next state-machine
state. By definition this can't be a blocking operation (since the
whole point is that they're high priority threads that prevent anything
else from running). But in the virtual case, the fact that they're all
spinning means that the underlying hypervisor has no idea who's just
spinning, and who's trying to do some work needed to make overall
progress, so the whole thing gets bogged down.

Now perhaps we could solve stop_machine by modifying the scheduler in
some way, where you can block the run queue so that you sit in the idle
loop even though there's runnable processes waiting. But even then,
stop_machine requires that interrupts be disabled, which means the we're
pretty much limited to spinning.

So my proposal is to add a non-scheduler-blocking operation which is
semantically equivalent to spinning, but gives the underlying platform
more information about what's going on.

Arjan suggested that since this is more or less equivalent to a
completion, we should just implement "spinpletions" - a spinning
completion. This should be more familiar to kernel programmers, and
should be just as useful as triggers.

I've run out of time to work on this now, but Rusty has hinted he'll
pick up the baton...

(I'd also like to hear from other architecture folks, particularly s390,
to make sure this is going to be useful to them too.)

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/