Re: [PATCH] Kick CPUS that might be sleeping in cpus_idle_wait

From: Andrew Morton
Date: Wed Jan 09 2008 - 18:44:19 EST



> Subject: [PATCH] Kick CPUS that might be sleeping in cpus_idle_wait

s/cpus_/cpu_/

On Wed, 09 Jan 2008 15:42:10 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> This patch is different than the first patch I sent out.
> This one just sends an IPI to all CPUS that don't check in after 1 sec.
>
>
> Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are
> already in idle, have no tasks waiting to run and have no interrupts
> going to them. This is common on bootup when switching cpu idle
> governors.
>
> This patch gives those CPUS that don't check in an IPI kick.
>

(wakes up)

>
> Index: linux-compile-i386.git/arch/x86/kernel/process_32.c
> ===================================================================
> --- linux-compile-i386.git.orig/arch/x86/kernel/process_32.c 2008-01-09 14:09:36.000000000 -0500
> +++ linux-compile-i386.git/arch/x86/kernel/process_32.c 2008-01-09 14:09:45.000000000 -0500
> @@ -204,6 +204,10 @@ void cpu_idle(void)
> }
> }
>
> +static void do_nothing(void *unused)
> +{
> +}
> +
> void cpu_idle_wait(void)
> {
> unsigned int cpu, this_cpu = get_cpu();
> @@ -228,6 +232,13 @@ void cpu_idle_wait(void)
> cpu_clear(cpu, map);
> }
> cpus_and(map, map, cpu_online_map);
> + /*
> + * We waited 1 sec, if a CPU still did not call idle
> + * it may be because it is in idle and not waking up
> + * because it has nothing to do.
> + * Give all the remaining CPUS a kick.
> + */
> + smp_call_function_mask(map, do_nothing, 0, 0);
> } while (!cpus_empty(map));
>
> set_cpus_allowed(current, tmp);

This seems rather hacky. Although it may turn out to be the most efficient
fix, dunno.

I'd have thought that the right fix would be to plug the race which you
described at the top-of-thread. That might require some redesign, but it
sounds like the design is wrong anyway.

Maybe your proposed fix is suitable for a 2.6.24 bandaid..

<looks at cpu_idle_wait()>

<pokes his tongue out at the person who put in a global,
exported-to-modules interface and didn't bother documenting it>

OK, it's called infrequently, so a few extra IPIs there won't hurt.


btw, it's pretty damn sad that cpu_idle_wait() will always stall for at
least one second. That's a huge amount of time and I bet it's thousands of
times longer than is actually needed..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/