Re: [CPUISOL] CPU isolation extensions

From: Max Krasnyanskiy
Date: Mon Jan 28 2008 - 13:48:22 EST


Steven Rostedt wrote:
On Mon, Jan 28, 2008 at 08:59:10AM -0600, Paul Jackson wrote:
Thanks for the CC, Peter.

Thanks from me too.

Max wrote:
We've had scheduler support for CPU isolation ever since O(1) scheduler went it. I'd like to extend it further to avoid kernel activity on those CPUs as much as possible.
I recently added the per-cpuset flag 'sched_load_balance' for some
other realtime folks, so that they can disable the kernel scheduler
load balancing on isolated CPUs. It essentially allows for dynamic
control of which CPUs are isolated by the scheduler, using the cpuset
hierarchy, rather than enhancing the 'isolated_cpus' mask. That
'isolated_cpus' mask remained a minimal kernel boottime parameter.
I believe this went to Linus's tree about Oct 2007.

It looks like you have three additional tweaks for realtime in this
patch set, with your patches:

[PATCH] [CPUISOL] Do not route IRQs to the CPUs isolated at boot

I didn't know we still routed IRQs to isolated CPUs. I guess I need to
look deeper into the code on this one. But I agree that isolated CPUs
should not have IRQs routed to them.
Also note that it's just a convenience feature. In other words it's not that with this patch
we'll never route IRQs to those CPUs. They can still be explicitly routed by writing to irq/N/smp_affitnity.

[PATCH] [CPUISOL] Support for workqueue isolation

The thing about workqueues is that they should only be woken on a CPU if
something on that CPU accessed them. IOW, the workqueue on a CPU handles
work that was called by something on that CPU. Which means that
something that high prio task did triggered a workqueue to do some work.
But this can also be triggered by interrupts, so by keeping interrupts
off the CPU no workqueue should be activated.
No no no. That's what I though too ;-). The problem is that things like NFS and friends
expect _all_ their workqueue threads to report back when they do certain things like
flushing buffers and stuff. The reason I added this is because my machines were getting
stuck because CPU0 was waiting for CPU1 to run NFS work queue threads even though no IRQs
or other things are running on it.

[PATCH] [CPUISOL] Isolated CPUs should be ignored by the "stop machine"

This I find very dangerous. We are making an assumption that tasks on an
isolated CPU wont be doing things that stopmachine requires. What stops
a task on an isolated CPU from calling something into the kernel that
stop_machine requires to halt?
I agree in general. The thing is though that stop machine just kills any kind of latency guaranties. Without the patch the machine just hangs waiting for the stop-machine to run
when module is inserted/removed. And running without dynamic module loading is not very practical on general purpose machines. So I'd rather have an option with a big red warning than no option at all :).

Thanx
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/