Re: Tiny cpusets -- cpusets for small systems?

From: Max Krasnyanskiy
Date: Mon Feb 25 2008 - 21:38:01 EST


Paul Jackson wrote:
So. I see cpusets as a higher level API/mechanism and cpu_isolated_map as lower
level mechanism that actually makes kernel aware of what's isolated what's not.
Kind of like sched domain/cpuset relationship. ie cpusets affect sched domains
but scheduler does not use cpusets directly.

One could use cpusets to control the setting of cpu_isolated_map,
separate from the code such as your select_irq_affinity() that
uses it.
Yes. That's what I proposed too. In one of the CPU isolation threads with Peter. The only issue is that you need to simulate CPU_DOWN hotplug even in order to cleanup what's already running on those CPUs.

In a foreseeable future 2-8 cores will be most common configuration.
Do you think that cpusets are needed/useful for those machines ?
The reason I'm asking is because given the restrictions you mentioned
above it seems that you might as well just do
taskset -c 1,2,3 app1
taskset -c 3,4,5 app2

People tend to manage the CPU and memory placement of the threads
and processes within a single co-operating job using taskset
(sched_setaffinity) and numactl (mbind, set_mempolicy.)

They tend to manage the placement of multiple unrelated jobs onto
a single system, whether on separate or shared CPUs and nodes,
using cpusets.
>
Something like cpu_isolated_map looks to me like a system-wide
mechanism, which should, like sched_domains, be managed system-wide.
Managing it with a mechanism that encourages each thread to update
it directly, as if that thread owned the system, will break down,
resulting in conflicting updates, as multiple, insufficiently
co-operating threads issue conflicting settings.
I'm not sure how to interpret that. I think you might have mixed a couple of things I asked about in one reply ;-).
The question was that given the restrictions you talked about when you explained tiny-cpusets functionality I asked how much one gains from using them compared to the taskset/numactl. ie On the machines with 2-8 cores it's fairly easy to manage cpus with simple affinity masks.

The second part of your reply seems to imply that I somehow made you think that I suggested that cpu_isolated_map is managed per thread. That is of course not the case. It's definitely a system-wide mechanism and individual threads have nothing to do with it.
btw I just re-read my prev reply. I definitely did not say anything about threads managing cpu_isolated_map :).

Stuff that I'm working on this days (wireless basestations) is designed
with the following model:
cpuN - runs soft-RT networking and management code
cpuN+1 to cpuN+x - are used as dedicated engines
ie Simplest example would be
cpu0 - runs IP, L2 and control plane
cpu1 - runs hard-RT MAC

So if CPU isolation is implemented on top of the cpusets what kind of API do you envision for such an app ?

That depends on what more API is needed. Do we need to place
irqs better ... cpusets might not be a natural for that use.
Aren't irqs directed to specific CPUs, not to hierarchically
nested subsets of CPUs.

You clipped the part where I elaborated. Which was:
So if CPU isolation is implemented on top of the cpusets what kind of API do you envision for such an app ? I mean currently cpusets seems to be mostly dealing
with entire processes, whereas in this case we're really dealing with the threads. ie Different threads of the same process require different policies, some must run
on isolated cpus some must not. I guess one could write a thread's pid into cpusets
fs but that's not very convenient. pthread_set_affinity() is exactly what's needed.
In other words how would an app place its individual threads into the different cpusets.
IRQ stuff is separate, like we said above cpusets could simply update cpu_isolated_map which would take care of IRQs. I was talking specifically about the thread management.

Separate question:
Is it desired that the dedicated CPUs cpuN+1 ... cpuN+x even appear
as general purpose systems running a Linux kernel in your systems?
These dedicated engines seem more like intelligent devices to me,
such as disk controllers, which the kernel controls via device
drivers, not by loading itself on them too.
We still want to be able to run normal threads on them. Which means IPI, memory management, etc is still needed. So yes they better show up as normal CPUs :)
Also with dynamic isolation you can for example un-isolate a cpu when you're compiling stuff on the machine and then isolate it when you're running special app(s).

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/