Please, wrap your emails at 78 - most mailers can do this.Done.
On Fri, 2008-02-22 at 14:05 -0800, Max Krasnyanskiy wrote:Unless I'm missing something that's only possible for a very static system. What I mean is that yes you could go and set irq affinity, apps affinity, workqueue thread affinity, etc not to run on the isolated cpus. It works _until_ something changes, at which point the system needs to know that it's not supposed to touch CPU N.Peter Zijlstra wrote:On Thu, 2008-02-21 at 18:38 -0800, Max Krasnyanskiy wrote:
That's not not currently possible and will introduce a lot of complexity.List of commitscpu_isolated_map was a bad hack when it was introduced, I feel we should
cpuisol: Make cpu isolation configrable and export isolated map
deprecate it and fully integrate the functionality into cpusets. That would
give a much more flexible end-result.
I'm pretty sure you missied the discussion I had with Paul (you were cc'ed on that btw).
In fact I provided the link to that discussion in the original email.
Here it is again:
http://marc.info/?l=linux-kernel&m=120180692331461&w=2
I read it, I just firmly disagree.
Basically the problem is very simple. CPU isolation needs a simple/efficient way to check if CPU N is isolated.
I'm not seeing the need for that outside of setting up the various
states. That is, once all the affinities are set up, you'd hardly ever
would (or should - imho) need to know if a particular CPU is isolated or not.
No, not really.cpuset/cgroup APIs are not designed for that. In other to figure
out if a CPU N is isolated one has to iterate through all cpusets and checks their cpu maps.
That requires several levels of locking (cgroup and cpuset).
The other issue is that cpusets are a bit too dynamic (see the thread above for more details)
we'd need notified mechanisms to notify subsystems when a CPUs become isolated. Again more
complexity. Since I integrated cpu isolation with cpu hotplug it's already addressed in a nice
simple way.
I guess you have another definition of nice than I do.
I do not see a specific proposals here. The funny part that we're not even disagreeing on the high level. Yes It'd be nice to have such a flag ;-)Please take a look at that discussion. I do not think it's worth the effort to put this into
cpusets. cpu_isolated_map is very clean and simple concept and integrates nicely with the rest of the cpu maps. ie It's very much the same concept and API as cpu_online_map, etc.
I'm thinking cpu_isolated_map is a very dirty hack.
If we want to integrate this stuff with cpusets I think the best approach would be is to have cpusets update the cpu_isolated_map just like it currently updates scheduler domains.CPU-sets can already isolate cpus by either creating a cpu outside of any set,This only works for user-space. As I mentioned about for full CPU isolation various kernel subsystems need to be aware of that CPUs are isolated in order to avoid activities on them.
or a set with a single cpu not shared by any other sets.
Yes, hence the proposed system flag to handle the kernel bits like
unbounded kernel threads and IRQs.
One way I can think of how to support groups and allow for RT balancer is this: Make scheduler ignore cpu_isolated_map and give cpusets full control of the scheduler domains. Use cpu_isolated_map to only for hw irq and other kernel sub-systems. That way cpusets could mark cpus in the group as isolated to get rid of the kernel activity and build sched domain such that tasks get balanced in it.This also allows for isolated groups, there are good reasons to isolate groups,
esp. now that we have a stronger RT balancer. SMP and hard RT are not
exclusive. A design that does not take that into account is too rigid.
You're thinking scheduling only. Paul had the same confusion ;-)
I'm not, I'm thinking it ought to allow for it.
It's still optional. It's the sane default though. I mean users are welcome to change affinity mask of an active IRQ.As I explained before I'm redefining (or proposing to redefine) CPU isolation to
1. Isolated CPU(s) must not be subject to scheduler load balancing
Users must explicitly bind threads in order to run on those CPU(s).
I'm thinking that should be optional, load-balancing might make sense
under some scenarios.
I guess maybe you're not refuting them. But you're not giving me concrete examples of how low level stuff would work. Lets talks specifics. I'm not stuck on the cpu_isolated_map, if there are better options I'm all for it.2. By default interrupts must not be routed to the isolated CPU(s)
User must route interrupts (if any) to those CPUs explicitly.
That is what I allowed for by the system flag.
3. In general kernel subsystems must avoid activity on the isolated CPU(s) as much as possible
Includes workqueues, per CPU threads, etc.
This feature is configurable and is disabled by default.
How am I refuting any of these points? I'm very clear on what you want,
I'm just saying I want to get there differently.
"Default" affinity was the keyword here. Genirq definitely deals with affinities in general. Anyway, as you saw I rewrote it on top of genirq.Ah, good point. This patches started before genirq was merged and I did not realizecpuisol: Do not route IRQs to the CPUs isolated at boot>From the diffstat you're not touching the genirq stuff, but instead hack a
single architecture to support this feature. Sounds like an ill designed hack.
that there is a way to set default irq affinity with genirq.
I'll definitely take a look at that.
I said so before, but good to see you've picked this up. I didn't know
if there was a genirq way to set smp affinities - but if there wasn't
this was a good moment to introduce that.
Please be specific. Lets say we get rid of the cpu_isolated_map. How would that "system" flag be propagated to the lowest layers like genirq, workqueue, etc ?A better approach would be to add a flag to the cpuset infrastructure that saysYou're talking about very high level API. I'm totally all for it. What the patches deal with is actual low level stuff that is needed to do the "push the system away from those cpus".
whether its a system set or not. A system set would be one that services the
general purpose OS and would include things like the IRQ affinity and unbound
kernel threads (including unbound workqueues - or single workqueues). This flag
would default to on, and by switching it off for the root set, and a select
subset you would push the System away from those cpus, thereby isolating them.
As I mentioned above we can have cpuset update cpu_isolated_map for example.
I'm not wanting to extend cpu_isolated_map - I'm wanting to get rid of
it entirely.
I think there were other cases besides flushing that wanted all threads to report back. But I do not remember exact scenario of the top of my head.Oh boy, back to square one. I covered this already.cpuisol: Do not schedule workqueues on the isolated CPUs(per-cpu workqueues, the single ones are treated in the previous section)
I still strongly disagree with this approach. Workqueues are passive, they
don't do anything unless work is provided to them. By blindly not starting them
you handicap the system and services that rely on them.
I even started a thread on that and explained what this is and why its needed.
http://marc.info/?l=linux-kernel&m=120217173001671&w=2
You guys never replied.
Right, sorry, I did indeed miss that one (might still be in my inbox -
I'm having trouble getting to 0 unread these days).
I'm thinking moving away from flush would be a good idea. Often there is
no need for a full flush but just wait for completion of all 'your'
worklets - as opposed by all worklets.
I've ran into this in -rt in another context but managed to work around
it. It might be time to look at it again.
(fwiw, workqueues in -rt are priority aware)That does not help me because I need full isolation.
I agree in general. From my experience though it is kind of hard and the system might be a bit fragile.You last statement above (about handicap) does not make much sense with respect to the CPU isolation. I mean we cannot have it both ways, for isolated CPUs not to run kernel subsystems
and at the same time say that we must not impact the system. Of course it impacts the system,
in a predictable and controlled way. Non-isolated CPUs are not affected at all. And app threads that need full system services should not be running on the isolated CPUs.
I'm saying we should remove the generation of work, not the execution of
work when its needed. By keeping the workqueues around you might even
allow routing a single non custom IRQ to an isolated CPU. One that uses
softirqs for instance.
Its all about allowing other scenarios than the one you need, while
serving your needs equally well.