Re: [BUG] sched: big numa dynamic sched domain memory corruption

From: Siddha, Suresh B
Date: Tue Aug 01 2006 - 15:08:19 EST

On Tue, Aug 01, 2006 at 01:25:33AM -0700, Paul Jackson wrote:
> I wish you well on any further code improvements you have planned for
> this code. It's tough to understand, with such issues as many #ifdef's,
> an interesting memory layout of the key sched domain arrays that I
> didn't see described much in the comments, and a variety of memory
> allocation calls that are tough to unravel on error. Portions of
> the code could use some more comments, explaining what is going on.
> For example, I still haven't figured exactly what 'cpu_power' means.

I will add some info to Documentation/sched-domains.txt aswell as some
comments to the code where appropriate. I did some cleanup of the code
but unfortunately that got dropped because of some issues. I will repost
that cleanup patch aswell.

> The allocations of sched_group_allnodes, sched_group_phys and
> sched_group_core are -big- on our ia64 SN2 systems (1024 CPUs),
> and could fail once a system has been up for a while and is
> getting memory tight and fragmented.

I have to agree with you. I have an idea(basically passing cpu_map info
to functions which determine the group) to solve this issue. Let me work
on it and post a fix.

> It is not obvious to me from the code or comments just how sched
> domains are arranged on various large systems with hyper-threading
> (SMT) and/or multiple cores (MC) and/or multiple processor packages
> per node, and how scheduling is affected by all this.

Enabling SCHED_DOMAIN_DEBUG should atleast show how sched domains
and groups are arranged. Adding an example in Documentation might
be a good idea.

> This was about the third bug that has come by in it -- which I
> in particular notice when it is someone playing with cpu_exclusive
> cpusets who hits the bug. Any kernel backtrace with 'cpuset' buried in
> it tends to migrate to my inbox. This latest bug was particularly
> nasty, as is usually the case with random memory corruption bugs,
> costing us a bunch of hours.
> Good luck.
> If you are aware of any other fixes/patches besides the above that us
> big honkin numa iron SLES10 users need for reliable operation, let me
> know.

Will keep you in loop.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at