On Wednesday 09 October 2002 6:02 pm, Erich Focht wrote:
> > > Starting migration thread for cpu 3
> > > Bringing up 4
> > > CPU>dividNOWrro!
> >
> > I got the same thing on 2.5.40-mm1. It looks like it may be a a divide
> > by zero in calc_pool_load. I am attempting to boot a band-aid version
> > right now. OK, got a little further:
>
> This opened my eyes, thanks for all your help and patience!!!
>
> The problem is that the load balancer is called before the CPU pools
> were set up. That's fine, I thought, because I define in sched_init
> the default pool 0 to include all CPUs. But: in find_busiest_queue()
> the cpu_to_node(this_cpu) delivers a non-zero pool which is not set up
> yet, therefore pool_nr_cpus[pool]=0 and we get a zero divide.
>
> I'm still wondering why this doesn't happen on our architecture. Maybe
> the interrupts are disabled longer, I'll check. Anyway, a fix is to
> force this_pool to be 0 as long as numpools=1. The attached patch is a
> quick untested hack, maybe one can do it better. Has to be applied on top
> of the other 2.
Thanks very much Erich. I did come across another problem here on numa-q. In
task_to_steal() there is a divide by cache_decay_ticks, which apparantly is 0
on my system. This may have to do with notsc, but I am not sure. I set
cache_decay_ticks to 8, (I cannot boot without using notsc) which is probably
not correct, but I can now boot 16 processor numa-q on 2.5.40-mm1 with your
patches! I'll get some benchmark results soon.
Andrew Theurer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:37 EST