Re: 2.6.12-mm1 boot failure on NUMA box.

From: Martin J. Bligh
Date: Fri Jun 24 2005 - 12:19:38 EST


>> OK, still broken with the last 3 backed out, but works with the last 4
>> backed out. So I guess it's scheduler-cache-hot-autodetect.patch that
>> breaks it. Con just sent me something else to try to fix it in order
>> to run next ... will do that.
>
> hm. Does it work if you disable migration-autodetect via passing in e.g.
> migration_cost=1000,2000,3000 on the boot line? Is it perhaps the
> excessive debugging that hurts.
>
> or does it work if you undo the chunk below? Seemed harmless, but has
> CONFIG_NUMA relevance.
>
> Ingo
>
> --- linux/arch/i386/kernel/timers/timer_tsc.c.orig
> +++ linux/arch/i386/kernel/timers/timer_tsc.c
> @@ -133,18 +133,15 @@ static unsigned long long monotonic_cloc
>
> /*
> * Scheduler clock - returns current time in nanosec units.
> + *
> + * it's not a problem if the TSC is unsynchronized,
> + * as the scheduler will carefully compensate for it.
> */
> unsigned long long sched_clock(void)
> {
> unsigned long long this_offset;
>
> - /*
> - * In the NUMA case we dont use the TSC as they are not
> - * synchronized across all CPUs.
> - */
> -#ifndef CONFIG_NUMA
> - if (!use_tsc)
> -#endif
> + if (!cpu_has_tsc)
> /* no locking but a rare wrong value is not a big deal */
> return jiffies_64 * (1000000000 / HZ);

Humpf. That does look dangerous on a NUMA-Q. The TSCs aren't synced,
and we can't use them .... have to use PIT, whether the CPUs have TSC
or not.

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/