Re: Linux 2.6.32-rc1

From: Martin Schwidefsky
Date: Mon Sep 28 2009 - 16:56:37 EST


On Mon, 28 Sep 2009 20:41:41 +0200
Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:

> I did a bisection and found commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
> was the origin of the problem on my x86_32 machine.
>
> def0a9b2573e00ab0b486cb5382625203ab4c4a6 is first bad commit
> commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
> Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Date: Fri Sep 18 20:14:01 2009 +0200
>
> sched_clock: Make it NMI safe
>
> Arjan complained about the suckyness of TSC on modern machines, and
> asked if we could do something about that for PERF_SAMPLE_TIME.
>
> Make cpu_clock() NMI safe by removing the spinlock and using
> cmpxchg. This also makes it smaller and more robust.
>
> Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
> and x86.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

Confirmed. The bisect run on my machine gave me the same bad commit.
The new logic in sched_clock_remove seems racy: the old code got the
locks for the sched_clock_data of the local and the remove cpu before
it changed any value. The new code tries to get to the same result with
a single cmpxchg. Bad things happen if two cpus try to update the clock
values crosswise, no?

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/