Re: [BUG] long freezes on thinkpad t60

From: Ingo Molnar
Date: Thu Jun 21 2007 - 03:30:57 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> In other words, spinlocks are optimized for *lack* of contention. If a
> spinlock has contention, you don't try to make the spinlock "fair".
> No, you try to fix the contention instead!

yeah, and if there's no easy solution, change it to a mutex. Fastpath
performance of spinlocks and mutexes is essentially the same, and if
there's any measurable contention then the scheduler is pretty good at
sorting things out. Say if the average contention is longer than 10-20
microseconds then likely we could already win by scheduling away to some
other task. (the best is of course to have no contention at all - but
there are causes where it is real hard, and there are cases where it's
outright unmaintainable.)

Hw makers are currently producing transistors disproportionatly faster
than humans are producing parallel code, as a result of which we've got
more CPU cache than ever, even taking natural application bloat into
account. (it just makes no sense to spend those transistors on
parallelism when applications are just not making use of it yet. Plus
caches are a lot less power intense than functional units of the CPU,
and the limit these days is power input.)

So scheduling more frequently and more agressively makes more sense than
ever before and that trend will likely not stop for some time to come.

> The patch I sent out was an example of that. You *can* fix contention
> problems. Does it take clever approaches? Yes. It's why we have hashed
> spinlocks, RCU, and code sequences that are entirely lockless and use
> optimistic approaches. And suddenly you get fairness *and*
> performance!

what worries me a bit though is that my patch that made spinlocks
equally agressive to that loop didnt solve the hangs! So there is some
issue we dont understand yet - why was the wait_inactive_task()
open-coded spin-trylock loop starving the other core which had ... an
open-coded spin-trylock loop coded up in assembly? And we've got a
handful of other open-coded loops in the kernel (networking for example)
so this issue could come back and haunt us in a situation where we dont
have a gifted hacker like Miklos being able to spend _weeks_ to track
down the problem...

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/