RFC: revert 43fa5460fe60

From: Jörn Engel
Date: Mon Feb 23 2015 - 19:49:10 EST


Hello Steven!

I came across a silly problem that tempted me to revert 43fa5460fe60.
We had a high-priority realtime thread woken, TIF_NEED_RESCHED was set
for the running thread and the realtime thread didn't run for >2s.
Problem was a system call that mapped a ton of device memory and never
hit a cond_resched() point. Obvious solution is to fix the long-running
system call.

Applying that solution quickly turns into a game of whack-a-mole. Not
the worst game in the world and all those moles surely deserve a solid
hit on the head. But what is annoying in my case is that I had plenty
of idle cpus during the entire time and the high-priority thread was
allowed to run anywhere. So if the thread had been moved to a different
runqueue immediately there would have been zero delay. Sure, the cache
is semi-cold or the migration may even be cross-package. That is a
tradeoff we are willing to make and we set the cpumask explicitly that
way. We want this thread to run quickly, anywhere.

So we could check for currently idle cpus when waking realtime threads.
If there are any, immediately move the woken thread over. Maybe have a
check whether the running thread on the current cpu is in a syscall and
retain current behaviour if not.

Now this is not quite the same as reverting 43fa5460fe60 and I would
like to verify the idea before I spend time on a patch you would never
consider merging anyway.

Jörn

--
As more agents are added, systems become more reliable in the total-effort
case, but less reliable in the weakest-link case. What are the implications?
Well, software companies schould hire more software testers and fewer (but
more competent) programmers.
-- Ross Anderson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/