RFC on fixing mutex spinning on owner

From: Joel Fernandes
Date: Wed Mar 16 2016 - 19:22:24 EST


Hi,

On a fairly recent kernel and android userspace, I am seeing that with
i915 driver is in a spin loop waiting for mutex owner to release it
(mutex_spin_on_owner). I believe this because the owner of the mutex
is running on another CPU and the expectation is the mutex owner
releases the mutex or goes to sleep soon, so we avoid sleeping if we
fail to acquire mutex and continue to spin and try to acquire it much
like a spinlock (while disabling preemption through out the spinning).

My question is, what if the owner cannot or doesn't want to sleep and
holds the mutex runs for a while while holding it. (Lets also assume
that all other tasks are sleeping on the mutex owner's CPU so its not
preempted).

In this case, does it make sense to time out the spinning after a
while? Because preemption is disabled during the spinning so this
spinning business seems a very very bad thing.

Should the code holding the mutex and running (the owner) be fixed to
not hold mutex for a while? Or would a patch introducing a timeout of
a certain threshold on the spinning be welcomed?

To give numbers, I am seeing spinning of as long as 20 ms in the worst
case, while the mutex owner holds the mutex for 22 ms. The ftrace
preemptoff tracer goes off.

Thanks for any advice on what the right fix of the problem should be.

Best,
Joel