Re: __i915_spin_request() sucks

From: Jens Axboe
Date: Thu Nov 12 2015 - 17:52:11 EST


On 11/12/2015 03:19 PM, Chris Wilson wrote:
So today, I figured I'd try just killing that spin. If it fails, we'll
punt to normal completions, so easy change. And wow, MASSIVE difference.
I can now scroll in chrome and not rage! It's like the laptop is 10x
faster now.

Ran git blame, and found:

commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date: Tue Apr 7 16:20:41 2015 +0100

drm/i915: Optimistically spin for the request completion

and read the commit message. Doesn't sound that impressive. Especially
not for something that screws up interactive performance by a LOT.

What's the deal? Revert?

The tests that it improved the most were the latency sensitive tests and
since my Broadwell xps13 behaves itself, I'd like to understand how it
culminates in an interactivity loss.

1. Maybe it is the uninterruptible nature of the polling, making X's
SIGIO jerky:

This one still feels bad.

2. Or maybe it is increased mutex contention:

And so does this one... I had to manually apply hunks 2-3, and after doing seat-of-the-pants testing for both variants, I confirmed with perf that we're still seeing a ton of time in __i915_wait_request() for both of them.

Or maybe it is an indirect effect, such as power balancing between the
CPU and GPU, or just thermal throttling, or it may be the task being
penalised for consuming its timeslice (for which any completion polling
seems susceptible).

Look, polls in the 1-10ms range are just insane. Either you botched the commit message and really meant "~1ms at most" and in which case I'd suspect you of smoking something good, or you hacked it up wrong and used jiffies when you really wanted to be using some other time check that really did give you 1us.

I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks.

"Limit the spinning to a single jiffie (~1us) at most"

is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms!
Even if I had HZ=1000, that'd still be 1ms of spinning. That's
seriously screwed up, guys.

That's over and above the termination condition for blk_poll().

?! And this is related, how? Comparing apples and oranges. One is a test opt-in feature for experimentation, the other is unconditionally enabled for everyone. I believe the commit even says so. See the difference? Would I use busy loop spinning waiting for rotating storage completions, which are in the 1-10ms range? No, with the reason being that the potential wins for spins are in the usec range.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/