Re: [PATCH 3.16 012/114] drm/i915: Exit cherryview_irq_handler() after one pass

From: Ville Syrjälä
Date: Tue Jun 14 2016 - 08:08:34 EST


On Tue, Jun 14, 2016 at 12:37:34PM +0100, Ben Hutchings wrote:
> On Tue, 2016-06-14 at 13:47 +0300, Ville Syrjälä wrote:
> > On Mon, Jun 13, 2016 at 07:36:37PM +0100, Ben Hutchings wrote:
> > > 3.16.36-rc1 review patch.  If anyone has any objections, please let me know.
> >
> > Do not backport this one. It'll break things.
>
> But this has not been re-reverted in mainline, has it?  Is it that
> 3.16-stable would need more changes backported to make this work, or is
> mainline currently broken on Cherryview hardware?

No, as of 4.7 we have a proper fix, but it's a bit too big to backport
(see [1]). I think 4.6.x is still busted, but Greg said he'd revert
this broken patch there, so it should get fixed eventually.

OTOH CHV wasn't even officially supported until maybe 4.1, so whatever
you do in 3.16 shouldn't really matter.

It's a bit tedious having to block the same patch from different stable
trees over and over again. It would be nice it there would be some kind
of stable blacklist you guys could share so that we wouldn't have to
repeat this dance with every stable maintainer...

[1] http://thread.gmane.org/gmane.linux.kernel.stable/179312/focus=181316

>
> Ben.
>
> > >
> > > ------------------
> > >
> > > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > >
> > > commit 9dbaab56ac09f07a73fe83bf69bec3e31060080a upstream.
> > >
> > > This effectively reverts
> > >
> > > commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> > > Author: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > Date:   Wed Apr 9 13:28:50 2014 +0300
> > >
> > >     drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
> > >
> > > as under continuous execlists load we can saturate the IRQ handler,
> > > destablising the tsc clock and triggering the NMI watchdog to declare a hung
> > > CPU.
> > >
> > > [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> > > [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> > > [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> > > [  552.756210] clocksource: Switched to clocksource refined-jiffies
> > > [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> > > [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> > > [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> > > [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> > > [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> > > [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> > > [  575.217967] Call Trace:
> > > [  575.217973]    [] dump_stack+0x4f/0x72
> > > [  575.217994]  [] watchdog_overflow_callback+0x151/0x160
> > > [  575.218003]  [] __perf_event_overflow+0xa0/0x1e0
> > > [  575.218016]  [] perf_event_overflow+0x14/0x20
> > > [  575.218028]  [] intel_pmu_handle_irq+0x1da/0x460
> > > [  575.218042]  [] ? poll_idle+0x3e/0x70
> > > [  575.218052]  [] ? poll_idle+0x3e/0x70
> > > [  575.218064]  [] perf_event_nmi_handler+0x28/0x50
> > > [  575.218075]  [] nmi_handle+0x60/0x130
> > > [  575.218086]  [] ? poll_idle+0x3e/0x70
> > > [  575.218096]  [] do_nmi+0x140/0x470
> > > [  575.218108]  [] end_repeat_nmi+0x1a/0x1e
> > > [  575.218119]  [] ? poll_idle+0x3e/0x70
> > > [  575.218129]  [] ? poll_idle+0x3e/0x70
> > > [  575.218139]  [] ? poll_idle+0x3e/0x70
> > > [  575.218148]  <>  [] cpuidle_enter_state+0xf3/0x2f0
> > > [  575.218164]  [] cpuidle_enter+0x17/0x20
> > > [  575.218175]  [] call_cpuidle+0x2a/0x40
> > > [  575.218185]  [] cpu_startup_entry+0x273/0x330
> > > [  575.218196]  [] start_secondary+0x10e/0x130
> > >
> > > However, not servicing all available IIR within the handler does hurt the
> > > throughput of pathological nop execbuf by about 20%, with a similar effect
> > > upon the dispatch latency of a series of execbuf.
> > >
> > > v2: use do {} while(0) for a smaller patch, and easier to revert again
> > >
> > > I have reasonable confidence that we do not miss GT interrupts (as
> > > execlists provides a stress case with a failure mechanism easily
> > > detected by igt), however I have less confidence about all the other
> > > sources of interrupts and worry that may lose a display hotplug
> > > interrupt, for example.
> > >
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> > > Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > Cc: Antti Koskipää <antti.koskipaa@xxxxxxxxxxxxxxx>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> > > Reviewed-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > Link: http://patchwork.freedesktop.org/patch/msgid/1457946117-6714-1-git-send-email-chris@xxxxxxxxxxxxxxxxxx
> > > (cherry picked from commit 579de73b048a0a4c66c25a033ac76a2836e0cf73)
> > > Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx>
> > > Signed-off-by: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
> > > ---
> > >  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -1875,7 +1875,7 @@ static irqreturn_t cherryview_irq_handle
> > >   u32 master_ctl, iir;
> > >   irqreturn_t ret = IRQ_NONE;
> > >  
> > > - for (;;) {
> > > + do {
> > >   master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> > >   iir = I915_READ(VLV_IIR);
> > >  
> > > @@ -1897,7 +1897,7 @@ static irqreturn_t cherryview_irq_handle
> > >   POSTING_READ(GEN8_MASTER_IRQ);
> > >  
> > >   ret = IRQ_HANDLED;
> > > - }
> > > + } while (0);
> > >  
> > >   return ret;
> > >  }
> >
> --
> Ben Hutchings
> We get into the habit of living before acquiring the habit of thinking.
>                                                               - Albert
> Camus



--
Ville Syrjälä
Intel OTC