Re: [Intel-gfx] BUG: sleeping function called from invalid contexton 3.10.10-rt7

From: Mario Kleiner
Date: Thu Sep 26 2013 - 12:44:04 EST


On 25.09.13 09:49, Ville Syrjälä wrote:
On Wed, Sep 25, 2013 at 06:32:10AM +0200, Mario Kleiner wrote:
On 23.09.13 10:38, Ville Syrjälä wrote:
On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote:
On 09/17/2013 10:55 PM, Daniel Vetter wrote:
On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
On 09/11/2013 03:31 PM, Peter Hurley wrote:

[+cc dri-devel]

On 09/11/2013 11:38 AM, Steven Rostedt wrote:

On Wed, 11 Sep 2013 11:16:43 -0400
Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:

The funny part is, there's a comment there that shows that this was
done even for "PREEMPT_RT". Unfortunately, the call to
"get_scanout_position()" can call functions that use the rt-mutex
"sleeping spin locks" and it breaks there.

I guess we need to ask the authors of the mainline patch exactly why
that preempt_disable() is needed?


The drm core associates a timestamp with each vertical blank frame #.
Drm drivers can optionally support a 'high resolution' hw timestamp.
The vblank frame #/timestamp tuple is user-space visible.

The i915 drm driver supports a hw timestamp via this drm helper function
which computes the timestamp from the crtc scan position (based on the
pixel clock).

For mainline, the preempt_disable/_enable() isn't actually necessary
because every call tree that leads here already has preemption disabled.

For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?


No, it should not. Note, any other lock that can be held when it is
held would also need to be raw.


By that, you mean "any other lock" that might be claimed "would also need
to be raw"? Hopefully not "any other lock" already held?

And by taking a quick audit of the code, I see this:

spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);

/* Reset the chip */

/* GEN6_GDRST is not in the gt power well, no need to check
* for fifo space for the write or forcewake the chip for
* the read
*/
__raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL);

/* Spin waiting for the device to ack the reset request */
ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) &
GEN6_GRDOM_FULL) == 0, 500);

That spin is unacceptable in RT with preemption and interrupts disabled.


Yep. That would be bad.

AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included
in the force-wake set, so raw reads of the registers would
probably be acceptable (thus obviating the need for claiming the
uncore.lock).

Except that _ALL_ register access is disabled with the uncore.lock
during a gpu reset. Not sure if that's meant to include crtc registers
or not, or what other synchronization/serialization issues are being
handled/hidden by forcing all register accesses to wait during a gpu
reset.

Hopefully an i915 expert can weigh in here?



Daniel,

Can you shed some light on whether the i915+ crtc registers (specifically
those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter())
read as part of the vblank counter/timestamp handling need to
be prevented during gpu reset?

The depency here in the locking is a recent addition:

commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date: Fri Jul 19 20:36:51 2013 +0100

drm/i915: Serialize almost all register access

It's a (slightly) oversized hammer to work around a hardware issue -
we could break it down to register blocks, which can be accessed
concurrently, but that tends to be more fragile. But the chip really
dies if you access (even just reads) the same block concurrently :(

We could try break the spinlock protected section a bit in the reset
handler - register access on a hung gpu tends to be ill-defined
anyway.

The implied wait with preemption and interrupts disabled is causing grief
in -RT, but also a 4ms wait inside an irq handler seems like a bad idea.

Oops, the magic code in wait_for which is just there to make the imo
totally misguided kgdb support work papered over the aweful long wait
in atomic context ever since we've added this in

commit b6e45f866465f42b53d803b0c574da0fc508a0e9
Author: Keith Packard <keithp@xxxxxxxxxx>
Date: Fri Jan 6 11:34:04 2012 -0800

drm/i915: Move reset forcewake processing to gen6_do_reset

Reverting this change should be enough (code moved obviously a bit).

Cheers, Daniel


Regards,
Peter Hurley



What's the real issue here?


That the vblank timestamp needs to be an accurate measurement of a
realtime event. Sleeping/servicing interrupts while reading
the registers necessary to compute the timestamp would be bad too.

(edit: which hopefully Mario Kleiner clarified in his reply)

My point earlier was three-fold:
1. Don't need the preempt_disable() for mainline: all callers are already
holding interrupt-disabling spinlocks.
2. -RT still needs to prevent scheduling there.
3. the problem is i915-specific.

[update: the radeon driver should also BUG like the i915 driver but
probably
should have mmio_idx_lock spinlock as raw]


Ok, so register access must be serialized, at least within a register
block, no way around it. Thanks for explaining this. That makes me a bit
depressed. Is this also true for older hw generations than gen6?

Daniel, could we move access to the display register block to a separate
lock, so the I915_READ to PIPEDSL in i915_get_crtc_scanoutpos() can
avoid the uncore.lock? If the display registers don't need forcewake
then i assume we could have much shorter lock hold times by avoiding the
large up to 4 msecs waits in forcewake handling. Probably also much less
contention in normal operation when no modesetting happens? I think i
can get rid of the other register reads in that function. Those settings
are already cached and accessible from the intel_crtc_config->adjusted_mode.

I actually have a patch to kill most of the registers reads in
get_scanout_position on i915 somewhere. Let me dig that out and send it
to the list...

<snip>
In any case, maybe the simple retry 3x loop in the DRM for measuring the
timestamp is not good enough anymore to keep this reliable and accurate.
Maybe we would need some loop that retries until an accurate measurement
or a user configurable timeout is reached. Then users like mine could
set a very high timeout which rather totally degrades the system and
causes severe stuttering of graphics updates than silently failing with
inaccurate timestamps. The default could be something safe for normal
desktop use.

I never really understood the need for the preempt disabled retry loop in
drm core. What I would do is just something like this:

get_scanoutpos_and_timestamp()
{
local_irq_disable();
get_scanoutpos();
get_timestamp();
local_irq_enable();
}

The preempt_disable serves the same purpose for PREEMPT_RT kernels as
your local_irq_disable in your example - get rid of any preventable
interruption, so a retry is unlikely to be ever needed. At least that
was the idea.

I assume if a spin_lock_irqsave doesn't really disable interrupts on a
RT kernel with normal spinlocks then local_irq_disable won't really
disable interrupts either?


If that has a lot of error, then it seems to me that the system is just
crap and we shouldn't care. Or if you're really worried about SMIs and
such then you could still do a retry loop.


I didn't expect a lot of error with preemption and interrupts disabled,
essentially only such infrequent things as smi's, that's why the retry
loop only tries 3x max, once for real, once in case the 1st one got
spoiled by a smi or such, and once because three times a charm ;-) - In
practice i didn't ever observe more than 3-4 usecs of delay, well below
the 20 usecs retry threshold. But i didn't expect
i915_crtc_get_scanoutpos() to ever take any locks or other stuff that
could induce long waits.

But given the new situation, your proposal is great! If we push the
clock readouts into the get_scanoutpos routine, we can make this robust
without causing grief for the rt people and without the need for a new
separate lock for display regs in intel-kms.

E.g., for intel-kms:

i915_get_crtc_scanoutpos(..., ktime_t *stime, ktime_t *etime)
{
...
spin_lock_irqsave(...uncore.lock);
preempt_disable();
*stime = ktime_get();
position = __raw_i915_read32(dev_priv, PIPEDSL(pipe));
*etime = ktime_get();
preempt_enable();
spin_unlock_irqrestore(...uncore.lock)
...
}

With your patchset to reduce the amount of register reads needed in that
function, and given that forcewake handling isn't needed for these
registers, this should make it robust again and wouldn't need new locks.

Unless ktime_get is also a bad thing to do in a preempt disabled section?

The preempt_disable/enable is not needed. The spinlock serves the same
purpose already.

As far as ktime_get(), I've used it from spinlocked/irq disabled sections
and so far haven't seen it do bad things. But would be nice to get some
official statement to that effect.

To minimize the chance of breakage, I guess we could add a new func
->get_scanout_pos_and_time or something like that, fill it by default
with the code from the current drm_calc_vbltimestamp_from_scanoutpos().

Or we could just shovel the delta_ns handling from
drm_calc_vbltimestamp_from_scanoutpos() into some small helper function
that we could call from i915_get_vblank_timestamp(), and then we can
call i915_get_crtc_scanoutpos() directly from there as well.


I would like to keep most of the logic for scanoutpos timestamping in the the current drm_calc_vbltimestamp_from_scanoutpos() and only have the inner part of the retry loop in i915_get_crtc_scanoutpos(), so we have as much shared code as possible between drivers there. Makes it easier to keep track of what changes when, fix stuff, and to apply the module parameters related to timestamping.

We could add a new get_scanout_pos_with_time(), but afaik only i915 and radeon-kms currently use/implement that function, so maybe we can just convert those bits in one go, as this is only kernel internal api?

-mario

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/