Re: [PATCH REBASED v10 1/6] drm/i915/skl: Add support for the SAGV, fix underrun hangs

From: Maarten Lankhorst
Date: Thu Aug 11 2016 - 05:23:21 EST


Op 10-08-16 om 16:27 schreef Lyude:
> Since the watermark calculations for Skylake are still broken, we're apt
> to hitting underruns very easily under multi-monitor configurations.
> While it would be lovely if this was fixed, it's not. Another problem
> that's been coming from this however, is the mysterious issue of
> underruns causing full system hangs. An easy way to reproduce this with
> a skylake system:
>
> - Get a laptop with a skylake GPU, and hook up two external monitors to
> it
> - Move the cursor from the built-in LCD to one of the external displays
> as quickly as you can
> - You'll get a few pipe underruns, and eventually the entire system will
> just freeze.
>
> After doing a lot of investigation and reading through the bspec, I
> found the existence of the SAGV, which is responsible for adjusting the
> system agent voltage and clock frequencies depending on how much power
> we need. According to the bspec:
>
> "The display engine access to system memory is blocked during the
> adjustment time. SAGV defaults to enabled. Software must use the
> GT-driver pcode mailbox to disable SAGV when the display engine is not
> able to tolerate the blocking time."
>
> The rest of the bspec goes on to explain that software can simply leave
> the SAGV enabled, and disable it when we use interlaced pipes/have more
> then one pipe active.
>
> Sure enough, with this patchset the system hangs resulting from pipe
> underruns on Skylake have completely vanished on my T460s. Additionally,
> the bspec mentions turning off the SAGV with more then one pipe enabled
> as a workaround for display underruns. While this patch doesn't entirely
> fix that, it looks like it does improve the situation a little bit so
> it's likely this is going to be required to make watermarks on Skylake
> fully functional.
>
> Changes since v9:
> - Only enable/disable sagv on Skylake
> Changes since v8:
> - Add intel_state->modeset guard to the conditional for
> skl_enable_sagv()
> Changes since v7:
> - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
> all we use it for anyway)
> - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
> - Fix a styling error that snuck past me
> Changes since v6:
> - Protect skl_enable_sagv() with intel_state->modeset conditional in
> intel_atomic_commit_tail()
> Changes since v5:
> - Don't use is_power_of_2. Makes things confusing
> - Don't use the old state to figure out whether or not to
> enable/disable the sagv, use the new one
> - Split the loop in skl_disable_sagv into it's own function
> - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
> Changes since v4:
> - Use is_power_of_2 against active_crtcs to check whether we have > 1
> pipe enabled
> - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
> enabled
> - Call skl_sagv_enable/disable() from pre/post-plane updates
> Changes since v3:
> - Use time_before() to compare timeout to jiffies
> Changes since v2:
> - Really apply minor style nitpicks to patch this time
> Changes since v1:
> - Added comments about this probably being one of the requirements to
> fixing Skylake's watermark issues
> - Minor style nitpicks from Matt Roper
> - Disable these functions on Broxton, since it doesn't have an SAGV
>
> Reviewed-by: Matt Roper <matthew.d.roper@xxxxxxxxx>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
> Signed-off-by: Lyude <cpaul@xxxxxxxxxx>
> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Cc: Ville SyrjÃlà <ville.syrjala@xxxxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
>
> squash! drm/i915/skl: Add support for the SAGV, fix underrun hangs
>
> squash! drm/i915/skl: Add support for the SAGV, fix underrun hangs
>
> Signed-off-by: Lyude <cpaul@xxxxxxxxxx>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 +
> drivers/gpu/drm/i915/i915_reg.h | 4 ++
> drivers/gpu/drm/i915/intel_display.c | 12 ++++
> drivers/gpu/drm/i915/intel_drv.h | 2 +
> drivers/gpu/drm/i915/intel_pm.c | 112 +++++++++++++++++++++++++++++++++++
> 5 files changed, 132 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7f2754a..fa3c9f9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1944,6 +1944,8 @@ struct drm_i915_private {
> struct i915_suspend_saved_registers regfile;
> struct vlv_s0ix_state vlv_s0ix_state;
>
> + bool skl_sagv_enabled;
> +
> struct {
> /*
> * Raw watermark latency values:
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index da82744..6c3947f 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7143,6 +7143,10 @@ enum {
> #define HSW_PCODE_DE_WRITE_FREQ_REQ 0x17
> #define DISPLAY_IPS_CONTROL 0x19
> #define HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL 0x1A
> +#define GEN9_PCODE_SAGV_CONTROL 0x21
> +#define GEN9_SAGV_DISABLE 0x0
> +#define GEN9_SAGV_IS_DISABLED 0x1
> +#define GEN9_SAGV_DYNAMIC_FREQ 0x3
> #define GEN6_PCODE_DATA _MMIO(0x138128)
> #define GEN6_PCODE_FREQ_IA_RATIO_SHIFT 8
> #define GEN6_PCODE_FREQ_RING_RATIO_SHIFT 16
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 0ae2707..302cb1f 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -13769,6 +13769,14 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> intel_state->cdclk_pll_vco != dev_priv->cdclk_pll.vco))
> dev_priv->display.modeset_commit_cdclk(state);
>
> + /*
> + * SKL workaround: bspec recommends we disable the SAGV when we
> + * have more then one pipe enabled
> + */
> + if (IS_SKYLAKE(dev_priv) &&
> + hweight32(intel_state->active_crtcs) > 1)
> + skl_disable_sagv(dev_priv);
> +
> intel_modeset_verify_disabled(dev);
> }
>
> @@ -13842,6 +13850,10 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> intel_modeset_verify_crtc(crtc, old_crtc_state, crtc->state);
> }
>
> + if (IS_SKYLAKE(dev_priv) && intel_state->modeset &&
> + hweight32(intel_state->active_crtcs) <= 1)
> + skl_enable_sagv(dev_priv);
> +
> drm_atomic_helper_commit_hw_done(state);
>
> if (intel_state->modeset)
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index cbce786..e35799d 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1698,6 +1698,8 @@ void ilk_wm_get_hw_state(struct drm_device *dev);
> void skl_wm_get_hw_state(struct drm_device *dev);
> void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
> struct skl_ddb_allocation *ddb /* out */);
> +int skl_enable_sagv(struct drm_i915_private *dev_priv);
> +int skl_disable_sagv(struct drm_i915_private *dev_priv);
> uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
> bool ilk_disable_lp_wm(struct drm_device *dev);
> int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 81ab119..e90b974 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -2884,6 +2884,116 @@ skl_wm_plane_id(const struct intel_plane *plane)
> }
>
> static void
> +skl_sagv_get_hw_state(struct drm_i915_private *dev_priv)
> +{
> + u32 temp;
> + int ret;
> +
> + if (IS_BROXTON(dev_priv))
> + return;
> +
> + mutex_lock(&dev_priv->rps.hw_lock);
> + ret = sandybridge_pcode_read(dev_priv, GEN9_PCODE_SAGV_CONTROL, &temp);
> + mutex_unlock(&dev_priv->rps.hw_lock);
You need to initialize temp before calling sandybridge_pcode_read, it's confusing but the function's really pcode_writeread.

It also means you probably can't read out SAGV without adjusting it's value. :(

Maybe set sagv initially based on how many crtc's are active after hw readout?

> + if (!ret) {
> + dev_priv->skl_sagv_enabled = !(temp & GEN9_SAGV_IS_DISABLED);
> + } else {
> + /*
> + * If for some reason we can't access the SAGV state, follow
> + * the bspec and assume it's enabled
> + */
> + DRM_ERROR("Failed to get SAGV state, assuming enabled\n");
> + dev_priv->skl_sagv_enabled = true;
> + }
> +}
> +
> +/*
> + * SAGV dynamically adjusts the system agent voltage and clock frequencies
> + * depending on power and performance requirements. The display engine access
> + * to system memory is blocked during the adjustment time. Having this enabled
> + * in multi-pipe configurations can cause issues (such as underruns causing
> + * full system hangs), and the bspec also suggests that software disable it
> + * when more then one pipe is enabled.
> + */
> +int
> +skl_enable_sagv(struct drm_i915_private *dev_priv)
> +{
> + int ret;
> +
> + if (IS_BROXTON(dev_priv))
> + return 0;
> + if (dev_priv->skl_sagv_enabled)
> + return 0;
> +
> + mutex_lock(&dev_priv->rps.hw_lock);
> + DRM_DEBUG_KMS("Enabling the SAGV\n");
> +
> + ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL,
> + GEN9_SAGV_DYNAMIC_FREQ);
> + if (!ret)
> + dev_priv->skl_sagv_enabled = true;
> + else
> + DRM_ERROR("Failed to enable the SAGV\n");
> +
> + /* We don't need to wait for SAGV when enabling */
> + mutex_unlock(&dev_priv->rps.hw_lock);
> + return ret;
> +}
> +
> +static int
> +skl_do_sagv_disable(struct drm_i915_private *dev_priv)
> +{
> + int ret;
> + uint32_t temp;
> +
> + ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL,
> + GEN9_SAGV_DISABLE);
> + if (ret) {
> + DRM_ERROR("Failed to disable the SAGV\n");
> + return ret;
> + }
> +
> + ret = sandybridge_pcode_read(dev_priv, GEN9_PCODE_SAGV_CONTROL,
> + &temp);
> + if (ret) {
> + DRM_ERROR("Failed to check the status of the SAGV\n");
> + return ret;
> + }
> +
> + return temp & GEN9_SAGV_IS_DISABLED;
> +}
This will work better if you remove the sandybridge_pcode_write, and initialise temp to GEN9_SAGV_DISABLE.

The read function first writes something, then records its response. Right now it's checking the response to garbage.

Can you verify that this works?
> +int
> +skl_disable_sagv(struct drm_i915_private *dev_priv)
> +{
> + int ret, result;
> +
> + if (IS_BROXTON(dev_priv))
> + return 0;
> + if (!dev_priv->skl_sagv_enabled)
> + return 0;
> +
> + mutex_lock(&dev_priv->rps.hw_lock);
> + DRM_DEBUG_KMS("Disabling the SAGV\n");
> +
> + /* bspec says to keep retrying for at least 1 ms */
> + ret = wait_for(result = skl_do_sagv_disable(dev_priv), 1);
> + mutex_unlock(&dev_priv->rps.hw_lock);
> +
> + if (ret == -ETIMEDOUT) {
> + DRM_ERROR("Request to disable SAGV timed out\n");
> + } else {
> + if (result == GEN9_SAGV_IS_DISABLED)
> + dev_priv->skl_sagv_enabled = false;
> +
> + ret = result;
> + }
I've found out why SAGV doesn't work. Looking at the doc for BSW after the busy bit is cleared the lower byte contains status.
In my case the status is 01h ILLEGAL_CMD, so I'm guessing SAGV is not supported or forced disabled.