Re: [PATCH V2] drm/vkms: Avoid extra discount in the timestamp value

From: Daniel Vetter
Date: Wed Jun 05 2019 - 08:27:02 EST


On Tue, Jun 04, 2019 at 11:45:43PM -0300, Rodrigo Siqueira wrote:
> After the commit def35e7c5926 ("drm/vkms: Bugfix extra vblank frame")
> some of the crc tests started to fail in the vkms with the following
> error:
>
> [drm:drm_crtc_add_crc_entry [drm]] *ERROR* Overflow of CRC buffer,
> userspace reads too slow.
> [drm] failed to queue vkms_crc_work_handle
> ...
>
> The aforementioned commit fixed the extra vblank added by
> `drm_crtc_arm_vblank_event()` which is invoked inside
> `vkms_crtc_atomic_flush()` if the vblank event count was zero, otherwise
> `drm_crtc_send_vblank_event()` is invoked. The fix was implemented in
> `vkms_get_vblank_timestamp()` by subtracting one period from the current
> timestamp, as the code snippet below illustrates:
>
> if (!in_vblank_irq)
> *vblank_time -= output->period_ns;
>
> The above fix works well when `drm_crtc_arm_vblank_event()` is invoked.
> However, it does not properly work when `drm_crtc_send_vblank_event()`
> executes since it subtracts the correct timestamp, which it shouldn't.
> In this case, the `drm_crtc_accurate_vblank_count()` function will
> returns the wrong frame number, which generates the aforementioned
> error. Such decrease in `get_vblank_timestamp()` produce a negative
> number in the following calculation within `drm_update_vblank_count()`:
>
> u64 diff_ns = ktime_to_ns(ktime_sub(t_vblank, vblank->time));
>
> After this operation, the DIV_ROUND_CLOSEST_ULL macro is invoked using
> diff_ns with a negative number, which generates an undefined result;
> therefore, the returned frame is a huge and incorrect number. Finally,
> the code below is part of the `vkms_crc_work_handle()`, note that the
> while loop depends on the returned value from
> `drm_crtc_accurate_vblank_count()` which may cause the loop to take a
> long time to finish in case of huge value.
>
> frame_end = drm_crtc_accurate_vblank_count(crtc);
> while (frame_start <= frame_end)
> drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
>
> This commit fixes this issue by checking if the vblank timestamp
> corresponding to the current software vblank counter is equal to the
> current vblank; if they are equal, it means that
> `drm_crtc_send_vblank_event()` was invoked and vkms does not need to
> discount the extra vblank, otherwise, `drm_crtc_arm_vblank_event()` was
> executed and vkms have to discount the extra vblank. This fix made the
> CRC tests work again whereas keep all tests from kms_flip working as
> well.
>
> V2: Update commit message
>
> Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@xxxxxxxxx>
> Signed-off-by: Shayenne Moura <shayenneluzmoura@xxxxxxxxx>

Thanks a lot for typing up this commit message. I'm still not following
what's going on (this stuff is tricky), but now I think I can at least ask
useful scenarios.

For my understanding: Things go wrong when in the function
vkms_crtc_atomic_flush() the call to drm_crtc_vblank_get() returns 0, and
we call drm_crtc_send_vblank_event() directly? I'm not 100% from your
description above whether that's the failure case where everything blows
up.

The other part I'm having a hard time understanding: If we are in the case
where we call drm_crtc_send_vblank_event() directly, how do we end up in
the vkms_get_vblank_timestamp().

>From what I can see there's not caller to where we sample a new vblank ...
I think a full backtrace that hits the new condition you're adding to
understand when exactly we're hitting it would be perfect.

> ---
> drivers/gpu/drm/vkms/vkms_crtc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> index 7508815fac11..3ce60e66673e 100644
> --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> @@ -74,9 +74,13 @@ bool vkms_get_vblank_timestamp(struct drm_device *dev, unsigned int pipe,
> {
> struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev);
> struct vkms_output *output = &vkmsdev->output;
> + struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>
> *vblank_time = output->vblank_hrtimer.node.expires;
>
> + if (*vblank_time == vblank->time)
> + return true;

Ok, I think with the above I do now have a rough idea what's going wrong.
I think that would be a bug in the drm_vblank.c. Or at least it could be,
I think I first need to better understand still what's going on here to
decided that.

I have a bit a hunch this is fallout from our vblank fudging, but I guess
we'll see.

It's definitely clear that things blow up if the vblank time somehow goes
backwards, but I don't understand yet how that's possible.
-Daniel
> +
> if (!in_vblank_irq)
> *vblank_time -= output->period_ns;
>
> --
> 2.21.0
>



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch