Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater

From: Takashi Iwai
Date: Thu Oct 20 2016 - 10:35:38 EST


On Thu, 20 Oct 2016 16:17:25 +0200,
Ville SyrjÃlà wrote:
>
> On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote:
> > On Thu, 20 Oct 2016 15:28:14 +0200,
> > Ville SyrjÃlà wrote:
> > >
> > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote:
> > > > Since 4.7 kernel, we've seen the error messages like
> > > >
> > > > kernel: [TTM] Buffer eviction failed
> > > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001)
> > > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
> > > >
> > > > on QXL when switching and accessing on VT. The culprit was the generic
> > > > deferred_io code (qxl driver switched to it since 4.7). There is a
> > > > race between the dirty clip update and the call of callback.
> > > >
> > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock,
> > > > while it kicks off the update worker outside the spinlock. Meanwhile
> > > > the update worker clears the dirty clip in the spinlock, too. Thus,
> > > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is
> > > > called after the clip is cleared in the first worker call.
> > >
> > > Why does that matter? The first worker should have done all the
> > > necessary work already, no?
> >
> > Before the first call, it clears the clip and passes the copied clip
> > to the callback. Then the second call will be with the cleared and
> > untouched clip, i.e. with x1=~0. This confuses
> > qxl_framebuffer_dirty().
> >
> > Of course, we can filter out in the callback side by checking the
> > clip. It was actually my first version. But basically it's a race
> > and should be covered better in the caller side.
>
> The race is still there AFAICS. The worker may already be executing but
> not yet in the critical section, at which point drm_fb_helper_dirty()
> will expand the dirty rectangle, and schedule another work. So the first
> worker will already see the expanded rectangle, and second worker will
> get zilch.

Hrm, right, there's a slight race window there.

> I think the only good fix is to have the worker validate the dirty
> rectangle before calling the driver.

OK, let me cook it quickly. (It was actually the second version of
the patch I wrote, and I sent the third one :)


thanks,

Takashi