Re: Signal handling in a page fault handler

From: Daniel Vetter
Date: Tue Apr 03 2018 - 11:12:30 EST


On Tue, Apr 03, 2018 at 07:48:29AM -0700, Matthew Wilcox wrote:
> On Tue, Apr 03, 2018 at 03:12:35PM +0200, Thomas Hellstrom wrote:
> > I think the TTM page fault handler originally set the standard for this.
> > First, IMO any critical section that waits for the GPU (like typically the
> > page fault handler does), should be locked at least killable. The need for
> > interruptible locks came from the X server's silken mouse relying on signals
> > for smooth mouse operations: You didn't want the X server to be stuck in the
> > kernel waiting for GPU completion when it should handle the cursor move
> > request.. Now that doesn't seem to be the case anymore but to reiterate
> > Chris' question, why would the signal persist once returned to user-space?
>
> Yeah, you graphics people have had to deal with much more recalcitrant
> hardware than most of the rest of us ... and less reasonable user
> expectations ("My graphics card was doing something and I expected
> everything else to keep going" vs "My hard drive died and my kernel
> paniced, oh well.")
>
> I don't know exactly how the signal code works at the delivery end;
> I'm not sure when TIF_SIGPENDING gets cleared. I just get concerned
> when I see one bit of kernel code doing things in a very complicated
> and careful manner and another bit of kernel code blithely assuming
> that everything's going to be OK.

I think you last line pretty much sums up the proper attitude when writing
gpu drivers:

https://i.imgflip.com/27nm7w.jpg

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch