Re: [PATCH v5 1/1] drm/doc: Document DRM device reset expectations
From: Marek Olšák
Date: Wed Aug 09 2023 - 15:23:38 EST
On Wed, Aug 9, 2023 at 3:35 AM Michel Dänzer <michel.daenzer@xxxxxxxxxxx> wrote:
>
> On 8/8/23 19:03, Marek Olšák wrote:
> > It's the same situation as SIGSEGV. A process can catch the signal,
> > but if it doesn't, it gets killed. GL and Vulkan APIs give you a way
> > to catch the GPU error and prevent the process termination. If you
> > don't use the API, you'll get undefined behavior, which means anything
> > can happen, including process termination.
>
> Got a spec reference for that?
>
> I know the spec allows process termination in response to e.g. out of bounds buffer access by the application (which corresponds to SIGSEGV). There are other causes for GPU hangs though, e.g. driver bugs. The ARB_robustness spec says:
>
> If the reset notification behavior is NO_RESET_NOTIFICATION_ARB,
> then the implementation will never deliver notification of reset
> events, and GetGraphicsResetStatusARB will always return
> NO_ERROR[fn1].
> [fn1: In this case it is recommended that implementations should
> not allow loss of context state no matter what events occur.
> However, this is only a recommendation, and cannot be relied
> upon by applications.]
>
> No mention of process termination, that rather sounds to me like the GL implementation should do its best to keep the application running.
It basically says that we can do anything.
A frozen window or flipping between 2 random frames can't be described
as "keeping the application running". That's the worst user
experience. I will not accept it.
A window system can force-enable robustness for its non-robust apps
and control that. That's the best possible user experience and it's
achievable everywhere. Everything else doesn't matter.
Marek
Marek