On Fri, 2024-03-01 at 09:38 +0100, Johannes Berg wrote:
> On Wed, 2024-02-28 at 17:56 +0000, Souza, Jose wrote:
> > >
> > > In my opinion, the timeout should depend on the type of device driver.
> > >
> > > In the case of server-class Ethernet cards, where corporate users automate most tasks, five minutes might even be considered excessive.
> > >
> > > For our case, GPUs, users might experience minor glitches and only search for what happened after finishing their current task (writing an email,
> > > ending a gaming match, watching a YouTube video, etc.).
> > > If they land on https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html or the future Xe version of that page, following the
> > > instructions alone may take inexperienced Linux users more than five minutes.
>
> That's all not wrong, but I don't see why you wouldn't automate this
> even on end user machines? I feel you're boxing the problem in by
> wanting to solve it entirely in the kernel?
The other part of the stack that we provide are the libraries implementing Vulkan and OpenGL APIs, I don't think we could ship scripts that needs
elevated privileges to read and store coredump.
>
> > > I have set the timeout to one hour in the Xe driver, but this could increase if we start receiving user complaints.
>
> At an hour now, people will probably start arguing that "indefinitely"
> is about right? But at that point you're probably back to persisting
> them on disk anyway? Or maybe glitches happen during logout/shutdown ...
i915 driver don't use coredump and it persist the error dump in memory until user frees it or reboot it and we got no complains.
>
> Anyway, I don't want to block this because I just don't care enough
> about how you do things, but I think the kernel is the wrong place to
> solve this problem... The intent here was to give some userspace time to
> grab it (and yes for that 5 minutes is already way too long), not the
> users. That's also part of the reason we only hold on to a single
> instance, since I didn't want it to keep consuming more and more memory
> for it if happens repeatedly.
>
okay so will move forward with other version applying your suggestion to make dev_coredumpm() static inline and move to the header.
thank you for the feedback
> johannes