Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)
From: David Weinehall
Date: Wed Feb 01 2017 - 08:20:39 EST
On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote:
> Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> > Things seem to be calming down a bit, and everything looks nominal.
> >
> > There's only been about 250 changes (not counting merges) in the last
> > week, and the diffstat touches less than 300 files (with drivers and
> > architecture updates being the bulk, but there's tooling, networking
> > and filesystems in there too).
> >
> > So keep testing, and I think we'll have a regular release schedule.
>
> Testing this is no fun:
>
> Bug 99533 - black screen after switching session
> https://bugs.freedesktop.org/99533
>
>
> This after GPU hang/lockups with Kernel 4.9 reported as for example:
>
> Bug 98922 - [snb] GPU hang on PlaneShift
> https://bugs.freedesktop.org/98922
>
> Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
>
>
> I am back at kernel 4.8.15 as I need this machine for production work.
>
> Sometimes I wish for a microkernel that might be able to reincarnate drivers
> that hang or do wierd things like that. That may at least give a way to
> actually do some debugging or even get the desktop session back without
> loosing its state. Especially for graphics drivers and hibernating/resuming
> from hibernations which also occasionally fails â again without leaving a way
> to interact with the machine to do further debugging. Linux kernel usually
> just crashes completely, not even a ping or ssh possible, or it at least stuck
> with a black display without any way to restart the graphics driver cause it
> seems to be in some undefined state. Combined with occasionally happening bugs
> this makes triaging bugs time consuming and risky. I do like to help testing,
> but maybe its time to just switch to distro kernels and be done about it, as I
> regularily come across bugs that are too expensive for me to triage.
>
> Please understand that I am not willing to bisect these occasionally happening
> bugs with have the potential to cause data loss due to having to switch off
> the machine forcefully. Fortunately at least KMail saves a mail I write from
> time to time and also Kate does swap files.
>
> I am also a bit unwilling to do further debugging of this one as I usually use
> two sessions when I am at work and I risk loosing data I work on. Butâ at
> least with this issue it seems I would have a way to SSH into the machine
> before kicking it.
>
>
> I am dissatisfied with the state of the Intel graphics driver on this ThinkPad
> T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel
> really test things with older hardware versions.
Yes, we do. But for practical reasons we can only do testing for things
that we actually have testcases for, and obviously we don't have the
manpower to actually do *manual* testing on every platform, so issues
for older platforms that are only triggered by manual interaction tend
to slip under the radar.
We have a testfarm that tests every nightly build on all platforms we
have test machines for. The testcases are publicly available here:
https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
Obviously most of our manpower is spent on development and testing for current
and future platforms, so for issues that involve older platforms,
especially something as old as Sandybridge (which is, by now, 6 years old)
we are happy for help with testing and bisection.
If the issues are specific to certain subsets of a platform it obviously
gets even more complex; it'd be a combinatorial nightmare to build a
testfarm that could test every variation of every platform.
If I got the count right the i915 driver supports around a hundred
different varieties of Intel graphics; combine that with the number of
different displays people connect, the number of eDP display that the
vendors connect, the different BIOSes that vendors use, etc., and I
think you'll begin to see what we're combating) -- to make things even
more complex you can connect several displays to each graphics card
(possibly via adapters), displays that don't always meet the standards
that they claim to meet. Due to limited room we are also a bit limited
when it comes to testing with multi-monitor setups.
This is why any help is welcome and sometimes even necessary. If you're
afraid of dataloss, be aware that it's possible to boot your system with
file systems mounted read-only; you could also boot from a USB-stick or
similar.
If you can find a testcase in i-g-t that easily reproduces the issue
that'd also be very helpful. Do note that not all testcases in i-g-t
are run as part of our nightly tests, since some of them are *extremely*
time consuming; the full combinatorial testcase, for instance, can
take weeks or months--I haven't done a full run recently--to complete.
I hope this helps you understand why bugs can slip under the radar,
and why a bisect is so important.
Kind regards, David Weinehall