Re: [REGRESSION] Xorg doesn't like 4e8b14526 "time: Improve sanitychecking of timekeeping inputs"

From: John Stultz
Date: Fri Aug 31 2012 - 13:43:43 EST


This is a multi-part message in MIME format.On 08/30/2012 09:05 PM, Andreas Bombe wrote:
I have recently started to get problems with X simply shutting itself
down and returning to the login screen. In the X logs I find:

[ 1492.936]
Fatal server error:
[ 1492.936] WaitForSomething(): select: Invalid argument
No messages whatsoever is found in the kernel logs. This error happens
randomly without any correlation to user input, but with a high
likelihood (within a few minutes at most) when a video is playing. It
doesn't matter if the video is in Flash in a browser window or in a
video player playing a local file.

With that somewhat easy test I bisected it down to 4e8b14526 "time:
Improve sanity checking of timekeeping inputs". The latest Linus git
(155e36d40) with a revert of the bisected commit does not show the
problem.

Video is Radeon HD 6950 with open source drivers. Xorg version is the
one currently in Debian unstable (xserver-xorg-core: 2:1.12.3.902-1,
xserver-xorg-video-radeon: 1:6.14.4-5, libdrm: 2.4.33-3).

Thanks so much for bisecting this down!
I'm guessing X is passing crazy large timespecs into select (via WaitForSomething()) values that are catching on the ktime_t overflow check in timespec_valid(). Previously these would be clamped to KTIME_MAX (which basically is infinity) in the timer subsystem before.

So the issue is the patch in question is too strict in its validation. We want to be strict on things like timekeeping inputs, but for timers wait to infinity is still valid.

The attached (sorry not inline, on the road) patch should fix this, but could you verify it? (I'm running my testing concurrently)

Linus: The issue the patch in question addresses has only been reported from trinity stress testing and a system with a crazy CMOS clock value, so I'm ok with the revert if you'd prefer that.

thanks
-john