Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework

From: Michal Necasek
Date: Fri Oct 21 2016 - 15:33:43 EST



Alan,

I'll get back to you on whether increasing the timeout helps, it'll take a bit of testing. Does it actually sound plausible that some controllers could not get things done in 250ms but could in 275ms?

I will note that according to the table in <1>, a 250ms timeout with a HZ=250 kernel (what Ubuntu uses) falls into the 4ms granularity bucket, but a 275ms timeout goes into the 32ms granularity bucket. That could change things.

And as a bit of background... the 250ms timeout should not be a problem in a virtualized environment under normal conditions. A ~10ms timeout is. What makes things harder is that the watchdog routine reads the frame counter from the HCCA, which has to be updated asynchronously. Reading from a HC register would actually be more expensive but much more reliable in this case. But again, 250ms should be plenty...

Another factor is that the OHCI watchdog just kills the driver the first time there's a problem, there's no recovery attempt. So it's very noticeable when this happens.


Regards,
Michal


<1> http://lxr.free-electrons.com/source/kernel/time/timer.c

----- Original Message -----
From: stern@xxxxxxxxxxxxxxxxxxx
To: michael.thayer@xxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx, tglx@xxxxxxxxxxxxx, michal.necasek@xxxxxxxxxx, knut.osmundsen@xxxxxxxxxx
Sent: Friday, October 21, 2016 6:54:13 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna
Subject: Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework

On Fri, 21 Oct 2016, Michael Thayer wrote:

> Hello Alan (LKML on CC),
>
> Contacting you about this on Thomas Gleixner's (also on CC) suggestion.
> The short summary is that when Linux 4.8.0 (also tested with a few later
> kernels) is run on a VirtualBox virtual machine with USB enabled, OHCI
> fails with the log messages "frame counter not updated; disabled" and
> "HC died; cleaning up". This seems to be due to the 250 ms interval
> watchdog running with far too short intervals, which we think is a
> consequence of the timer wheel code rework. I will refer you to a bug
> filed in Launchpad<1> for a longer description.
>
> Hope this is of interest to you.
>
> Regards,
>
> Michael
>
> <1> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1634737

That bug description says the watchdog timer routine can be called
twice in a 4-ms period, even though it requests a 250-ms delay. Is
this really true? If it is, it sounds like a real bug in the timer
core.

Bryan Paluch reported a similar problem and said that increasing the
timeout value to 275 ms fixed it:

http://marc.info/?l=linux-usb&m=147670889009451&w=2

Does that patch also fix the "frame counter not updating" problem?

Alan Stern