Re: Interactivity issue in 2.6.25-rc3
From: Ray Lee
Date: Thu Feb 28 2008 - 16:29:14 EST
Please keep all CCs. Others that will read lkml later will want to
know what you found out.
On Thu, Feb 28, 2008 at 1:06 PM, Carlos R. Mafra <crmafra2@xxxxxxxxx> wrote:
> On Thu 28.Feb'08 at 11:54:04 -0800, Ray Lee wrote:
> > On Thu, Feb 28, 2008 at 11:18 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
> > >
> > > * Carlos R. Mafra <crmafra2@xxxxxxxxx> wrote:
> > >
> > > > I want to report a bad interactivity which happened in my desktop
> > > > running the latest kernel (2.6.25-rc3-00081-g7704a8b).
> > > >
> > > > I tried to play 'flightgear' and the desktop became _very_ slow while
> > > > flightgear was "loading scenery objects" (a task which never finished
> > > > and I could not play it).
> > > >
> > > > The desktop is a P4 @ 3.0 GHz, 512MB, with the nv graphics driver.
> > >
> > > yes, but this means you run the soft-3D driver under Xorg, right? That
> > > is known to starve its clients. The stats you sent show no worse than a
> > > few tens of msecs delays caused by CPU scheduling itself:
> > >
> > > mrxvt (3730, #threads: 1)
> > > se.wait_max : 18.679129
> > >
>
> Thanks Ingo!
>
> Hm, but I remember that my desktop became unusable. I was experiencing
> latencies _much_ higher than msecs. But I think I have an explanation
> for this low number, if you excuse my attempt to understand this.
>
> Could it be that your debug script generated these numbers while
> fgfs was being killed, and then it felt no more the bad latencies
> fgfs was causing?
>
> This was the first scenario:
>
> 1) Start 'fgfs' as normal user.
>
> 2) Wait a few seconds until fgfs' message "loading scenery objects"
> appeared.
>
> 3) Everything becomes _very_ slow (measured in seconds, not msecs),
> so I notice something is wrong (at least I thought it was).
>
> 4) Killed fgfs with Ctrl-c.
>
> 5) I go to Ingo's page and download his debug scripts.
>
> 6) Preparing for the battle to follow, I run cfs-debug-info-clear.sh
>
> 7) Start 'fgfs' and system becomes slow while loading scenery objects.
>
> 8) I try to reach the mrxvt terminal to run cfs-debug-info.sh. Each
> letter I type takes seconds to appear.
>
> 9) I manage to start cfs-debug-info.sh. I could read the first line after
> a few seconds:
>
> sched info dump (of tasks, modules, hw, dmesg, config, fs):
>
> But I am sure I did not see the
> "gathering statistics for 15 seconds ..."
>
> As that was the first time ever that I used this script, I didn't
> even know what I was supposed to do, but I was waiting for more
> than a minute and nothing happened.
>
> 10) I managed to change tab and kill fgfs with Crtl-c.
>
> 11) I got back to the debug script tab and waited a few seconds
> for it to finish.
>
> 12) That is the debug log which I put in the page mentioned in the
> first email.
>
> I am sorry to especulate about it, but maybe the script got the
> latencies after (or meanwhile) I was killing fgfs.
>
>
> > Also, please try disabling the group scheduler and run the test again.
> > The group scheduler has known bad interactivity issues. Also be on the
> > watch for any abnormally high disk activity, to rule out starvation
> > due to the kernel choosing poor candidates for swap-in/swap-out.
> > (Running a vmstat 1 in a console ahead of time and watching for high
> > IO Wait -- the last column printed -- will give you a good indicator
> > other than the drive LED.)
>
> I have rebooted and tried to repeat the experiment, but I couldn't
> reproduce the bad interactivity I reported earlier. Flightgear
> loaded the scenery (which it did not do before) and the airplane
> appeared in the screen. The game was slow, but I had a very good
> usability of the desktop and I could type things almost normally,
> I could switch desktops etc. Definitely not what happened before!
>
> So I am sorry for the noise, but something bad happened before
> and unfortunately I am not sure I could run Ingo's debug script
> correctly. I'll be more patient if that happens again.
>
> Anyway, I want to thank Ingo and Ray for their replies and
> would like to humbly ask:
>
> Is it an scheduler anomaly if 'se.wait_max' is bigger than
> 40 msecs for _any_ of the processes which appear in the
> debug script log? In other words, is the scheduler
> mathematically build to not allow latencies higher than
> 40 msecs?
Ingo will have to answer the scheduler part of that.
But it's good to keep in mind that the scheduler can't do anything
about slowdowns due to tasks being swapped out or waiting on reads
from the disk. I mention this as shortly after CFS got into mainline,
something changed in the VM that seems to make my system spend a lot
more time in IO wait, causing the system to be much less responsive
than it used to be.
Unfortunately it seems to be dependent upon the history of what tasks
I've run and their memory usage, so it's been hard to come up with a
reproducible test case (well that and a complete lack of time). All I
can say is that I've seen what you've reported as well, though it had
nothing to do with using any 3d applications, just a browser, editor,
gcc, etc.
Ray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/