Re: writeout stalls in current -git

From: Fengguang Wu
Date: Fri Nov 02 2007 - 03:52:42 EST


On Fri, Nov 02, 2007 at 08:42:05AM +0100, Torsten Kaiser wrote:
> The Subject is still missleading, I'm using 2.6.23-mm1.
>
> On 11/2/07, Fengguang Wu <wfg@xxxxxxxxxxxxxxxx> wrote:
> > On Thu, Nov 01, 2007 at 07:20:51PM +0100, Torsten Kaiser wrote:
> > > On 11/1/07, Fengguang Wu <wfg@xxxxxxxxxxxxxxxx> wrote:
> > > > On Wed, Oct 31, 2007 at 04:22:10PM +0100, Torsten Kaiser wrote:
> > > > > Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts.
> > > > > Each time I noticed this I was using emerge (package util from the
> > > > > gentoo distribution) to install/upgrade a package. The last step,
> > > > > where this hang occurred, is moving the prepared files from a tmpfs
> > > > > partion to the main xfs filesystem.
> > > > > The hangs where not fatal, after a few second everything resumed
> > > > > normal, so I was not able to capture a good image of what was
> > > > > happening.
> > > >
> > > > Thank you for the detailed report.
> > > >
> > > > How severe was the hangs? Only writeouts stalled, all apps stalled, or
> > > > cannot type and run new commands?
> > >
> > > Only writeout stalled. The emerge that was moving the files hung, but
> > > everything else worked normaly.
> > > I was able to run new commands, like coping the /proc/meminfo.
> >
> > But you mentioned in the next mail that `watch cat /proc/meminfo`
> > could also be blocked for some time - I guess in the same time emerge
> > was stalled?
>
> The behavior was different on these stalls.
> On first report the writeout stopped completly, the emerge stopped,
> but at that time a cat /proc/meminfo >~/stall/meminfo did succedd and
> not stall.
> About the watch cat /proc/meminfo, I will write in the answer to the
> other mail...

OK.

> > > [snip]
> > > > > After this SysRq+W writeback resumed again. Possible that writing
> > > > > above into the syslog triggered that.
> > > >
> > > > Maybe. Are the log files on another disk/partition?
> > >
> > > No, everything was going to /
> > >
> > > What might be interesting is, that doing cat /proc/meminfo
> > > >~/stall/meminfo did not resume the writeback. So there might some
> > > threshold that only was broken with the additional write from
> > > syslog-ng. Or syslog-ng does some flushing, I dont now. (I'm using the
> >
> > Have you tried explicit `sync`? ;-)
>
> No. I wanted to see what is stalled. So I startet by collecting info
> from /proc and then the SysRq+W. And after hitting SysRQ the writeout
> started to resume without any further action.
>
> But I think I have seen a `sync` stall also. During an other emerge I
> noticed the system slowing down and wanted to use `sync` to speed up
> the writeout. The result was, that the writeout did not speed up
> imiedetly only after around a minitue. The `sync` only returned at
> that time.
> Can writers starve `sync`?

I guess the new debug printks will provide more hints on it.

> > > syslog-ng package from gentoo:
> > > http://www.balabit.com/products/syslog_ng/ , version 2.0.5)
> > >
> > > > > The source tmpfs is mounted with any special parameters, but the
> > > > > target xfs filesystem resides on a dm-crypt device that is on top a 3
> > > > > disk RAID5 md.
> > > > > During the hang all CPUs where idle.
> > > >
> > > > No iowaits? ;-)
> > >
> > > No, I have a KSysGuard in my taskbar that showed no activity at all.
> > >
> > > OK, the subject does not match for my case, but there was also a tmpfs
> > > involved. And I found no thread with stalls on xfs. :-)
> >
> > Do you mean it is actually related with tmpfs?
>
> I don't know. It's just that I have seen tmpfs also redirtieing inodes
> in these logs and the stalling emerge is moving files from tmpfs to
> xfs.
> It could be, but I don't know enough about tmpfs internals to really be sure.
> I just wanted to mention, that tmpfs is involved somehow.

The requeue messages for tmpfs are not pleasant, but known to be fine ;-)

Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/