Re: Processes spinning forever, apparently in lock_timer_base()?

From: Matthias Hensler
Date: Thu Aug 09 2007 - 13:38:09 EST


On Thu, Aug 09, 2007 at 09:55:34AM -0700, Andrew Morton wrote:
> On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler <matthias@xxxxxxxx> wrote:
> > On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
> > > On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> > > [...]
> > > I am also willing to try the patch posted by Richard.
> >
> > I want to give some update here:
>
> Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?

So far not, sorry. The problem did not reoccur since I am running the
kernel with that patch now.

> As you're seeing this happening when multiple disks are being written
> to it is possible that the per-device-dirty-threshold patches which
> recently went into -mm (and which appear to have a bug) will fix it.

All affected systems have two devices which are cross mounted, eg. it
has /home on disk 1 and /var/spool/imap on disk 2.

On the respectively device there is the mirror partition (mirror of
/home on disk 2 and mirror of /var/spool/imap on disk 1).

Normally the system hangs when running rsync from /var/spool/imap to its
corresponding mirror partition.

We have around 10 different partitions (all in LVM), but so far the hang
mostly started running rsync on the imapspool (which has by far most of
the files on the system).

_However_: we also had this hang while running a backup to an external
FTP server (so only reading the filesystems apart from the usual system
activity).

And: the third system we had this problem on has no IMAP spool at all.
The hang occured while running a backup together with "yum update".

It might be related that all the systems uses these crossmounts over two
different disks. I wasn't not able to reproduce that on my homesystem
with only one disk, either because the activity-pattern was different,
or it really needs two disks to run into that issue.

> But I worry that the stall appears to persist *forever*. That would indicate
> that we have a dirty-memory accounting leak, or that for some reason the
> system has decided to stop doing writeback to one or more queues (might be
> caused by an error in a lower-level driver's queue congestion state management).

Well, all the time I was still connected to one of that machines it was
possible to clear the situation by killing a lot of processes. That is
mostly killing all httpd, smtpd, imapd, amavisd, crond, spamassassin and
rsync processes. Eventually the system responded normal again (and I
could cleanly reboot it). However the final killed process which
resolved the issue was non deterministic so far.

The workload is different on all the servers:

Server 1 processes around 10-20k mails per day but also servers a lot
of HTTP requests.
Server 2 is only a mailserver processing around 5-10k mails per day.
Server 3 just serves HTTP requests (and a bit DNS).

> If it is the latter, then it could be that running "sync" will clear
> the problem. Temporarily, at least. Because sync will ignore the
> queue congestion state.

I have to admit that I never tried to sync in such a case, since mostly
I had only one open SSH session and tried to find the root cause.

So far the problem occured first on server 2 around march or april
without changes to the machine (we have a Changelog for every server:
there was a single kernel update that time, however we reverted back
after the first issue and run a kernel which was stable for several
weeks before and encountered the problem again). In the beginning we
encountered the problem maybe twice a week, getting worse within the
next weeks. Several major updates (kernel and distribution were made)
without resolving the problem.

Since end of april server 1 began showing the same problem (running a
different Fedora version and kernel at that time), first slowly (once a
week or so) then more regular.

Around july we had a hang nearly every day.

Server 3 had the problem only once now, but that server has no high
workload.

We spent a lot of time investigating the issue, but since all servers
use a different hardware, different setups (beside from the crossmounts
with noatime) and even different base systems (Fedora Core 5+6, Fedora
7, different kernels: 2.6.18, .19, .20, .21 and .22) I think that we can
rule out hardware problems. I think the issue might be there some time
now, but is hit more often now since the workload increased a lot over
the last months.

We ruled a lot out over the month (eg. syslog was replaced, many not so
important services were stopped, schedular was changed), without
changes. Just reverting from "noatime" in the fstab to "default" fixed
it reliable so far.

As said I am still running "vmstat 1" and catting of /proc/meminfo just
in case. If there is anything I can do beside that to clearify the
problem I will try to help.

Regards,
Matthias

Attachment: pgp00000.pgp
Description: PGP signature