Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad forfile/email/web servers

From: Rob Mueller
Date: Mon Sep 20 2010 - 19:41:50 EST


I don't think we will ever get the default value for this tunable right.
I would also worry that avoiding the reclaim_mode for file-backed
cache will hurt HPC applications that are dumping their data to disk
and depending on the existing default for zone_reclaim_mode to not
pollute other nodes.

The ideal would be if distribution packages for mail, web servers
and others that are heavily IO orientated would prompt for a change
to the default value of zone_reclaim_mode in sysctl.

I would argue that there's a lot more mail/web/file servers out there than HPC machines. And HPC machines tend to have a team of people to monitor/tweak them. I think it would be much more sane to default this to 0 which works best for most people, and get the HPC people to change it.

However there's still another question, why is this problem happening at all for us? I know almost nothing about NUMA, but from other posts, it sounds like the problem is the memory allocations are all happening on one node? But I don't understand why that would be happening. The machine runs the cyrus IMAP server, which is a classic unix forking server with 1000's of processes. Each process will mmap lots of different files to access them. Why would that all be happening on one node, not spread around?

One thing is that the machine is vastly more IO loaded than CPU loaded, in fact it uses very little CPU at all (a few % usually). Does the kernel prefer to run processes on one particular node if it's available? So if a machine has very little CPU load, every process will generally end up running on the same node?

Rob

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/