Problem: scaling of /proc/stat on large systems

From: Jack Steiner
Date: Wed Sep 29 2010 - 13:22:22 EST


I'm looking for suggestions on how to fix a scaling problem with access to
/proc/stat.

On a large x86_64 system (4096p, 256 nodes, 5530 IRQs), access to
/proc/stat takes too long - more than 12 sec:

# time cat /proc/stat >/dev/null
real 12.630s
user 0.000s
sys 12.629s

This affects top, ps (some variants), w, glibc (sysconf) and much more.


One of the items reported in /proc/stat is a total count of interrupts that
have been received. This calculation requires summation of the interrupts
received on each cpu (kstat_irqs_cpu()).

The data is kept in per-cpu arrays linked to each irq_desc. On a
4096p/5530IRQ system summing this data requires accessing ~90MB.


Deleting the summation of the kstat_irqs_cpu data eliminates the high
access time but is an API breakage that I assume is unacceptible.

Another possibility would be using delayed work (similar to vmstat_update)
that periodically sums the data into a single array. The disadvantage in
this approach is that there would be a delay between receipt of an
interrupt & it's count appearing /proc/stat. Is this an issue for anyone?
Another disadvantage is that it adds to the overall "noise" introduced by
kernel threads.

Is there a better approach to take?


--- jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/