/proc/stat interrupt counter wrap-around
From: Alexei Lozovsky
Date: Fri Sep 10 2021 - 04:53:40 EST
A monitoring dashboard caught my attention when it displayed weird
spikes in computed interrupt rates. A machine with constant network
load shows about ~250k interrupts per second, then every 4-5 hours
there's a one-off spike to 11 billions.
Turns out, if you plot the interrupt counter, you get a graph like this:
###.
### | ####. #######
#####. ##### ## | #########
##### | ## ######
# ####
####
#
###
While monitoring tools are typically used to handling counter
wrap-arounds, they may not be ready to handle dips like this.
What is the impact
------------------
Not much, actually.
The counters always decrement by exactly 2^32 (which is suggestive),
so if you mask out the high bits of the counter and consider only
the low 32 bits, then the value sequence actually make sense,
given an appropriate sampling rate.
However, if you don't mask out the value and assume it to be accurate --
well, that assumption is incorrect. Interrupt sums might look correct
and contain some big number, but it could be arbitrarily distant from
the actual number of interrupts serviced since the boot time.
This concerns only the total value of "intr" and "softirq" rows:
intr 14390913189 32 11 0 0 238 0 0 0 0 0 0 0 88 0 [...]
softirq 14625063745 0 596000256 300149 272619841 0 0 [...]
^^^^^^^^^^^
these ones
Why this happens
----------------
The reason for such behaviour is that the "total" interrupt counters
presented by /proc/stat are actually computed by adding up per-interrupt
per-CPU counters. Most of these are "unsigned int", while some of them
are "unsigned long", and the accumulator is "u64". What a mess...
Individual counters are monotonically increasing (modulo wrapping),
however if you add multiple values with different bit widths then
the sum is *not* guaranteed to be monotonically increasing.
What can be done
----------------
1. Do nothing.
Userspace can trivially compensate for this 'curious' behavior
by masking out the high bits, observing only the low
sizeof(unsigned) part, and taking care to handle wrap-arounds.
This maintains status quo, but the "issue" of interrupt sums
not being quite accurate remains.
2. Change the presentation type to the lowest denominator.
That is, unsigned int. Make the kernel mask out not-quite-accurate
bits from the value it reports. Keep it that way until every
underlying counter type is changed to something wider.
The benefit here is that users that *are* ready to handle proper
wrap-arounds will be able to handle them automagically without
undocumented hacks (see option 1).
This changes the observed value and will cause "unexpected"
wrap-arounds to happen earlier in some use-cases, which might
upset users that are not ready to handle them, or don't want
to poll /proc/stat more frequently.
It's debatable what's better: a lower-width value that might
need to be polled more often, or a wider-width value that is
not completely accurate.
3. Change the interrupt counter types to be wider.
A different take on the issue: instead of narrowing the presentation
from faux-u64 to unsigned it, widen the interrupt counters from
unsigned int to... something else:
- u64 interrupt counters are 64-bit everywhere, period
- unsigned long interrupt counters are 64-bit if the platform
thinks that "long" is longer than "int"
Whatever the type is used, it must be the same for all interrupt
counters across the kernel as well as the type used to compute
and display the sum of all these counters by /proc/stat.
The advantage here is that 64-bit counters will be probably enough
for *anything* to not overflow anytime soon before the heat death
of the universe, thus making the wrap-around problem irrelevant.
The disadvantage here is that some hardware counters are 32-bit,
and you can't make them wider. Some platforms also don't have
proper atomic support for 64-bit integers, making wider counters
problematic to implement efficiently.
So what do we do?
-----------------
I suggest to wrap interrupt counter sum at "unsigned int", the same
type used for (most) individual counters. That makes for the most
predictable behavior.
I have a patch set cooking that does this.
Will this be of any interest? Or do you think changing the behavior
of /proc/stat will cause more trouble than merit?
Prior discussion
----------------
This question is by no means new, it has been discussed several times:
2019 - genirq, proc: Speedup /proc/stat interrupt statistics
The issue of overflow and wrap-around has been touched upon,
suggesting that userspace should just deal with it. The issue of
using u64 for the sum has been brought up too, but it did not
go anywhere.
https://lore.kernel.org/all/20190208143255.9dec696b15f03bf00f4c60c2@xxxxxxxxxxxxxxxxxxxx/
https://lore.kernel.org/all/3460540b50784dca813a57ddbbd41656@xxxxxxxxxxxxxxxx/
2014 - Why do we still have 32 bit counters? Interrupt counters overflow within 50 days
Discussion on whether it's appropriate to bump counter width to
64 bits in order to avoid the overflow issues entirely.
https://lore.kernel.org/lkml/alpine.DEB.2.11.1410030435260.8324@xxxxxxxxxx/