Re: [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs

From: Waiman Long
Date: Thu Apr 19 2018 - 15:57:58 EST


On 04/19/2018 03:43 PM, Andrew Morton wrote:
> On Thu, 19 Apr 2018 13:09:29 -0400 Waiman Long <longman@xxxxxxxxxx> wrote:
>
>> It was found that reading /proc/stat could be time consuming on
>> systems with a lot of irqs. For example, reading /proc/stat in a
>> certain 2-socket Skylake server took about 4.6ms because it had over
>> 5k irqs. In that particular case, the majority of the CPU cycles for
>> reading /proc/stat was spent in the kstat_irqs() function. Therefore,
>> application performance can be impacted if the application reads
>> /proc/stat rather frequently.
>>
>> The "intr" line within /proc/stat contains a sum total of all the irqs
>> that have happened followed by a list of irq counts for each individual
>> irq number. In many cases, the first number is good enough. The
>> individual irq counts may not provide that much more information.
>>
>> In order to avoid this kind of performance issue, all these individual
>> irq counts are now separated into a new /proc/stat_irqs file. The
>> sum total irq count will stay in /proc/stat and be duplicated in
>> /proc/stat_irqs. Applications that need to look up individual irq counts
>> will now have to look into /proc/stat_irqs instead of /proc/stat.
>>
> (cc /proc maintainer)
>
> It's a non-backward-compatible change. For something which has
> existing for so long, it would be a mighty task to demonstrate that no
> existing userspace will be disrupted by this change.
>
> So we need to think again. A new interface which omits the per-IRQ
> stats might be acceptable.

OK, how about a new /proc/stat2 file that is the same as /proc/stat
except that it omits per-IRQ stats. BTW, do you have any suggestion for
a better name?

> Or, conceivably, a new /proc knob which disables the per-IRQ stats in
> /proc/stat. That would allow operators to opt in to this disabling and
> would avoid the need to alter
> whatever-application-it-is-that-is-having-trouble. This seems a bit ugly
> though.
>
> Also, the changelog is rather vague. "application performance can be
> impacted". Well, *are* applications impacted? What is the real-world
> performance gain which this change provides, in a real-world workload?

Will clarify that in the next version.

-Longman