Re: [PATCH v3 1/2] x86, msr: allow rdmsr_safe_on_cpu() to schedule

From: Eric Dumazet
Date: Sat Mar 24 2018 - 10:29:56 EST




On 03/24/2018 01:09 AM, Ingo Molnar wrote:
>
> * Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
>> I noticed high latencies caused by a daemon periodically reading
>> various MSR on all cpus. KASAN kernels would see ~10ms latencies
>> simply reading one MSR. Even without KASAN, sending IPI to CPU
>> in deep sleep state or blocking hard IRQ in a a long section,
>> then waiting for the answer can consume hundreds of usec.
>>
>> Converts rdmsr_safe_on_cpu() to use a completion instead
>> of busy polling.
>>
>> Overall daemon cpu usage was reduced by 35 %,
>> and latencies caused by msr_read() disappeared.
>
> What "daemon" is this and why is it reading MSRs?

It is named gsysd, "Google System Tool", a daemon+cli that is run
on all machines in production to provide a generic interface
for interacting with the system hardware.

I am not sure if this answers your question, I probably
could give a rough estimation of MWh this daemon consumes on the planet
if that helps.

Note that the source of the problem is not reading the MSR, but having cpus
blocking hard irqs for a long time.

Ingo, it looks like any loop protected by unlock_task_sighand() might be the main
offender.

Application writers seem to love getrusage() for example.
Can we rewrite it to not block hard irqs ?

Thanks !