Re: [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu()

From: Frederic Weisbecker
Date: Mon Jul 23 2018 - 09:15:31 EST


On Fri, Jul 20, 2018 at 07:24:00PM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jul 2018, Yury Norov wrote:
> > While here. I just wonder, on my system IRQs are sent to nohz_full CPUs
> > at every incoming ssh connection. The trace is like this:
> > [ 206.835533] Call trace:
> > [ 206.848411] [<ffff00000889f984>] dump_stack+0x84/0xa8
> > [ 206.853455] [<ffff0000081ea308>] _task_isolation_remote+0x130/0x140
> > [ 206.859714] [<ffff0000081bf5ec>] irq_work_queue_on+0xcc/0xfc
> > [ 206.865365] [<ffff0000081478ac>] tick_nohz_full_kick_cpu+0x88/0x94
> > [ 206.871536] [<ffff000008147930>] tick_nohz_dep_set_all+0x78/0xa8
> > [ 206.877533] [<ffff000008147b58>] tick_nohz_dep_set_signal+0x28/0x34
> > [ 206.883792] [<ffff0000081421fc>] set_process_cpu_timer+0xd0/0x128
> > [ 206.889876] [<ffff0000081422ac>] update_rlimit_cpu+0x58/0x7c
> > [ 206.895528] [<ffff0000083aa3d0>] selinux_bprm_committing_creds+0x180/0x1fc
> > [ 206.902394] [<ffff00000839e394>] security_bprm_committing_creds+0x40/0x5c
> > [ 206.909173] [<ffff00000828c4a0>] install_exec_creds+0x20/0x6c
> > [ 206.914911] [<ffff0000082e15b0>] load_elf_binary+0x368/0xbb8
> > [ 206.920561] [<ffff00000828d09c>] search_binary_handler+0xb8/0x224
> > [ 206.926645] [<ffff00000828d99c>] do_execveat_common+0x44c/0x5f0
> > [ 206.932555] [<ffff00000828db78>] do_execve+0x38/0x44
> > [ 206.937510] [<ffff00000828dd74>] SyS_execve+0x34/0x44
> >
> > I suspect that scp, ssh tunneling and similar network activities will source
> > ticks on nohz_full CPUs as well. On high-loaded server it may generate
> > significant interrupt traffic on nohz_full CPUs. Is it desirable behavior?
>
> Supsicions and desirable are not really technical interesting aspects.
>
> Just from looking at the stack trace it's obvious that there is a CPU TIME
> rlimit on that newly spawned sshd. That's nothing what the kernel
> imposes. That's what user space sets.
>
> Now the actual mechanism which does that, i.e. set_process_cpu_timer() ends
> up IPI'ing _ALL_ nohz full CPUs for no real good reason. In the exec path
> this is really pointless because the new process is not running yet and it
> is single threaded. So forcing a IPI to all cpus is pretty pointless.
>
> In fact the state of the task/process for which update_rlimit_cpu(() is
> called is known, so the IPI can really be either avoided completely or
> restricted to the CPUs on which this process can run or actually runs.
>
> Fredric?

Indeed, so far the tick dependency code is lazy and IPIs everywhere when we
add either a thread or a process timer.

We want to make sure that any thread target, running somewhere without a tick,
sees the new tick dependency.

So in the case a single thread, I can easily fix that and IPI the CPU it's running
in if any. In the case of a thread group, I'm concerned about the performance
penalty to walk through each of them and IPI only those running. But probably we'll
have to come into that in the end.

Thanks.