Re: Question about kill a process group

From: Eric W. Biederman
Date: Wed May 11 2022 - 14:33:46 EST


Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:

> On Thu, Apr 21 2022 at 11:12, Eric W. Biederman wrote:
>> Zhang Qiao <zhangqiao22@xxxxxxxxxx> writes:
>>>> How many children are being created in this test? Several million?
>>>
>>> There are about 300,000+ processes.
>>
>> Not as many as I was guessing, but still enough to cause a huge
>> wait on locks.
>
> Indeed. It's about 4-5us per process to send the signal on a 2GHz
> SKL-X. So with 20000k processes tasklist lock is read held for 1 second.
>
>> I do agree over 1 second for holding a spin lock is ridiculous and a
>> denial of service attack.
>
> Exactly. Even holding it for 100ms (20k forks) is daft.
>
> So unless the number of PIDs for a user is limited this _is_ an
> unpriviledged DoS vector.

After having slept on this a bit it finally occurred to me the
semi-obvious solution to this issue is to convert tasklist_lock
from a rw-spinlock to rw-semaphore. The challenge is finding
the users (tty layer?) that generate signals from interrupt
context and redirect that signal generation.

Once signals holding tasklist_lock are no longer generated from
interrupt context irqs no longer need to be disabled and
after verifying tasklist_lock isn't held under any other spinlocks
it can be converted to a semaphore.

It won't help the signal delivery times, but it should reduce
the effect on the rest of the system, and prevent watchdogs from
firing.

I don't know if I have time to do any of that now, but it does seem a
reasonable direction to move the code in.

Eric