Re: [GIT PULL] sigqueue cache fix
From: Ingo Molnar
Date: Mon Jun 28 2021 - 14:47:06 EST
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Jun 27, 2021 at 10:14 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > The most fundamental race we can have is this:
>
> No. It's this (all on the same CPU):
>
> sigqueue_cache_or_free():
>
> if (!READ_ONCE(current->sigqueue_cache))
> <-- Interrupt happens here
> WRITE_ONCE(current->sigqueue_cache, q);
Indeed - I was under the impression that this cannot happen, because
interrupts are disabled - but I was wrong:
__sigqueue_free() is the only user of sigqueue_cache_or_free().
Callers of __sigqueue_free():
- flush_sigqueue():
# flush_signals() is holding the siglock & disables IRQs
# __exit_signal() isn't holding the siglock but has IRQs disabled
# selinux_bprm_committed_creds() is holding the siglock & disables IRQs
- __flush_itimer_signals()
# Its single caller is holding the siglock & disables IRQs
- collect_signal()
# Its single caller is holding the siglock & disables IRQs
- dequeue_synchronous_signal()
# Its single caller is holding the siglock & disables IRQs
- flush_sigqueue_mask():
# All callers are holding the siglock & disable IRQs
- sigqueue_free()
...
Boom, the last one on the list, sigqueue_free(), does __sigqueue_free()
while not holding the siglock and not disabling interrupts. :-/
It does it in various syscall paths in the POSIX timers code through
release_posix_timer(), with interrupts clearly enabled.
> and then the interrupt sends a SIGCONT, which ends up flushing
> previous process control signals, which ends up freeing them, which
> ends up in sigqueue_cache_or_free() again, at which point you have
>
> if (!READ_ONCE(current->sigqueue_cache))
> WRITE_ONCE(current->sigqueue_cache, q);
>
> again.
>
> And both the original and the interrupting one sees a NULL
> current->sigqueue_cache, so both of them will do that WRITE_ONCE(),
> and when the interrupt returns, the original case will overwrite the
> value that the interrupt free'd.
>
> Boom - memory leak.
>
> It does seem to be very small race window, and it's "only" a memory
> leak, but it's a very simple example of how this cache was broken even
> on UP.
Yeah - a clear Producer <-> Producer IRQ preemptability race that can leak
freed sigqueue structures.
Thanks for catching this ...
But even if release_posix_timer() is changed to call sigqueue_free() with
IRQs disabled, or sigqueue_free() disables interrupts itself, I think we
need to be mindful of the Consumer <-> Producer SMP races, which only
appear to be safe due to an accidental barrier by free_uid().
Plus a lockdep_assert_irqs_disabled() would have helped a lot in catching
this sooner.
Thanks,
Ingo