Re: [GIT PULL] sigqueue cache fix

From: Linus Torvalds
Date: Thu Jun 24 2021 - 12:30:17 EST


On Thu, Jun 24, 2021 at 12:13 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> Fix a memory leak in the recently introduced sigqueue cache.

Side note: I detest the READ_ONCE/WRITE_ONCE games for the sigqueue
cache thing. I also think it's actively buggy.

The sigqueue cache is only safe when it's serialized by the sighand
lock, so all those games with "ONCE" accesses are entirely bogus and
misleading.

It looks like this is due to __sigqueue_alloc() being called without
the lock held (in which case sigqueue_flags will be SIGQUEUE_PREALLOC)
and then it does this:

q = READ_ONCE(t->sigqueue_cache);
if (!q || sigqueue_flags)
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);

where the point is that it reads that sigqueue_cache thing even if
sigqueue_flags is non-zero (no lock), so presumably KCSAN complained
about it.

However, hiding it from KCSAN (instead of just test sigqueue_flags
*first* - which admittedly would presumably need some kind of
serialization) is not only ugly, I think it's hiding the *real* bug,
which is

release_posix_timer ->
sigqueue_free ->
__sigqueue_free ->
sigqueue_cache_or_free

where that sigqueue_free() thing explicitly calls __sigqueue_free()
*outside* the sighand lock. End result: I don't think that cache is
serialized at all.

So I think the sigqueue cache is still potentially quite buggy, and I
think the bug is hidden by the READ/WRITE_ONCE games that are
misleading and not actually valid.

But maybe there is something I'm missing that makes that
release_posix_timer case ok. Or maybe I'm misunderstanding that code
entirely. But it does look bad to me. And the ONCE accesses look
positively wrong in every single possible way: they aren't logical,
and they are actively hindering KCSAN rather than fixing anything at
all.

Linus