Re: [GIT PULL] sigqueue cache fix

From: Ingo Molnar
Date: Mon Jun 28 2021 - 01:14:27 EST

Next message: kernel test robot: "net/ceph/messenger_v1.c:1204:5: warning: stack frame size (2880) exceeds limit (2048) in function 'ceph_con_v1_try_read'"
Previous message: Justin He: "RE: [PATCH v2 1/4] fs: introduce helper d_path_unsafe()"
In reply to: Linus Torvalds: "Re: [GIT PULL] sigqueue cache fix"
Next in thread: Ingo Molnar: "Re: [GIT PULL] sigqueue cache fix"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Sun, Jun 27, 2021 at 11:52 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Ok, I may have confused myself looking at all this, but it does all
> > make me think this is dodgy.
>
> I also couldn't convince myself that the memory ordering is correct
> for the _contents_ of the sigqueue entry that had its pointer cached,
> although I suspect that is purely a theoretical concern (certainly a
> non-issue on x86).
>
> So I've reverted the sigqueue cache code, in that I haven't heard
> anything back and I'm not going to delay 5.13 over something small and
> easily undone like this.

I concur that it was the safest to revert this, because it was close to the
final release.

I think the code is safe, but only by accident. The most critical data race
isn't well-documented, unless I missed something.

The most fundamental race we can have is this:

CPU#0

__sigqueue_alloc()

[ holds sighand->siglock ]
[ IRQs off. ]

q = READ_ONCE(t->sigqueue_cache);
if (!q || sigqueue_flags)
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
else
WRITE_ONCE(t->sigqueue_cache, NULL);

CPU#1

__sigqueue_free()

[ IRQs off. ]

if (!READ_ONCE(current->sigqueue_cache))
WRITE_ONCE(current->sigqueue_cache, q);
else
kmem_cache_free(sigqueue_cachep, q);

( Let's assume exit_task_sigqueue_cache() happens while there's no new
signal sending going on, so that angle is safe. )

Someone confusingly, *alloc() is the consumer and *free() is the producer
of the sigqueue_cache.

Here's how I see the 3 fundamental races these two pieces of code may have:

- Producer <-> producer: The producer cannot race with itself, because it
only ever produces into current->sigqueue_cache and has interrupts
disabled. We don't send signals from NMI context.

- Consumer <-> consumer: multiple consumers cannot race with themselves,
because they serialize on sighand->siglock.

- Producer <-> consumer: this is the most interesting race, and I think
it's unsafe in theory, because the producer doesn't make sure that any
previous writes to the actual queue entry (struct sigqueue *q) have
reached storage before the new 'free' entry is advertised to consumers.

So in principle CPU#0 could see a new sigqueue entry and use it, before
it's fully freed.

In *practice* it's probably safe by accident (or by undocumented
intent), because there's an atomic op we have shortly before putting the
queue entry into the sigqueue_cache, in __sigqueue_free():

if (atomic_dec_and_test(&q->user->sigpending))
free_uid(q->user);

And atomic_dec_and_test() implies a full barrier - although I haven't
found the place where we document it and
Documentation/memory-ordering.txt is silent on it. We should probably
fix that too.

At minimum the patch adding the ->sigqueue_cache should include a
well-documented race analysis firmly documenting the implicit barrier after
the atomic_dec_and_test().

Anyway, I agree with the revert.

Thanks,

Ingo

Next message: kernel test robot: "net/ceph/messenger_v1.c:1204:5: warning: stack frame size (2880) exceeds limit (2048) in function 'ceph_con_v1_try_read'"
Previous message: Justin He: "RE: [PATCH v2 1/4] fs: introduce helper d_path_unsafe()"
In reply to: Linus Torvalds: "Re: [GIT PULL] sigqueue cache fix"
Next in thread: Ingo Molnar: "Re: [GIT PULL] sigqueue cache fix"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]