Re: [BUG] Race between policy reload sidtab conversion and live conversion

From: Paul Moore
Date: Sun Feb 28 2021 - 14:22:44 EST


On Fri, Feb 26, 2021 at 6:12 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> On Fri, Feb 26, 2021 at 2:07 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > > After the switch to RCU, we now have:
> > > 1. Start live conversion of new entries.
> > > 2. Convert existing entries.
> > > 3. RCU-assign the new policy pointer to selinux_state.
> > > [!!! Now actually both old and new sidtab may be referenced by
> > > readers, since there is no synchronization barrier previously provided
> > > by the write lock.]
> > > 4. Wait for synchronize_rcu() to return.
> > > 5. Now only the new sidtab is visible to readers, so the old one can
> > > be destroyed.
> > >
> > > So the race can happen between 3. and 5., if one thread already sees
> > > the new sidtab and adds a new entry there, and a second thread still
> > > has the reference to the old sidtab and also tires to add a new entry;
> > > live-converting to the new sidtab, which it doesn't expect to change
> > > by itself. Unfortunately I failed to realize this when reviewing the
> > > patch :/
> >
> > It is possible I'm not fully understanding the problem and/or missing
> > an important detail - it is rather tricky code, and RCU can be very
> > hard to reason at times - but I think we may be able to solve this
> > with some lock fixes inside sidtab_context_to_sid(). Let me try to
> > explain to see if we are on the same page here ...
> >
> > The problem is when we have two (or more) threads trying to
> > add/convert the same context into a sid; the task with new_sidtab is
> > looking to add a new sidtab entry, while the task with old_sidtab is
> > looking to convert an entry in old_sidtab into a new entry in
> > new_sidtab. Boom.
> >
> > Looking at the code in sidtab_context_to_sid(), when we have two
> > sidtabs that are currently active (old_sidtab->convert pointer is
> > valid) and a task with old_sidtab attempts to add a new entry to both
> > sidtabs it first adds it to the old sidtab then it also adds it to the
> > new sidtab. I believe the problem is that in this case while the task
> > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which
> > allows it to race with tasks that already see only new_sidtab. I
> > think adding code to sidtab_context_to_sid() which grabs the
> > new_sidtab->lock when adding entries to the new_sidtab *should* solve
> > the problem.
> >
> > Did I miss something important? ;)
>
> Sadly, yes :) Consider this scenario (assuming we fix the locking at
> sidtab level):
>
> If it happens that a new SID (x) is added via the new sidtab and then
> another one (y) via the old sidtab, to avoid clash of SIDs, we would
> need to leave a "hole" in the old sidtab for SID x. And this will
> cause trouble if the thread that has just added SID y, then tries to
> translate the context string corresponding to SID x (without re-taking
> the RCU read lock and refreshing the policy pointer). Even if we
> handle skipping the "holes" in the old sidtab safely, the translation
> would then end up adding a duplicate SID entry for the context already
> represented by SID x - which is not a state we want to end up in.

Ah, yes, you're right. I was only thinking about the problem of
adding an entry to the old sidtab, and not the (much more likely case)
of an entry being added to the new sidtab. Bummer.

Thinking aloud for a moment - what if we simply refused to add new
sidtab entries if the task's sidtab pointer is "old"? Common sense
would tell us that this scenario should be very rare at present, and I
believe the testing mentioned in this thread adds some weight to that
claim. After all, this only affects tasks which entered into their
RCU protected session prior to the policy load RCU sync *AND* are
attempting to add a new entry to the sidtab. That *has* to be a
really low percentage, especially on a system that has been up and
running for some time. My gut feeling is this should be safe as well;
all of the calling code should have the necessary error handling in
place as there are plenty of reasons why we could normally fail to add
an entry to the sidtab; memory allocation failures being the most
obvious failure point I would suspect. This obvious downside to such
an approach is that those operations which do meet this criteria would
fail - and we should likely emit an error in this case - but is this
failure really worse than any other transient kernel failure, and is
attempting to mitigate this failure worth abandoning the RCU approach
for the sidtab?

--
paul moore
www.paul-moore.com