Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier

From: Eric W. Biederman
Date: Thu Jan 04 2018 - 09:54:59 EST


Dan Williams <dan.j.williams@xxxxxxxxx> writes:

> On Wed, Jan 3, 2018 at 9:01 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>> "Williams, Dan J" <dan.j.williams@xxxxxxxxx> writes:
>>
>>
>>
>>> Note that these are "a human looked at static analysis reports and
>>> could not rationalize that these are false positives". Specific domain
>>> knowledge about these paths may find that some of them are indeed false
>>> positives.
>>>
>>> The change to m_start in kernel/user_namespace.c is interesting because
>>> that's an example where the nospec_load() approach by itself would need
>>> to barrier speculation twice whereas if_nospec can do it once for the
>>> whole block.
>>
>>
>> This user_namespace.c change is very convoluted for what it is trying to
>> do.
>
> Sorry this was my rebase on top of commit d5e7b3c5f51f "userns: Don't
> read extents twice in m_start" the original change from Elena was
> simpler. Part of the complexity arises from converting the common
> kernel pattern of
>
> if (<invalid condition>)
> return NULL;
> do_stuff;
>
> ...to:
>
> if (<valid conidtion>) {
> barrier();
> do_stuff;
> }
>
>> It simplifies to a one liner that just adds osb() after pos >=
>> extents. AKA:
>>
>> if (pos >= extents)
>> return NULL;
>> + osb();
>>
>> Is the intent to hide which branch branch we take based on extents,
>> after the pos check?
>
> The intent is to prevent speculative execution from triggering any
> reads when 'pos' is invalid.

If that is the intent I think the patch you posted is woefully
inadequate. We have many many more seq files in proc than just
/proc/<pid>/uid_map.

>> I suspect this implies that using a user namespace and a crafted uid
>> map you can hit this in stat, on the fast path.
>>
>> At which point I suspect we will be better off extending struct
>> user_namespace by a few pointers, so there is no union and remove the
>> need for blocking speculation entirely.
>
> How does this help prevent a speculative read with an invalid 'pos'
> reading arbitrary kernel addresses?

I though the concern was extents.

I am now convinced that collectively we need a much better description
of the problem than currently exists.

Either the patch you presented missed a whole lot like 90%+ of the
user/kernel interface or there is some mitigating factor that I am not
seeing. Either way until reasonable people can read the code and
agree on the potential exploitability of it, I will be nacking these
patches.

>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> index 246d4d4ce5c7..aa0be8cef2d4 100644
>>> --- a/kernel/user_namespace.c
>>> +++ b/kernel/user_namespace.c
>>> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>>> {
>>> loff_t pos = *ppos;
>>> unsigned extents = map->nr_extents;
>>> - smp_rmb();
>>>
>>> - if (pos >= extents)
>>> - return NULL;
>>> + /* paired with smp_wmb in map_write */
>>> + smp_rmb();
>>>
>>> - if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> - return &map->extent[pos];
>>> + if (pos < extents) {
>>> + osb();
>>> + if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> + return &map->extent[pos];
>>> + return &map->forward[pos];
>>> + }
>>>
>>> - return &map->forward[pos];
>>> + return NULL;
>>> }
>>>
>>> static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
>>
>>
>>
>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>> index 8ca9915befc8..7f83abdea255 100644
>>> --- a/net/mpls/af_mpls.c
>>> +++ b/net/mpls/af_mpls.c
>>> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>> if (index < net->mpls.platform_labels) {
>>> struct mpls_route __rcu **platform_label =
>>> rcu_dereference(net->mpls.platform_label);
>>> +
>>> + osb();
>>> rt = rcu_dereference(platform_label[index]);
>>> }
>>> return rt;
>>
>> Ouch! This adds a barrier in the middle of an rcu lookup, on the
>> fast path for routing mpls packets. Which if memory serves will
>> noticably slow down software processing of mpls packets.
>>
>> Why does osb() fall after the branch for validity? So that we allow
>> speculation up until then?
>
> It falls there so that the cpu only issues reads with known good 'index' values.
>
>> I suspect it would be better to have those barriers in the tun/tap
>> interfaces where userspace can inject packets and thus time them. Then
>> the code could still speculate and go fast for remote packets.
>>
>> Or does the speculation stomping have to be immediately at the place
>> where we use data from userspace to perform a table lookup?
>
> The speculation stomping barrier has to be between where we validate
> the input and when we may speculate on invalid input.

So a serializing instruction at the kernel/user boundary (like say
loading cr3) is not enough? That would seem to break any chance of a
controlled timing.

> So, yes, moving
> the user controllable input validation earlier and out of the fast
> path would be preferred. Think of this patch purely as a static
> analysis warning that something might need to be done to resolve the
> report.

That isn't what I was suggesting. I was just suggesting a serialization
instruction earlier in the pipeline.

Given what I have seen in other parts of the thread I think an and
instruction that just limits the index to a sane range is generally
applicable, and should be cheap enough to not care about. Further
it seems to apply to the pattern the static checkers were catching,
so I suspect that is the pattern we want to stress for limiting
speculation. Assuming of course the compiler won't just optimize the
and of the index out.

Eric