Re: [GIT PULL] x86/mm changes for v5.8

From: Ingo Molnar
Date: Tue Jun 02 2020 - 03:33:56 EST

Next message: Naresh Kamboju: "Re: [PATCH 4.14 00/77] 4.14.183-rc1 review"
Previous message: Sai Prakash Ranjan: "Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF"
In reply to: Singh, Balbir: "Re: [GIT PULL] x86/mm changes for v5.8"
Next in thread: Benjamin Herrenschmidt: "Re: [GIT PULL] x86/mm changes for v5.8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Balbir Singh <sblbir@xxxxxxxxxx> wrote:

> > At a _minimum_, SMT being enabled should disable this kind of crazy
> > pseudo-security entirely, since it is completely pointless in that
> > situation. Scheduling simply isn't a synchronization point with SMT
> > on, so saying "sure, I'll flush the L1 at context switch" is beyond
> > stupid.
> >
> > I do not want the kernel to do things that seem to be "beyond stupid".
> >
> > Because I really think this is just PR and pseudo-security, and I
> > think there's a real cost in making people think "oh, I'm so special
> > that I should enable this".
> >
> > I'm more than happy to be educated on why I'm wrong, but for now I'm
> > unpulling it for lack of data.
> >
> > Maybe it never happens on SMT because of all those subtle static
> > branch rules, but I'd really like to that to be explained.
>
> The documentation calls out the SMT restrictions.

That's not what Linus suggested though, and you didn't answer his
concerns AFAICS.

The documentation commit merely mentions that this feature is useless
with SMT:

0fcfdf55db9e: ("Documentation: Add L1D flushing Documentation")

+Limitations
+-----------
+
+The mechanism does not mitigate L1D data leaks between tasks belonging to
+different processes which are concurrently executing on sibling threads of
+a physical CPU core when SMT is enabled on the system.
+
+This can be addressed by controlled placement of processes on physical CPU
+cores or by disabling SMT. See the relevant chapter in the L1TF mitigation
+document: :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.

Linus is right that the proper response is for this feature to do
*nothing* if SMT is enabled on affected CPUs - but that's not
implemented in the patches ...

Or rather, we should ask a higher level question as well, maybe we
should not do this feature at all?

Typically cloud computing systems such as AWS will have SMT enabled,
because cloud computing pricing is essentially per vCPU, and they want
to sell the hyperthreads as vCPUs. So the safest solution, disabling
SMT on affected systems, is not actually done, because it's an
economic non-starter. (I'd like to note the security double standard
there: the most secure option, to disable SMT, is not actually used ...)

BTW., I wonder how Amazon is solving the single-vCPU customer workload
problem on AWS: if the vast majority of AWS computing capacity is
running on a single vCPU, because it's the cheapest tier and because
it's more than enough capacity to run a website. Even core-scheduling
doesn't solve this fundamental SMT security problem: separate customer
workloads *cannot* share the same core - but this means that the
single-vCPU workloads will only be able to utilize 50% of all
available vCPUs if they are properly isolated.

Or if the majority of AWS EC2 etc. customer systems are using 2,4 or
more vCPUs, then both this feature and 'core-scheduling' is
effectively pointless from a security POV, because the cloud computing
systems are de-facto partitioned into cores already, with each core
accounted as 2 vCPUs.

The hour-up-rounded way AWS (and many other cloud providers) account
system runtime costs suggests that they are doing relatively static
partitioning of customer workloads already, i.e. customer workloads
are mapped to actual physical hardware in an exclusive fashion, with
no overcommitting of physical resources and no sharing of cores
between customers.

If I look at the pricing and capabilities table of AWS:

https://aws.amazon.com/ec2/pricing/on-demand/

Only the 't2' and 't3' On-Demand instances have 'Variable' pricing,
which is only 9% of the offered 228 configurations.

I.e. I strongly suspect that neither L1D flushing nor core-scheduling
is actually required on affected vulnerable CPUs to keep customer
workloads isolated from each other, on the majority of cloud computing
systems, because they are already isolated via semi-static
partitioning, using pricing that reflects static partitioning.

Thanks,

Ingo

Next message: Naresh Kamboju: "Re: [PATCH 4.14 00/77] 4.14.183-rc1 review"
Previous message: Sai Prakash Ranjan: "Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF"
In reply to: Singh, Balbir: "Re: [GIT PULL] x86/mm changes for v5.8"
Next in thread: Benjamin Herrenschmidt: "Re: [GIT PULL] x86/mm changes for v5.8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]