Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs

From: Marcelo Tosatti
Date: Wed Feb 07 2024 - 08:13:09 EST

Next message: Marcelo Tosatti: "Re: [patch 04/12] clockevent unbind: use smp_call_func_single_fail"
Previous message: Marcelo Tosatti: "Re: [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate"
In reply to: Thomas Gleixner: "Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs"
Next in thread: Marcelo Tosatti: "Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Feb 07, 2024 at 01:41:36PM +0100, Thomas Gleixner wrote:
> On Tue, Feb 06 2024 at 15:49, Marcelo Tosatti wrote:
> > @@ -396,6 +397,7 @@ static void amd_l3_disable_index(struct
> > * disable index in all 4 subcaches
> > */
> > for (i = 0; i < 4; i++) {
> > + int ret;
> > u32 reg = idx | (i << 20);
> >
> > if (!nb->l3_cache.subcaches[i])
> > @@ -409,6 +411,7 @@ static void amd_l3_disable_index(struct
> > * is not sufficient.
> > */
> > ret = wbinvd_on_cpu(cpu);
> > + WARN_ON(ret == -EPERM);
>
> What? You create inconsistent state here.

That should not happen, since we checked for

+ idx = block_interf_srcu_read_lock();
+
+ if (block_interf_cpu(cpu))
+ ret = -EPERM;

Earlier.

Thus the WARN_ON (hum, can change to BUG_ON...).

> > - amd_l3_disable_index(nb, cpu, slot, index);
> > + ret = 0;
> > + idx = block_interf_srcu_read_lock();
> > +
> > + if (block_interf_cpu(cpu))
> > + ret = -EPERM;
> > + else
> > + amd_l3_disable_index(nb, cpu, slot, index);
> > +
> > + block_interf_srcu_read_unlock(idx);
>
> Again. This is a root only operation.
>
> This whole patch series is just voodoo programming with zero
> justification for the mess it creates.
>
> Thanks,
>
> tglx

Its not really voodoo programming: its simply returning errors to
userspace if a CPU is set in a particular cpumask.

Do you have a better idea? (i am in the process of converting more
users).
Can improve on the patchset quality.

Ok, the justification is as follows. Today, there is a reactive approach
to interruptions to isolated CPUs:

1) Run Linux+latency sensitive application on isolated CPU.

2) Wait for IPI or other interruption to happen on that isolated CPU,
which breaks the application.

3) Remove that interruption to the isolated CPU.

This is (for a class of IPIs), an active approach, where those IPIs are
not permitted to interrupt the latency sensitive applications.

https://iot-analytics.com/soft-plc-industrial-innovators-dilemma/

"Hard PLCs (a market in which incumbent vendors dominate) have
historically addressed most of the needs of the existing / high end
market, such as high reliability, fast cycle times and, perhaps most
importantly, the ability of existing workforce to support and maintain
the systems. Soft PLCs, on the other hand, initially addressed the needs
of new / lower end customers by providing more flexible,
non-deterministic control solutions often at a fraction of the cost of
similar hard PLCs. Since entering the market in the 90’s, soft PLCs have
rapidly become more performant thanks to advances in virtualization
technologies, real-time Linux operating systems and more powerful edge
computing hardware, thus moving up the y-axis (product performance) in
the chart above."

Next message: Marcelo Tosatti: "Re: [patch 04/12] clockevent unbind: use smp_call_func_single_fail"
Previous message: Marcelo Tosatti: "Re: [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate"
In reply to: Thomas Gleixner: "Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs"
Next in thread: Marcelo Tosatti: "Re: [patch 12/12] x86/cacheinfo.c: check for block interference CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]