Re: [PATCH 0/4] Introduce QPW for per-cpu operations

From: Marcelo Tosatti

Date: Tue Feb 24 2026 - 13:30:03 EST


On Tue, Feb 24, 2026 at 03:40:56PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 20, 2026 at 02:35:41PM -0300, Marcelo Tosatti a écrit :
> >
> > I am not sure its safe to assume that. Ask Gemini about isolcpus use
>
> Erm... ok fine let's see that :-)
>
> > cases and:
> >
> > 1. High-Frequency Trading (HFT)
> > In the world of HFT, microseconds are the difference between profit and loss.
> > Traders use isolcpus to pin their execution engines to specific cores.
> >
> > The Goal: Eliminate "jitter" caused by the OS moving other processes onto the same core.
> >
> > The Benefit: Guaranteed execution time and ultra-low latency.
>
> That would be full isolation (aka nohz_full) because the goal here is to beat
> the competitors. As such the software latency must tend toward hardware latency.
>
> I wouldn't expect any syscall here but a full userspace stack with DPDK for
> example.
>
> I put that in the 5g uRLLC (or similar low latency networking) usecase family.
>
> >
> > 2. Real-Time Audio & Video Processing
> > If you are running a Digital Audio Workstation (DAW) or a live video encoding rig, a tiny "hiccup" in CPU availability results in an audible pop or a dropped frame.
> >
> > The Goal: Reserve cores specifically for the Digital Signal Processor (DSP) or the encoder.
> >
> > The Benefit: Smooth, glitch-free media streams even when the rest of the
> > system is busy.
>
> Here I expect weaker isolation requirements with syscalls involved. Scheduler
> domain isolation alone (aka isolcpus=[domain]) would fit.
>
> >
> > 3. Network Function Virtualization (NFV) & DPDK
> > For high-speed networking (like 10Gbps+ traffic), the Data Plane Development Kit (DPDK) uses "poll mode" drivers. These drivers constantly loop to check for new packets rather than waiting for interrupts.
> >
> > The Goal: Isolate cores so they can run at 100% utilization just checking for network packets.
> >
> > The Benefit: Maximum throughput and zero packet loss in high-traffic
> > environments.
>
> I put that in the 5g uRLLC usecase family as well (again or similar low latency networking).
>
> > 4. Gaming & Simulation
> > Competitive gamers or flight simulator enthusiasts sometimes isolate a few cores to handle the game's main thread, while leaving the rest of the OS (Discord, Chrome, etc.) to the remaining cores.
> >
> > The Goal: Prevent background Windows/Linux tasks from stealing cycles from the game engine.
> >
> > The Benefit: More consistent 1% low FPS and reduced input lag.
>
> That's domain isolation because frequent syscalls are unavoidable.
>
> >
> > 5. Deterministic Scientific Computing
> > If you're running a simulation that needs to take exactly the same amount of time every time it runs (for benchmarking or safety-critical testing), you can't have the OS interference messing with your metrics.
> >
> > The Goal: Remove the variability of the Linux scheduler.
> >
> > The Benefit: Highly repeatable, deterministic results.
>
> I guess here there are plenty of flavours. The only one I know of is this
> power simulator that relies of nohz_full. Not sure about the implementation
> relying on syscalls or not:
>
> https://dpsim.fein-aachen.org/docs/getting-started/real-time/
>
> > For example, AF_XDP bypass uses system calls (and wants isolcpus):
> >
> > https://www.quantvps.com/blog/kernel-bypass-in-hft?srsltid=AfmBOoryeSxuuZjzTJIC9O-Ag8x4gSwjs-V4Xukm2wQpGmwDJ6t4szuE
>
> That's HFT again and they state that they rely on polling userspace drivers so
> I don't expect syscalls.
>
> But anyway here is a summary I would propose:
>
> * Domain isolation alone is a good fit when some glitches must be avoided but
> kernel work is still necessary: non critical high volume networking or data
> capture, video games, etc...
>
> * Full isolation is a better fit for ultra low latency requirement, in this case
> the kernel is only good for preparatory work and interface layout between
> userspace and the hardware (VFIO).
>
> I've observed 3 patterns so far:
>
> - Low latency networking with DPDK, eg: 5g uRLLC (should be syscalls free)
> - Scientific simulation (not sure about syscalls)
> - HPC computation such as LLM (not sure about syscalls).
>
> Is flushing work only relevant for full isolation? If so I can't say which is
> the best solution between flushing pending work on syscall exit and doing that
> remotely. But if it's relevant also for domain isolation, then the remote
> work is better because it doesn't add unecessary work on syscalls which still
> happen in this mode.

Yes, see my last email about HPC.

> At least doing things remotely should be free of any surprising side-effects.
> But we must determine how to properly activate the isolated mode (switch to
> spinlocks) depending on the isolation mode which can be not only defined
> on boot but also on runtime (at least for domain isolation through cpusets
> but it will be the case as well with nohz_full in the future).
>
> Thanks.

If you boot with remote spinlocks (qpw=1) today, then you can't change
that.

You could, because its a static key:

#define qpw_lock(lock, cpu) \
do { \
if (static_branch_maybe(CONFIG_QPW_DEFAULT, &qpw_sl)) \
spin_lock(per_cpu_ptr(lock.sl, cpu)); \
else \
local_lock(lock.ll); \
} while (0)

But haven't thought about switching on runtime (and don't see why it
would be necessary to switch on runtime). It is independent of
switching CPUs to/from being isolated (or nohz_full).

OK will address the remaining comments and repost.