Re: [PATCH] sched/topology: Allow EAS without schedutil for artificial Energy Models

From: Lucas Lima

Date: Tue Jun 30 2026 - 14:15:40 EST


Em ter., 30 de jun. de 2026 às 10:07, Rafael J. Wysocki (Intel)
<rafael@xxxxxxxxxx> escreveu:
>
> On Tue, Jun 30, 2026 at 10:11 AM Lucas Lima <lucaslnobrega38@xxxxxxxxx> wrote:
> >
> > Em seg., 29 de jun. de 2026 às 16:06, Rafael J. Wysocki
> > <rafael@xxxxxxxxxx> escreveu:
> > >
> > > On Monday, June 29, 2026 5:16:17 PM CEST Rafael J. Wysocki (Intel) wrote:
> > > > On Mon, Jun 29, 2026 at 10:36 AM Lucas de Lima Nóbrega
> > > > <lucaslnobrega38@xxxxxxxxx> wrote:
> > > > >
> > > > > EAS currently refuses to enable energy-aware scheduling on a root
> > > > > domain unless schedutil is the active CPUFreq governor for all of its
> > > > > CPUs (cpufreq_ready_for_eas()). This requirement exists to protect the
> > > > > accuracy of the energy estimate: EAS predicts the OPP a CPU will run
> > > > > at from its utilization, which is only meaningful if the active
> > > > > governor actually requests OPPs that way, and schedutil is the only
> > > > > one that does.
> > > > >
> > > > > That requirement does not apply to artificial Energy Models
> > > > > (EM_PERF_DOMAIN_ARTIFICIAL). An artificial EM is built from a
> > > > > get_cost() callback instead of real power numbers, and only encodes a
> > > > > cost ranking between CPUs (e.g. P-cores cost more than E-cores at a
> > > > > given utilization). It never claims to predict real energy use at any
> > > > > specific OPP, so there is no per-OPP accuracy for the governor
> > > > > requirement to protect, regardless of which governor is in control or
> > > > > whether it tracks utilization at all.
> > > >
> > > > But it is still about comparing the cost of running on different CPUs
> > > > at different performance levels.
> > > >
> > > > For instance, say the scale-invariant utilization of a task is 256 and
> > > > it can run either by itself on a P-core, or with another task whose
> > > > utilization is 128 on an E-core, and say the P-core's and E-core's
> > > > capacity is 1024 and 512, respectively.
> > > >
> > > > Say the cost function tells EAS that running a P-core at 1/4 of the
> > > > capacity is cheaper than running an E-core at 3/4 capacity, so it will
> > > > pick up the P-core to run that task, but if cpufreq ramps up the
> > > > frequency of the P-core to the max when the task gets to it, it may
> > > > actually turn out to be more expensive.
> > > >
> > > > This means that EAS still has an expectation regarding cpufreq which
> > > > is that it will generally tend to run tasks at the performance level
> > > > corresponding to the sum of their scale-invariant utilization at least
> > > > roughly.
> > > >
> > > > IIUC this actually has nothing to do with whether or not the energy
> > > > model used by EAS is artificial. The schedutil requirement is about
> > > > choosing a performance level proportional to the utilization (which
> > > > schedutil generally tends to do by design).
> > > >
> > > > > intel_pstate registers exactly this kind of artificial EM for hybrid
> > > > > (P/E-core) systems without SMT, regardless of whether it operates in
> > > > > active or passive mode. In active mode it never uses schedutil, since
> > > > > HWP picks frequency autonomously, so on these systems EAS never
> > > > > engages even though SD_ASYM_CPUCAPACITY, frequency invariance and the
> > > > > EM are all in place: find_energy_efficient_cpu() is never reached
> > > > > because is_rd_overutilized() is hardcoded to true whenever
> > > > > sched_energy_enabled() is false. cppc_cpufreq registers the same kind
> > > > > of ranking-only artificial EM and is affected the same way with any
> > > > > non-schedutil governor.
> > > > >
> > > > > Allow EAS to be enabled when every CPU's EM in the root domain is
> > > > > artificial, even when schedutil is not the active governor.
> > > > >
> > > > > Tested on a Raptor Lake-P laptop with nosmt=force and intel_pstate in
> > > > > active/HWP mode: find_energy_efficient_cpu() was never called before
> > > > > this change (confirmed via the sched_overutilized_tp tracepoint and
> > > > > ftrace) and is exercised as expected afterwards.
> > > >
> > > > If this is about allowing EAS to work with intel_pstate running in the
> > > > active mode, you may argue that what the processor firmware is doing
> > > > when intel_pstate runs in the active mode is not much different from
> > > > what schedutil would do. So a driver implementing an internal
> > > > governor (that is, using the .set_policy() callback) would need to
> > > > declare that its internal governor is as good as schedutil from EAS'
> > > > perspective and so it will pass the "cpufreq readiness" check.
> > >
> > > And I have a prototype patch (on top of 7.2-rc1) doing this which is
> > > appended.
> > >
> > > I wonder if it works for you (that is, if it allows intel_pstate and EAS to
> > > work together both with schedutil and when intel_pstate operates in the
> > > active mode with the "powersave" policy on your system).
> >
> > It does work, thank you.
>
> Great, thanks!
>
> So this approach is more straightforward IMV and that's why I prefer it.
>
> I'll need to revise the new flag description so it mentions the need
> for a "matching" EM to produce reasonable results and there are a few
> intel_pstate patches in-flight, so this one will need to be rebased.
> It also needs a changelog, of course.
>
> > >
> > > Also I wonder why exactly you want intel_pstate in the active mode to
> > > work with EAS. Do you see any significant improvement in that case?
> >
> > About that specific topic i do not have any testing data, but it felt
> > like schedutil drains more battery than pstate active (likely due to
> > worse c-states management) and presents more stutters in general usage
> > (I would guess it's slower to react to load changes). After bypassing schedutil
> > those very observations were gone, and the responsiveness of the system
> > looked very similar to EAS disabled, pstate active. Since EAS does
> > prioritize spreading
> > onto E cores, which do consume less energy by my testing, IMHO it's
> > almost too good
> > to be leaving it unused.
>
> Fair enough.
>
> > I also want to point out that gaming (mainly minecraft) stutters a lot more with
> > EAS on, even when pstate is set to active. So i wonder what do you think about
> > capturing the system power mode (currently only clamps frequency) and
> > disabling eas_compatible when set to "performance"? That would need
> > updating cpufreq_policy, but feels reasonable to let the user disable EAS
> > for latency sensitive tasks, since E cores struggle at those.
>
> Switching the governor (or policy if you will) to "performance" on any
> CPU should cause eas_compatible to be cleared for it due to the
> cpu->policy != CPUFREQ_POLICY_PERFORMANCE check and then the scheduler
> will refuse to use EAS after rebuilding the sched domains.
>
> Or do you mean something else?

I actually mean performance mode in platform profile really, as it's the more
straightforward way for the user to request for more responsiveness
from the system.
For now, putting platform profile at performance does not change the
governor from
powersave, and as such, EAS is not disabled. A notifier from platform_profile
to re-evaluate eas_compatible when the profile changes would address this."

>
> > I know that for now, my observations are only anecdotal, but if needed
> > I'm eager to test those assumptions!
>
> So it would be good to have some data indicating that it is beneficial
> to use EAS when intel_pstate operates in the active mode ("powersave"
> policy).

Will be on it.

>
> Also, enabling EAS by default for the active mode may be problematic
> because people may see (and report) performance regressions due to it.
> OTOH, EAS can be disabled via sysctl, so that may not be a big deal.

I feel that concern further motivates my proposal. I would not expect
the average user to use sysctl to change EAS or even to know
about cpufreq governors at all, so linking EAS to platform profiles
already exposed by the desktop might avoid complaints.