Typically, on our setup we observed, 10% less power consumption with some
use-cases in which CPU goes to power collapse frequently. For example,
playing audio while typically CPU remains idle.
I'm probably stupid, but I don't quite get your scenario from that
description: please would you spell it out a little more clearly for me?
Are you thinking of two CPUs, one of them running a process busily
streaming audio (with no VM_MERGEABLE areas to work on), most other
processes sleeping, and ksmd "pinned" to another, otherwise idle CPU?
I'm very inexperienced in scheduler (and audio) matters, but I'd like
to think that the scheduler would migrate ksmd to the mostly busy CPU
in that case - or is it actually 100% busy, with no room for ksmd too?
To enable deferrable timers,
$ echo 1> /sys/kernel/mm/ksm/deferrable_timer
I do share Andrew's original reservations: I'd much prefer this if we
can just go ahead and do the deferrable timer without a new tunable
to concern the user, simple though your "deferrable_timer" knob is.
In an earlier mail, you said "We have observed that KSM does maximum
savings when system is idle", as reason why some will prefer a non-
deferrable timer. I am somewhat suspicious of that observation:
because KSM waits for a page's checksum to stabilize before it saves
it in its "unstable" tree of pages to compare against. So when the
rest of the system goes idle, KSM is briefly more likely to find
matches; but that may be a short-lived "success" once the system
becomes active again. So, I'm wondering if your observation just
reflects the mechanics of KSM, and is not actually a reason to
refrain from using a deferrable timer for everyone.
On the other hand, I have a worry about using deferrable timer here.
I think I understand the value of a deferrable timer, in doing a job
which is bound to a particular cpu (mm/slab.c's cache_reap() gives
me a good example of that). But ksmd is potentially serving every
process, every cpu: we would not want it to be deferred indefinitely,
if other cpus (running processes with VM_MERGEABLE vmas) are active.
Perhaps the likelihood of that scenario is too low; or perhaps it's
a reason why we do need to offer your "deferrable_timer" knob.
Please, I need to understand better before acking this change.
By the way: perhaps KSM is the right place to start, but please take
a look also at THP in mm/huge_memory.c, whose khugepaged was originally
modelled on ksmd (but now seems to be using wait_event_freezable_timeout
rather than schedule_timeout_interruptible - I've not yet researched the
history behind that difference). I expect it to need the same treatment.
+ unsigned long enable;
+ int err;
+
+ err = kstrtoul(buf, 10,&enable);
+ if (err< 0)
+ return err;
+ if (enable>= 1)
+ return -EINVAL;
I haven't studied the patch itself, I'm still worrying about the concept.
But this caught my eye just before hitting Send: I don't think we need
a tunable which only accepts the value 0 ;)
+ use_deferrable_timer = enable;