Re: [PATCH] nohz1: Documentation
From: Rob Landley
Date: Mon Mar 18 2013 - 16:00:09 EST
On 03/18/2013 01:46:32 PM, Frederic Weisbecker wrote:
2013/3/18 Rob Landley <rob@xxxxxxxxxxx>:...
> On 03/18/2013 11:29:42 AM, Paul E. McKenney wrote:
> And really seems like it's kconfig help text?
It's more exhaustive than a Kconfig help. A Kconfig help text should
have the level of detail that describe the purpose and impact of a
feature, as well as some quick reference/pointer to the interface.
Deeper explanation which include implementation internals, finegrained
constraints, TODO list, detailed interface are better here.
I really think we want to keep all the detailed explanations from
Paul's doc. What we need is not a quick reference but a very detailed
It's much _longer_, I'm not sure it contains significantly more
information. ("Using more power will shorten battery life" is a nice
observation, but is it specific to your subsystem? I dunno, maybe it's
a personal idiosyncrasy, but I tend to think that people start with use
cases and need to find infrastructure. The other direction seems less
interesting somehow. Like a pan with a picture on the front of what you
might want to bake with it.)
>> +1. It increases the number of instructions executed on the
>> + to and from the idle loop.
> This detail didn't get mentioned in my summary.
And it's an important point.
I mentioned increased latency coming out of idle. Increased latency
going _to_ idle is an important point? (And pretty much _every_ kconfig
option has ramifications at that level which realtime people tend to
want to bench.)
Also, I mentioned this one because all the other details I deleted
pretty much _did_ get taken into account in my summary.
>> +5. The LB_BIAS scheduler feature is disabled by adaptive
> I have no idea what that one is, my summary didn't mention it.
Nobody seem to know what that thing is, except probably the scheduler
All I know is that it's hard to implement without the tick. So I
disabled it in my tree.
Is it also an important point?
>> +o At least one CPU must keep the scheduling-clock interrupt
>> + in order to support accurate timekeeping.
> How? You never said how to tell a processor _not_ to suppress
> when CONFIG_THE_OTHER_HALF_OF_NOHZ is enabled.
Ah indeed it would be nice to point out that there must be an online
CPU outside the value range of the nohz_mask= boot parameter.
There's a nohz_mask boot parameter?
> I take it the problem is the value in the sysenter page won't get
> so gettimeofday() will see a stale value until the CPU hog stops
> suppressing interrupts? I thought the first half of NOHZ had a way
> dealing with that many moons ago? (Did sysenter cause a regression?)
With CONFIG_NO_HZ, there is always a tick running that updates GTOD
and jiffies as long as there is non-idle CPU. If every CPUs are idle
and one suddenly wakes up, GTOD and jiffies values are caught up.
With full dynticks we have a new problem: there can be a CPU using
jiffies of GTOD without running the tick (we are not idle so there can
be such users). So there must a ticking CPU somewhere.
I.E. because gettimeofday() just checks a memory location without
requiring a kernel transition, there's no opportunity for the kernel to
trigger and run catch-up code.
So you'd need a timer to remove the read flag on the page containing
the jiffies value after it was considered sufficiently stale, and then
have the page fault update the value restore the read flag and reset
the timer to switch it off again, and then just tell CPU-intensive code
that wanted to take advantage of running uninterrupted not to mess with
jiffies unless they wanted to trigger interrupts to keep it current.
By the way, I find this "full" name strange if you yourself have a list
of more cases where ticks could be dropped, but which you haven't
implemented yet. The system being entirely idle means unnecessary ticks
can be dropped. The system having no scheduling decisions to make on a
processor also means unnecessary ticks can be dropped. But there are
two config options and they get treated as entirely different
I suppose one of them having a bucket of workarounds and caveats is the
reason? One is just "let the system behave more efficiently, only
reason it's a config option is increased latency waking up from idle
can annoy the realtime guys". The second is "let the system behave more
efficiently in a way that opens up a bunch of sharp edges and requires
extensive micromanagement". But those sharp edges seem more
"unfinished" than really a design limitation...
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/