On Wed, 2011-09-28 at 15:19 -0300, Glauber Costa wrote:Btw, asm output with CGROUP_SCHED disabled seem to be no worse thanOn 09/27/2011 06:00 PM, Peter Zijlstra wrote:On Fri, 2011-09-23 at 19:20 -0300, Glauber Costa wrote:/* Must have preemption disabled for this to be meaningful. */
-#define kstat_this_cpu __get_cpu_var(kstat)
+#define kstat_this_cpu this_cpu_ptr(task_group_kstat(current))
This just lost you a debug check, the former would whinge when called
without preemption, the new one wont. Its part of the this_cpu feature
set to make debugging impossible.
+#define kstat_cpu(cpu) per_cpu(kstat, cpu)
+#define kstat_this_cpu (&__get_cpu_var(kstat))
extern unsigned long long nr_context_switches(void);
@@ -52,8 +62,8 @@ struct irq_desc;
static inline void kstat_incr_irqs_this_cpu(unsigned int irq,
struct irq_desc *desc)
It might be worth looking at the asm output of that, I think you made it
worse, but I'm not quite sure how smart gcc is, it might just figure out
what you meant.
I'd say leave it alone.
The biggest difference is that we don't have access to task_group(), or
any of the fields in struct task_group. Because of that, we end up
having to export a function to do the job of dealing with it.
Users inside sched.c won't have this problem. Outside of it, we'll add a
call to some paths. True, mostly handle_irq paths, but I don't think
that's what's going to kill us.
Now if we really really want to save it, we'd have to move struct
task_group and its friends to a more visible location like a header...
I'm not quite getting how task_group is relevant here.
The above will do something like:
mov gs:$per-cpu-offset-of-kstat, reg
inc reg + idx*8
whereas __this_cpu_inc() could end up like:
inc gs:$per-cpu-offset-of-kstat + idx*8
or whatnot. Now clearly gcc could be smart and optimize the temporary
reg thing away in the earlier case, or it might not, I really don't know
how smart that thing is.