Re: [PATCH] sched: fair: fix missed CONFIG_SCHEDSTATS

From: Yafang Shao
Date: Thu Mar 07 2019 - 02:50:38 EST


On Wed, Mar 6, 2019 at 8:53 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Wed, Mar 6, 2019 at 8:38 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Mar 06, 2019 at 07:49:36PM +0800, Yafang Shao wrote:
> >
> >
> > $ grep SCHEDSTAT defconfig-build/.config
> > # CONFIG_SCHEDSTATS is not set
> > $ obbjdump -dr defconfig-build/kernel/sched/fair.o | awk '/>:$/ { F=$2 } /sched_stat/ { print F " " $0 }'
> > <update_curr>: 24cd: R_X86_64_32S __tracepoint_sched_stat_runtime+0x28
> > <update_curr>: 24d9: R_X86_64_PC32 __tracepoint_sched_stat_runtime+0x24
> > $ patch -p1 < foo
> > patching file kernel/sched/fair.c
> > $ make O=defconfig-build kernel/sched/
> > make[1]: Entering directory '/usr/src/linux-2.6/defconfig-build'
> > Using .. as source for kernel
> > GEN Makefile
> > CALL ../scripts/checksyscalls.sh
> > CALL ../scripts/atomic/check-atomics.sh
> > DESCEND objtool
> > CC kernel/sched/fair.o
> > AR kernel/sched/built-in.a
> > make[1]: Leaving directory '/usr/src/linux-2.6/defconfig-build'
> > $ objdump -dr defconfig-build/kernel/sched/fair.o | awk '/>:$/ { F=$2 } /sched_stat/ { print F " " $0 }'
> > $ cat foo
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8213ff6e365d..6e5ceec3b662 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -839,7 +839,8 @@ static void update_curr(struct cfs_rq *cfs_rq)
> > if (entity_is_task(curr)) {
> > struct task_struct *curtask = task_of(curr);
> >
> > - trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
> > + if (schedstat_enabled())
> > + trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
> > cgroup_account_cputime(curtask, delta_exec);
> > account_group_exec_runtime(curtask, delta_exec);
> > }
> >
> >
> > _1_ line, where you wanted to add _6_ ugly #ifdefs
>
> I get your point now.
>
> Yes, these codes can be removed from the callsites in kernel/sched/fair.c,
> but the definitions of these tracepoints are still there,
> and then they will be exposed in /sys/kernel/debug/tracing/events/sched/.
>
> You can try objdump the vmlinux.
> $ objdump -dr kernel/sched/fair.o | awk '/>:$/ { F=$2 } /sched_stat/ {
> print F " " $0 }' // nothing
>
> $ objdump -dr vmlinux | awk '/>:$/ { F=$2 } /sched_stat/ { print F " " $0 }'
> <perf_trace_sched_stat_runtime>: ffffffff810b3c30
> <perf_trace_sched_stat_runtime>: // it is still defined
>
>
> My guess is they will be used by perf or bpf,
> so they won't be optimized out by the compiler.
>

Hi Peter,

If you do not like sprinkle #ifdef, we can use something like bellow
to resovle this issue.
I don't like bellow code really, but it can avoid exposing these
tracepoints to the userspace.

What about your opinon ?


diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9a4bdfa..a0291f2 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -336,6 +336,7 @@ static inline long __trace_sched_switch_state(bool
preempt, struct task_struct *
__entry->pid, __entry->old_pid)
);

+#ifdef CONFIG_SCHEDSTATS
/*
* XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE
* adding sched_stat support to SCHED_FIFO/RR would be welcome.
@@ -394,6 +395,14 @@ static inline long
__trace_sched_switch_state(bool preempt, struct task_struct *
DEFINE_EVENT(sched_stat_template, sched_stat_blocked,
TP_PROTO(struct task_struct *tsk, u64 delay),
TP_ARGS(tsk, delay));
+#else
+
+#define trace_sched_stat_wait(...) do {} while (0)
+#define trace_sched_stat_sleep(...) do {} while (0)
+#define trace_sched_stat_iowait(...) do {} while (0)
+#define trace_sched_stat_blocked(...) do {} while (0)
+
+#endif

/*
* Tracepoint for accounting runtime (time the task is executing


Thanks
Yafang