Re: [PATCH 2/2] sched/debug: add sched_update_nr_running tracepoint

From: Peter Zijlstra
Date: Wed Sep 04 2019 - 13:48:57 EST


On Wed, Sep 04, 2019 at 03:37:11PM +0100, Qais Yousef wrote:

> I managed to hook into sched_switch to get the nr_running of cfs tasks via
> eBPF.
>
> ```
> int on_switch(struct sched_switch_args *args) {
> struct task_struct *prev = (struct task_struct *)bpf_get_current_task();
> struct cgroup *prev_cgroup = prev->cgroups->subsys[cpuset_cgrp_id]->cgroup;
> const char *prev_cgroup_name = prev_cgroup->kn->name;
>
> if (prev_cgroup->kn->parent) {
> bpf_trace_printk("sched_switch_ext: nr_running=%d prev_cgroup=%s\\n",
> prev->se.cfs_rq->nr_running,
> prev_cgroup_name);
> } else {
> bpf_trace_printk("sched_switch_ext: nr_running=%d prev_cgroup=/\\n",
> prev->se.cfs_rq->nr_running);
> }
> return 0;
> };
> ```
>
> You can do something similar by attaching to the sched_switch tracepoint from
> a module and a create a new event to get the nr_running.
>
> Now this is not as accurate as your proposed new tracepoint in terms where you
> sample nr_running, but should be good enough?

The above is after deactivate() and gives an up-to-date count for
decrements. Attach something to trace_sched_wakeup() to get the
increment update.