Re: [RFC PATCH v3 3/6] sched, tracing: add to report task state in symbolic chars
From: Ze Gao
Date: Tue Aug 01 2023 - 09:04:09 EST
On Tue, Aug 1, 2023 at 7:34 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Aug 01, 2023 at 05:01:21PM +0800, Ze Gao wrote:
> > Internal representations of task state are likely to be changed
> > or ordered, and reporting them to userspace without exporting
> > them as part of API is basically wrong, which can easily break
> > a userspace observability tool as kernel evolves. For example,
> > perf suffers from this and still reports wrong states as of this
> > writing.
> >
> > OTOH, some masqueraded states like TASK_REPORT_IDLE and
> > TASK_REPORT_MAX are also reported inadvertently, which confuses
> > things even more and most userspace tools do not even take them
> > into consideration.
> >
> > So add a new variable in company with the old raw value to
> > report task state in symbolic chars, which are self-explaining
> > and no further translation is needed. Of course this does not
> > break any userspace tool.
> >
> > Note for PREEMPT_ACTIVE, we introduce 'p' to report it and use
> > the old conventions for the rest.
>
> *sigh*... just because userspace if daft, we need to change the kernel?
Hi Peter,
Sorry that I don't quite agree with you on this one.
It's just the design that exporting internal details is fundamentally wrong.
And even worse, I did not see any userspace tool is aware of masqueraded
states like TASK_REPORT_IDLE and TASK_REPORT_MAX and let alone
parse it correctly. This confused me a lot when I decided to write my own bpf
version of sched-latency analysis tool and only after I figured out everything
underneath, I started to make things right here.
Again, I mean it's not me that deliberately "breaks" ABI here and I am
never meant
to upset anyone. My confusion is why did people forget to update in-tree perf
the very last time they decide to rearrange the task state mapping
since we all agree
this is important "ABI" here. I don't think it's the tool's fault.
And that's my initiative
to request this RFC.
> Why do we need this character anyway, why not just print the state in
> hex and leave it at that? These single character state things are a
> relic, please just let them die.
I believe hex is ok only after having the reported task state mapping
appear in the
uapi headers, otherwise it's still useless to userspace especially for
value like
TASK_REPORT_IDLE and TASK_REPORT_MAX, which need to dig into the
kernel to see what the hell is going on here.
Thoughts?
Regards,
Ze