Re: [RFC PATCH v6 1/5] perf sched: sync state char array with the kernel

From: Ze Gao
Date: Thu Aug 03 2023 - 22:39:15 EST


On Fri, Aug 4, 2023 at 10:21 AM Ze Gao <zegao2021@xxxxxxxxx> wrote:
>
> On Thu, Aug 3, 2023 at 11:10 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > On Thu, 3 Aug 2023 04:33:48 -0400
> > Ze Gao <zegao2021@xxxxxxxxx> wrote:
> >
> > > Update state char array and then remove unused and stale
> > > macros, which are kernel internal representations and not
> > > encouraged to use anymore.
> > >
> > > Signed-off-by: Ze Gao <zegao@xxxxxxxxxxx>
> > > ---
> > > tools/perf/builtin-sched.c | 13 +------------
> > > 1 file changed, 1 insertion(+), 12 deletions(-)
> > >
> > > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > > index 9ab300b6f131..8dc8f071721c 100644
> > > --- a/tools/perf/builtin-sched.c
> > > +++ b/tools/perf/builtin-sched.c
> > > @@ -92,23 +92,12 @@ struct sched_atom {
> > > struct task_desc *wakee;
> > > };
> > >
> > > -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
> > > +#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"
> >
> > Thinking about this more, this will always be wrong. Changing it just works
> > for the kernel you made the change for, but if it is run on another kernel,
> > it's broken again.
>
> Indeed. There is no easy way to maintain backward compatibility unless
> we stop using this bizarre 'prev_state' field. Basically all its users suffer
> from this. That's why I believe this needs a fix to alert people does not
> use 'prev_state' anymore.
>
> > I actually wrote code once that basically just did a:
> >
> > struct trace_seq s;
> >
> > trace_seq_init(&s);
> > tep_print_event(tep, &s, record, "%s", TEP_PRINT_INFO);
> >
> > then searched s.buffer for "prev_state=%s ", to find the state character.
> >
> > That's because the kernel should always be up to date (and why I said I
> > needed that string in the print_fmt).
>
> Turing to building the state char array from print fmt string dynamically
> is a great idea. :)
>
> > As perf has a tep handle, this could be a helper function to extract the
> > state if needed, and get rind of relying on the above character array.
>
> I'll figure out how to make it happen.
>
> BTW, my last concern is that is there any better way to notice userspace to
> avoid interpreting task state out of 'prev_state'. Because the awkward thing
> happens again.

By userspace, I mean all tools consume 'prev_state' but don't have print fmt
available, taking bpf tracepoint for example.

Regards,
Ze

> Thanks,
> Ze
>
> > -- Steve
> >
> >
> > >
> > > /* task state bitmask, copied from include/linux/sched.h */
> > > #define TASK_RUNNING 0
> > > #define TASK_INTERRUPTIBLE 1
> > > #define TASK_UNINTERRUPTIBLE 2
> > > -#define __TASK_STOPPED 4
> > > -#define __TASK_TRACED 8
> > > -/* in tsk->exit_state */
> > > -#define EXIT_DEAD 16
> > > -#define EXIT_ZOMBIE 32
> > > -#define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD)
> > > -/* in tsk->state again */
> > > -#define TASK_DEAD 64
> > > -#define TASK_WAKEKILL 128
> > > -#define TASK_WAKING 256
> > > -#define TASK_PARKED 512
> > >
> > > enum thread_state {
> > > THREAD_SLEEPING = 0,
> >