Re: Query: Regarding Notifier chain callback debugging or profiling

From: Gaurav Kohli
Date: Mon Feb 10 2020 - 23:47:14 EST




On 2/11/2020 2:36 AM, Greg KH wrote:
On Mon, Feb 10, 2020 at 05:26:16PM +0530, Gaurav Kohli wrote:
Hi,

In Linux kernel, everywhere we are using notification chains to notify for
any kernel events, But we don't have any debugging or profiling mechanism to
know which callback is taking time or currently we are stuck on which call
back(without dumps it is difficult to say for last problem)

Callbacks are a mess, I agree.

Below are the few ways, which we can implement to profile callback on need
basis:

1) Use trace event before and after callback:

static int notifier_call_chain(struct notifier_block **nl,
unsigned long val, void *v,
int nr_to_call, int *nr_calls)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb, *next_nb;


+ trace_event for entry of callback
ret = nb->notifier_call(nb, val, v);
+ trace_event for exit of callback

Ick.

}
return ret;
}

2) Or use pr_debug instead of trace_event

3) Both of the above approach has certain problems, like it will dump
callback for each notifier chain, which might flood trace buffer or dmesg.

So we can use bool variable to control that and dump the required
notification chain only.

Some thing like below we can use:

struct srcu_notifier_head {
struct mutex mutex;
struct srcu_struct srcu;
struct notifier_block __rcu *head;
+ bool debug_callback;
};


static int notifier_call_chain(struct notifier_block **nl,
unsigned long val, void *v,
- int nr_to_call, int *nr_calls)
+ int nr_to_call, int *nr_calls, bool
debug_callback)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb, *next_nb;
@@ -526,6 +526,7 @@ void srcu_init_notifier_head(struct srcu_notifier_head
*nh)
if (init_srcu_struct(&nh->srcu) < 0)
BUG();
nh->head = NULL;
+ nh->debug_callback = false; -> by default it would be false for
every notifier chain.

4) we can also think of something pre and post function, before and after
each callback, And we can enable only for those who wants to profile.

Please let us what approach we can use, or please suggest some debugging
mechanism for the same.

Why not just pay attention to the specific notifier you want? Trace
when the specific blocking_notifier_call_chain() is called.

What specific notifier call chain is causing you problems that you need
to debug?

Thanks Greg for the reply.
I agree, we can trace specific notifier chain, but that is very hacky(we have to add debug code here and there when problems comes)

We are using lot of SRCU notifier callchain to notify clients for events, And if we have something generic debugging mechanism, we just have to switch on for that particular client for initial testing phase.

As mentioned above, if we can come up with something like below then only client has to switch on who wants to debug:
>> struct srcu_notifier_head {
>> struct mutex mutex;
>> struct srcu_struct srcu;
>> struct notifier_block __rcu *head;
>> + bool debug_callback; -> this we can turn on for particular client.
>> };

Right now we don't have any generic way to debug notifier chains, please suggest some approach. On live target, it is difficult to say where notification chain got stuck.


Regards
Gaurav

thanks,

greg k-h


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.