Re: [PATCH] habanalabs: print to kernel log when reset is finished
From: Oded Gabbay
Date: Sat Aug 10 2019 - 13:46:38 EST
On Sat, Aug 10, 2019 at 8:23 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Aug 10, 2019 at 06:29:16PM +0300, Oded Gabbay wrote:
> > On Sat, Aug 10, 2019 at 3:53 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> > > > Now that we don't print the queue testing messages, we need to print when
> > > > the reset is finished so whoever looks at the kernel log will know the
> > > > reset process was finished successfully and the driver is not stuck.
> > > >
> > > > Signed-off-by: Oded Gabbay <oded.gabbay@xxxxxxxxx>
> > > > ---
> > > > drivers/misc/habanalabs/device.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> > > > index 9a5926888b99..1fac808c2546 100644
> > > > --- a/drivers/misc/habanalabs/device.c
> > > > +++ b/drivers/misc/habanalabs/device.c
> > > > @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> > > > else
> > > > hdev->soft_reset_cnt++;
> > > >
> > > > + dev_info(hdev->dev, "Successfully finished resetting the device\n");
> > >
> > > Really? For doing things "properly" there is no need to spam the kernel
> > > log. Only spit stuff out if an error happens.
> > >
> > > thanks,
> > >
> > > greg k-h
> >
> > I beg to differ for two reasons:
> > 1. Reset happens very rarely, if at all. So this message (that get
> > printed after reset is done) will definitely not spam the kernel log.
> > 2. When a reset starts we print an appropriate error message. I think
> > it is expected by the user that we will also print if and when the
> > reset has finished successfully. I really believe that lack of this
> > printing might be deceiving for users.
>
> How is anyone going to parse the kernel log for anything to know if
> something happens?
Actually our jenkins server parse the kernel log and look for bad
messages from the driver...
But the main reason for this, IMO, is for developers (in userspace)
that run with a terminal open on the kernel log to see if something
bad happens while they run their applications.
>
> How do you trigger a reset? Is it done by userspace? If so, just
> notify them then.
Reset can be triggered by two main reasons, but a userspace
application is *not* one of them.
It can be due to a command submission not finishing in time or from a
major error in the device (e.g. double ECC error).
That's why I think it is important to tell them the reset +
re-initialization flow was finished successfully.
Of course all this information, let's call it operational status, can
be acquired by the INFO IOCTL the driver provides, but it is *really*
convenient, IMHO, to see these type of messages in the kernel log
while they happen.
>
> thanks,
>
> greg k-h