Re: [PATCH for-next 1/3] selftests/watchdog: add count parameter for watchdog-test

From: Zhijian Li (Fujitsu)
Date: Mon Oct 28 2024 - 01:46:58 EST




On 28/10/2024 13:31, Shuah Khan wrote:
> On 10/27/24 22:02, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 28/10/2024 11:29, Shuah Khan wrote:
>>> On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
>>>>
>>>>
>>>> On 27/10/2024 08:28, Shuah Khan wrote:
>>>>> On 10/24/24 19:39, Li Zhijian wrote:
>>>>>> Currently, watchdog-test keep running until it gets a SIGINT. However,
>>>>>> when watchdog-test is executed from the kselftests framework, where it
>>>>>> launches test via timeout which will send SIGTERM in time up. This could
>>>>>> lead to
>>>>>> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>>>>>>       in silent.
>>>>>> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>>>>>>      'not ok'
>>>>>>
>>>>> This test isn't really supposed to be run from kselftest framework.
>>>>> This is the reason why it isn't included in the default run.
>>>>
>>>> May I know what's the default run, is it different from `make run_tests` ?
>>>
>>> No it isn't. "make kselftest" runs only the targets mentioned in the
>>> selftests Makefile. That is considered the kselftest default run.
>>
>> Hey, Shuah,
>>
>>
>> Thanks for your explanation.
>> If that is the case, I do not have an urgent need for the current patch, expect
>> I'd like to avoid the reboot issue after an accidentally `make run_tests`
>>
>> Some changes are make as below, please take a look. I will send it out we reach a consensus.
>>
>>
>> commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD)
>> Author: Li Zhijian <lizhijian@xxxxxxxxxxx>
>> Date:   Mon Oct 28 11:54:03 2024 +0800
>>
>>       selftests/watchdog-test: Fix system accidentally reset after watchdog-test
>>       After `make run_tests` to run watchdog-test, a system reboot would
>>       happen due to watchdog not stop.
>>       ```
>
> The system shouldn't reboot just because watchdog test is left running.
> watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure
> the watchdog card timer is reset.

Err..

How watchdog test keep calling ioctl() with WDIOC_KEEPALIVE after ./watchdog_test has finished?

In my understanding, the cause is that, ./watchdog_test didn't goto neither
A)
347 end:
348 /*
349 * Send specific magic character 'V' just in case Magic Close is
350 * enabled to ensure watchdog gets disabled on close.
351 */
352 ret = write(fd, &v, 1);
353 if (ret < 0)
354 printf("Stopping watchdog ticks failed (%d)...\n", errno);

nor B)

68 static void term(int sig)
69 {
70 int ret = write(fd, &v, 1);
71
72 close(fd);
73 if (ret < 0)
74 printf("\nStopping watchdog ticks failed (%d)...\n", errno);
75 else
76 printf("\nStopping watchdog ticks...\n");
77 exit(0);
78 }

to "ensure watchdog gets disabled on close"


The timeout default signal is SIGTERM, watchdog_test only registered SIGINT handler.

Thanks
Zhijian



>
> If you are seeing reboots, that means watchdog test couldn't reset the
> timer. This usually mean system is unresponsive or something is wrong
> with the watchdog card on your system.
>
> This is the behavior you would expect from a watchdog timer. Does your
> system have a watchdog card ot or you enabling softdog module?
>
> Either way there is some other reason for the system reboot.
>
> thanks,
> -- Shuah