Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
From: Pingfan Liu
Date: Wed Sep 22 2021 - 00:27:26 EST
On Mon, Sep 20, 2021 at 10:20:46AM +0200, Petr Mladek wrote:
> On Fri 2021-09-17 23:41:31, Pingfan Liu wrote:
[...]
> >
> > I had thought about queue_work_on() in watchdog_nmi_enable(). But since
> > this work will block the worker kthread for this cpu. So finally,
> > another worker kthread should be created for other work.
>
> This is not a problem. workqueues use a pool of workers that are
> already created and can be used when one worker gets blocked.
>
Yes, you are right. The creation is dynamic and immune to blocking.
> > But now, I think queue_work_on() may be more neat.
> >
> > > must wait in a loop until someone else stop it and read
> > > the exit code.
> > >
> > Is this behavior mandotory? Since this kthread can decide the exit
> > condition by itself.
>
> I am pretty sure. Unfortunately, I can't find it in the documentation.
>
> My view is the following. Each process has a task_struct. The
> scheduler needs task_struct so that it can switch processes.
> The task_struct must still exist when the process exits.
> The scheduler puts the task into TASK_DEAD state.
> Another process has to read the exit code and destroy the
> task struct.
>
Thanks for bringing up this, and I have an opportunity to think about it.
The core of the problem is put_task_struct(), and who releases the
last one.
It should be: finish_task_switch()->put_task_struct_rcu_user()->delayed_put_task_struct()->put_task_struct(),
if (unlikely(prev_state == TASK_DEAD)). It does not depend on another task.
> See, do_exit() in kernel/exit.c. It ends with do_dead_task().
> It is the point when the process goes into TASK_DEAD state.
>
> For a good example, see lib/test_vmalloc.c. The kthread waits
> until anyone want him to stop:
>
> static int test_func(void *private)
> {
> [...]
>
> /*
> * Wait for the kthread_stop() call.
> */
> while (!kthread_should_stop())
> msleep(10);
>
> return 0;
> }
>
> The kthreads are started and stopped in:
>
> static void do_concurrent_test(void)
> {
> [...]
> for (i = 0; i < nr_threads; i++) {
> [...]
> t->task = kthread_run(test_func, t, "vmalloc_test/%d", i);
> [...]
> /*
> * Sleep quiet until all workers are done with 1 second
> * interval. Since the test can take a lot of time we
> * can run into a stack trace of the hung task. That is
> * why we go with completion_timeout and HZ value.
> */
> do {
> ret = wait_for_completion_timeout(&test_all_done_comp, HZ);
> } while (!ret);
> [...]
> for (i = 0; i < nr_threads; i++) {
> [...]
> if (!IS_ERR(t->task))
> kthread_stop(t->task);
> [...]
> }
They are good and elegant examples.
>
>
> You do not have to solve this if you use the system workqueue
> (system_wq).
>
Yes, workqueue is a better choice.
Thanks for your great patience.
Regards,
Pingfan