Re: Query about timer wheel API

From: imran . f . khan
Date: Mon Dec 23 2024 - 09:37:15 EST


Hello Hillf,
On 23/12/2024 11:51 pm, Hillf Danton wrote:
> On Mon, 23 Dec 2024 11:14:21 +1100 imran.f.khan@xxxxxxxxxx
>>
>> Recently we have come across some bugs in the RDS code, where a delayed
>> work was being queued on an offlined CPU and as a result of that the
>
> Such a queue could not happen given irq disabled in queue_delayed_work_on().
> Did you see it upstream?
>
You mean upstream RDS or upstream workqueue ? For RDS I need to check, but with
upstream v6.6 kernel, I was able to submit a delayed work to an offlined CPU.
The delayed work would never happen and I can see corresponding timer in timer
list of offlined CPU (using crash).
Once the CPU is brought back online, depending on the workload the work handler
gets executed.

I used following test module:

===============

#include <linux/module.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/workqueue.h>
#include <linux/completion.h>
#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/jiffies.h>

#define TIMEOUT 1 /* test timeout in secs */
#define NUM_WORK_ITEMS 1 /* number of work items to submit */


static DEFINE_MUTEX(mutex);

static DEFINE_MUTEX(dwork_func_mutex);

static void delayed_work_func(struct work_struct *data)
{
int cpu;
mutex_lock(&dwork_func_mutex);
cpu = get_cpu();
pr_err("%s invoked for work: 0x%px on cpu#%d \n", __func__, data, cpu);
put_cpu();
mutex_unlock(&dwork_func_mutex);
}

static int param_set_queue_work_on_cpu(const char *val, const struct kernel_param *kp)
{
int cpu, this_cpu, i;
struct delayed_work *dwork = NULL;

if (!mutex_trylock(&mutex))
return -EBUSY;

cpu = simple_strtoul(val, NULL, 0);
/*if (!cpu_present(cpu))
return -EINVAL;*/

for (i = 0; i < NUM_WORK_ITEMS; i++) {
dwork = kzalloc(sizeof(struct delayed_work), GFP_KERNEL);
if(dwork) {
this_cpu = get_cpu();
INIT_DELAYED_WORK(dwork, delayed_work_func);
queue_delayed_work_on(cpu, system_wq, dwork, msecs_to_jiffies(10000));
pr_err("Submitted dwork 0x%px on %s cpu#%d \n", dwork, cpu_online(cpu)?"online":"offline", cpu);
put_cpu();
}

}
mutex_unlock(&mutex);
return 0;
}

module_param_call(queue_work_on_cpu, param_set_queue_work_on_cpu, NULL, NULL, 0600);

static int __init workqueue_study_init(void)
{
pr_err("module_init \n");

return 0;
}

static void workqueue_study_exit(void)
{
pr_err("module_exit \n");
}

MODULE_AUTHOR("Imran Khan <imran.eie.85@xxxxxxxxx>");
MODULE_DESCRIPTION("Workqueue study");
MODULE_LICENSE("GPL");

module_init(workqueue_study_init);
module_exit(workqueue_study_exit);

===========

This module gives an interface at:

/sys/module/<module name>/params/queue_work_on_cpu

Writing X there would submit a delayed_work (delay 10 secs)
to CPU X.

We can see if CPU X is online, submitted work gets executed
after around 10 secs. But if CPU X is offline, the submitted
work handler does not get fired unless the CPU has been brought
back online.

Thanks,
Imran
>> underlying timer was not firing, which in turn meant that the work was
>> never able to make it to the intended worker_pool.