kthread behavior question

From: Erik Lotspeich
Date: Thu Aug 31 2017 - 00:08:21 EST

Next message: Kees Cook: "Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection"
Previous message: Ricardo Neri: "Re: [PATCH v8 02/28] x86/boot: Relocate definition of the initial state of CR0"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

I have seen a behavior using kernel threads that I do not understand. I
would like to know if there is a real issue within the kernel or if I am
expecting something that is not possible. I have donned the flame suit.

I would expect the kernel module listed below to load successfully and
run, albeit with a very heavy load on the system. On the following OS
environments, this is the case:

* OpenSuSE 12.3 x86_64 (kernel: 3.7.10-1.45-desktop) VMware Workstation
12.5.7, 1GB RAM, 2 processors
* OpenSuSE Leap 42.1 x86_64 (kernel: 4.1.39-56-default) VMware
Workstation 12.5.7, 4GB RAM, 4 processors

I would expect the behavior I described above due to the fact that
kernel threads run in process context and timer interrupt results in a
running of the scheduler thus allowing the system to function even under
the load caused by the busy loops in the threads.

What boggles my mind is why insmodding this kernel module on bare metal
systems (no VM) causes an instant lockup of the system. There is no
output to the VGA or serial consoles at all (even the printks that
should be printed before the threads started implying that buffers
couldn't even be flushed). It seems that in this case, the timer
interrupt is getting lost or something. But this is weird because
interrupts shouldn't be disabled for normal kthreads. Here are the bare
metal systems that I've tried:

* Slackware i686 (kernel: custom Linux 3.18.18) Intel Pentium III, 512MB RAM
* OpenSuSE Leap 42.2 x86_64 (kernel: 4.4.79-18.26-default), Intel
Celeron J3060 (2 cores, no hyperthreading), 4GB RAM

I don't understand the difference on bare metal vs. VM. It seems that if
the lockup is the expected behavior, it wouldn't work on the VM
environment. Also, the behavior does not seem to be affected by 3.x vs
4.x kernel or 32-bit vs 64-bit.

Of course, if I enable the yield() in the code below, the kernel module
loads and runs as expected in all cases.

Any advice, thoughts, questions, or comments greatly appreciated.

Thanks!

Regards

Erik

P.S. Yes, I know I should have locking around my counter variable if
this was a real program doing important work. But I'm trying to create
the simplest example that shows the behavior.

** Begin threadtest.c **
#include <linux/kernel.h>
#include <linux/module.h>

#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/delay.h>

#define NTHREADS 10
#define YIELD_ENABLED 0

static struct task_struct *test_th[NTHREADS];
static struct task_struct *monitor_th;
static volatile int val = 0;

static int monitor_thread(void *);
static int test_thread(void *);

int monitor_thread(void *arg)
{
while (! kthread_should_stop()) {
printk("%s val: %d\n", __func__, val);
msleep_interruptible(5000);
}

return 0;
}

int test_thread(void *arg)
{
int *num = (int *)arg;

printk("%s %d started\n", __func__, *num);

while (! kthread_should_stop()) {
val++;
#if YIELD_ENABLED
yield();
#endif
}

printk("%s %d stopping\n", __func__, *num);
if (num) {
kfree(num);
}

return 0;
}

int __init threadtest_init(void)
{
int i;

printk("%s\n", __func__);

for (i = 0; i < NTHREADS; i++) {
int *num = kmalloc(sizeof(int), GFP_KERNEL);

*num = i;
test_th[i] = kthread_run(test_thread, num,
"test_th_%d", i);
}
monitor_th = kthread_run(monitor_thread, NULL, "monitor_th");

return 0;
}

void __exit threadtest_exit(void)
{
int i;

if (monitor_th) {
kthread_stop(monitor_th);
}
for (i = 0; i < NTHREADS; i++) {
if (test_th[i]) {
kthread_stop(test_th[i]);
}
}

printk("%s val: %d\n", __func__, val);
}

module_init(threadtest_init);
module_exit(threadtest_exit);

MODULE_AUTHOR("");
MODULE_LICENSE("GPL");
** End threadtest.c **

Attachment: signature.asc
Description: OpenPGP digital signature

Next message: Kees Cook: "Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection"
Previous message: Ricardo Neri: "Re: [PATCH v8 02/28] x86/boot: Relocate definition of the initial state of CR0"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]