Re: [PATCH v2] Pre-emption control for userspace

From: Khalid Aziz
Date: Tue Mar 25 2014 - 14:50:01 EST


On 03/25/2014 12:20 PM, Andi Kleen wrote:
Khalid Aziz <khalid.aziz@xxxxxxxxxx> writes:

First it would be nice to have some standard reference lock library that
uses this. What would it take to support this in glibc?

I am not sure if it would be practical and useful to integrate this into any of the standard locking interfaces, but I have not looked into it much either. My initial intent is to let individual apps decide if they could benefit from this interface and code it in if so since the interface is meant to be very simple. Do you see any of the standard locking interfaces where it would make sense to integrate this feature in, or are you thinking of creating a new interface?


+==================================
+Using the preemption delay feature
+==================================
+
+This feature is enabled in the kernel by setting
+CONFIG_SCHED_PREEMPT_DELAY in kernel configuration. Once this feature is
+enabled, the userspace process communicates with the kernel using a
+4-byte memory location in its address space. It first gives the kernel
+address for this memory location by writing its address to
+/proc/<tgid>/task/<tid>/sched_preempt_delay. This memory location is
+interpreted as a sequence of 4 bytes:
+
+ byte[0] = flag to request preemption delay
+ byte[1] = flag from kernel indicating preemption delay was granted
+ byte[2] = reserved for future use
+ byte[3] = reserved for future use

Should reserve more bytes (64, 128?) and rename the proc flag
to a more generic name. I could well assume other things
using such a mechanism in the future. Also please add a flag
word with feature bits (similar to the perf mmap page)

I am reluctant to make it too big since reading larger quantities from userspace will take longer and start to impact performance. Keeping shared data limited to 32-bits allows us to move it between userspace and kernel with one instruction.


How about alignment? x86 will not care, but other architectures
may.


You are right. These 4-bytes need to be aligned to a 4-byte boundary. I will look into adding an alignment check at the time this address is passed to kernel.

#endif
+#ifdef CONFIG_SCHED_PREEMPT_DELAY
+ REG("sched_preempt_delay", S_IRUGO|S_IWUSR,
proc_tid_preempt_delay_ops),

This shouldn't be readable by group/other, as it exposes the address space,
so could help exploits.

I like Oleg's suggestion of using prctl() instead of procfs to pass the userspace address to kernel. Above problem will disappear when I switch to prctl() in v3 of this patch.


@@ -2061,6 +2069,13 @@ extern u64 scheduler_tick_max_deferment(void);
static inline bool sched_can_stop_tick(void) { return false; }
#endif

+#if defined(CONFIG_SCHED_PREEMPT_DELAY) && defined(CONFIG_PROC_FS)
+extern void sched_preempt_delay_show(struct seq_file *m,
+ struct task_struct *task);
+extern void sched_preempt_delay_set(struct task_struct *task,
+ unsigned char *val);
+#endif

Prototypes don't need to be ifdefed.

-Andi


Ah, you are right. Thanks, Andi!

--
Khalid

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/