[RFC][PATCH] sched: might_sleep(): do rate-limiting before sanity checks

From: Dave Hansen
Date: Wed Jun 24 2015 - 20:04:07 EST



From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>

I have a dumb microbenchmark. It loops doing single-byte writes
to a file. I have a few other patches to work on some things in
the filesystem write path. But after those are applied, the
4th-hottest kernel function is ___might_sleep() which seems a bit
silly.

I narrowed the overhead down to the pushf/pop in native_save_fl()
underneath the irqs_disabled() call. Those instructions must
serialize something in the CPU because they seem to be way slower
than they should be.

In any case, we ratelimit might_sleep() checks anyway. But, we
do the ratelimiting *after* we check the other conditions for
might_sleep() including the (costly) irqs_disabled() call.

If we flip these around and ratelimit _before_ the other checks,
I see a boost in the microbenchmark.

The downside here is that we end up doing more frequent updates
to the global 'prev_jiffy'. But, we're still only actually
updating it once per jiffy. I tested this on an 80-core system
and my test scales better with this patch applied than without
it, which made me feel a bit better that the global updates to
'prev_jiffy' won't be that painful in practice.

Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
---

b/kernel/sched/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff -puN kernel/sched/core.c~might-sleep-ratelimit-first kernel/sched/core.c
--- a/kernel/sched/core.c~might-sleep-ratelimit-first 2015-06-24 16:57:24.643850450 -0700
+++ b/kernel/sched/core.c 2015-06-24 16:57:24.650850764 -0700
@@ -7330,13 +7330,14 @@ void ___might_sleep(const char *file, in
static unsigned long prev_jiffy; /* ratelimiting */

rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
+ if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
+ return;
+ prev_jiffy = jiffies;
+
if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
!is_idle_task(current)) ||
system_state != SYSTEM_RUNNING || oops_in_progress)
return;
- if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
- return;
- prev_jiffy = jiffies;

printk(KERN_ERR
"BUG: sleeping function called from invalid context at %s:%d\n",
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/