Re: [PATCH][resubmit] x86: enable preemption in delay

From: Marin Mitov
Date: Mon Jun 09 2008 - 12:12:40 EST


On Monday 09 June 2008 03:13:09 pm Ingo Molnar wrote:
>
> * Marin Mitov <mitov@xxxxxxxxxxx> wrote:
>
> > Hi all,
> >
> > This is a resubmit of the patch proposed by Ingo and me few month ago:
> >
> > http://lkml.org/lkml/2007/11/20/343
> >
> > It was in -mm for a while and then removed due to a move into the
> > mainline, but it never appeared in it.
>
> hm, we've got this overlapping commit in -tip for the same general
> problem:
>
> | # x86/urgent: e71e716: x86: enable preemption in delay
> |
> | From: Steven Rostedt <rostedt@xxxxxxxxxxx>
> | Date: Sun, 25 May 2008 11:13:32 -0400
> | Subject: [PATCH] x86: enable preemption in delay
>
> i think Thomas had a concern with the original fix - forgot the details.

Here they are:

http://lkml.org/lkml/2008/5/25/251

but see my comment to it.

>
> Ingo

There is no principal difference between both patches. I have seen Steven's
one as merged in linux-2.6.26-rc5. The only difference (if it matters of all) is
that in mine patch preempt_disable()/preempt_enable() sections are shorter
and protect only the code that must be protected:

preempt_disable()
rdtscl()
smp_processor_id()
preempt_enable()

As far as Steven's patch is already merged - let it be :-)

Regards

Marin Mitov

>
> ------------------>
> commit e71e716c531557308895598002bee24c431d3be1
> Author: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Date: Sun May 25 11:13:32 2008 -0400
>
> x86: enable preemption in delay
>
> The RT team has been searching for a nasty latency. This latency shows
> up out of the blue and has been seen to be as big as 5ms!
>
> Using ftrace I found the cause of the latency.
>
> pcscd-2995 3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
> pcscd-2995 3dN.2 52360301us : idle_cpu (irq_exit)
> pcscd-2995 3dN.2 52360301us : rcu_irq_exit (irq_exit)
> pcscd-2995 3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
> )
> pcscd-2995 3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)
>
> Here's an example of a 400 us latency. pcscd took a timer interrupt and
> returned with "need resched" enabled, but did not reschedule until after
> the next interrupt came in at 52360771us 400us later!
>
> At first I thought we somehow missed a preemption check in entry.S. But
> I also noticed that this always seemed to happen during a __delay call.
>
> pcscd-2995 3dN.2 52360836us : rcu_irq_exit (irq_exit)
> pcscd-2995 3.N.. 52361265us : preempt_schedule (__delay)
>
> Looking at the x86 delay, I found my problem.
>
> In git commit 35d5d08a085c56f153458c3f5d8ce24123617faf, Andrew Morton
> placed preempt_disable around the entire delay due to TSC's not working
> nicely on SMP. Unfortunately for those that care about latencies this
> is devastating! Especially when we have callers to mdelay(8).
>
> Here I enable preemption during the loop and account for anytime the task
> migrates to a new CPU. The delay asked for may be extended a bit by
> the migration, but delay only guarantees that it will delay for that minimum
> time. Delaying longer should not be an issue.
>
> [
> Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
> and to place the rep_nop between preempt_enabled/disable.
> ]
>
> Signed-off-by: Steven Rostedt <srostedt@xxxxxxxxxx>
> Cc: akpm@xxxxxxxx
> Cc: Clark Williams <clark.williams@xxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: "Luis Claudio R. Goncalves" <lclaudio@xxxxxxxx>
> Cc: Gregory Haskins <ghaskins@xxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Andi Kleen <andi-suse@xxxxxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> diff --git a/arch/x86/lib/delay_32.c b/arch/x86/lib/delay_32.c
> index 4535e6d..d710f2d 100644
> --- a/arch/x86/lib/delay_32.c
> +++ b/arch/x86/lib/delay_32.c
> @@ -44,13 +44,36 @@ static void delay_loop(unsigned long loops)
> static void delay_tsc(unsigned long loops)
> {
> unsigned long bclock, now;
> + int cpu;
>
> - preempt_disable(); /* TSC's are per-cpu */
> + preempt_disable();
> + cpu = smp_processor_id();
> rdtscl(bclock);
> - do {
> - rep_nop();
> + for (;;) {
> rdtscl(now);
> - } while ((now-bclock) < loops);
> + if ((now - bclock) >= loops)
> + break;
> +
> + /* Allow RT tasks to run */
> + preempt_enable();
> + rep_nop();
> + preempt_disable();
> +
> + /*
> + * It is possible that we moved to another CPU, and
> + * since TSC's are per-cpu we need to calculate
> + * that. The delay must guarantee that we wait "at
> + * least" the amount of time. Being moved to another
> + * CPU could make the wait longer but we just need to
> + * make sure we waited long enough. Rebalance the
> + * counter for this CPU.
> + */
> + if (unlikely(cpu != smp_processor_id())) {
> + loops -= (now - bclock);
> + cpu = smp_processor_id();
> + rdtscl(bclock);
> + }
> + }
> preempt_enable();
> }
>
> diff --git a/arch/x86/lib/delay_64.c b/arch/x86/lib/delay_64.c
> index bbc6105..4c441be 100644
> --- a/arch/x86/lib/delay_64.c
> +++ b/arch/x86/lib/delay_64.c
> @@ -31,14 +31,36 @@ int __devinit read_current_timer(unsigned long *timer_value)
> void __delay(unsigned long loops)
> {
> unsigned bclock, now;
> + int cpu;
>
> - preempt_disable(); /* TSC's are pre-cpu */
> + preempt_disable();
> + cpu = smp_processor_id();
> rdtscl(bclock);
> - do {
> - rep_nop();
> + for (;;) {
> rdtscl(now);
> + if ((now - bclock) >= loops)
> + break;
> +
> + /* Allow RT tasks to run */
> + preempt_enable();
> + rep_nop();
> + preempt_disable();
> +
> + /*
> + * It is possible that we moved to another CPU, and
> + * since TSC's are per-cpu we need to calculate
> + * that. The delay must guarantee that we wait "at
> + * least" the amount of time. Being moved to another
> + * CPU could make the wait longer but we just need to
> + * make sure we waited long enough. Rebalance the
> + * counter for this CPU.
> + */
> + if (unlikely(cpu != smp_processor_id())) {
> + loops -= (now - bclock);
> + cpu = smp_processor_id();
> + rdtscl(bclock);
> + }
> }
> - while ((now-bclock) < loops);
> preempt_enable();
> }
> EXPORT_SYMBOL(__delay);
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/