Too much error in __const_udelay() ?

From: john stultz
Date: Sat Jun 05 2004 - 02:14:06 EST


[Resending due to lkml bouncing me for having html attachments]

Hey all,
I've been hunting a bug on one of our systems and it seems to closely
resemble the __delay() issues we saw w/ the ACPI PM time source.

Earlier when implementing the ACPI PM time source, we found that we
couldn't use the actual ACPI PM time source for the __delay function, or
else various subsystems would break. I chalked it up to the 8Mhz counter
being too low res for some callers of delay(), causing the waits to be
far too long. We backed off to just using the TSC for delay() and things
seemed to get better.

However I've started to see some problems w/ 2.6 and USB on x440/x445s,
both of which use the 100Mhz cyclone time source. Further digging has
pointed to the fact that certain important udelay()s in the USB
subsystem aren't actually waiting long enough.

So far I've narrowed it down to the scaled math bits in
__const_udelay() causing too much error for loops_per_jiffy values
around the 100,000 level the cyclone timesource uses.

To demonstrate this, I wrote the attached demo app using the
__const_udelay code. For those not wanting to run it themselves, its
output (for HZ=1000) looks like:

1 usec: LPJ: 100000 __udelay: 0 vs my_udelay: 100
1 usec: LPJ: 1500000 __udelay: 1000 vs my_udelay: 1500

2 usec: LPJ: 100000 __udelay: 0 vs my_udelay: 200
2 usec: LPJ: 1500000 __udelay: 2000 vs my_udelay: 3000

5 usec: LPJ: 100000 __udelay: 0 vs my_udelay: 500
5 usec: LPJ: 1500000 __udelay: 7000 vs my_udelay: 7500

10 usec: LPJ: 100000 __udelay: 0 vs my_udelay: 1000
10 usec: LPJ: 1500000 __udelay: 14000 vs my_udelay: 15000

20 usec: LPJ: 100000 __udelay: 1000 vs my_udelay: 2000
20 usec: LPJ: 1500000 __udelay: 29000 vs my_udelay: 30000

50 usec: LPJ: 100000 __udelay: 4000 vs my_udelay: 5000
50 usec: LPJ: 1500000 __udelay: 74000 vs my_udelay: 75000

100 usec: LPJ: 100000 __udelay: 9000 vs my_udelay: 10000
100 usec: LPJ: 1500000 __udelay: 149000 vs my_udelay: 150000



Here you can see __udelay() fails to be even close to accurate until
~50usec and returns zero for values less then 20usec.

I then went and measured the same udelay() values via rdtsc in the
kernel for both the cyclone based delay() as well as the tsc based
delay. The results are attached in the html file. You'll notice this
closely matches the results from the demo app.

This issue hasn't bitten me before w/ 2.4 because (as you can show w/
the demo app) __const_udelay() is more accurate w/ HZ=100.

I tried replacing __const_udelay w/ my_delay() but it didn't boot
(overflow issues, I'm guessing).

So I'm no math wiz. What's the proper fix here?

thanks
-john
#include <stdio.h>

#define MILLION 1000000
#define HZ 1000


unsigned long tsc_freq = 1500000000UL;
unsigned long cyclone_freq = 100000000UL;
unsigned long tsc_lpj = 1500000000UL/HZ;
unsigned long cyclone_lpj = 100000000UL/HZ;
unsigned long LPJ;

unsigned long __delay(unsigned long loops)
{
return loops;
}

unsigned long __const_udelay(unsigned long xloops)
{
int d0;
__asm__("mull %0"
:"=d" (xloops), "=&a" (d0)
:"1" (xloops),"0" (LPJ));
return __delay(xloops * HZ);
}

unsigned long my_udelay(unsigned long usec)
{
unsigned long cyc_per_usec = (LPJ*HZ)/MILLION;
return __delay(usec*cyc_per_usec);
}


unsigned long __udelay(unsigned long usecs)
{
return __const_udelay(usecs * 0x000010c6); /* 2**32 / 1000000 */
}

unsigned long __ndelay(unsigned long nsecs)
{
return __const_udelay(nsecs * 0x00005); /* 2**32 / 1000000000 (rounded up) */
}


int main(void)
{
unsigned long time[7] = {1,2,5,10,20,50,100};
int i;

for(i=0;i < 7; i++){
LPJ=cyclone_lpj;
printf("%3i usec: LPJ: %7lu __udelay: %5lu vs my_udelay: %5lu\n",
time[i], LPJ, __udelay(time[i]), my_udelay(time[i]));
LPJ=tsc_lpj;
printf("%3i usec: LPJ: %lu __udelay: %5lu vs my_udelay: %5lu\n",
time[i], LPJ, __udelay(time[i]), my_udelay(time[i]));
printf("\n");
}

return 0;
}

Attachment: delay-differences.html.gz
Description: GNU Zip compressed data