Re: sched_yield() makes OpenLDAP slow

From: Con Kolivas
Date: Wed Aug 17 2005 - 19:57:47 EST


On Thu, 18 Aug 2005 10:50 am, Bernardo Innocenti wrote:
> Hello,
>
> I've been investigating a performance problem on a
> server using OpenLDAP 2.2.26 for nss resolution and
> running kernel 2.6.12.
>
> When a CPU bound process such as GCC is running in the
> background (even at nice 10), many trivial commands such
> as "su" or "groups" become extremely slow and take a few
> seconds to complete.
>
> strace revealed that data exchange over the slapd socket
> was where most of the time was spent. Looking at the
> slapd side, I see several calls to sched_yield() like this:
>
>
> [pid 8780] 0.000033 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES
> (Permission denied) [pid 8780] 0.000059 pread(20,
> "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) =
> 4096 [pid 8780] 0.000083 pread(20,
> "\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096,
> 16384) = 4096 [pid 8780] 0.000078 time(NULL) = 1124322520
> [pid 8780] 0.000066 pread(11,
> "\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\0000"..., 4096, 688128)
> = 4096 [pid 8780] 0.000241 write(19,
> "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 103) = 103 [pid 8780]
> 0.000137 sched_yield( <unfinished ...>
> [pid 8781] 0.050020 <... sched_yield resumed> ) = 0
> [pid 8780] 0.000025 <... sched_yield resumed> ) = 0
> [pid 8781] 0.000060 futex(0x925ab20, FUTEX_WAIT, 33, NULL <unfinished
> ...> [pid 8780] 0.000026 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14)
> = 14 [pid 8774] 0.000774 <... select resumed> ) = 1 (in [19])
>
>
> The relative timestamp reveals that slapd is spending 50ms
> after yielding. Meanwhile, GCC is probably being scheduled
> for a whole quantum.
>
> Reading the man-page of sched_yield() it seems this isn't
> the correct behavior:
>
> Note: If the current process is the only process in the
> highest priority list at that time, this process will
> continue to run after a call to sched_yield.
>
> I also think OpenLDAP is wrong. First, it should be calling
> pthread_yield() because slapd is a multithreading process
> and it just wants to run the other threads. See:

sched_yield behaviour changed in 2.5 series more than 3 years ago and
applications that use this as a locking primitive should be updated.

Cheers,
Con
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/