Re: sched_yield() makes OpenLDAP slow
From: Howard Chu
Date: Mon Aug 22 2005 - 18:21:12 EST
Florian Weimer wrote:
* Howard Chu:
> That's not the complete story. BerkeleyDB provides a
> db_env_set_func_yield() hook to tell it what yield function it
> should use when its internal locking routines need such a function.
> If you don't set a specific hook, it just uses sleep(). The
> OpenLDAP backend will invoke this hook during some (not necessarily
> all) init sequences, to tell it to use the thread yield function
> that we selected in autoconf.
And this helps to increase performance substantially?
When the caller is a threaded program, yes, there is a substantial
(measurable and noticable) difference. Given that sleep() blocks the
entire process, the difference is obvious.
> Note that (on systems that support inter-process mutexes) a
> BerkeleyDB database environment may be used by multiple processes
> concurrently.
Yes, I know this, and I haven't experienced that much trouble with
deadlocks. Maybe the way you structure and access the database
environment can be optimized for deadlock avoidance?
Maybe we already did this deadlock analysis and optimization, years ago
when we first started developing this backend? Do you think everyone
else in the world is a total fool?
> As such, the yield function that is provided must work both for
> threads within a single process (PTHREAD_SCOPE_PROCESS) as well as
> between processes (PTHREAD_SCOPE_SYSTEM).
If I understand you correctly, what you really need is a syscall
along the lines "don't run me again until all threads T that share
property X have run, where the Ts aren't necessarily in the same
process". The kernel is psychic, it can't really know which
processes to schedule to satisfy such a requirement. I don't even
think "has joined the Berkeley DB environment" is the desired
property, but something like "is part of this cycle in the wait-for
graph" or something similar.
You seem to believe we're looking for special treatment for the
processes we're concerned with, and that's not true. If the system is
busy with other processes, so be it, the system is busy. If you want
better performance, you build a dedicated server and don't let anything
else make the system busy. This is the way mission-critical services are
delivered, regardless of the service. If you're not running on a
dedicated system, then your deployment must not be mission critical, and
so you shouldn't be surprised if a large gcc run slows down some other
activities in the meantime. If you have a large nice'd job running
before your normal priority jobs get their timeslice, then you should
certainly wonder wtf the scheduler is doing, and why your system even
claims to support nice() when clearly it doesn't mean anything on that
system.
I would have to check the Berkeley DB internals in order to tell what
is feasible to implement. This code shouldn't be on the fast path,
so some kernel-based synchronization is probably sufficient.
pthread_cond_wait() probably would be just fine here, but BerkeleyDB
doesn't work that way.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/