Re: [1/1] CBUS: new very fast (for insert operations) message busbased on kenel connector.

From: Evgeniy Polyakov
Date: Fri Apr 01 2005 - 05:12:47 EST


On Fri, 2005-04-01 at 01:25 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> >
> > On Thu, 2005-03-31 at 23:59 -0800, Andrew Morton wrote:
> > > Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> > > >
> > > > On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
> > > > > Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > > +static int cbus_event_thread(void *data)
> > > > > > > > +{
> > > > > > > > + int i, non_empty = 0, empty = 0;
> > > > > > > > + struct cbus_event_container *c;
> > > > > > > > +
> > > > > > > > + daemonize(cbus_name);
> > > > > > > > + allow_signal(SIGTERM);
> > > > > > > > + set_user_nice(current, 19);
> > > > > > >
> > > > > > > Please use the kthread api for managing this thread.
> > > > > > >
> > > > > > > Is a new kernel thread needed?
> > > > > >
> > > > > > Logic behind cbus is following:
> > > > > > 1. make insert operation return as soon as possible,
> > > > > > 2. deferring actual message delivering to the safe time
> > > > > >
> > > > > > That thread does second point.
> > > > >
> > > > > But does it need a new thread rather than using the existing keventd?
> > > >
> > > > Yes, it is much cleaner [especially from performance tuning point]
> > > > to use own kernel thread than pospone all work to the queued work.
> > > >
> > >
> > > Why? Unless keventd is off doing something else (rare), it should be
> > > exactly equivalent. And if keventd _is_ off doing something else then that
> > > will slow down this kernel thread too, of course.
> >
> > keventd does very hard jobs on some of my test machines which
> > for example route big amount of traffic.
>
> As I said - that's going to cause _your_ kernel thread to be slowed down as
> well.

Yes, but it does not solve peak performance issues - all scheduled
jobs can run one after another which will decrease insert performance.

> I mean, it's just a choice between two ways of multiplexing the CPU. One
> is via a context switch in schedule() and the other is via list traversal
> in run_workqueue(). The latter will be faster.

But in case of separate thread one can control execution process,
if it will be called from work queue then insert requests
can appear one after another in a very interval,
so theirs processing will hurt insert performance.

> > > Plus keventd is thread-per-cpu and quite possibly would be faster.
> >
> > I experimented with several usage cases for CBUS and it was proven
> > to be the fastest case when only one sending thread exists which manages
> > only very limited amount of messages at a time [like 10 in CBUS
> > currently]
>
> Maybe that's because the cbus data structures are insufficiently
> percpuified. On really big machines that single kernel thread will be a
> big bottleneck.

It is not because of messages itself, but becouse of it's peaks,
if there is a peak then above mechanism will smooth it into
several pieces [for example 10 in each bundle, that value should be
changeable in run-time, will place it into TODO],
with keventd there is no guarantee that next peak will be processed
not just after the current one, but after some timeout.

> > without direct awakening [that is why wake_up() is commented in
> > cbus_insert()].
>
> You mean the
>
> interruptible_sleep_on_timeout(&cbus_wait_queue, 10);
>
> ? (That should be HZ/100, btw).
>
> That seems a bit kludgy - someone could queue 10000 messages in that time,
> although they'll probably run out of memory first, if it's doing
> GFP_ATOMIC.

GFP_ATOMIC issues will be resolved first.

> Introducing an up-to-ten millisecond latency seems a lot worse than some
> reduction in peak bandwidth - it's not as if pumping 100000 events/sec is a
> likely use case. Using prepare_to_wait/finish_wait will provide some
> speedup in SMP environments due to reduced cacheline transfers.

It is a question actually...
If we allow peak processing, then we _definitely_ will have insert
performance degradation, it was observed in my tests.
The main goal of CBUS was exactly insert speed - so
it somehow must smooth shape performance peaks, and thus
above budget was introdyced.
It is similar to NAPI in some abstract way, but with different aims -
NAPI for speed improovement, but here we have peak smootheness.

> > If too many deferred insert works will be called simultaneously
> > [which may happen with keventd] it will slow down insert operations
> > noticeably.
>
> What is a "deferred insert work"? Insertion is always synchronous?

Insert is synchronous in one CPU, but actuall message delivering is
deferred.

> > I did not try that case with the keventd but with one kernel thread
> > it was tested and showed worse performance.
>
> But your current implementation has only one kernel thread?

It has a budget and timeout between each bundle processing.
keventd does not allow to create such a timeout between each bundle
processing.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachment: signature.asc
Description: This is a digitally signed message part