Re: [patch 1/2] fork_connector: add a fork connector

From: Paul Jackson
Date: Tue Mar 29 2005 - 04:00:57 EST


Evgeniy wrote:
> There is no overhead at all using CBUS.

This is unlikely. Very unlikely.

Please understand that I am not trying to critique CBUS or connector in
isolation, but rather trying to determine what mechanism is best suited
for getting this accounting data written to disk, which is where I
assume it has to go until some non-real time job gets around to
analyzing it. We already have the rest of the BSD Accounting
information taking this batched up path directly to the disk. There is
nothing (that I know of) to be gained from delivering this new fork data
with any higher quality of service, or to any other place.

>From what I can understand, correct me if I'm wrong, we have two
alternatives in front of us (ignoring relayfs for a second):

1) Using fork_connector (presumably soon to include use of CBUS):
- forking process enqueues data plus header for single fork
- context switch
- daemon process dequeues single fork data (is this a read or recv?)
- daemon process mergers multiple fork data into single buffer
- daemon process writes buffer for multiple forks (a write)
- disk driver pushes buffer with data for multiple forks to disk

2) Using a modified form of what BSD ACCOUNTING does now:
- forking process appends single fork data to in-kernel buffer
- disk driver pushes buffer with data for multiple forks to disk

It seems to me to be rather unlikely that (1) is cheaper than (2). It
is no particular fault of connector or CBUS that this is so. Even if
there were no overhead at all using CBUS (which I don't believe), (1)
still costs more, because it passes the data, with an added packet
header, into a separate process, and into user space, before it is
combined with other accounting information, and written back down into
the kernel to go to the disk.


> > ... The efficiency
> > of getting this one extra <parent-pid, child-pid> out of the kernel
> > seems to be one or two orders of magnitude worse than the rest of
> > the accounting data.
>
> It can be easily changed.
> One may send kernel/acct.c acct_t structure out of the kernel -
> overhead will be the same: kmalloc probably will get new area from the
> same 256-bytes pool, skb is still in cache.

I have no idea what you just said.


> File writing accounting [kernel/acct.c] is slower,

Sure file writing is slower than queuing on an internal list. I don't
care that getting the data where I want it is slower than getting it
some other place that's only part way.


> and requires process' context to work with system calls.

For some connector uses, that might matter. For hooks in fork,
that is no problem - we have all the process context one could
want - two of them if that helps ;).


> realyfs is interesting project, but it has different aims,

That could well be ... I can't claim to know which of relayfs or
connector would be better here, of the two.


> while connector is purely control/notification mechanism

However connector is, in this regard, overkill. We don't need a single
event notification mechanism here. One of the key ways in which
accounting such as this has historically minimized system load is to
forgo any effort to provide any notification or data packet per event,
and instead immediately work to batch the data up in bulk form, with one
buffer containing the concatenated data for multiple events. This
amortizes the cost of almost all the handling, and of all the disk i/o,
over many data collection events. Correct me if I'm wrong, but
fork_connector doesn't do this merging of events into a consolidated
data buffer, so is at a distinct disadvantage, for this use, because the
data merging is delayed, and a separate, user level process, is required
to accomplish the merging and conversion to writable blocks of data
suitable for storing on the disk.

Nothing wrong with a good screwdriver. But if you want to pound nails,
hammers, even rocks, work better.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxxxxxxx> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/