Re: Speeding up screen output for dialog

tytso@mit.edu
Mon, 2 Sep 1996 12:39:46 -0400


Date: Mon, 2 Sep 1996 09:57:33 +0200 (MET DST)
From: Bernd Schmidt <crux@pool.informatik.rwth-aachen.de>

I've been wondering why some dialog-based applications
(e.g. menuconfig) take so long even on a P90 to draw the screen. So I
made some investigations with strace. I found that ncurses writes to
the screen in rather large blocks of bytes, but these get broken up
in write_chan() in n_tty.c into single byte writes to the console,
which is inefficient (I assume that setting the cursor after each
write is the most time-consuming part). So I came up with a hack:
Instead of calling opost() for each byte, I wrote a new function
opost_block() that examines the buffer and returns the number of
bytes that can be directly written. These are then passed to
con_write() instead of put_char().

You've defintiely identified a real problem in the slowness of the
console. However, the solution isn't really the best one, as it
requires that characters be read an extra time --- once in opost_block,
and once in the driver's write routine. When you consider that most
interrupt driven devices have to buffer characters themselves (what BSD
calls "pseudo-dma"), that means that there will be three copies, and
that's non-optimal.

A better solution would be to actually implement the put_char() and
flush_chars() methods for the console driver. What the console's
put_char() is called, buffer the characters to be printed. When the
n_tty line discpline is done sending the characters, it will call the
flush_chars() method of the tty driver, if the driver has prepared one.
You can use this method as the cue to flush the output buffer by calling
con_write().

Yes, this means that there will still be a procedure activation to
put_char() and to opost() per character. And for the console driver,
you will still be copying the character an extra time (because of the
buffering that I'm asking you to add). However, it eliminates the extra
memory copying that you would have forced on all of the other consumers
of the tty layer. And in any case, as you pointed out, setting the
cursor in the console driver for each character is probably the real
underlying problem anyway. Using put_char() and flush_char() so taht
the cursor is only set once per block write solves this problem.

In the long run, opost() should probably be written so that it takes a
pointer, a from_user boolean, and a count. This will eliminate one of
the procedure activiations per character that's in the current system.
However, I think that fixing the console driver so that it supports
put_char() and flush_chars() will provide nearly all of the procedure
improvements you were looking for, while not impacting the performance
of the other tty drivers.

- Ted